GrabDuck

sampointer/kapo

:

Wrap any command in a status socket.

Description

Kapo is a swiss-army knife for integrating programs that do not have their own method of presenting their status over a network with those that need it. Examples might be:

  1. Allow queue workers to be monitored by your service discovery system
  2. Allow random shell scripts to present status to your container scheduler
  3. Abuse load balancers to route traffic based on an open port
  4. Expose the running status of configuration management tools
  5. Alert if a process fails too often
  6. Start and monitor non-networked programs on-demand using systemd socket activation

When a program is executed under kapo a JSON-speaking HTTP server is started and the state of the process is reported to whomever requests it.

This server will respond correctly to HEAD requests. The open socket will function as an indicator of aliveness. The body is a JSON document reflecting process state.

Requests could be as simple as inferring the process is alive if a socket can be opened to as complex as parsing the JSON schema and performing selective actions based on the state reported therein.

Modes

kapo can be run in one of three modes: run, supervise and watch.

The first is the most useful as a container ENTRYPOINT, especially in tandem with the --ttl flag to inject some chaos.

The second will prop up a failing process by continually restarting it if it fails (with an optional wait interval), reporting interesting facts like the last return code and start time on the status listener.

The third, watch, is for use in tandem with your preferred process supervisor: it'll infer the state of the process from the process list of the operating system. By default watch will match all processes in the process list that match the binary name given as the argument. Passing --pid limits the scope to just that process.

The utility of all of this is probably best illustrated via some examples:

Examples

Abuse an ELB to keep a pool of queue workers running

  1. Create an ELB with no external ingress
  2. Configure the ELB Health check to perform a TCP or HTTP check on port 6666 with failure criteria that suit your application
  3. Create an auto-scaling group with HealthCheckType: ELB
  4. Have each host start their worker under kapo:
$ kapo --interface 0.0.0.0 --port 6666 run -- ./worker.py --dowork myqueue

Should the worker on any given node die the ELB will instruct the ASG to kill the node and another will be provisioned to replace it.

A slightly less expensive variant that resurrects worker processes if it can is:

$ kapo supervise --wait 30 -- ./worker.py --dowork myqueue

A human or computer can query the state of workers:

$ curl http://localhost:6666 2>/dev/null
[{"Arguments":["--dowork", "myqueue"],"Command":"./worker.py","EndTime":"0001-01-01T00:00:00Z","ExitCode":0,"Mode":"supervise","StartTime":"0001-01-01T00:00:00Z","Status":"running","TTL":0,"Wait":0}]

As a Container ENTRYPOINT to inject random failure, exercise your scheduler's resilience and add TTLs to containers

As all good proponents of the SRE model know, forcing failures by periodically killing execution units, forcing circuit breakers to fire, and reguarly refreshing your running environment are vital. kapo can be used as a container ENTRYPOINT to force containers to have a random TTL:

FROM ubuntu:latest
COPY kapo kapo
RUN apt-get update && apt-get install stress
EXPOSE 6666
ENTRYPOINT /bin/bash -c './kapo --interface 0.0.0.0 --port 6666 run --ttl $(($RANDOM % 30 + 1)) -- stress -c 1'

Expose Puppet run activity

$ nohup kapo watch puppet &
$ curl http://somehost:6666 2>/dev/null
[{"Arguments":null,"Command":"puppet","EndTime":"0001-01-01T00:00:00Z","ExitCode":0,"Mode":"watch","StartTime":"0001-01-01T00:00:00Z","Status":"stopped","TTL":0,"Wait":5000000000}]
$ sleep 300
$ curl http://somehost:6666 2>/dev/null
[{"Arguments":null,"Command":"puppet","EndTime":"0001-01-01T00:00:00Z","ExitCode":0,"Mode":"watch","StartTime":"2017-03-02T18:20:28.762060588Z","Status":"running","TTL":0,"Wait":5000000000}]

Socket Activation

Kapo can listen for connections via systemd socket activation by passing the global option --socket-activation (or setting KAPO_SOCKET_ACTIVATION) and configuring systemd as appropriate.

When --socket-activation is passed any configured interface or port is ignored for the purposes of binding. The --sidebind global option will attempt to bind a second listener to the incrementing values above --port on --interface until it is successful.

There are a number of interesting use-cases for this functionality, including but not limited to starting and inspecting the status of non-networked programs and scripts on-demand upon receipt of a TCP connection, using run mode and --sidebind.

# useful.service
[Unit]
Description=Most Useful Script
Requires=network.target
After=multi-user.target

[Service]
Type=notify # required to enable the HTTP handler to notify dbus
ExecStart=/usr/local/bin/kapo --socket-activation --sidebind run /usr/local/bin/useful.sh
NonBlocking=true

[Install]
WantedBy=multi-user.target
# useful.socket
[Socket]
ListenStream=0.0.0.0:6666

[Install]
WantedBy=sockets.target
$ systemctl enable useful.socket
$ systemctl start useful.socket
$ systemctl enable useful.service # But not started on boot

One can then curl http://localhost:6666 to have systemd start an instance of useful.sh. The connection to the socket will remain open for the duration of the execution of the process. We'll receive a status structure:

[{"Arguments":null,"Command":"/usr/local/bin/useful.sh","EndTime":"0001-01-01T00:00:00Z","ExitCode":0,"Mode":"run","StartTime":"2017-03-02T18:20:28.762060588Z","Status":"running","TTL":0,"SidebindPort": 6667, "Wait":5000000000}]

If we wanted to get a further update on the execution status of this process we cannot, of course, curl http://localhost:6666 as this was start another instance of the service. Instead we can see from the initial call above that we have a SidebindPort returned on which kapo has started another listener, by virtue of us passing --sidebind. We can interrogate this to our heart's content. If we were to start another instance of the service whilst the original one was still executing we'd be returned a different SidebindPort by that instance.

Configuration

Environment Variables

Switch arguments can be configured by setting an appropriately named environment variable:

  • KAPO_PORT
  • KAPO_INTERFACE
  • KAPO_SIDEBIND
  • KAPO_SOCKET_ACTIVATION
  • KAPO_STDOUT
  • KAPO_STDERR
  • KAPO_STDLOG
  • KAPO_TTL
  • KAPO_WAIT
  • KAPO_WATCHPID

Capturing STDOUT and STDERR

Many container execution environments implement logging by capturing the STDOUT and STDERR of the process executing in the container. By passing the global options --stdout and --stderr one may capture fds 1 and 2 from the supervised process and echo them up to whatever is in turn executing Kapo.

Passing the --stdlog flag alongside one or both of these options causes Kapo to emit each line read as a log line from Kapo itself. This is useful if your supervised process is writing undecorated strings and you have a need to capture the time context.

Expvar

The listener exposes basic runtime metrics via expvar for use with expvarmon.

Install

Contribution

  1. Fork (https://github.com/sampointer/kapo/fork)
  2. Create a feature branch
  3. Commit your changes
  4. Rebase your local changes against the master branch
  5. Run test suite with the go test ./... command and confirm that it passes
  6. Run gofmt -s
  7. Create a new Pull Request

Kapo?

Keep A Port Open.

Author

sampointer