Meet Einhorn

Greg Brockman, May 24, 2012

Today we're happy to release Einhorn, the language-independent shared socket manager. Einhorn makes it easy to have multiple instances of an application server listen on the same port. You can also seamlessly restart your workers without dropping any requests. Einhorn requires minimal application-level support, making it easy to use with an existing project.

Motivation

The main alternatives for achieving this functionality are FastCGI (and related options such as Phusion Passenger) and Unicorn (and derivatives such as Rainbows!). In our case using either would have required significant application changes. As well, we could only use them for applications speaking HTTP. So we decided to build a general solution.

Unicorn's architecture has a lot going for it, though. It uses a shared socket opened by a master process and then inherited by workers. This means all concurrency is handled by your operating system's scheduler. At any time, you can ask Unicorn to upgrade your workers, and it will spin up a new pool of workers before killing off the old. Unicorn can also preload your application, meaning it loads everything prior to forking so that your code is only stored in memory once.

We decided to take the best features of Unicorn and roll them into a language-independent shared socket manager, which we dubbed Einhorn (the German word for Unicorn).

Using Einhorn

Installing Einhorn is easy:

$ gem install einhorn

Running a process under Einhorn is as simple as:

$ einhorn -n 3 sleep 5
[MASTER 19665] INFO: Writing PID to /tmp/einhorn.pid
[MASTER 19665] INFO: Launching 3 new workers
[MASTER 19665] INFO: ===> Launched 19666
[WORKER 19666] INFO: About to exec ["/bin/sleep", "5"]
[MASTER 19665] INFO: ===> Launched 19667
[WORKER 19667] INFO: About to exec ["/bin/sleep", "5"]
[MASTER 19665] INFO: ===> Launched 19668
[WORKER 19668] INFO: About to exec ["/bin/sleep", "5"]
...

This will spawn and autorestart three copies of sleep 5. Einhorn is configured with a handful command line flags (run einhorn -h for usage).

Einhorn ships with a sample app, time_server, that shows how to use Einhorn's shared-socket features. To run it, cd into the example directory, and execute something like the following:

$ einhorn -m manual ./time_server srv:127.0.0.1:2345,so_reuseaddr
[MASTER 20265] INFO: Writing PID to /tmp/einhorn.pid
[MASTER 20265] INFO: Binding to 127.0.0.1:2345 with flags ["so_reuseaddr"]
[MASTER 20265] INFO: Launching 1 new workers
[MASTER 20265] INFO: ===> Launched 20266
[WORKER 20266] INFO: About to exec ["./time_server", "6"]
Called with ["6"]
[MASTER 20265] INFO: [client 2:7] client connected
[MASTER 20265] INFO: Received a manual ACK from 20266
[MASTER 20265] INFO: Up to 1 / 1 manual ACKs
[MASTER 20265] INFO: [client 2:7] client disconnected
...

Let's break down the arguments here. The -m manual flag indicates that Einhorn should wait for an explicit acknowledgement from the time_server before considering it "up". (By default, Einhorn will consider a worker up if it's been alive for one second.) When it's ready, the time_server worker connects to the Einhorn master and sends an ACK command.

The remaining arguments serve as a template of the program to run. Einhorn scans for server socket specifications of the form srv:(IP:PORT)[<,OPT>...]. When it finds one, it configures a corresponding socket and replaces the specification with the socket's file descriptor number. The specification srv:127.0.0.1:2345,so_reuseaddr is taken to mean "create a socket listening on 127.0.0.1:2345 with the SO_REUSEADDR flag set". In the above case, the opened socket had file descriptor number 6. See the README for more details on specifying server sockets.

Features

Einhorn lets you spin up any number of worker processes (the number can be adjusted on the fly) each possessing one or more shared sockets. Einhorn can spawn a new pool of workers and gracefully kill off the old ones, allowing seamless upgrades to new versions of your code. As well, Einhorn gets out of your application's way — the shared sockets are just file descriptors which your application manipulates directly or manages with an existing framework. You can introspect a running Einhorn's state or send it administrative commands using its command shell, einhornsh.

If you happen to be using Ruby, Einhorn can also preload your application. Just pass a -p PATH_TO_CODE and define a method einhorn_main as your workers' entry point:

$ einhorn -n 2 -p ./pool_worker.rb ./pool_worker.rb argument
[MASTER 20873] INFO: Writing PID to /tmp/einhorn.pid
[MASTER 20873] INFO: Set ARGV = ["argument"]
[MASTER 20873] INFO: Requiring ./pool_worker.rb (if this hangs, make sure your code can be properly loaded as a library)
From PID 20873: loading /home/gdb/stripe/einhorn/example/pool_worker.rb
[MASTER 20873] INFO: Successfully loaded ./pool_worker.rb
[MASTER 20873] INFO: Launching 2 new workers
[MASTER 20873] INFO: ===> Launched 20875
[WORKER 20875] INFO: About to tear down Einhorn state and run einhorn_main
[WORKER 20875] INFO: Set $0 = "./pool_worker.rb argument",  ARGV = ["argument"]
[MASTER 20873] INFO: ===> Launched 20878
From PID 20875: Doing some work
[WORKER 20878] INFO: About to tear down Einhorn state and run einhorn_main
[WORKER 20878] INFO: Set $0 = "./pool_worker.rb argument",  ARGV = ["argument"]
From PID 20878: Doing some work
...

As in Unicorn, this reduces memory usage and makes spawning additional workers very lightweight. Preloading is Einhorn's only language-dependent feature (and was easy to implement because Einhorn is itself written in Ruby). Adding preloading for other languages would require some architectural changes, but we might do it in the future.

⁕ ⁕ ⁕

Though Einhorn requires very little cooperation from your code, we still had to do some work to make our API servers compatible. In particular, we use Thin and EventMachine, both of which needed patching to support the use of an existing file descriptor. The relevant patches are on the master branch of our public forks of the respective projects.

These days, we use Einhorn to run all of our application servers. We also use it to run our non-web processes where we want to spawn and keep alive multiple instances. We run Einhorn under a process manager (we use daemontools, but any will work) — adding Einhorn into your existing infrastructure should just require adding an einhorn into the command-line arguments of your managed processes.

We've been using Einhorn in production for a number of months now. We hope you'll find it useful as well. If you want to run a web app but can't use Unicorn, or if you have a worker process that you want to start pooling, you should check Einhorn out and let us know what you think!