Blog Engineering

Follow Stripe on Twitter

Open-sourcing tools for Hadoop

Colin Marc on November 21, 2014 in Engineering

Stripe’s batch data infrastructure is built largely on top of Apache Hadoop. We use these systems for everything from fraud modeling to business analytics, and we’re open-sourcing a few pieces today:


Timberlake is a dashboard that gives you insight into the Hadoop jobs running on your cluster. Jeff built it as a replacement for the web interfaces currently provided by YARN’s ResourceManager and MRv2’s JobHistory server, and it has some features we’ve found useful:

  • Map and reduce task waterfalls and timing plots
  • Scalding and Cascading awareness
  • Error tracebacks for failed jobs


Avi wrote a Scala framework for distributed learning of ensemble decision tree models called Brushfire. It’s inspired by Google’s PLANET, but built on Hadoop and Scalding. Designed to be highly generic, Brushfire can build and validate random forests and similar models from very large amounts of training data.


Sequins is a static database for serving data in Hadoop’s SequenceFile format. I wrote it to provide low-latency access to key/value aggregates generated by Hadoop. For example, we use it to give our API access to historical fraud modeling features, without adding an online dependency on HDFS.


At Stripe, we use Parquet extensively, especially in tandem with Cloudera Impala. Danielle, Jeff, and Avi wrote Herringbone (a collection of small command-line utilities) to make working with Parquet and Impala easier.

If you’re interested in trying out these projects, there’s more info on how to use them (and how they were designed) in the READMEs. If you’ve got feedback, please get in touch or send us a PR.

Happy Hadooping!

November 21, 2014

Game Day Exercises at Stripe:
Learning from `kill -9`

Marc Hedlund on October 28, 2014 in Engineering

We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node [0], and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.

If you’re not familiar with game days, the best introductory article is this one from John Allspaw [1]. Below, we’ll lay out a playbook for how to run a game day, and describe the results from our latest exercise to show why we believe they are valuable.

How to run a game day exercise

The system we recently tested, scoring-srv, is one part of our fraud detection system. The scoring-srv processes run on a cluster of boxes and connect to a three-node Redis cluster to store fraud scoring data. Our internal charge-processing code connects to scoring-srv for each charge made on Stripe’s network, so it needs to be very low-latency; likewise, accurate scoring requires historical data, so it needs durable storage.

The scoring-srv developers and a member of our systems team, who could help run the tests, got together around a whiteboard. We drew a basic block diagram of the machines and processes, the data stores, and the network connections between the components. With that diagram, we were able to come up with a list of possible failures.

We came up with a list of six tests we could run easily:

  • destroying and restoring a scoring-srv box,
  • destroying progressively more scoring-srv boxes until calls to it began timing out,
  • partitioning the network between our charge processing code and scoring-srv,
  • increasing the load on the primary Redis node,
  • killing the primary Redis node, and
  • killing one of the Redis replicas.

Since the team was new to game days, we did not try to be comprehensive or clever. We instead chose the simplest, easiest to simulate failures we could think of. We’d take a blunt instrument, like kill -9 or aws ec2 terminate-instances, give the system a good hard knock, and see how it reacted [2].

For each test, we came up with one or more hypotheses for what would happen when we ran it. For instance, we guessed that partitioning the network between charge processing and scoring-srv would cause these calls to time out and fail open (that is, allow the charge to go through immediately). Then, we decided on an order to perform the tests, saved a backup of a recent Redis snapshot as a precaution, and dove in.

Here, then, is a quick-start checklist for running a game day:

  1. Get the development team together with someone who can modify the network and destroy or provision servers, and block off an afternoon to run the exercise.
  2. Make a simple block diagram of the machines, processes, and network connections in the system you’re testing.
  3. Come up with 5-7 of the simplest failures you can easily induce in the system.
  4. Write down one or more hypotheses for what will happen after each failure.
  5. Back up any data you can’t lose.
  6. Induce each failure and observe the results, filing bugs for each surprise you encounter.

Observations and results

We were able to terminate a scoring-srv machine and restore it with a single command in roughly the estimated time. This gave us confidence that replacing or adding cluster machines would be fast and easy. We also saw that killing progressively more scoring-srv machines never caused timeouts, showing we currently have more capacity than necessary. Partitioning the network between the charge-processing code and scoring-srv caused a spike in latency, where we’d expected calls to scoring-srv to time out and fail open quickly. This test also should have immediately alerted the teams responsible for this system, but did not.

The first Redis test went pretty well. When we stopped one of the replicas with kill -9, it flapped several times on restart, which was surprising and confusing to observe. As expected, though, the replica successfully restored data from its snapshot and caught up with replication from the primary.

Then we moved to the Redis primary node test, and had a bigger surprise. While developing the system, we had become concerned about latency spikes during snapshotting of the primary node. Because scoring-srv is latency-sensitive, we had configured the primary node not to snapshot its data to disk. Instead, the two replicas each made frequent snapshots. In the case of failure of the primary, we expected one of the two replicas to be promoted to primary; when the failed process came back up, we expected it to restore its data via replication from the new primary. That didn’t happen. Instead, when we ran kill -9 on the primary node (and it was restarted by daemontools), it came back up – after, again, flapping for a short time – with no data, but was still acting as primary. From there, it restarted replication and sent its empty dataset to the two replica nodes, which lost their datasets as a result. In a few seconds, we’d gone from a three-node replicated data store to an empty data set. Fortunately, we had saved a backup and were able to get the cluster re-populated quickly.

The full set of tests took about 3.5 hours to run. For each failure or surprise, we filed a bug describing the expected and actual results. We wound up with 15 total issues from the five tests we performed (we wound up skipping the Redis primary load test) – a good payoff for the afternoon’s work. Closing these, and re-running the game day to verify that we now know what to expect in these cases, will greatly increase our confidence in the system and its behavior.

Learning from the game day

The invalidation of our Redis hypothesis left us questioning our approach to data storage for scoring-srv. Our original Redis setup had all three nodes performing snapshots (that is, periodically saving data to disk). We had tested failover from the primary node due to a clean shutdown and it had succeeded. While analyzing the cluster once we had live data running through it, though, we observed that the low latency we’d wanted from it would hit significant spikes, above 1 second, during snapshotting:

Obviously these spikes were concerning for a latency-sensitive application. We decided to disable snapshotting on the primary node, leaving it enabled on the replica nodes, and you can see the satisfying results below, with snapshotting enabled, then disabled, then enabled again:

Since we believed that failover would not be compromised in this configuration, this seemed like a good trade-off: relying on the primary node for performance and replication, and the replica nodes for snapshotting, failover, and recovery. As it turned out, this change was made the day before the game day, as part of the final lead-up to production readiness. (One could imagine making a similar change in the run-up to a launch!)

The game day wound up being the first full test of the configuration including all optimizations and changes made during development. We had tested the system with a primary node shutdown, then with snapshotting turned off on the primary, but this was the first time we’d seen these conditions operating together. The value of testing on production systems, where you can observe failures under the conditions you intend to ship, should be clear from this result.

After discussing the results we observed with some friends, a long and heated discussion about the failure took place on Twitter, in which Redis’ author said he had not expected the configuration we were using. Since there is no guarantee the software you’re using supports or expects the way you’re using it, the only way to see for certain how it will react to a failure is to try it.

While Redis is functional for scoring-srv with snapshotting turned on, the needs of our application are likely better served by other solutions. The trade-off between high-latency spikes, with primary node snapshotting enabled, versus total cluster data loss, with it disabled, leaves us feeling neither option is workable. For other configurations at Stripe – especially single-node topologies for which data loss is less costly, such as rate-limiting counters – Redis remains a good fit for our needs.


In the wake of the game day, we’ve run a simple experiment with PostgreSQL RDS as a possible replacement for the Redis cluster in scoring-srv. The results suggest that we could expect comparable latency without suffering snapshotting spikes. Our testing, using a similar dataset, had a 99th percentile read latency of 3.2 milliseconds, and a 99th percentile write latency of 11.3 milliseconds. We’re encouraged by these results and will be continuing our experiments with PostgreSQL for this application (and obviously, we will run similar game day tests for all systems we consider).

Any software will fail in unexpected ways unless you first watch it fail for yourself. We completely agree with Kelly Sommers’ point in the Twitter thread about this:

We’d highly recommend game day exercises to any team deploying a complex web application. Whether your hypotheses are proven out or invalidated, either way you’ll leave the exercise with greater confidence in your ability to respond to failures, and less need for on-the-fly diagnosis. Having that happen for the first time while you’re rested, ready, and watching is the best failure you can hope for.


[0] We’ve chosen to use the terms “primary” and “replica” in discussing Redis, rather than the terms “master” and “slave” used in the Redis documentation, to support inclusivity. For some interesting and heated discussion of this substitution, we’d recommend this Django pull request and this Drupal change.

[1] Some other good background articles for further reading: “Weathering the Unexpected”; “Resilience Engineering: Learning to Embrace Failure”; “Training Organizational Resilience in Escalating Situations”; “When the Nerds Go Marching In.”

[2] If you’d like to run more involved tests and you’re on AWS, this Netflix Tech Blog post from last week describes the tools they use for similar testing approaches.


Thanks much to John Allspaw, Jeff Hodges, Kyle Kingsbury, and Raffi Krikorian for reading drafts of this post, and to Kelly Sommers for permission to quote her tweet. Any errors are ours alone.

October 28, 2014

Open-Source Retreat meetup

Greg Brockman on October 16, 2014 in Engineering

A few months ago, we announced our Open-Source Retreat. Though we’d originally expected to sponsor two grantees, we ended up giving out three full grants (and then an additional shorter grant).

Read more

October 16, 2014

Stripe Open-Source Retreat

Greg Brockman on April 24, 2014 in Engineering

We rely on a lot of open-source software at Stripe, and over time we’ve contributed back our own share of patches and projects. We decided we’d like to do more, though, so we’re launching an open-source retreat program.

Read more

April 24, 2014


Alex MacCaw on February 7, 2013 in Engineering

A rising tide lifts all boats, and we’d like to help improve payment experiences for consumers everywhere, whether or not they use Stripe. Today, we’re releasing jQuery.payment, a general purpose library for building credit card forms, validating input, and formatting numbers. This library is behind a lot of the functionality in Checkout.

Some sites require a bit more flexibility than our Checkout provides. This is where jQuery.payment shines. You can have some of the same formatting and validation as in the Checkout along with as much flexibility as you need.


For example, you can ensure that a text input is formatted as a credit card number, with digits in groups of four and limited to 16 characters.


Or you can ensure input is formatted as a MM/YYYY card expiry:


The library includes a bunch of utility and validation methods, for example:

$.payment.validateCardNumber('4242 4242 4242 4242'); //=> true
$.payment.validateCardCVC('123', 'amex'); //=> false
$.payment.validateCardExpiry('05', '20'); //=> true

$.payment.cardType('4242 4242 4242 4242'); //=> 'visa'

Robust and tested

It turns out that rolling your own code that restricts and formats input is particularly tricky in JavaScript. You have to cater for lots of edge cases such as users pasting text, selecting and replacing numbers, as well as the different ways credit card numbers are formatted.

We’ve spent a lot of time tuning our formatting and validation logic as well as testing and ensuring cross browser compatibility, so you don't have to reinvent the wheel. We look forward to seeing what you build! You can find a live demo of the library, as well as the source on GitHub.

February 7, 2013

Announcing MoSQL

Nelson Elhage on February 5, 2013 in Engineering

Today, we are releasing MoSQL, a tool Stripe developed for live-replicating data from a MongoDB database into a PostgreSQL database. With MoSQL, you can run applications against a MongoDB database, but also maintain a live-updated mirror of your data in PostgreSQL, ready for querying with the full power of SQL.


Here at Stripe, we use a number of different database technologies for both internal- and external-facing services. Over time, we've found ourselves with growing amounts of data in MongoDB that we would like to be able to analyze using SQL. MongoDB is great for a lot of reasons, but it's hard to beat SQL for easy ad-hoc data aggregation and analysis, especially since virtually every developer or analyst already knows it.

An obvious solution is to periodically dump your MongoDB database and re-import into PostgreSQL, perhaps using mongoexport. We experimented with this approach, but found ourselves frustrated with the ever-growing time it took to do a full refresh. Even if most of your analyses can tolerate a day or two of delay, occasionally you want to ask ad-hoc questions about "what happened last night?", and it's frustrating to have to wait on a huge dump/load refresh to do that. In response, we built MoSQL, enabling us to keep a real-time SQL mirror of our Mongo data.

MoSQL does an initial import of your MongoDB collections into a PostgreSQL database, and then continues running, applying any changes to the MongoDB server in near-real-time to the PostgreSQL mirror. The replication works by tailing the MongoDB oplog, in essentially the same way Mongo's own replication works.


MoSQL can be installed like any other gem:

$ gem install mosql

To use MoSQL, you'll need to create a collection map which maps your MongoDB objects to a SQL schema. We'll use the collection from the MongoDB tutorial as an example. A possible collection map for that collection would look like:

      - _id: TEXT
      - x: INTEGER
      - j: INTEGER
     :table: things
     :extra_props: true

Save that file as collections.yaml, start a local mongod and postgres, and run:

$ mosql --collections collections.yaml

Now, run through the MongoDB tutorial, and then open a psql shell. You'll find all your Mongo data now available in SQL form:

postgres=# select * from things limit 5;
           _id            | x | j |   _extra_props
 50f445b65c46a32ca8c84a5d |   |   | {"name":"mongo"}
 50f445df5c46a32ca8c84a5e | 3 |   | {}
 50f445e75c46a32ca8c84a5f | 4 | 1 | {}
 50f445e75c46a32ca8c84a60 | 4 | 2 | {}
 50f445e75c46a32ca8c84a61 | 4 | 3 | {}
(5 rows)

mosql will continue running, syncing any further changes you make into Postgres.

For more documentation and usage information, see the README.


MoSQL comes from a general philosophy of preferring real-time, continuously-updating solutions to periodic batch jobs.

MoSQL is built on top of mongoriver, a general library for MongoDB oplog tailing that we developed. Along with the MoSQL release, we have also released mongoriver as open source today. If you find yourself wanting to write your own MongoDB tailer, to monitor updates to your data in near-realtime, check it out.

February 5, 2013

Exploring Python Using GDB

Evan Broder on June 13, 2012 in Engineering

People tend to have a narrow view of the problems they can solve using GDB. Many think that GDB is just for debugging segfaults or that it's only useful with C or C++ programs. In reality, GDB is an impressively general and powerful tool. When you know how to use it, you can debug just about anything, including Python, Ruby, and other dynamic languages. It's not just for inspection either—GDB can also be used to modify a program's behavior while it's running.

When we ran our Capture The Flag contest, a lot of people asked us about introductions to that kind of low-level work. GDB can be a great way to get started. In order to demonstrate some of GDB's flexibility, and show some of the steps involved in practical GDB work, we've put together a brief example of debugging Python with GDB.

Imagine you're building a web app in Django. The standard cycle for building one of these apps is to edit some code, hit an error, fix it, restart the server, and refresh in the browser. It's a little tedious. Wouldn't it be cool if you could hit the error, fix the code while the request is still pending, and then have the request complete successfully?

As it happens, the Seaside framework supports exactly this. Using one of Stripe's example projects, let's take a look at how we could pull it off in Python using GDB:

GDB Demo Screencast

Pretty cool, right? Though a little contrived, this example demonstrates many helpful techniques for making effective real-world use of GDB. I'll walk through what we did in a little more detail, and explain some of the GDB tricks as we go.

For the sake of brevity, I'll show the commands I type, but elide some of the output they generate. I'm working on Ubuntu 12.04 with GDB 7.4. The manipulation should still work on other platforms, but you probably won't get automatic pretty-printing of Python types. You can generate them by hand by running p PyString_AsString(PyObject_Repr(obj)) in GDB.

Getting Set Up

First, let's start the monospace-django server with --noreload so that Django's autoreloading doesn't get in the way of our GDB-based reloading. We'll also use the python2.7-dbg interpreter, which will ensure that less of the program's state is optimized away.

$ git clone
$ cd monospace-django/
$ virtualenv --no-site-packages env
$ cp /usr/bin/python2.7-dbg env/bin/python
$ source env/bin/activate
(env)$ pip install -r requirements.txt
(env)$ python monospace/ syncdb
(env)$ python monospace/ runserver --noreload

$ sudo gdb -p $(pgrep -f monospace/
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
Attaching to process 946
Reading symbols from /home/evan/monospace-django/env/bin/python...done.
(gdb) symbol-file /usr/bin/python2.7-dbg
Load new symbol table from "/usr/bin/python2.7-dbg"? (y or n) y
Reading symbols from /usr/bin/python2.7-dbg...done.

As of version 7.0 of GDB, it's possible to automatically script GDB's behavior, and even register your own code to pretty-print C types. Python comes with its own hooks which can pretty-print Python types (such as PyObject *) and understand the Python stack. These hooks are loaded automatically if you have the python2.7-dbg package installed on Ubuntu.

Whatever you're debugging, you should look to see if there are relevant GDB scripts available—useful helpers have been created for many dynamic languages.

Catching the Error

The Python interpreter creates a PyFrameObject every time it starts executing a Python stack frame. From that frame object, we can get the name of the function being executed. It's stored as a Python object, so we can convert it to a C string using PyString_AsString, and then stop the interpreter only if it begins executing a function called handle_uncaught_exception.

The obvious way to catch this would be by creating a GDB breakpoint. A lot of frames are allocated in the process of executing Python code, though. Rather than tediously continue through hundreds of false positives, we can set a conditional breakpoint that'll break on only the frame we care about:

(gdb) b PyEval_EvalFrameEx if strcmp(PyString_AsString(f->f_code->co_name), "handle_uncaught_exception") == 0
Breakpoint 1 at 0x519d64: file ../Python/ceval.c, line 688.
(gdb) c

Breakpoint conditions can be pretty complex, but it's worth noting that conditional breakpoints that fire often (like PyEval_EvalFrameEx) can slow the program down significantly.

Generating the Initial Return Value

Okay, let's see if we can actually fix things during the next request. We resubmit the form. Once again, GDB halts when the app starts generating the internal server error response. While we investigate more, let's disable the breakpoint in order to keep things fast.

What we really want to do here is to let the app finish generating its original return value (the error response) and then to replace that with our own (the correct response). We find the stack frame where get_response is being evaluated. Once we've jumped to that frame with the up or frame command, we can use the finish command to wait until the currently selected stack frame finishes executing and returns.

Breakpoint 1, PyEval_EvalFrameEx (f=
    Frame 0x3534110, for file [...]/django/core/handlers/, line 186, in handle_uncaught_exception [...], throwflag=0) at ../Python/ceval.c:688
688 ../Python/ceval.c: No such file or directory.
(gdb) disable 1
(gdb) frame 3
#3  0x0000000000521276 in PyEval_EvalFrameEx (f=
    Frame 0x31ac000, for file [...]/django/core/handlers/, line 169, in get_response [...], throwflag=0) at ../Python/ceval.c:2666
2666      in ../Python/ceval.c
(gdb) finish
Run till exit from #3  0x0000000000521276 in PyEval_EvalFrameEx (f=
    Frame 0x31ac000, for file [...]/django/core/handlers/, line 169, in get_response [...], throwflag=0) at ../Python/ceval.c:2666
0x0000000000526871 in fast_function (func=<function at remote 0x26e96f0>,
    pp_stack=0x7fffb296e4b0, n=2, na=2, nk=0) at ../Python/ceval.c:4107
4107                         in ../Python/ceval.c
Value returned is $1 =
    <HttpResponseServerError[...] at remote 0x3474680>

Patching the Code

Now that we've gotten the interpreter into the state we want, we can use Python's internals to modify the running state of the application. GDB allows you to make fairly complicated dynamic function invocations, and we'll use lots of that here.

We use the C equivalent of the Python reload function to reimport the code. We have to also reload the monospace.urls module so that it picks up the new code in monospace.views.

One handy trick, which we use to invoke git in the video and curl here, is that you can run shell commands from within GDB.

(gdb) shell curl -s -L | patch -p1
patching file monospace/
(gdb) p PyImport_ReloadModule(PyImport_AddModule("monospace.views"))
$2 = <module at remote 0x31d4b58>

(gdb) p PyImport_ReloadModule(PyImport_AddModule("monospace.urls"))
$3 = <module at remote 0x31d45a8>

We've now patched and reloaded the code. Next, let's generate a new response by finding self and request from the local variables in this stack frame, and fetch and call its get_response method.

(gdb) p $self = PyDict_GetItemString(f->f_locals, "self")
$4 =
    <WSGIHandler([...]) at remote 0x311c610>
(gdb) set $request = PyDict_GetItemString(f->f_locals, "request")
(gdb) set $get_response = PyObject_GetAttrString($self, "get_response")
(gdb) set $args = Py_BuildValue("(O)", $request)
(gdb) p PyObject_Call($get_response, $args, 0)
$5 =
    <HttpResponse([...]) at remote 0x31b9fb0>

In the above snippet, we use GDB's set command to assign values to variables.

Alright, we now have a new response. Remember that we stopped the program right where the original get_response method returned. The C return value for the Python interpreter is the same as the Python return value. And so, to replace that return value on x86, we just have to store the new return value in a register—$rax on 64-bit x86— and then allow the execution to continue.

GDB allows you to refer to refer to the values returned by every command you evaluate by number. In this case, we want $5:

(gdb) set $rax = $5
(gdb) c

And, like magic, our web request finishes successfully.

GDB is a powerful precision tool. Even if you spend most of your time writing code in a much higher-level language, it can be extremely useful to have it available when you need to investigate subtle bugs or complex issues in running applications.

June 13, 2012

Meet Einhorn

Greg Brockman on May 24, 2012 in Engineering

Einhorn logo

Today we're happy to release Einhorn, the language-independent shared socket manager. Einhorn makes it easy to have multiple instances of an application server listen on the same port. You can also seamlessly restart your workers without dropping any requests. Einhorn requires minimal application-level support, making it easy to use with an existing project.


The main alternatives for achieving this functionality are FastCGI (and related options such as Phusion Passenger) and Unicorn (and derivatives such as Rainbows!). In our case using either would have required significant application changes. As well, we could only use them for applications speaking HTTP. So we decided to build a general solution.

Unicorn's architecture has a lot going for it, though. It uses a shared socket opened by a master process and then inherited by workers. This means all concurrency is handled by your operating system's scheduler. At any time, you can ask Unicorn to upgrade your workers, and it will spin up a new pool of workers before killing off the old. Unicorn can also preload your application, meaning it loads everything prior to forking so that your code is only stored in memory once.

We decided to take the best features of Unicorn and roll them into a language-independent shared socket manager, which we dubbed Einhorn (the German word for Unicorn).

Using Einhorn

Installing Einhorn is easy:

$ gem install einhorn

Running a process under Einhorn is as simple as:

$ einhorn -n 3 sleep 5
[MASTER 19665] INFO: Writing PID to /tmp/
[MASTER 19665] INFO: Launching 3 new workers
[MASTER 19665] INFO: ===> Launched 19666
[WORKER 19666] INFO: About to exec ["/bin/sleep", "5"]
[MASTER 19665] INFO: ===> Launched 19667
[WORKER 19667] INFO: About to exec ["/bin/sleep", "5"]
[MASTER 19665] INFO: ===> Launched 19668
[WORKER 19668] INFO: About to exec ["/bin/sleep", "5"]

This will spawn and autorestart three copies of sleep 5. Einhorn is configured with a handful command line flags (run einhorn -h for usage).

Einhorn ships with a sample app, time_server, that shows how to use Einhorn's shared-socket features. To run it, cd into the example directory, and execute something like the following:

$ einhorn -m manual ./time_server srv:,so_reuseaddr
[MASTER 20265] INFO: Writing PID to /tmp/
[MASTER 20265] INFO: Binding to with flags ["so_reuseaddr"]
[MASTER 20265] INFO: Launching 1 new workers
[MASTER 20265] INFO: ===> Launched 20266
[WORKER 20266] INFO: About to exec ["./time_server", "6"]
Called with ["6"]
[MASTER 20265] INFO: [client 2:7] client connected
[MASTER 20265] INFO: Received a manual ACK from 20266
[MASTER 20265] INFO: Up to 1 / 1 manual ACKs
[MASTER 20265] INFO: [client 2:7] client disconnected

Let's break down the arguments here. The -m manual flag indicates that Einhorn should wait for an explicit acknowledgement from the time_server before considering it "up". (By default, Einhorn will consider a worker up if it's been alive for one second.) When it's ready, the time_server worker connects to the Einhorn master and sends an ACK command.

The remaining arguments serve as a template of the program to run. Einhorn scans for server socket specifications of the form srv:(IP:PORT)[<,OPT>...]. When it finds one, it configures a corresponding socket and replaces the specification with the socket's file descriptor number. The specification srv:,so_reuseaddr is taken to mean "create a socket listening on with the SO_REUSEADDR flag set". In the above case, the opened socket had file descriptor number 6. See the README for more details on specifying server sockets.


Einhorn lets you spin up any number of worker processes (the number can be adjusted on the fly) each possessing one or more shared sockets. Einhorn can spawn a new pool of workers and gracefully kill off the old ones, allowing seamless upgrades to new versions of your code. As well, Einhorn gets out of your application's way — the shared sockets are just file descriptors which your application manipulates directly or manages with an existing framework. You can introspect a running Einhorn's state or send it administrative commands using its command shell, einhornsh.

If you happen to be using Ruby, Einhorn can also preload your application. Just pass a -p PATH_TO_CODE and define a method einhorn_main as your workers' entry point:

$ einhorn -n 2 -p ./pool_worker.rb ./pool_worker.rb argument
[MASTER 20873] INFO: Writing PID to /tmp/
[MASTER 20873] INFO: Set ARGV = ["argument"]
[MASTER 20873] INFO: Requiring ./pool_worker.rb (if this hangs, make sure your code can be properly loaded as a library)
From PID 20873: loading /home/gdb/stripe/einhorn/example/pool_worker.rb
[MASTER 20873] INFO: Successfully loaded ./pool_worker.rb
[MASTER 20873] INFO: Launching 2 new workers
[MASTER 20873] INFO: ===> Launched 20875
[WORKER 20875] INFO: About to tear down Einhorn state and run einhorn_main
[WORKER 20875] INFO: Set $0 = "./pool_worker.rb argument",  ARGV = ["argument"]
[MASTER 20873] INFO: ===> Launched 20878
From PID 20875: Doing some work
[WORKER 20878] INFO: About to tear down Einhorn state and run einhorn_main
[WORKER 20878] INFO: Set $0 = "./pool_worker.rb argument",  ARGV = ["argument"]
From PID 20878: Doing some work

As in Unicorn, this reduces memory usage and makes spawning additional workers very lightweight. Preloading is Einhorn's only language-dependent feature (and was easy to implement because Einhorn is itself written in Ruby). Adding preloading for other languages would require some architectural changes, but we might do it in the future.

Though Einhorn requires very little cooperation from your code, we still had to do some work to make our API servers compatible. In particular, we use Thin and EventMachine, both of which needed patching to support the use of an existing file descriptor. The relevant patches are on the master branch of our public forks of the respective projects.

These days, we use Einhorn to run all of our application servers. We also use it to run our non-web processes where we want to spawn and keep alive multiple instances. We run Einhorn under a process manager (we use daemontools, but any will work) — adding Einhorn into your existing infrastructure should just require adding an einhorn into the command-line arguments of your managed processes.

We've been using Einhorn in production for a number of months now. We hope you'll find it useful as well. If you want to run a web app but can't use Unicorn, or if you have a worker process that you want to start pooling, you should check Einhorn out and let us know what you think!

May 24, 2012