CTF3 architecture

Greg Brockman, February 4, 2014

In philosophy, CTF3 was the same as our previous CTFs: we gave people a chance to solve problems they normally would only get to read about. However, in terms of infrastructure, this was by far our most complex CTF: we needed to build, run, and test arbitrary distributed systems code. In the course of the week it was live, our 7,500 participants pushed over 640,000 times, meaning we needed a scalable and robust architecture that provided isolation between users.

Participants have released a number of walkthroughs for the actual levels, so we won't be releasing official solutions here. Instead, we'll give you a tour of how we made the systems work. (If you'd prefer to see this in video form, we've just released the video from our CTF3 wrapup.)

As an aside, the architecture for CTF reflects a lot of what we've learned in building Stripe. If you're interested in this kind of thing, we're hiring engineers in San Francisco and remotely within US timezones. I also wrote a Quora post about the problems we're working on. (It turns out we do things besides just building CTFs :).)

Overview

CTF3 consisted of five levels. Most of the levels looked pretty similar from a high level: the user would push some code to us, we'd run it in a sandbox environment, and then we'd return a score. The one exception was the Gitcoin level, where we would just validate Git commits people had mined locally (or on their cloud vendor).

Code was submitted to us in the simplest possible way: you just ran git push. On the backend, we received your code via git-shell and used wrappers and commit hooks to implement the CTF-specific logic.

The "wrappers and commit hooks" had lot of moving parts, though. One important design goal was to decouple components and make it possible to horizontally scale any given piece of the system. Stateful pieces were few in number and were constrained to be low volume. In the following sections, we'll go into detail about how all the pieces worked, but here's how things roughly fit together:

Submission pipeline

Wondering what actually happened after you ran git push? The following steps were common between all levels.

  1. You resolved stripe-ctf.com to the public IP for one of our gate frontend servers.

  2. You connected to port 22 on your chosen gate server. An haproxy daemon load-balanced your traffic to one of our submitter boxes. We had three submitter boxes in the pool for much of the event.

    As an optimization, the load balancing used IP stickiness to route you to the same submitter backend on each connection. The submitters were mostly stateless: all that they held was the code you were pushing and convenience tags for each submission. If you'd committed a large blob though, being routed to the same submitter was nice since you wouldn't have to re-upload it on each push.

    In previous CTFs, rather than load balancing, we'd just exposed our machine hostnames (so you'd connect to directly to e.g. level0-01.stripe-ctf.com). In that case, it was hard to drop a machine out of the pool or rebalance traffic. Controlling the load balancing here gave us operational flexibility at the cost of additional constraints on our system design (e.g. haproxy knew only your IP address, so we couldn't do stickiness based on username).

  3. The public-facing sshd on your chosen submitter received the username we'd given you in the web interface, which looked like lvl0-ohngii5M.

    We'd configured our PAM stack to use LDAP. So we could share the user database with the web interface, we put together a quick-and-dirty LDAP server implementation (called fakeuser) to grab usernames directly out of our central database. The users had empty passwords, which (given appropriate settings in sshd.conf and PAM) meant that you could log in without pasting a password or giving us your SSH key. Of course, the downside was that your username became a secret credential.

  4. At this point the sshd ran your user's shell, which was a custom script in /usr/local/bin/login-shell. The shell was pretty simple: it set some environment variables, took out an flock on a per-user file, and then (conceptually) ran a bunch of Ruby code that did all of the level-specific work.

    At first, we'd actually spawn a new Ruby interpreter and load our code on each login. This turned out to be untenable. First of all, loading Bundler plus all our code took a few seconds, which was way too slow for a login session. So we split out the code intended for just the login session into a module we called CTF3NoBundler. This was painful to manage, and meant the no-bundler code couldn't use most of the libraries we were writing over in Bundler-land.

    Even with this split, it still took about 100-200 milliseconds to load our code, which was effectively all CPU time. When we tested continuously running about 20 concurrent logins, the submitter box ground to a halt under the load. We effectively DOSed ourselves through the work of loading the same code over and over again.

    At this point, perhaps the most obvious thing to do would be to rewrite in a faster-loading language. However, there's actually a decent amount of code involved in submission, and there was nothing wrong with the code once it was up and running. So instead, we decided to try a load-once, fork-for-each-login model. We took a look at using Zeus for this purpose. It's a cool tool, but unfortunately it's aimed at development rather than production, and doesn't have the kind of robust failure handling we'd need for something as core as this. So instead, we wrote a simpler implementation based off similar ideas, called Poseidon.

Standard pipeline

Here's the point at which Gitcoin and the standard pipelines diverged. The remainder of the standard submission pipeline looked like the following:

  1. Next, we constructed your user's level repository (that is, the actual repository that you would clone) if it didn't already exist on disk. This lazy assembly meant we didn't have to waste disk space on users until they'd actually fetched some code.

  2. In the case of a pull, we would just run git-shell and be done with it. Pushes had a lot more going on, however.

  3. In order to make submission as easy to test drive as possible, we wanted it to be possible to git push straight from a fresh clone. So before running git-shell, we played some branch renaming tricks.

  4. We then invoked git-shell, which in turn invoked a post-receive hook. The hook was also implemented as a Poseidon client for fast boot.

  5. The post-receive code in the Poseidon master then served as the coordinator of your scoring run. First, it called to a test_case_assigner service, which ran on the singleton colossus server. For this and other services which required synchronous responses, we used the Ruby Thrift abstractions we use internally at Stripe.

    The test_case_assigner simply grabbed some free test case records from the database, marked them as allocated, and then returned the resulting cases. These test cases were originally created by the test_case_generator daemon (running on the testasaurus boxes — ok, we ran out of good names at some point). The generator simply ran our benchmark solution against random test cases. We stored metadata in our database, with the actual blob data stored on S3 so your client could later download it.

  6. Once the post-recieve hook had its test cases, it started listening on two new RabbitMQ queues: one for results and one for output to display to the user. The hook then submitted a build RPC over RabbitMQ. We used RabbitMQ as a buffer for RPCs that we expected might get backed up, or where a synchronous response wasn't needed.

  7. At the other end of the queue was a builder daemon, running on one of our aptly-named build boxes. Upon receiving the RPC, the daemon fetched the code from the relevant submitter's git-daemon into a temporary directory.

    The builder then asked a central build_cacher Thrift service if the built commit was cached. Assuming not, the builder spawned a Docker container with your code mounted at your user's home directory and ran ./build.sh in the container. We then streamed back the first few hundred KB of output.

    The builder then tarred up your output directory and generated a RabbitMQ score RPC for each test case. The score RPC contained a URL to fetch the tarball from an nginx running on the build box. Finally, the builder uploaded the built tarball to S3 and informed the build_cacher about the new SHA1.

    In the cached case, the builder just short-circuited this logic and sent the score RPCs right away.

  8. Each score RPC was serviced by an executor daemon on a worker box. The executor fetched the build product and then spawned a new container with the code mounted into it. It then (finally!) ran your code, again streaming output back to you. Once complete, the executor determined the results of your trial and then sent a result RPC back to the post-receive hook.

  9. The post-receive hook aggregated your results and from there compiled a final result. It sent a single FinalScoreRPC representing the results of the test run to RabbitMQ.

  10. At the other end of the wire, a resulter daemon hung around on the colossus box waiting to consume the FinalScoreRPC. Upon consuming the RPC, it updated your user's high score.

  11. Gitcoin pipeline

    Gitcoin had its own architecture. Since we didn't need to run any of your code (we just needed to validate the purported Gitcoin), we could get by with a lot less complexity.

    Our mining bots

    To clear the level, you just needed to mine faster than our bots. The obvious design is to spawn a new miner for each end-user. However, this would be pretty expensive, as we'd have to be mining hundreds of Gitcoins at any one time.

    So instead, we had miners on a single central repository on the gitcoin box, which produced a steady stream of Gitcoins. Each submitter had a gitcoin daemon whose job was to periodically fetch from the central repository and then release at most one new commit to a machine-local Gitcoin instance.

    We'd started out with a coin release frequency of 25 + rand(20) seconds, but after seeing how many people were struggling to mine that quickly, we dropped the frequency to a flat 90 seconds.

    When you pushed, we had a git update hook which would perform a bunch of sanity checks to ensure it was a valid Gitcoin. Once your commit was accepted, the bots had to stop because our pre-mined Gitcoins wouldn't apply cleanly to your repository.

    Gitcoin bonus round

    In this round, we pitted everyone together in a master Gitcoin instance. Conveniently for us, we didn't have to run our own miners, since people provided plenty of competition against each other.

    The architecture here was a single shared (created using git init --bare --shared=all) global Gitcoin repository on the gitcoin box. The submitters maintained their own clone of this repository.

    On pull, you just hit the submitter repository. On push, the commit was validated by the submitter, which then pushed (via a new SSH connection) to the backend gitcoin box. If the backend push was successful, a Thrift service on the gitcoin box would synchronously push the new commit to all other submitters.

    One consequence of this architecture was that submitting Gitcoins was decently slow — we weren't maintaining persistent connections to the backend gitcoin server, so there was a decent amount of overhead. We compensated for this by tuning the difficulty to ensure the time to mine a coin was large compared to the time to complete a push. By the contest's end, the difficulty was 0000000005, a full 4 (!) orders of magnitude harder than the difficulty we'd started with.


    I hope you had as much fun playing CTF3 as we had building it. If you're curious about any details I didn't cover here, feel free to send me an email.