In
philosophy, CTF3
was the same as our previous CTFs: we gave people a chance to
solve problems they normally would only get to read
about. However, in terms of infrastructure, this was by far our
most complex CTF: we needed to build, run, and test arbitrary
distributed systems code. In the course of the week it was live,
our 7,500 participants pushed over 640,000 times, meaning we
needed a scalable and robust architecture that provided isolation
between users.
Participants have released a
number
of
walkthroughs
for
the
actual levels,
so we won't be releasing official solutions here. Instead, we'll
give you a tour of how we made the systems work. (If you'd prefer
to see this in video form, we've just released the video from our
CTF3 wrapup.)
As an aside, the architecture for CTF reflects a lot of what
we've learned in building Stripe. If you're interested in this
kind of thing, we're hiring engineers
in San Francisco and remotely within US timezones. I also wrote
a Quora
post about the problems we're working on. (It turns out we do
things besides just building CTFs :).)
Overview
CTF3 consisted
of five
levels. Most of the levels looked pretty similar from a high
level: the user would push some code to us, we'd run it in a sandbox
environment, and then we'd return a score. The one exception was
the Gitcoin
level, where we would just validate Git commits people had mined
locally (or on their cloud vendor).
Code was submitted to us in the simplest possible way: you just
ran git push. On the backend, we received your code
via git-shell
and used wrappers and commit hooks to implement the CTF-specific
logic.
The "wrappers and commit hooks" had lot of moving parts,
though. One important design goal was to decouple components and
make it possible to horizontally scale any given piece of the
system. Stateful pieces were few in number and were constrained to
be low volume. In the following sections, we'll go into detail about
how all the pieces worked, but here's how things roughly fit
together:
Submission pipeline
Wondering what actually happened after you ran git push? The
following steps were common between all levels.
-
You resolved stripe-ctf.com to the public IP for
one of our gate frontend servers.
-
You connected to port 22 on your chosen gate server. An
haproxy daemon
load-balanced your traffic to one of our submitter
boxes. We had three submitter boxes in the pool for much of the
event.
As an optimization, the load balancing used IP stickiness to route
you to the same submitter backend on each connection. The
submitters were mostly stateless: all that they held was the code
you were pushing and
convenience tags
for each submission. If you'd committed a large blob though, being
routed to the same submitter was nice since you wouldn't have to
re-upload it on each push.
In previous CTFs, rather than load balancing, we'd just exposed
our machine hostnames (so you'd connect to directly to
e.g. level0-01.stripe-ctf.com). In that case, it was
hard to drop a machine out of the pool or rebalance
traffic. Controlling the load balancing here gave us operational
flexibility at the cost of additional constraints on our system
design (e.g. haproxy knew only your IP address, so we couldn't do
stickiness based on username).
-
The public-facing sshd on your chosen submitter received the
username we'd given you in the web interface, which looked
like lvl0-ohngii5M.
We'd configured our PAM stack to
use LDAP. So
we could share the user database with the web interface, we put
together a quick-and-dirty LDAP server implementation
(called fakeuser)
to grab usernames directly out of our central database. The users
had empty passwords, which (given appropriate settings
in sshd.conf
and PAM) meant that you could log in without pasting a password or
giving us your SSH key. Of course, the downside was that your
username became a secret credential.
-
At this point the sshd ran your user's shell, which was a custom
script in /usr/local/bin/login-shell. The shell was
pretty simple: it set some environment variables, took out
an flock
on a per-user file, and then (conceptually) ran a bunch of Ruby code
that did all of the level-specific work.
At first, we'd actually spawn a new Ruby interpreter and load
our code on each login. This turned out to be untenable. First of
all, loading Bundler plus all our
code took a few seconds, which was way too slow for a login
session. So we split out the code intended for just the login
session into a module we called CTF3NoBundler. This was
painful to manage, and meant the no-bundler code couldn't use most
of the libraries we were writing over in Bundler-land.
Even with this split, it still took about 100-200 milliseconds
to load our code, which was effectively all CPU time. When we tested
continuously running about 20 concurrent logins, the submitter box
ground to a halt under the load. We effectively DOSed ourselves
through the work of loading the same code over and over again.
At this point, perhaps the most obvious thing to do would be to
rewrite in a faster-loading language. However, there's actually a
decent amount of code involved in submission, and there was nothing
wrong with the code once it was up and running. So instead, we
decided to try a load-once, fork-for-each-login model. We took a
look at using Zeus for
this purpose. It's a cool tool, but unfortunately it's aimed at
development rather than production, and doesn't have the kind of
robust failure handling we'd need for something as core as this. So
instead, we wrote a simpler implementation based off similar ideas,
called Poseidon.
Standard pipeline
Here's the point at which Gitcoin and the standard pipelines
diverged. The remainder of the standard submission pipeline looked
like the following:
-
Next, we constructed your user's level repository (that is, the
actual repository that you would clone) if it didn't already exist
on disk. This lazy assembly meant we didn't have to waste disk space
on users until they'd actually fetched some code.
-
In the case of a pull, we would just run git-shell and be
done with it. Pushes had a lot more going on, however.
-
In order to make submission as easy to test drive as
possible, we wanted it to be possible to git push
straight from a fresh clone. So before running git-shell, we
played
some branch
renaming tricks.
-
We then invoked git-shell, which in turn invoked
a post-receive
hook. The hook was also implemented as a Poseidon client for fast
boot.
-
The post-receive code in the Poseidon master then served as
the coordinator of your scoring run. First, it called to a
test_case_assigner service, which ran on the
singleton colossus server. For this and other
services which required synchronous responses, we used the Ruby
Thrift abstractions we use internally at Stripe.
The test_case_assigner simply grabbed some free test case
records from the database, marked them as allocated, and then
returned the resulting cases. These test cases were originally
created by the test_case_generator daemon (running on
the testasaurus boxes — ok, we ran out of good
names at some point). The generator simply ran our benchmark
solution against random test cases. We stored metadata in our
database, with the actual blob data stored on S3 so your client
could later download it.
-
Once the post-receive hook had its test cases, it started
listening on two new RabbitMQ
queues: one for results and one for output to display to the
user. The hook then submitted
a build
RPC over RabbitMQ. We used RabbitMQ as a buffer for RPCs that we
expected might get backed up, or where a synchronous response wasn't
needed.
-
At the other end of the queue was a builder daemon,
running on one of our aptly-named build boxes. Upon
receiving the RPC, the daemon fetched the code from the relevant
submitter's git-daemon
into a temporary directory.
The builder then asked a central build_cacher
Thrift service if the built commit was cached. Assuming not, the
builder spawned a Docker
container with your code mounted at your user's home directory and
ran ./build.sh in the container. We then streamed back
the
first few
hundred KB of output.
The builder
then tarred
up your output directory and generated a RabbitMQ score RPC for
each test case. The score RPC contained a URL to fetch the tarball
from an nginx running
on the build box. Finally, the builder uploaded the built tarball
to S3 and informed the build_cacher about the new SHA1.
In the cached case, the builder just short-circuited this logic
and sent the score RPCs right away.
-
Each score RPC was serviced by an executor daemon
on a worker box. The executor fetched the build product
and then spawned a new container with the code mounted into it. It
then (finally!) ran your code, again streaming output back to
you. Once complete, the executor determined the results of your
trial and then sent a result RPC back to the post-receive hook.
-
The post-receive hook aggregated your results and from there
compiled a final result. It sent a single FinalScoreRPC
representing the results of the test run to RabbitMQ.
-
At the other end of the wire, a resulter daemon
hung around on the colossus box waiting to consume the
FinalScoreRPC. Upon consuming the RPC, it updated your user's high
score.
Gitcoin pipeline
Gitcoin had its own architecture. Since we didn't need to run any
of your code (we just needed to validate the purported Gitcoin), we
could get by with a lot less complexity.
Our mining bots
To clear the level, you just needed to mine faster than our
bots. The obvious design is to spawn a new miner for each
end-user. However, this would be pretty expensive, as we'd have to
be mining hundreds of Gitcoins at any one time.
So instead, we had miners on a single central repository on
the gitcoin box, which produced a steady stream of
Gitcoins. Each submitter had a gitcoin daemon whose job
was to periodically fetch from the central repository and then release
at most one new commit to a machine-local Gitcoin instance.
We'd started out with a coin release frequency of 25 +
rand(20) seconds, but after seeing how many people were
struggling to mine that quickly, we dropped the frequency to a
flat 90 seconds.
When you pushed, we had a
git update
hook which would perform a bunch
of sanity
checks to ensure it was a valid Gitcoin. Once your commit was
accepted, the bots had to stop because our pre-mined Gitcoins
wouldn't apply cleanly to your repository.
Gitcoin bonus round
In this round, we pitted everyone together in a master Gitcoin
instance. Conveniently for us, we didn't have to run our own miners,
since people provided plenty of competition against each other.
The architecture here was a single shared (created
using git
init --bare --shared=all) global Gitcoin repository on
the gitcoin box. The submitters maintained their own clone of this
repository.
On pull, you just hit the submitter repository. On push, the
commit was validated by the submitter, which then pushed (via a new
SSH connection) to the backend gitcoin box. If the backend push was
successful, a Thrift service on the gitcoin box would synchronously
push the new commit to all other submitters.
One consequence of this architecture was that submitting Gitcoins
was decently slow — we weren't maintaining persistent
connections to the backend gitcoin server, so there was a decent
amount of overhead. We compensated for this by tuning the difficulty
to ensure the time to mine a coin was large compared to the time to
complete a push. By the contest's end, the difficulty
was 0000000005, a full 4 (!) orders of magnitude harder
than the difficulty we'd started with.
I hope you had as much fun playing CTF3 as we had building it. If
you're curious about any details I didn't cover here, feel free
to send me an email.