Follow Stripe on Twitter

Scaling email transparency

Greg Brockman on December 8, 2014

In February 2013, we blogged about email transparency at Stripe. Since then a number of other companies have implemented their own versions of it (which a few have talked about publicly). We often get asked whether email transparency is still around, and if so, how we've scaled it.

Email transparency continues to be one important tool for state transfer at Stripe. The vast majority of Stripe email (excluding particularly sensitive classes of email or threads where a participant has a strong expectation of privacy) remains publicly available through the company.

Today we're publishing two key components that have allowed us to scale it this far: our list manager tool and updated internal documentation reflecting what we've learned over the past year and a half. Hopefully these will make it easier for others to run email transparency at their own organizations.


In the time since our first post, we've grown our mailing list count almost linearly with headcount: from 40 employees and 119 mailing lists in February 2013 to now 164 people and 428 lists. A plurality are project lists (sys@, sys-archive@, sys-bots@, sys-ask@), but there's also a long tail on topics ranging from country operations (australia@) to ideas for things Stripe should try (crazyideas@).

We use Google Groups for our email list infrastructure. Today we're releasing the web interface we've built on Google's APIs to make managing many list subscriptions (and associated filters) easy. This interface, called Gaps, lets you do things like:

  • Quickly subscribe to or unsubscribe from a list.
  • View your organization's lists (categorized by topic), and which you're subscribed to (including indirect subscriptions through other lists).
  • Get notifications when new lists are created.
  • Generate and upload GMail filters.

Here's a quick sample of what Gaps looks like:

Check it out and let us know what you think!

Updated internal documentation

Scaling email transparency has required active cultural effort and adaptation. As our team grew, we'd notice that formerly good patterns could turn sour. For example, at first email transparency would improve many conversations by letting people drop in with helpful tidbits. But with a larger team, having many people jumping into a conversation would instead grind the thread to a halt.

As we've identified cases where email transparency didn't scale well, we've made changes to our culture. Below is our updated internal documentation on how we approach email transparency. It embodies what we've learned about how to make email transparency work at an organization of our size:

Email transparency (from our internal wiki)

One of Stripe's core strategies is hiring great people and then making sure they have enough information to make good local decisions. Email transparency is one system that has helped make this possible. As with any rule at Stripe, you should consider the recommendations in this document to be strong defaults, which you should just override if they don't make sense in a particular circumstance.

How it works

Email transparency is fairly simple: make your emails transparent by CCing a list, and make it easy for others to be transparent by observing the responsibilities below.

The main mechanisms of email transparency are the specially-designated archive lists, to which you should CC all mail that would normally be off-list, but only because of its apparent irrelevance rather than out of any particular desire for secrecy. The goal isn't to share things that would otherwise be secret: it's to unlock the wealth of information that would otherwise be accidentally locked up in a few people's inboxes.

In general, if you are debating including an archive list, you should include it. This includes internal P2P email which you would normally leave off a list, emails to vendors, and scheduling email. Don't be afraid to send "boring" email to an archive list — people have specifically chosen to subscribe to that list. You should expect most people will autoarchive this list traffic (hence the name!), and then dip into it as they prefer.

If you're new to it, email transparency always feels a bit weird at first, but it doesn't take long to get used to it.

What's the point?

Email transparency is something few other organizations try to do. It's correspondingly on us to make sure we have really good indicators for how it's valuable. Here's a sample of things people have found useful about email transparency:

  • Provides the full history on interactions that are relevant to you. If you're pulled into something, you can always pull up the relevant state. This is especially useful for external communications with users or vendors.
  • Provides a way for serendipitous interactions to happen — someone who has more state on something may notice what's happening and jump in to help (subject to the limitations about jumping in).
  • Lets you keep up with things going on at various other parts of Stripe, at whatever granularity you want. This reduces siloing, makes it easier to function as a remote (and even just know what we're working on), and generally increases the feeling of connectedness.
  • Requires ~no additional effort from the sender.
  • Makes conversations persistent and linkable, which is particularly useful for new hires.
  • Forces us to think about how we're segmenting information — if you're tempted to send something off-list, you should think through why.
  • Makes spin-up easier by immersing yourself in examples of Stripe tone and culture, and enabling you to answer your own questions via the archives.
  • Helps you learn how different parts of the business work.

Reader responsibilities

Email transparency cuts two ways. Being able to see the raw feed of happenings at Stripe as they unfold is awesome, but it also implies an obligation to consume responsibly. Overall, threads on an archive list merit a level of civil inattention — you should feel free to read it, but be careful about adding your own contributions.

  • Talk to people rather than silently judging. If you see something on an email list that rubs you the wrong way or that you think doesn't make sense (e.g. "why are we working on that?", "that email seems overly harsh/un-Stripelike"), you should talk to that person directly (or their manager, if there's a reason you can't talk to them about it). Remember that we hire smart people, and if something seems off you're likely missing context or a view of the larger picture. No one wants their choice to send email on-list to result in a bunch of people making judgements without telling them, or chattering behind their back — if that can happen, then people will be less likely to CC a list in the future.
  • Avoid jumping in. A conversation on an archive list should be considered a private conversation between the participants. When people jump into the thread, it often grinds to a halt and nothing gets done. There will be some very rare occasions (e.g. if you have some factual knowledge the participants probably don't) where it's ok to lurk in to the thread, but in practice these should be very rare. By convention, the people on the thread may ignore your email; don't take it personally — it's just a way of making sure that email transparency doesn't accidentally make email communication harder. Knowing when to jump in is an art, and when in doubt, don't.
  • Don't penalize people for choosing to CC a list. Ideally, people are writing their emails exactly as they would if they were off-list. So be cognizant about creating additional overhead for people because they chose to CC the list. There may be typos or things that you're wondering about or don't make sense. If you're *concerned* about something being actively bad, then you should talk to the person, but if it's something small (e.g. "there's a typo", "this tone isn't Stripelike", "this conversation seems like a waste of time"), you should trust that there's either a reason, or the person's manager will be on the lookout to help them (especially if they're new).
  • Help others live by the above responsibilities. The only way we can preserve email transparency is by collectively nudging each other onto the right course. Whether it's poking someone to CC a list, or telling someone to stop venting about an email but just go talk to the author, the person responsible for fixing the shortfalls you see is the same as the one responsible for your availability.

Common scenarios/FAQs

  • I don't mind people being able to read this boring scheduling email, but I don't think it's worth anyone's time to read. You should still send it to an archive list! Archive lists are intended to be the feed of everything going on within a particular team — let the people who are subscribing decide if it's worth their time or not.
  • I have a small joke on this thread. Should I CC it to the list, or just send it to one person (or a small set of people)? Small jokes are good! The main cost is potentially derailing the relevant thread. So generally, if it's a productive, focused thread, just send your joke off-list, but if it's already fairly broad, then you should feel free to send the joke publicly.
  • I feel like I need to write my email for the broad audience that might be reading it, rather than the one person it's actually meant for. The only change between how you write emails for email transparency and how you would write them privately to other Stripes should be that one has a CC. That is, if you feel a need to rewrite your emails for the audience, then that likely indicates a bug in the organization we should fix. If you notice yourself having this tendency, talk to gdb — we should be able to shift the norms of the organization so this isn't a problem.
  • How do we make sure this respects outside people's expectations? In many ways, email transparency is just a more extreme version of what happens at other organizations — since it's opt-in, all of the emails are human-vetted to be shareable. Email transparency is mostly about changing the default thresholds. As a corollary, if someone requests that their email not be shared, then certainly respect their request.

Common exceptions

Like any tool, email transparency has its limitations. Since it's in many ways a one-way communication system, email transparency is bad for sensitive situations where people may react strongly. It's also important to preserve people's privacy. The following is a description of the classes of things which you may not see on an archive list.

  • Anything personnel related (e.g. performance).
  • Some recruiting conversations, especially during closing or when people are confidentially looking around. People's decision-making process at that stage is usually quite personal, and even if people have a hard time picking Stripe, we want to make sure that they start with a blank slate.
  • Communications of mixed personal and professional nature (e.g. recruiting a friend).
  • Early stage discussions about topics that will affect Stripes personally (e.g. changing our approach to compensation).
  • Some particularly sensitive partnerships.

As we said in the original email transparency post, it's hard to know how far it will scale. That doesn't bother us much: we continue to do unscalable things until they break down. The general sentiment at Stripe is that email transparency adds a lot of value, and it seems we'll keep being able to find tweaks to keep it going.

Hopefully these components will help you with email transparency in your own organization. If you end up implementing something similar, I'd love to hear about it!

December 8, 2014

PagerDuty analytics with Postgres

Mark McGranaghan on December 2, 2014

We’re open-sourcing the tool we use to collect and analyze on-call data from PagerDuty. We use pd2pg to improve the on-call experience for engineers at Stripe, and we think it’ll be useful for your teams too.

PagerDuty data in Postgres

PagerDuty is an important source of data about how services behave in production and the on-call load experienced by engineers. This data has been instrumental for managing and evolving our on-call rotations: over five months, we’ve reduced on-call load for our systems team by about 75%.

We import data from the PagerDuty API into a Postgres database using pd2pg, where we can use the full power of Postgres’ SQL queries.

Here’s how you import your data:

$ export PAGERDUTY_SUBDOMAIN="your-company"
$ export PAGERDUTY_API_KEY="..."
$ export DATABASE_URL="postgres://..."
$ bundle exec ruby pd2pg.rb

The script incrementally updates existing data, so it’s trivial to refresh your database periodically. (It also fetches historical data from your account, so you can get started with long-term analysis right away.)

Querying PagerDuty data with SQL

You can start analyzing and exploring your PagerDuty data once it’s in the database with psql:

> \d incidents
           Column            |           Type           | Modifiers 
 id                          | character varying        | not null
 incident_number             | integer                  | not null
 created_at                  | timestamp with time zone | not null
 html_url                    | character varying        | not null
 incident_key                | character varying        | 
 service_id                  | character varying        | 
 escalation_policy_id        | character varying        | 
 trigger_summary_subject     | character varying        | 
 trigger_summary_description | character varying        | 
 trigger_type                | character varying        | not null
> select count(*) from incidents;
(1 row)

As an example of a real query, here’s how you’d count the number of incidents per service over the past 28 days:

  incidents.created_at > now() - '28 days'::interval and
  incidents.service_id =
group by
order by
  count( desc

How we use pd2pg at Stripe

  • Weekly team report: Our sys team reviews a detailed on-call report each week. It covers all alerts sent by either a team-owned service or fielded by an engineer (which can include escalations from other team’s services). This detailed report helps us understand the types of incidents we’re seeing so we can prevent or respond to them better.
  • Per-service incident counts: Aggregates like per-service incident counts help give us a high-level overview. (They’re not actionable results in themselves, but do show us high-load services we should review further.)
  • Interrupted hours metric: A common way to measure on-call load is counting the number of incidents over a period a time. Sometimes, this over-represents issues that cause several related alerts to fire at the same time (which aren’t actually more costly than a single alert firing). To get a more accurate view of on-call load, we calculate an "interrupted hours" metric that counts the intervals in which an engineer receives one or more pages. This metric provides pretty good insight into real on-call load by suppressing noise from issues that result in multiple pages and more heavily weighting incidents with escalations.
  • On-hours vs. off-hours alerts: Pages during the work day are less costly than ones that wake an engineer up at 3am on a Sunday. So, we look at the metrics discussed above broken down by on-hours vs. off-hours incidents.
  • Escalation rate analysis: Frequent or repeated escalations may indicate that either that responders aren’t able to get to a computer, or they aren’t prepared to deal with the issue. Some escalations are expected, but keeping an eye on escalation rates across services helps us keep an eye out for organizational bugs.
  • Individual on-call load: Being primary on-call is a major responsibility, and high on-call load can cause burnout in engineers. To help understand on-call load at the individual level, we can perform user-specific variants of the above queries.

We’d love to hear how you use pd2pg. If you’ve got any feedback, please get in touch or send us a PR.

December 2, 2014

Open-sourcing tools for Hadoop

Colin Marc on November 21, 2014

Stripe’s batch data infrastructure is built largely on top of Apache Hadoop. We use these systems for everything from fraud modeling to business analytics, and we’re open-sourcing a few pieces today:


Timberlake is a dashboard that gives you insight into the Hadoop jobs running on your cluster. Jeff built it as a replacement for the web interfaces currently provided by YARN’s ResourceManager and MRv2’s JobHistory server, and it has some features we’ve found useful:

  • Map and reduce task waterfalls and timing plots
  • Scalding and Cascading awareness
  • Error tracebacks for failed jobs


Avi wrote a Scala framework for distributed learning of ensemble decision tree models called Brushfire. It’s inspired by Google’s PLANET, but built on Hadoop and Scalding. Designed to be highly generic, Brushfire can build and validate random forests and similar models from very large amounts of training data.


Sequins is a static database for serving data in Hadoop’s SequenceFile format. I wrote it to provide low-latency access to key/value aggregates generated by Hadoop. For example, we use it to give our API access to historical fraud modeling features, without adding an online dependency on HDFS.


At Stripe, we use Parquet extensively, especially in tandem with Cloudera Impala. Danielle, Jeff, and Avi wrote Herringbone (a collection of small command-line utilities) to make working with Parquet and Impala easier.

If you’re interested in trying out these projects, there’s more info on how to use them (and how they were designed) in the READMEs. If you’ve got feedback, please get in touch or send us a PR.

Happy Hadooping!

November 21, 2014

Avi Bryant on November 4, 2014

Earlier this year, after raising $1M in May, Lawrence Lessig’s Mayday PAC announced an ambitious goal to raise $5M by the 4th of July—a goal which they met mere hours before the deadline.

One of the remarkable things about this campaign is how transparent they’ve been through the whole process. In August, they released anonymized records of all contributions from the prior three months to “enable researchers to study the pattern and nature of the contributions” they received.

Stripe helps Mayday to accept credit card payments, and with Mayday’s blessing, we did some digging of our own into the data relating to their $5M campaign. While we couldn’t look at every contribution (only those made using credit cards), we were able to discover certain patterns that wouldn’t necessarily show up in the published data set. We’d like to share here some of the interesting things we discovered.

Meeting a deadline

It shouldn’t be too surprising that the volume of donations went up as the July 4th deadline approached. By our count, over half of the donations were made in the last 48 hours of the campaign. We also saw some subtler changes in the final days:

  • Overall, a healthy 17% of donations came in via mobile devices. But on the last day of the campaign, mobile use doubled: 32% of donors donated from their phones or tablets instead of waiting to get to their laptops.
  • Repeat donations were three times as common in the final week. Between June 25th and July 4th, 14% of donations were from email addresses that had contributed at least once already. Although it’s true that repeat donations are more likely the longer a campaign goes on, it’s notable that in the previous week, repeat donations only made up 4% of the total—and only 1% the week before that.
  • Deadlines can be incredibly effective in fundraising: Mayday’s supporters were motivated to donate both immediately and repeatedly.

    Checking out

    Mayday’s donation page uses Stripe Checkout to collect payment information. Checkout optionally allows customers to store their payment info with Stripe, making future purchases easier. Since this works across all sites that use Checkout, Stripe already remembered the payment info for a portion of the people visiting Mayday for the first time. We were very curious to see how this would perform. Here’s what we found:

    • The overall conversion rate, once a visitor got to Checkout on Mayday for the first time, was 78%.
    • For users already logged in to a Stripe account, the rate shot up to 90%.

    To put it another way, the chance that a visitor would abandon their donation at the Checkout step halved from 22% to 10%.

    It’s worth repeating that these users weren’t on the Mayday site when they stored their details, and there’s no reason to expect they were any more likely to donate than anyone else—they just happened to have already used Stripe to buy something online in the past.

    Even for repeat visitors to Mayday, who are more likely to donate than anyone else, having a Stripe account made a substantial difference. In general, visitors who had donated before had a healthy 87% conversion rate, but for those who were already logged in to Stripe, it was 94%.

    Coming back for more

    Looking at repeat donations prompted us to ask: do people donate more or less their second time? On average, the answer is roughly 50% more. While first donations had a mean of $88 and a median of $30, repeat donations had a mean of $114 and a median of $50.

    Average doesn’t mean typical, however. If you look at each repeat donor one by one, it turns out they’re split almost exactly into thirds: 33% donate less the second time (most commonly half), 35% donate more (most commonly double), and 32% donate exactly the same. The averages get pushed up because doubling (and the occasional tripling or even quadrupling) makes a bigger difference overall than halving does.

    Supporting your supporters

    Supporting repeat donors was critical to the campaign’s success. When donors return to your site (probably at the last minute), make it easy for them: don’t make them find their laptop, and don’t make them enter their credit card again. Encourage them to increase their donation, but don’t expect it. When you make it easy enough, they’ll almost certainly help you out—94% of the time, anyway.

November 4, 2014

Stripe Dublin Meetup

Christina Mairs on October 29, 2014

Come join us and our friends from Intercom for a meetup in Dublin on Thursday night. A handful of Stripes will be around, and we’d love to see you all at Intercom’s new offices for a chat and a pint.

Thursday, November 6th, starting at 6:30 PM
Intercom (2nd Floor, Stephen Court)

RSVP via our event page.

October 29, 2014

Game Day Exercises at Stripe:
Learning from `kill -9`

Marc Hedlund on October 28, 2014

We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node [0], and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.

If you’re not familiar with game days, the best introductory article is this one from John Allspaw [1]. Below, we’ll lay out a playbook for how to run a game day, and describe the results from our latest exercise to show why we believe they are valuable.

How to run a game day exercise

The system we recently tested, scoring-srv, is one part of our fraud detection system. The scoring-srv processes run on a cluster of boxes and connect to a three-node Redis cluster to store fraud scoring data. Our internal charge-processing code connects to scoring-srv for each charge made on Stripe’s network, so it needs to be very low-latency; likewise, accurate scoring requires historical data, so it needs durable storage.

The scoring-srv developers and a member of our systems team, who could help run the tests, got together around a whiteboard. We drew a basic block diagram of the machines and processes, the data stores, and the network connections between the components. With that diagram, we were able to come up with a list of possible failures.

We came up with a list of six tests we could run easily:

  • destroying and restoring a scoring-srv box,
  • destroying progressively more scoring-srv boxes until calls to it began timing out,
  • partitioning the network between our charge processing code and scoring-srv,
  • increasing the load on the primary Redis node,
  • killing the primary Redis node, and
  • killing one of the Redis replicas.

Since the team was new to game days, we did not try to be comprehensive or clever. We instead chose the simplest, easiest to simulate failures we could think of. We’d take a blunt instrument, like kill -9 or aws ec2 terminate-instances, give the system a good hard knock, and see how it reacted [2].

For each test, we came up with one or more hypotheses for what would happen when we ran it. For instance, we guessed that partitioning the network between charge processing and scoring-srv would cause these calls to time out and fail open (that is, allow the charge to go through immediately). Then, we decided on an order to perform the tests, saved a backup of a recent Redis snapshot as a precaution, and dove in.

Here, then, is a quick-start checklist for running a game day:

  1. Get the development team together with someone who can modify the network and destroy or provision servers, and block off an afternoon to run the exercise.
  2. Make a simple block diagram of the machines, processes, and network connections in the system you’re testing.
  3. Come up with 5-7 of the simplest failures you can easily induce in the system.
  4. Write down one or more hypotheses for what will happen after each failure.
  5. Back up any data you can’t lose.
  6. Induce each failure and observe the results, filing bugs for each surprise you encounter.

Observations and results

We were able to terminate a scoring-srv machine and restore it with a single command in roughly the estimated time. This gave us confidence that replacing or adding cluster machines would be fast and easy. We also saw that killing progressively more scoring-srv machines never caused timeouts, showing we currently have more capacity than necessary. Partitioning the network between the charge-processing code and scoring-srv caused a spike in latency, where we’d expected calls to scoring-srv to time out and fail open quickly. This test also should have immediately alerted the teams responsible for this system, but did not.

The first Redis test went pretty well. When we stopped one of the replicas with kill -9, it flapped several times on restart, which was surprising and confusing to observe. As expected, though, the replica successfully restored data from its snapshot and caught up with replication from the primary.

Then we moved to the Redis primary node test, and had a bigger surprise. While developing the system, we had become concerned about latency spikes during snapshotting of the primary node. Because scoring-srv is latency-sensitive, we had configured the primary node not to snapshot its data to disk. Instead, the two replicas each made frequent snapshots. In the case of failure of the primary, we expected one of the two replicas to be promoted to primary; when the failed process came back up, we expected it to restore its data via replication from the new primary. That didn’t happen. Instead, when we ran kill -9 on the primary node (and it was restarted by daemontools), it came back up – after, again, flapping for a short time – with no data, but was still acting as primary. From there, it restarted replication and sent its empty dataset to the two replica nodes, which lost their datasets as a result. In a few seconds, we’d gone from a three-node replicated data store to an empty data set. Fortunately, we had saved a backup and were able to get the cluster re-populated quickly.

The full set of tests took about 3.5 hours to run. For each failure or surprise, we filed a bug describing the expected and actual results. We wound up with 15 total issues from the five tests we performed (we wound up skipping the Redis primary load test) – a good payoff for the afternoon’s work. Closing these, and re-running the game day to verify that we now know what to expect in these cases, will greatly increase our confidence in the system and its behavior.

Learning from the game day

The invalidation of our Redis hypothesis left us questioning our approach to data storage for scoring-srv. Our original Redis setup had all three nodes performing snapshots (that is, periodically saving data to disk). We had tested failover from the primary node due to a clean shutdown and it had succeeded. While analyzing the cluster once we had live data running through it, though, we observed that the low latency we’d wanted from it would hit significant spikes, above 1 second, during snapshotting:

Obviously these spikes were concerning for a latency-sensitive application. We decided to disable snapshotting on the primary node, leaving it enabled on the replica nodes, and you can see the satisfying results below, with snapshotting enabled, then disabled, then enabled again:

Since we believed that failover would not be compromised in this configuration, this seemed like a good trade-off: relying on the primary node for performance and replication, and the replica nodes for snapshotting, failover, and recovery. As it turned out, this change was made the day before the game day, as part of the final lead-up to production readiness. (One could imagine making a similar change in the run-up to a launch!)

The game day wound up being the first full test of the configuration including all optimizations and changes made during development. We had tested the system with a primary node shutdown, then with snapshotting turned off on the primary, but this was the first time we’d seen these conditions operating together. The value of testing on production systems, where you can observe failures under the conditions you intend to ship, should be clear from this result.

After discussing the results we observed with some friends, a long and heated discussion about the failure took place on Twitter, in which Redis’ author said he had not expected the configuration we were using. Since there is no guarantee the software you’re using supports or expects the way you’re using it, the only way to see for certain how it will react to a failure is to try it.

While Redis is functional for scoring-srv with snapshotting turned on, the needs of our application are likely better served by other solutions. The trade-off between high-latency spikes, with primary node snapshotting enabled, versus total cluster data loss, with it disabled, leaves us feeling neither option is workable. For other configurations at Stripe – especially single-node topologies for which data loss is less costly, such as rate-limiting counters – Redis remains a good fit for our needs.


In the wake of the game day, we’ve run a simple experiment with PostgreSQL RDS as a possible replacement for the Redis cluster in scoring-srv. The results suggest that we could expect comparable latency without suffering snapshotting spikes. Our testing, using a similar dataset, had a 99th percentile read latency of 3.2 milliseconds, and a 99th percentile write latency of 11.3 milliseconds. We’re encouraged by these results and will be continuing our experiments with PostgreSQL for this application (and obviously, we will run similar game day tests for all systems we consider).

Any software will fail in unexpected ways unless you first watch it fail for yourself. We completely agree with Kelly Sommers’ point in the Twitter thread about this:

We’d highly recommend game day exercises to any team deploying a complex web application. Whether your hypotheses are proven out or invalidated, either way you’ll leave the exercise with greater confidence in your ability to respond to failures, and less need for on-the-fly diagnosis. Having that happen for the first time while you’re rested, ready, and watching is the best failure you can hope for.


[0] We’ve chosen to use the terms “primary” and “replica” in discussing Redis, rather than the terms “master” and “slave” used in the Redis documentation, to support inclusivity. For some interesting and heated discussion of this substitution, we’d recommend this Django pull request and this Drupal change.

[1] Some other good background articles for further reading: “Weathering the Unexpected”; “Resilience Engineering: Learning to Embrace Failure”; “Training Organizational Resilience in Escalating Situations”; “When the Nerds Go Marching In.”

[2] If you’d like to run more involved tests and you’re on AWS, this Netflix Tech Blog post from last week describes the tools they use for similar testing approaches.


Thanks much to John Allspaw, Jeff Hodges, Kyle Kingsbury, and Raffi Krikorian for reading drafts of this post, and to Kelly Sommers for permission to quote her tweet. Any errors are ours alone.

October 28, 2014

Apple Pay

Ray Morgan on October 20, 2014

Starting today, any Stripe user can begin accepting Apple Pay in their iOS apps. Apple Pay lets your customers frictionlessly pay with one touch using a stored credit card. We think Apple Pay will make starting a mobile business easier than ever.

Apple Pay doesn’t replace In-App Purchases. You should use Apple Pay when charging for physical goods (such as groceries, clothing, and appliances) or for services (such as club memberships, hotel reservations, and tickets for events). You should continue to use In-App Purchases to charge for virtual goods such as premium content in your app.

When your customer is ready to pay, they’ll authorize a payment using Touch ID. Then, Stripe generates a card token, which you can use to create charges as you normally would through the Stripe API. It just takes a few lines of code to set up and display the Apple Pay UI:

- (void)paymentAuthorizationViewController:(PKPaymentAuthorizationViewController *)controller
                       didAuthorizePayment:(PKPayment *)payment
                                completion:(void (^)(PKPaymentAuthorizationStatus))completion {

    [Stripe createTokenWithPayment:payment
                        completion:^(STPToken *token, NSError *error) {
        // charge your Stripe token as normal

The following Stripe-powered apps already have Apple Pay enabled. You can try it out as soon as their updates hit the App Store. We owe them special thanks for all their feedback and bugsquashing over the past few weeks.

If you’ve got any questions, or need help getting started, please get in touch.

Get started with Apple Pay View documentation

October 20, 2014

Open-Source Retreat meetup

Greg Brockman on October 16, 2014

A few months ago, we announced our Open-Source Retreat. Though we’d originally expected to sponsor two grantees, we ended up giving out three full grants (and then an additional shorter grant).

Here’s what happened with those grants:

If you’d like more details, we’ll be hosting a meetup at Stripe on Tuesday, October 21st. The grantees will talk about their projects and where they plan to go next. RSVP on our event page if you’d like to attend in person, or view our livestream.

If you have any questions about the retreat, the projects, or anything else, please get in touch!

October 16, 2014


Steve Woodrow on October 15, 2014

As you’ve likely seen, a design flaw in SSL 3.0 was announced to the internet yesterday, nicknamed POODLE. Unfortunately, it’s not just an implementation flaw—the only way to disable the attack is to turn off the affected ciphers altogether. Fortunately, the only common browser which still relies on SSL 3.0 is Internet Explorer 6 on Windows XP, which is a small fraction of internet traffic.

We’ve deployed changes to ensure Stripe traffic remains secure.

Our response

We’ve taken an approach similar to Google’s: We’ve disabled the now easily-exploited CBC-mode SSL 3.0 ciphers. We’ve also deployed OpenSSL with support for TLS_FALLBACK_SCSV, which prevents newer browsers from being tricked into using SSL 3.0 at all. This means that IE6 customers will (for now) continue to be able to purchase from Stripe users, and there will be no immediate user-facing impact.

Ending support for SSL 3.0

While there do exist some mitigations, there is no configuration under which SSL 3.0 is totally secure. As well, with so many websites responding to POODLE by dropping SSL 3.0 support entirely, we expect that IE6 on XP will soon stop working on most of the web.

Our plan going forward:

  • Starting today, new Stripe users will not be able to send API requests or receive webhooks using SSL 3.0.
  • On November 15, 2014, we will drop SSL 3.0 support entirely (including for Stripe.js and Checkout).

In the meantime, we’ll notify any of our users who we expect to be affected by this change. If you have any questions, please don’t hesitate to get in touch.

October 15, 2014


Karl-Aksel Puulmann on September 26, 2014

We’re open-sourcing Pagerbot, a tool we developed to make it easy to interact with PagerDuty through your internal chat system. (At the very least, we hope it'll help other companies respond to incidents like Shellshock or the ongoing AWS reboot cycle.)


Like many tech companies, Stripe uses PagerDuty to help coordinate on-call schedules and incident response. The service is super reliable, does a great job of handling our normal rotations, and we appreciate being able to individually set preferences for how we want to get notified.

Fairly frequently, though, people will trade on-call shifts, whether because of travel, vacation, or even just making sure someone is keeping an eye on things while they’re out watching a movie. The communication about the trades mainly happens in one of our Slack channels.

Inspired by GitHub’s idea of chat-driven ops, we wanted PagerDuty schedule changes to happen in the same place as the rest of our communication.

We’ve tried to make Pagerbot easily handle our previous scheduling woes. For instance, with Stripes scattered all around the world, juggling timezones is very confusing, but if you don’t specify a timezone in your queries, Pagerbot automatically uses the timezone you configured in your PagerDuty profile.

Over time, we’ve added more commands to Pagerbot. For instance, based on Heroku’s incident response blog post, we added support for explicitly paging an individual:

Deploying to Heroku

Pagerbot supports both Slack and IRC and although you can always run Pagerbot on your own infrastructure, we’ve also made Pagerbot compatible with the new Heroku Button:


(Note: Heroku requires you to provide a credit card to enable the Heroku MongoDB addon, though you won’t actually be charged anything).

Once you’ve deployed Pagerbot to Heroku, there’s a built-in admin panel you can use to get things set up. You’ll need to tell Pagerbot about your PagerDuty subdomain, your chat credentials, and any aliases you want for either people or schedules.

We’ve also tried to make it easy to add new commands to Pagerbot by building a simple plugin architecture. Feel free to fork Pagerbot and add your own plugins.

We’ve been using Pagerbot as our main interface to PagerDuty for over two years now. If you use PagerDuty and either Slack or IRC, we hope you’ll find it useful — check it out, and let us know what you think!

September 26, 2014