Announcing MoSQL

Nelson Elhage, February 5, 2013

Today, we are releasing MoSQL, a tool Stripe developed for live-replicating data from a MongoDB database into a PostgreSQL database. With MoSQL, you can run applications against a MongoDB database, but also maintain a live-updated mirror of your data in PostgreSQL, ready for querying with the full power of SQL.

Motivation

Here at Stripe, we use a number of different database technologies for both internal- and external-facing services. Over time, we've found ourselves with growing amounts of data in MongoDB that we would like to be able to analyze using SQL. MongoDB is great for a lot of reasons, but it's hard to beat SQL for easy ad-hoc data aggregation and analysis, especially since virtually every developer or analyst already knows it.

An obvious solution is to periodically dump your MongoDB database and re-import into PostgreSQL, perhaps using mongoexport. We experimented with this approach, but found ourselves frustrated with the ever-growing time it took to do a full refresh. Even if most of your analyses can tolerate a day or two of delay, occasionally you want to ask ad-hoc questions about "what happened last night?", and it's frustrating to have to wait on a huge dump/load refresh to do that. In response, we built MoSQL, enabling us to keep a real-time SQL mirror of our Mongo data.

MoSQL does an initial import of your MongoDB collections into a PostgreSQL database, and then continues running, applying any changes to the MongoDB server in near-real-time to the PostgreSQL mirror. The replication works by tailing the MongoDB oplog, in essentially the same way Mongo's own replication works.

Usage

MoSQL can be installed like any other gem:

$ gem install mosql

To use MoSQL, you'll need to create a collection map which maps your MongoDB objects to a SQL schema. We'll use the collection from the MongoDB tutorial as an example. A possible collection map for that collection would look like:

mydb:
  things:
    :columns:
      - _id: TEXT
      - x: INTEGER
      - j: INTEGER
    :meta:
     :table: things
     :extra_props: true

Save that file as collections.yaml, start a local mongod and postgres, and run:

$ mosql --collections collections.yaml

Now, run through the MongoDB tutorial, and then open a psql shell. You'll find all your Mongo data now available in SQL form:

postgres=# select * from things limit 5;
           _id            | x | j |   _extra_props   
--------------------------+---+---+------------------
 50f445b65c46a32ca8c84a5d |   |   | {"name":"mongo"}
 50f445df5c46a32ca8c84a5e | 3 |   | {}
 50f445e75c46a32ca8c84a5f | 4 | 1 | {}
 50f445e75c46a32ca8c84a60 | 4 | 2 | {}
 50f445e75c46a32ca8c84a61 | 4 | 3 | {}
(5 rows)

mosql will continue running, syncing any further changes you make into Postgres.

For more documentation and usage information, see the README.

mongoriver

MoSQL comes from a general philosophy of preferring real-time, continuously-updating solutions to periodic batch jobs.

MoSQL is built on top of mongoriver, a general library for MongoDB oplog tailing that we developed. Along with the MoSQL release, we have also released mongoriver as open source today. If you find yourself wanting to write your own MongoDB tailer, to monitor updates to your data in near-realtime, check it out.