Access data in your data warehouse with Data Pipeline
Data Pipeline is a no-code product that sends all your Stripe data and reports to Snowflake or Amazon Redshift. This allows you to centralize your Stripe data with other business data to close your books faster and get more detailed business insights.
With Data Pipeline, you can:
- Automatically export your complete Stripe data in a fast and reliable manner.
- Stop relying on third-party extract, transform, and load (ETL) pipelines or home-built API integrations.
- Combine data from all your Stripe accounts into one data warehouse.
- Integrate Stripe data with your other business data for more complete business insights.
Data Pipeline currently supports Snowflake (deployed on AWS) and Amazon Redshift data regions. For additional information on supported instances, view the table below.
|AWS Region||Snowflake||Amazon Redshift RA3 (with encryption)||Amazon Redshift DS2/DC2|
|us-east-1 (N. Virginia)|
|us-west-1 (N. California)||Coming soon||Coming soon|
|ca-central-1 (Central Canada)||Coming soon||Coming soon||Coming soon|
If you’re using another data warehouse besides Snowflake or Amazon Redshift, let us know at firstname.lastname@example.org.
You can access your non-US Stripe data in Snowflake or Amazon Redshift as long as you export the data to a warehouse region Stripe supports. Data Pipeline doesn’t support any non-AWS instances yet, such as Google Cloud Storage (GCS), or Microsoft Azure.
Because of data localization requirements, Stripe doesn’t offer Data Pipeline services to customers, merchants, or users in India.
When you subscribe to Data Pipeline, Stripe sends a data share to your Snowflake or Amazon Redshift account. After you accept the data share, you can access your core Stripe data in Snowflake or Amazon Redshift within 12 hours. After the initial load, your Stripe data refreshes regularly.
Query Stripe data in your data warehouse
In Snowflake and Amazon Redshift, your data is available as secure views. To query your data, follow the steps below.
Your warehouse data is split into two database schemas, based on the API mode used to create the data.
|Data populated from live mode|
|Data populated from test mode|
Example use case
In some cases, you might want to combine information from your proprietary data with Stripe data. The following schema shows an
orders table that lists data about an order for a company:
|1||bt_xcVXgHcBfi83m94||John Smith||5||1 book|
The table above doesn’t contain data regarding transaction fees or payouts because that data is contained solely within Stripe. In Stripe, the
balance_transactions table contains the following information, but lacks proprietary data regarding customer names and items purchased:
To access your proprietary data alongside your Stripe data, combine the
orders table with Stripe’s
select orders.date, orders.order_no, orders.stripe_txn_no, bts.amount, bts.fee, bts.automatic_transfer_id from mycompany.orders join stripe.balance_transactions bts on orders.stripe_txn_no = bts.id;
After it completes, the following information is available:
Financial reports in Data Pipeline
To speed up your financial close, you can access Stripe’s financial reports directly in your data warehouse.
At this time, financial reports aren’t available for Amazon Redshift.
Financial report templates have a
FINANCIAL_REPORT prefix and are available as views in your data warehouse.
Generating financial reports in Snowflake
Generating financial reports from Data Pipeline requires setting a few custom variables. These are the same variables you set when generating the report through the dashboard or API:
START_DATE(varchar)—The starting date of the report (inclusive).
END_DATE(varchar)—The ending date of the report (exclusive).
TIMEZONE(varchar)—The time zone of non-UTC datetime columns.
To set these variables and run the report query:
- Create a new worksheet.
- Set the database schema and required variables to your desired values.
-- set schema based on the name you gave your Stripe database use schema db_name.stripe; -- set financial report template variables set (TIMEZONE, START_DATE, END_DATE) = ('UTC', '2021-09-01', '2021-10-01');
Run these lines of code separately before attempting to query tables that require them. Otherwise, you might receive an error that a session variable doesn’t exist.
If you’re using the Snowflake Connector for Python, set the session parameter
TIMEZONE. You can do this using the command
ALTER SESSION SET TIMEZONE = 'UTC'.
- After running the code that sets the necessary variables, query the view of the report you want to generate. For example, running:
select * from FINANCIAL_REPORT_BALANCE_CHANGE_FROM_ACTIVITY_ITEMIZED;
Yields the same results as you would find for the itemized balance change from activity report on the Stripe Dashboard or through the API:
Unsubscribing from Data Pipeline
If you currently have an active Data Pipeline subscription and want to cancel it, you can unsubscribe from Data Pipeline in the settings page of the Stripe Dashboard by clicking Unsubscribe. After you unsubscribe, you lose access to your data share immediately. To maintain your data tables, copy them to your local data warehouse instance before unsubscribing.