ETL vs. ELT pipelines: When each data processing model works best

Data Pipeline

Stripe Data Pipeline sends all your up-to-date Stripe data and reports to Snowflake or Amazon Redshift in a few clicks.

Learn more 
  1. Introduction
  2. ETL vs. ELT pipelines: What’s the difference?
  3. How does ETL work?
  4. How does ELT work?
  5. How do you choose between ETL and ELT?
    1. Data volume and velocity
    2. Transformation ownership
    3. Raw data access
    4. Governance and compliance
    5. Team workflow
  6. How do data connectors fit into ETL and ELT pipelines?

If you’re building a data pipeline to feed a business intelligence tool or train a machine learning model, one of the earliest decisions you need to make is where the transformation happens: before the data lands in your warehouse or after. This means you have to choose between extract, transform, load (ETL) and extract, load, transform (ELT) pipelines.

The choice determines how quickly analysts can access raw data, how much compute you’re spending where, and how easily you can adapt when your data model changes. The market for global data pipeline tools is projected to be worth $48.33 billion by 2030, and ETL is losing ground to ELT and other types of pipelines.

Below, we’ll go over how each model works, when ETL or ELT makes more sense, and how purpose-built connectors fit into both approaches for specific data sources.

Highlights

  • ETL transforms data before it reaches the destination. ELT loads raw data and transforms it inside the warehouse using its own compute.

  • ELT is the default pattern for most modern data teams. ETL remains popular for constrained destinations, strict governance requirements, and legacy pipelines.

  • Purpose-built connectors for specific data sources can reduce the risk of data quality issues that generic connectors sometimes introduce.

ETL vs. ELT pipelines: What’s the difference?

ETL stands for “extract, transform, load.” In an ETL model, you pull data from a source system, run it through a transformation layer that cleans, reshapes, and filters it, and then load the processed result into a destination, such as a data warehouse or data mart.

With ETL, a separate processing engine does the heavy lifting before data reaches the destination. The warehouse receives structured data that is ready to be queried rather than the raw source records. Reprocessing from the source means either re-extracting or maintaining a separate raw copy somewhere else.

ELT, which stands for “extract, load, transform,” reverses the last two steps. Raw data lands at the destination first, and then transformations run inside the warehouse using its own compute.

That shift became practical when cloud data warehouses made it cheap and fast to run Structured Query Language (SQL) transformations across large datasets. Instead of a separate transformation engine sitting between the source and destination, your warehouse is the transformation environment.

How does ETL work?

An ETL pipeline runs in three stages, each with a distinct responsibility.

Here’s how ETL works:

  • Extract: A connector or ingestion job pulls data from the source, such as a database, a payment application programming interface (API), a software-as-a-service (SaaS) platform, or a flat file. This step handles authentication, pagination, incremental versus full refresh logic, and rate limits.

  • Transform: The extracted data moves into a processing layer where it gets cleaned (e.g., nulls handled, types cast), reshaped (e.g., joins, aggregations, column renaming), filtered, and sometimes enriched with reference data from other sources.

  • Load: The processed dataset loads into the destination as a full overwrite, an upsert, or an append, depending on how the pipeline is configured.

ETL still makes sense to use in these specific situations:

  • Constrained destinations: Older data warehouses or on-premises systems with limited compute can’t efficiently run large transformations in place, so loading pretransformed data is the only realistic option.

  • Strict governance requirements: Some organizations need to ensure that personally identifiable information or sensitive financial data never lands in the warehouse in raw form. Transforming and masking upstream satisfies that requirement architecturally.

  • Edge environments: Internet of Things (IoT) and embedded systems that push data to central infrastructure often can’t send large raw payloads, which makes local transformation before transmission a practical necessity.

  • Legacy pipeline debt: Many mature organizations have ETL pipelines that have run reliably for years. Rebuilding them as ELT carries a risk that might not be worth taking.

How does ELT work?

ELT pipelines extract and load raw or lightly formatted source data directly to the destination. Transformations run after, activated on a schedule or on demand, using the warehouse’s own compute.

For example, a connector can extract data from a source and write it to a raw schema in your warehouse. An analytics engineer writes data build tools (dbt) models that read from that raw schema, apply business logic, and create cleaned tables that business intelligence (BI) tools can query. When the logic needs to change, they just update the model and run it again against data that’s already there.

ELT is a smart choice for these reasons:

  • Faster access to raw data: Data lands in the warehouse quickly, so teams can start exploring and validating it before the transformation layer is finished.

  • Cheaper improvements: Changing a transformation in ETL often means modifying a pipeline, reprocessing, and reloading. In ELT, you just update a SQL model and run it again against data you already have.

  • Warehouse-scale compute: Cloud warehouses are built to process large datasets fast, which means you’re applying serious compute to transformation rather than growing a separate middleware layer independently.

  • Full raw history: Because the raw data is always in the warehouse, you can derive any downstream table from scratch again, which matters when business definitions change or when you have to debug a data quality issue.

How do you choose between ETL and ELT?

Large organizations typically run both data processing models in parallel for different sources. Your organization’s needs will help you decide which method is the right fit.

Here’s what you need to consider.

Data volume and velocity

High-volume, high-frequency data generally favors an ELT data pipeline. Cloud warehouses handle it well, and the cost of running a heavy transformation engine upstream grows fast. As a trade-off, your warehouse compute bill increases when transformations are heavy, and storing raw data carries costs.

Lower-volume pipelines with stable, well-defined transformation logic might stay on ETL without much pain.

Transformation ownership

If your transformations are SQL-based and owned by analytics engineers who need to adapt quickly, ELT with dbt or a similar tool is likely the better fit.

If your transformations are encoded in Python or Java, and they’re owned by a data engineering team, ETL might fit naturally into your existing toolchain.

Raw data access

If analysts need access to raw source data (e.g., for ad hoc exploration, auditing, or because transformation logic is still developing), ELT is the better default. ETL pipelines that don’t preserve raw data make that kind of access hard to retrofit.

Governance and compliance

If specific data must be masked or anonymized before it reaches a shared destination, ETL gives you architectural enforcement of that requirement.

ELT can satisfy many of the same requirements through column-level security and warehouse-level access controls, but the implementation differs and is worth reviewing against your compliance obligations.

Team workflow

A team with deep SQL fluency and version control habits around dbt will extract more value from an ELT setup than from maintaining a custom ETL pipeline they didn’t build. Compatibility with how your team works is a legitimate factor.

How do data connectors fit into ETL and ELT pipelines?

Generic ETL and ELT tools handle a wide range of sources through configurable connectors. That breadth is useful, but it comes with trade-offs.

Payments data is a good example. Stripe’s data model covers events, charges, refunds, disputes, balance transactions, customers, subscriptions, and invoices. A generic connector that doesn’t keep pace with Stripe’s API versioning or that handles incremental sync logic incorrectly can produce errors that are hard to catch until they affect a dashboard or a financial reconciliation.

Stripe Data Pipeline addresses this directly:

  • Built and maintained by Stripe: Because Stripe owns the connector and mirrors internal Stripe infrastructure directly, you can sync data reliably on an ongoing basis while ensuring completeness and accuracy—no matter how much data you have and with no API rate limits.

  • Direct warehouse sync: Data moves to Amazon Redshift, Snowflake, Databricks, and more, without routing through a third-party connector, which keeps sensitive financial data out of additional vendor infrastructure.

  • Complete Stripe dataset: The sync covers Stripe objects, prebuilt financial reports, and curated datasets to accelerate analysis and reporting.

  • Lightweight setup: The connection is configured in the Stripe Dashboard, with no code required. If you’re already running ELT pipelines for other sources, Stripe data drops into the same pattern.

The content in this article is for general information and education purposes only and should not be construed as legal or tax advice. Stripe does not warrant or guarantee the accuracy, completeness, adequacy, or currency of the information in the article. You should seek the advice of a competent lawyer or accountant licensed to practise in your jurisdiction for advice on your particular situation.

More articles

  • Something went wrong. Please try again or contact support.

Ready to get started?

Create an account and start accepting payments – no contracts or banking details required. Or, contact us to design a custom package for your business.

Data Pipeline

Stripe Data Pipeline sends all your up-to-date Stripe data and reports to your data warehouse in a few clicks.

Data Pipeline docs

Understand your business with Stripe data.