Data pipelines used to follow a strict sequence: pull data from source systems, transform it into a clean, structured format, then load the finished product into a database. That sequence (extract, transform, load, or ETL) made sense when storage was expensive and destination systems weren’t powerful enough to do heavy computation.
But cloud warehouses changed all of that, and ELT (extract, load, transform) became the pattern many analytics teams now default to. While traditional ETL maintains a nearly 40% market share, it’s quickly losing ground to other data pipeline types such as ELT.
Below, we discuss what ELT is, how it works, and where it does and doesn’t fit.
Highlights
An ELT (extract, load, transform) process loads raw data into a cloud warehouse first, then transforms it. This gives analysts faster access to data and more flexibility to improve models.
Cloud warehouses are well-suited to ELT because they separate storage from compute, grow elastically, and let analysts write transformation logic in Structured Query Language (SQL).
Working with a payments provider that syncs directly to your warehouse without routing through third-party infrastructure reduces pipeline complexity and security exposure.
What is ELT, and how does it work?
ELT (extract, load, transform) is a data integration pattern built around three steps: extracting data from source systems, loading it raw into a central destination, then transforming it inside the destination for analytics, reporting, or machine learning.
Here’s how a typical ELT workflow runs in practice:
Extract: A connector or pipeline job pulls records from a source system (incrementally, on a schedule, or triggered by events) and writes them to the destination in their original structure.
Load: Raw tables land in the warehouse, often in a dedicated schema, preserving field names, data types, and relationships exactly as they existed in the source.
Transform: Transformation logic runs inside the warehouse as SQL-based tools, such as data build tool (dbt), let teams control versions of those models, test them, and build datasets on top of the raw layer.
Because raw data lands first, analysts can query it immediately, and multiple downstream views can be built from a single source without extracting again. If a transformation model is wrong, you rewrite it and run it again against data that’s already in the warehouse.
How does ELT differ from ETL in practice?
The core architectural difference between ETL and ELT is where transformation happens: outside the destination in ETL, inside the destination in ELT.
Here are the biggest differences between ETL and ELT:
Data availability: In an ETL pipeline, data can’t be queried until it’s been transformed and loaded. In ELT, raw data lands first, so analysts can work with it while transformation jobs run separately.
Improvement speed: Changing a transformation in an ETL pipeline often means rebuilding the pipeline itself. In ELT, you update a SQL model and re-run it against data that’s already in the warehouse.
Where compute lives: ETL transformations run on the pipeline infrastructure (a separate server or service you provision for that purpose). ELT transformations run on data warehouse compute, which scales elastically and is already paid for.
Raw data retention: ETL typically discards intermediate states once the transformed output is produced. ELT preserves the raw layer by default, which means you can reprocess historical data when requirements change.
Why does ELT fit modern cloud data warehouses?
The following properties make ELT a natural fit for cloud warehouses:
Separated storage and compute: Platforms such as Snowflake and BigQuery charge separately for storing data and querying it. You can land as much raw data as you want, and transformation jobs only consume compute when they run.
Elastic scaling: You can create the compute you need, run the job, and reduce it again. The warehouse handles parallelization.
Columnar storage: Cloud warehouses store data in columnar format, which makes analytical queries (aggregations, filters, joins across large datasets) dramatically faster than row-based storage.
SQL native: Most analysts and analytics engineers already know SQL. Running transformations inside the warehouse means the people who understand the business logic can write and own the transformation models directly.
When is ELT a good fit?
ELT isn’t the right answer for every data pipeline, but it fits a recognizable set of conditions.
Here’s where ELT tends to work well:
Large or fast-growing datasets: If you’re syncing millions of events per day, running transformations before loading requires significant pipeline infrastructure. Loading raw and transforming inside the warehouse is more practical at scale.
Frequently changing transformation logic: If transformations live inside the warehouse as SQL models, updating analytics requirements doesn’t require redeploying pipeline infrastructure.
Multiple downstream use cases: Raw data in the warehouse can feed a business intelligence dashboard, a machine learning feature store, and an operations report simultaneously. You build one pipeline and model the data differently for each user.
Teams with analytics engineering capability: ELT shifts transformation ownership to the analytics layer. Teams using tools such as dbt are well-suited for this.
ELT isn’t a good fit under the following circumstances:
Regulatory constraints on raw data: Some healthcare and financial data environments prohibit landing sensitive data before masking or filtering. In those cases, transformations need to happen upstream.
Rigid destination schemas: If your destination can’t accommodate raw or semi-structured data, ELT’s load-first approach creates more problems than it solves.
Small, stable datasets: If a simple ETL pipeline does the job, adding warehouse transformation infrastructure is overhead without much benefit.
What challenges come with ELT?
Landing raw data in a warehouse is powerful, but it can come with governance and cost challenges.
Consider these potential challenges of ELT.
Access control
Raw tables often contain data that shouldn’t be broadly accessible, such as personally identifiable information (PII), financial records, or internal identifiers. In ETL, sensitive fields can be masked or dropped before data ever reaches the destination. But in ELT, the data lands first. Without row-level security, column masking, or tightly scoped warehouse roles, analysts can query fields they shouldn’t be able to see.
Data quality
Bad data from upstream sources lands in your warehouse before any checks run. Null fields, duplicate records, schema changes, and type mismatches can propagate into downstream models if you don’t build quality tests into the transformation layer with dbt’s test framework or warehouse-native data quality features.
Cost management
Transformation jobs run on warehouse compute, and poorly written SQL (such as unbounded scans, missing partition filters, or redundant full-table refreshes) can generate substantial costs. Setting limits on query costs, using incremental models where possible, and monitoring compute usage are worth building into your workflow from the start.
How does an ELT workflow handle payments data?
Many teams build a custom integration against their payments provider’s application programming interface (API) or use a third-party connector. But custom integrations require ongoing maintenance as the API changes, and third-party connectors mean routing sensitive financial data through an additional vendor’s infrastructure.
Stripe Data Pipeline is a native option built directly into the Stripe Dashboard. It syncs Stripe data and reports to Snowflake, Databricks, Amazon Redshift, and more in a few clicks. There’s no code, no connector configuration, and no separate credentials to manage. The sync runs using Stripe’s own infrastructure, so no third-party systems handle your financial data in transit.
A few specifics worth knowing:
Historical coverage: Stripe Data Pipeline includes historical data from the start of your Stripe account rather than from the point you enable the sync.
Schema: Data lands in tables that mirror Stripe’s data model (e.g., charges, customers, subscriptions), which makes it straightforward to build transformation models on top without extensive raw-layer cleanup.
Data completeness: In addition to raw object data, Stripe Data Pipeline includes access to prebuilt financial reports and curated datasets, which can accelerate reporting and reduce transformation work for common use cases such as monthly recurring revenue (MRR) and fraud analysis.
O conteúdo deste artigo é apenas para fins gerais de informação e educação e não deve ser interpretado como aconselhamento jurídico ou tributário. A Stripe não garante a exatidão, integridade, adequação ou atualidade das informações contidas no artigo. Você deve procurar a ajuda de um advogado competente ou contador licenciado para atuar em sua jurisdição para aconselhamento sobre sua situação particular.