Blog - Data Movement Strategies: Choosing Between SFTP, ETL, and CDC

‍

In many transformation projects, the question of how to move the data comes too late. Architectural choices are made, platforms are selected, and only then does the realization set in: without the right integration strategy, the entire initiative is at risk.

Whether the goal is to feed a cloud warehouse, enable AI initiatives, or reduce reliance on legacy infrastructure, data movement is the invisible backbone of modernization. It determines cost, risk, ownership, and political feasibility - and yet it is often overlooked until it becomes the bottleneck.

For highly regulated industries such as banking and insurance, this oversight is particularly damaging. Reliance on outdated methods like SFTP exports has left many data teams locked out of modernization altogether, unable to deliver the real-time capabilities their organizations demand.

Three approaches dominate enterprise practice today: file-based transfers (SFTP), batch ETL/ELT, and Change Data Capture (CDC). Each can deliver results, but each carries very different consequences in terms of technical feasibility, IT ownership, total cost of ownership (TCO), risk, and long-term architectural fit.

This article examines the outcomes and risks of each method, and what they mean for leaders responsible for modernization at scale.

‍

1. File-based (SFTP + batch exports)

‍

How it works:
The oldest and most common pattern: source systems export CSV or flat files on a schedule, often overnight. These files are deposited on an SFTP server and then picked up by downstream processes for ingestion.

‍

Outcomes:
On paper, this approach solves the problem of access: data leaves the source system in a usable form, and downstream teams can load it into a warehouse or analytical environment. It is widely supported, easy to explain, and in some cases provides the only integration mechanism available for legacy applications.

Politics and ownership:
Behind its apparent simplicity, SFTP creates political friction. IT is responsible for provisioning, securing, and maintaining the SFTP server, as well as configuring exports, managing failures, and handling replays. Business or data teams remain dependent for every operational issue - from lag to broken exports. Worse, because the data arrives through a slow, brittle, and siloed channel, all downstream processes built on top of SFTP are treated as second-class. They are tolerated but rarely prioritized, seen as less reliable than “official” systems of record. SFTP is an easy way to pass on accountability but comes with weak ownership.

Total cost of ownership (TCO):
The upfront cost is negligible: an SFTP server and some scripts. But hidden costs accumulate quickly: manual monitoring, failed loads, one-off fixes, and repeated reprocessing. What begins as a “cheap” solution often becomes a drain on scarce IT resources.

Risk:
The risks are significant. Pipelines are brittle and fragile under schema drift. Auditability is weak: lineage often stops at “a file was dropped.” Recovery is manual and uncertain. SFTP exports also reinforce silos, since each downstream consumer receives their own copy without a shared governance model. Strategically, it is a dead-end: no path leads from SFTP to modern data practices and architectures.

Bottom line:
SFTP pipelines persist because they are familiar and technically feasible, but they lock enterprises out of modernization. In industries where many processes still depend on overnight file drops, data teams remain cut off from timely insights and advanced analytics. What looks like a conservative, low-risk option in fact increases operational risk, entrenches dependency on IT, and condemns downstream processes to be treated as second-class citizens in the enterprise data landscape.

‍

2. Batch ETL / ELT (including APIs)

‍

How it works:
In batch ETL or ELT, data is extracted directly from transactional databases or SaaS applications at scheduled intervals. The extract may run heavy SQL queries, or call APIs repeatedly to pull records. With ETL, the data is transformed before loading into the target system; with ELT, the raw extract is loaded first, and transformations are handled downstream.

‍

Outcomes:
Batch ETL/ELT remains the backbone of many enterprise data warehouses. It provides structured, governed pipelines, with transformations centralized for consistency. Mature tooling and SaaS platforms make it relatively easy to adopt across a wide range of sources. For traditional BI and reporting use cases, it delivers dependable outcomes.

Politics and ownership:
Despite its maturity, batch ETL creates political friction. DBAs and IT operations teams often resist extraction jobs because they put direct load on production systems - consuming CPU, memory, and locking resources. Data teams want fresh data; IT teams want stable systems. Ownership is split: IT manages credentials, connectors, and schedules, while data teams try to define transformations, but neither side is fully satisfied. This tension frequently surfaces during audits or incidents, when the extraction process is scrutinized as a source of instability.

Total cost of ownership (TCO):
The economics of batch ETL look reasonable at first. SaaS ETL vendors make connectors easy to buy and deploy, and open-source frameworks reduce licensing costs. But TCO grows with volume and complexity: API rate limits require workarounds, transformations multiply, and scheduling windows become harder to manage. Over time, what starts as a straightforward pipeline evolves into a complex web of dependencies with mounting maintenance costs.

Risk:
Risks are significant at scale. Extraction impacts the performance of transactional systems, sometimes during critical business operations. Pipelines are fragile under schema changes or API updates. Latency is built-in: data is only as fresh as the last batch. Replay and lineage are limited, often requiring custom logging or vendor-specific features. From a compliance perspective, auditability is better than with SFTP but still partial, especially when queries or APIs fail silently.

Bottom line:
Batch ETL/ELT has served enterprises for decades, but it was designed for an era of static reporting, not real-time operations. Yet, it creates friction between IT stability and data agility, consumes resources on both sides, and increasingly struggles to keep pace with modern requirements. Persisting with batch ETL means accepting a structural trade-off: reliable nightly reports, but no foundation for monitoring, AI, or modern data services. What once felt like the default choice is now a bottleneck.

‍

3. Change Data Capture (CDC)

‍

How it works:
Change Data Capture (CDC) captures inserts, updates, and deletes directly from a database’s transaction logs or journals. Instead of querying the database, CDC reads the log of committed transactions, then streams those changes to downstream systems in near real time. This makes it fundamentally different from batch approaches: CDC follows the database’s own record of truth rather than competing with it.

‍

Outcomes:
CDC provides what other methods cannot: fresh, transaction-level data with minimal delay and minimal impact on the source. It enables real-time fraud detection, instant risk monitoring, customer-facing digital services, and AI/ML pipelines that depend on up-to-the-second data. Beyond analytics, it also supports operational needs such as zero-downtime cloud migrations or maintaining live replicas for disaster recovery.

Politics and ownership:
CDC does require deeper collaboration with IT and DBAs, since it needs access to database logs or journals. That initial negotiation can be politically sensitive, but once access is granted, it often reduces friction. Unlike ETL, CDC does not compete with production workloads, which relieves DBAs from the constant pushback against extraction jobs. Ownership shifts: IT sets up the initial configuration, but data teams gain autonomy over consuming and transforming streams without interfering with source systems.

Total cost of ownership (TCO):
CDC can involve higher upfront commit: specialized tooling, licensing, and set up are required. But over time, TCO is substentially lower than with batch ETL. There are no batch windows to maintain, no nightly jobs to troubleshoot, and no heavy queries taxing production systems. Once running, CDC pipelines are largely automated, scaling with data volumes without proportional increases in maintenance effort.

Risk:
CDC is more robust than any other approach. It captures changes exactly as they occur, with full lineage back to the underlying transactions - a critical advantage for regulated industries. Because it relies on transaction logs, recovery is straightforward: replay from a specific log position. Schema evolution is still a challenge, but handled more systematically than in batch pipelines. From a compliance and auditability perspective, CDC offers the strongest guarantees.

Bottom line:
CDC is not just another integration method - it is the strategic enabler of modernization. For IT, it reduces load on transactional systems and eliminates the nightly tension between operations and data teams. For compliance, it provides auditable, transaction-level lineage. For data teams, it unlocks real-time use cases that were previously impossible. Enterprises that invest in CDC are not only solving today’s integration challenges - they are laying the foundation for architectures that will support the next decade of transformation.

‍

Comparative Overview

‍

Each of these approaches can deliver in isolation:

- SFTP provides access when no other option exists.

- Batch ETL/ELT centralizes data for BI and reporting.

- CDC streams changes as they happen for real-time use cases.

But the differences become decisive once you measure them across the dimensions that matter: technical feasibility, politics and ownership, TCO, modernity, and risk.

‍

Strategy	Technical Feasibility	Politics & Ownership	TCO	Modernity & Future Fit	Risk
SFTP (File-based)	Very high - universally supported, simple exports to CSV/flat files.	IT maintains exports & SFTP. Data teams depend on IT for ingestion, replay, and lag handling; downstream processes often treated as second-class.	Low upfront - high hidden costs from manual recovery, monitoring, and reprocessing.	Legacy - no path to real-time or streaming architectures.	High - brittle / weak auditability / poor lineage / fragile recovery
Batch ETL / ELT (incl. APIs)	Medium - mature tooling; orchestration & scheduling add complexity.	Constant friction: DBAs push back because ETL jobs query production directly, consuming CPU, I/O, and locks. IT safeguards stability; data teams push for freshness.	Medium upfront - costs escalate with volume, API limits, and shrinking batch windows.	Adequate for BI; architecturally obsolete for real-time, AI, or customer-facing apps.	Very high - direct production load / fragile under schema/API change / weak replay
CDC (Log-based replication)	Medium - requires DBA cooperation & specialized tooling; highly automated once in place.	IT grants log/journal access once; DBAs relieved from recurring extraction jobs. Data teams gain autonomy without hitting production workloads.	Higher upfront - lowest long-term cost (no batch jobs, minimal source load, scales with volume).	High - aligned with event-driven, cloud-native, and AI-driven architectures; enables real-time ops and zero-downtime migrations.	Low - robust / auditable / replayable / future-proof

‍

At a glance:

- SFTP is a tactical workaround that entrenches dependency and risk. To each their own: IT manages exports, Data owns the pulls - yet no one owns the whole pipeline.

- Batch ETL/ELT may have been acceptable for BI, but its direct load on production systems makes it impossible to scale safely in critical environments

- CDC is the only method that reduces source load, improves auditability, and lays a foundation for modernization.

‍

Conclusion: The Hidden Cost of Playing it Safe

‍

In many programs, data movement is treated as a detail to sort out after platform choices. Deciding late has outsized effects on steady-state cost, operational risk, ownership boundaries, and the ability to meet modern architectural expectations.

SFTP-based transfers preserve access where integration options are limited. They also split ownership in ways that are hard to sustain: IT provisions and operates exports, controls replay and lag handling, and remains the gatekeeper for fixes, while data teams depend on what arrives and when. Because files sit outside primary pipelines and standards, downstream work built on them is brittle and often unfit for production-grade use. The result is persistent friction, weak lineage, and a long tail of manual effort.

Batch ETL and ELT centralize data for analytics with mature tooling. Their defining characteristic is direct extraction against production systems. As volumes and freshness expectations rise, batch windows shrink and resource contention increases - CPU, I/O, and lock time on transactional databases become material and visible to operations. This model aligns with periodic reporting, but its impact on source systems makes it difficult to scale safely in production environments.

Change Data Capture reads database logs or journals. Source impact is typically low because it avoids query-based extraction; lineage ties back to transactions; and recurring operational work is reduced once pipelines are in place. Upfront setup and can be higher, but nightly jobs and heavy extracts disappear once set up, which shifts the long-run cost curve.

Bottom line: the three methods lead to different steady states. SFTP maintains access but embeds friction and broken ownership of ingestion. Batch ETL/ELT consolidates analytics while amplifying load on production systems as scale and freshness needs grow. CDC minimizes source impact and strengthens lineage, trading higher initial setup for a lower operational burden over time and a closer fit with contemporary architectural expectations.

Data Movement Strategies: Choosing Between SFTP, ETL, and CDC

1. File-based (SFTP + batch exports)

2. Batch ETL / ELT (including APIs)

3. Change Data Capture (CDC)

Comparative Overview

Conclusion: The Hidden Cost of Playing it Safe

Connect with us on Slack