Written by Popsink team
Most Postgres-to-Snowflake pipelines are built the same way: extract on a schedule, stage to object storage, copy into Snowflake. It works until the moment someone asks why the inventory count is eight hours stale, or why the fraud model is scoring on yesterday's transactions.
The question is not whether you can move data from Postgres to Snowflake. Every team already does that. The real question is how fresh it needs to be, and what architecture can actually deliver that freshness in production without becoming a fragile, expensive, or hard-to-operate system.
Sub-second sync between Postgres and Snowflake is achievable. But it requires the right combination of WAL-based change data capture, a low-latency Snowflake ingestion path, and production-grade observability. This guide covers all three: the architectural tradeoffs, the setup mechanics, the monitoring signals that matter, and how to decide whether sub-second freshness is the right target for your workload.
Batch ETL was designed for a world where analytics ran overnight and no one expected Snowflake to reflect what happened in Postgres an hour ago. That world has largely disappeared. Operational analytics, customer-facing dashboards, AI feature stores, and fraud detection all require data that is current, not just recent.
The structural problems with batch ETL are not fixable by running the job more frequently:
Missed intermediate states. If a row is inserted and then deleted between two batch runs, it never appears in Snowflake at all. Short-lived records are invisible.
Missed deletes. Most batch pipelines perform full-table scans or timestamp-based extracts. Neither approach reliably captures hard deletes unless you add soft-delete logic to every source table.
Source load at scale. Frequent full-table scans add read pressure to the operational database. At high table volumes, this creates a direct tradeoff between freshness and source stability.
Scheduling blind spots. Any row written between the last extract and the current one is delayed by the full interval. At hourly cadence, the worst-case lag is 60 minutes. At daily, it is 24 hours.
The real problem: Batch ETL answers the wrong question. It asks "what changed since the last run?" CDC asks "what changed the moment it happened?"
The table below summarizes where each approach stands on the dimensions architects actually care about:
Dimension | Batch ETL | CDC (Log-Based) | CDC via Kafka |
|---|---|---|---|
End-to-end latency | Minutes to hours | Seconds to sub-second | Sub-second to seconds |
Captures deletes | Unreliable | Yes, natively | Yes, natively |
Source database load | High (full scans) | Minimal (WAL read) | Minimal (WAL read) |
Intermediate row states | Missed | Captured | Captured |
Operational complexity | Low | Low to medium | High |
Schema evolution | Manual | Automatic (with right tool) | Automatic (with right tool) |
The decision is not batch versus real-time as a philosophy. It is a concrete question: what is the maximum acceptable lag between a Postgres commit and a queryable Snowflake row, and which architecture can meet that SLA reliably?
Sub-second latency is a system property, not a connector property. A tool can capture a WAL event in 50 milliseconds and still deliver data to Snowflake 30 seconds later if the ingestion path is wrong. Understanding the full pipeline is the only way to reason honestly about what latency is achievable.
Every Postgres-to-Snowflake sync passes through the same five stages. Latency accumulates at each one:
Postgres commit. The transaction is written to the WAL and acknowledged. This is your T=0. Nothing downstream can be faster than this.
CDC capture. The CDC connector reads the WAL via logical decoding, decodes the change event, and prepares it for transport. With a well-configured logical replication slot, capture latency is typically in the 10-50 millisecond range.
Transport and buffering. In a direct CDC-to-Snowflake architecture, this step is minimal. In a Kafka-based architecture, events pass through a broker and a sink connector before reaching Snowflake. Each hop adds latency and a potential failure point.
Snowflake ingestion. This is where most teams lose time. File-based Snowpipe stages data to object storage and polls for new files, adding 30-90 seconds of ingestion latency. Snowpipe Streaming uses the Snowflake Ingest SDK to write rows directly, bypassing the staging cycle and reducing ingestion latency to under 10 seconds.
Query availability. Data written via Snowpipe Streaming is queryable almost immediately after ingestion. There is no additional materialization delay for standard tables.
Key insight: Snowpipe Streaming is the single highest-leverage change you can make to reduce Snowflake-side latency. Switching from file-based Snowpipe to Snowpipe Streaming typically cuts ingestion lag from 30-90 seconds to under 10 seconds without any changes to the capture layer.
Strict sub-second end-to-end latency (Postgres commit to Snowflake query returning the row) is achievable in well-tuned systems, but it requires all five stages to be optimized simultaneously. In practice, the realistic latency targets look like this:
Architecture | Typical End-to-End Latency |
|---|---|
Batch ETL (hourly) | 1-60 minutes |
CDC + file-based Snowpipe | 30-90 seconds |
CDC + Snowpipe Streaming | 2-10 seconds |
CDC + Snowpipe Streaming (optimized, low load) | Under 1 second |
The path to sub-second requires: WAL-based logical replication on the Postgres side, Snowpipe Streaming on the Snowflake side, minimal buffering in between, and a Snowflake virtual warehouse that is not suspended or queue-constrained. Snowflake's row timestamps feature, released in February 2026, makes it significantly easier to measure actual end-to-end latency by recording the precise commit time of each row, independent of client-side timestamps.
The capture layer is rarely the bottleneck. The ingestion path almost always is.
Choosing the right architecture is not about picking the most sophisticated option. It is about matching the design to the actual freshness requirement, the team's operational capacity, and the number of downstream consumers. Over-engineering toward sub-second when the business can tolerate five minutes is a real and common mistake.
Batch ETL remains the right choice for a narrower set of use cases than most teams realize. If the downstream consumers are weekly reports, monthly aggregates, or historical analyses where data from the previous business day is sufficient, batch is simpler to operate and cheaper to run. The problems start when teams try to stretch batch to cover operational use cases it was never designed for.
Direct CDC to Snowflake is the right default for most teams moving beyond batch. The CDC connector reads directly from the Postgres WAL and writes to Snowflake via Snowpipe Streaming. Fewer moving parts than Kafka-based architectures, lower operational overhead, and latency in the 2-10 second range under normal load. The tradeoff is that without a message broker in the middle, replay capability is limited to what the replication slot retains.
CDC via Kafka adds a message broker between the capture layer and Snowflake. This architecture is worth the complexity when you need fan-out to multiple destinations (Snowflake, Redshift, a real-time API, a feature store) from a single Postgres source, or when you need durable event replay with configurable retention. For teams whose only destination is Snowflake, the added operational surface area rarely justifies the latency and maintenance cost.
Use this table to match your requirements to the right architecture:
Requirement | Batch ETL | Direct CDC | CDC via Kafka |
|---|---|---|---|
Latency SLA: hours | Best fit | Overkill | Overkill |
Latency SLA: minutes | Possible | Best fit | Acceptable |
Latency SLA: seconds | Not viable | Best fit | Viable |
Latency SLA: sub-second | Not viable | Viable (optimized) | Best fit (complex) |
Must capture deletes | Requires workarounds | Native | Native |
Single destination | Good fit | Good fit | Overkill |
Fan-out to 3+ destinations | Separate pipelines | Separate pipelines | Best fit |
Schema evolution needed | Manual | Automatic | Automatic |
Small ops team | Best fit | Good fit | Avoid |
High write throughput | Degrades | Handles well | Best fit |
The operational complexity column is where most architectural decisions go wrong. Teams evaluate latency in a demo environment and choose the lowest-latency option, then spend months maintaining a five-component streaming stack for a single Snowflake destination.
Direct CDC to Snowflake via Snowpipe Streaming is the right starting point for the majority of teams. It delivers seconds-level freshness, captures every change type natively, and does not require Kafka expertise to operate. Adding Kafka is a deliberate upgrade for specific requirements, not a default.
A working demo and a production-ready pipeline are separated by three things: schema evolution handling, checkpointing, and fault recovery. Most tutorials cover the happy path. This section covers what actually needs to be true before you put a CDC pipeline in front of real workloads.
Postgres does not expose WAL changes to external consumers by default. You need to enable logical decoding, which requires four prerequisites:
-- 1. Confirm WAL level (requires restart to change)
SHOW wal_level; -- must return 'logical'
-- 2. Create a dedicated replication user
CREATE ROLE replication_role REPLICATION LOGIN;
GRANT replication_role TO <your_cdc_user>;
-- 3. Set REPLICA IDENTITY to FULL on each table
-- (required to capture full before/after state on updates)
ALTER TABLE your_table REPLICA IDENTITY FULL;
-- 4. Create a publication scoped to the tables you want to replicate
CREATE PUBLICATION popsink_pub FOR TABLE orders, customers, products;
Also confirm that max_replication_slots and max_wal_senders are each set to at least the number of CDC connectors you plan to run. One slot per connector is the standard allocation. For managed Postgres services, the configuration path differs: AWS RDS requires setting rds.logical_replication = 1 via a parameter group; GCP Cloud SQL uses cloudsql.logical_decoding; most hosted Postgres providers expose these as first-class settings.
A production CDC connector does two things simultaneously on first connection: it takes an initial snapshot of existing table data (the backfill) and it opens the replication slot to begin capturing live changes. The critical requirement is that no events are missed during the transition from snapshot to streaming mode.
This is where many DIY implementations break. A naive implementation that completes the backfill first and then opens the slot will miss any changes that occurred during the snapshot window. A correct implementation opens the replication slot before the snapshot begins, buffers any changes that arrive during the snapshot, and replays them in order once the snapshot is complete.
Schemas change. Developers add columns, rename fields, and occasionally change types. A CDC pipeline that crashes on a schema change is not a production pipeline.
The minimum viable schema evolution behavior is:
New column added upstream: The pipeline detects the schema change and propagates the new column to Snowflake automatically, with null backfill for historical rows.
Column type widened (e.g., INT to BIGINT): Handled automatically without pipeline restart.
Breaking change (e.g., column dropped, type narrowed): The pipeline should surface a clear error rather than silently corrupting data.
The replication slot tracks the last consumed WAL position (the restart_lsn). A well-behaved CDC connector commits its offset after each successful write to Snowflake, not before. This ensures that in the event of a connector failure, it resumes from the last confirmed delivery rather than re-delivering from an earlier checkpoint and creating duplicates.
The risk to watch: If the connector goes offline and stops consuming the slot, Postgres retains WAL segments from that position forward. Left unchecked, this grows the disk footprint of the source database. Setting max_slot_wal_keep_size (Postgres 13+) caps this retention and prevents runaway disk usage.
For a detailed walkthrough of the Popsink Postgres source connector configuration, including publication setup and REPLICA IDENTITY requirements, the documentation covers the full prerequisites for each managed Postgres provider.
Most teams monitor whether the pipeline is running. Far fewer monitor whether it is delivering data on time. Those are different questions, and the second one is the one that matters for freshness SLAs.
1. Replication slot lag (Postgres side)
This is the first metric to instrument. A growing slot means the CDC consumer is falling behind, and Postgres is retaining WAL segments to compensate. Left unchecked, this can fill disk on the source database.
-- Check slot lag size
SELECT
slot_name,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS lag_size
FROM pg_replication_slots;
Alert when slot lag exceeds 100 MB. At that threshold, something is wrong with consumption: either the connector is down, Snowflake is throttling writes, or the warehouse is suspended.
2. Snowflake warehouse queue time (Snowflake side)
A suspended or undersized warehouse is the most common source of Snowflake-side latency. Snowpipe Streaming does not require a running warehouse for ingestion itself, but downstream queries and any MERGE operations do. If the warehouse is queuing work, data is queryable later than expected even if the ingestion completed on time.
3. End-to-end commit-to-query latency
This is the only metric that directly measures whether your freshness SLA is being met. The recommended approach is to insert a sentinel row with a known timestamp into a monitored table in Postgres, then query for it in Snowflake and measure the delta. Snowflake's row timestamps feature (GA February 2026) provides a native, server-side commit timestamp on each row, which eliminates the need to rely on application-side timestamps that may drift or be absent.
Signal | Where to Check | Alert Threshold |
|---|---|---|
Replication slot lag |
| > 100 MB |
WAL sender activity |
| Sender lag > 30s |
Snowflake ingestion latency |
| > your SLA |
Warehouse queue depth | Snowflake query history / monitoring | Any sustained queuing |
End-to-end latency (sentinel) | Custom query on monitored table | > your SLA |
Error rate | Connector logs / observability platform | Any non-zero sustained rate |
Row count drift | Source vs. destination row count comparison | > 0.01% divergence |
The most common failure pattern: A connector goes offline, the replication slot grows silently, and the team only notices when analysts report stale data hours later. Slot lag alerting catches this in minutes, not hours.
Integrating these metrics into your existing observability stack (Datadog, Grafana, CloudWatch) is straightforward via the Postgres system views and Snowflake's information schema. The key is setting alerts before you need them, not after the first incident.
The architecture described in this guide is achievable with open-source components: Debezium for capture, Kafka Connect for transport, a custom Snowpipe Streaming sink. Teams that have built this stack know the operational reality: connector version conflicts, schema registry maintenance, backfill restarts after failures, and a Kafka cluster that needs uptime guarantees of its own.
Managed CDC exists to remove that operational surface area. The question for architects is not whether managed is better in theory, it is whether the specific tool handles the production edge cases that matter.
Not all managed CDC platforms are equivalent. The capabilities that separate production-ready tools from demo-ready ones:
Capability | Why It Matters in Production |
|---|---|
Native Postgres logical decoding | Generic JDBC connectors miss deletes and intermediate states |
Snowpipe Streaming as default ingest | File-based Snowpipe adds 30-90s of unnecessary latency |
Concurrent backfill + live CDC | Naive sequential implementations miss events during snapshot |
Automatic schema evolution | Manual schema handling is the most common source of pipeline downtime |
Fault-tolerant checkpointing | Offset management determines whether failures cause duplicates or gaps |
| Without it, UPDATE events carry only the changed columns, not the full row |
Materialized view on Snowflake | Preserves full change history while exposing a clean current-state view |
Popsink's Postgres source connector uses native PostgreSQL logical decoding rather than a generic JDBC driver. It handles REPLICA IDENTITY FULL, publication management, and schema evolution automatically, and it resumes from the last confirmed offset after failures without duplicating or dropping events.
On the Snowflake side, the Popsink Snowflake target connector uses Snowpipe Streaming as the default ingestion path, writes CDC updates as inserts to preserve the full change history, and constructs a materialized view that mirrors the current state of the source table. Snowflake's Time Travel is retained, so historical state is available for auditing and SCD2-style workflows.
The practical outcome for engineering teams: a Postgres-to-Snowflake CDC pipeline that handles production workloads without a Kafka cluster, without manual schema migrations, and without a dedicated engineer on call for connector failures. For teams evaluating the data platform ingestion use case, this translates directly to faster time-to-production and fewer failure modes at scale.
The managed CDC argument is not about outsourcing complexity. It is about not building and maintaining infrastructure that is not your core competency, so your engineering capacity goes toward the analytics and AI workloads that depend on fresh data, not the pipeline that delivers it.
Sub-second freshness is not the right target for every workload. Optimizing for it when the business does not require it adds cost (continuous Snowflake compute, higher ingestion rates) and operational complexity without a corresponding return.
Fraud and risk detection. Scoring models that operate on stale transaction data miss fraud patterns that resolve within seconds.
Customer-facing operational views. Order status, inventory availability, and support ticket state shown to end users should reflect what actually happened, not what happened an hour ago.
AI and ML feature stores. Embedding stores and retrieval-augmented generation (RAG) pipelines that draw context from Snowflake degrade in quality when the underlying data is stale.
Live financial and trading analytics. Position and exposure calculations that run on delayed data carry real risk.
Real-time personalization. Recommendation and pricing engines that adapt to user behavior need current state to be effective.
Nightly or weekly business intelligence reports
Historical trend analysis and cohort studies
Data science model training on stable datasets
Compliance and audit logging where near-real-time is sufficient
Recommendation: Start by defining your freshness SLA before evaluating architecture. If the business can tolerate 5-minute lag, direct CDC with a modest flush interval is the right choice and costs significantly less to operate than a fully optimized sub-second pipeline. Reserve sub-second tuning for the specific tables and workloads that genuinely require it.
The right architecture is the one that meets your SLA reliably, not the one with the lowest theoretical latency. If you are ready to evaluate what a production-grade Postgres CDC to Snowflake pipeline looks like for your specific workload, book a demo with Popsink to see the full setup in a live environment.
One platform for real-time replication - SaaS, BYOC or on-prem. Fast time-to-value, enterprise controls and performance you can measure.