Exactly Once Semantics Myths Versus Practical Guarantees (Part 2)

Part 1 established the uncomfortable but useful truth: Kafka exactly-once semantics stops at Kafka’s boundary. Part 2 is what most real systems need next. If your processor writes to a database, cache, or external side-effect sink, you need an idempotent sink design that can absorb replay without corrupting state.

This is where the broader workflow becomes trustworthy. Not because Kafka suddenly expanded its guarantee, but because the external side is now designed to tolerate re-execution.

The Pattern We Need Outside Kafka

The usual shape is simple:

identify the logical event with a stable event id
check whether that id has already been applied
apply the business side effect and mark the id as processed in the same local transaction

That final clause matters. The dedupe marker and the side effect need to move together, or the sink can still end up inconsistent.

if (alreadyProcessed(eventId)) {
    return;
}

applyBusinessUpdate();
markProcessed(eventId);

The code is intentionally small, but the idea is strong: make replay cheap and safe.

Why the Dedupe Write Belongs Beside the Side Effect

Suppose you process PaymentSettled into a relational database.

If the service:

updates the payment row
crashes before recording eventId as processed

then replay can apply the same state transition again. The dedupe marker must live in the same local atomic unit as the business update.

That is why a processed-event table is so common in sink designs.

A More Honest Guarantee Statement

With this pattern in place, the system can say something more accurate:

Kafka provides transactional visibility for Kafka-to-Kafka work
the sink provides idempotent application of external side effects

That combination is what many teams actually mean when they say they want end-to-end correctness under replay.

It is still not magic. It is two explicit guarantees joined carefully.

Local Baseline

Prerequisites

Docker Desktop
Java 21
Kafka CLI tools

Local Stack

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.1
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.6.1
    depends_on: [zookeeper]
    ports: ["9092:9092"]
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

docker compose up -d

Dedupe Table

create table processed_event (
  event_id varchar(64) primary key,
  processed_at timestamp not null default now()
);

This table is not bookkeeping overhead. It is the evidence that the sink knows what it has already accepted.

The Right Failure Drill

Replay the same event batch twice and inspect final state, not just message counts.

psql -c "select count(*) from processed_event;"

A healthy result looks like:

business state unchanged on the second replay
no duplicate row effects
one durable processed marker per logical event

That is a much better test than “we did not notice duplicates in the logs.”

[!important] Idempotent sinks protect outcomes, not only executions. The real question is whether the final business state stays stable under replay.

Common Mistakes

Using unstable event identity

If the dedupe key changes between retries or replays, the sink loses the whole point of the pattern.

Recording the marker outside the business transaction

That reintroduces a crash window and weakens the guarantee.

Assuming every sink can use the same approach

Databases, HTTP APIs, and object stores may each need a slightly different form of idempotency, even if the concept is shared.

What This Part Should Leave You With

After Part 2, the team should understand:

why Kafka EOS still needs help once external side effects exist
how an idempotent sink closes that gap
why the dedupe marker and business side effect must be recorded together

That is the practical route from scoped Kafka guarantees to a workflow that can survive replay without losing integrity.

Find posts and pages