Schema Evolution with Avro and Protobuf Compatibility Contracts (Part 1)

Schema evolution is rarely where a Kafka program feels exciting, but it is where a lot of quiet production damage starts. A producer deploys a “small” payload change, one older consumer cannot deserialize it, and now the topic contract is broken in the middle of a rolling rollout.

Part 1 is about building the discipline before automation: a baseline contract, explicit compatibility rules, and mixed-version testing that proves the change is safe in a real deployment window.

The Design Problem Behind the Syntax

This is not fundamentally an Avro problem or a Protobuf problem. It is a coordination problem.

The real questions are:

which fields are safe to add
which changes break old readers or writers
how long old consumers are expected to coexist with new producers
whether the team treats the registry as a guardrail or as a box-checking step

flowchart LR
    A[Schema v1 in production] --> B[Producer proposes change]
    B --> C[Compatibility check]
    C --> D[Mixed-version test]
    D --> E[Safe rollout]

A contract is only trustworthy when both the registry and the running consumers agree.

A Safer Baseline Change

For Part 1, keep the change intentionally boring: add an optional field or one with a compatible default. That teaches the process without dragging the team into advanced compatibility edge cases too early.

For example:

message PaymentCreated {
  string payment_id = 1;
  int64 amount_minor = 2;
  string currency = 3;
  string merchant_id = 4;
}

If merchant_id is a new optional addition, older consumers can usually continue to read the record without crashing, assuming the compatibility mode and serializer behavior are aligned.

The Changes That Deserve More Fear

Teams get into trouble when they:

renumber fields
change meaning without changing names
remove fields still needed by older readers
switch a field from optional to effectively required during rollout

Those are not harmless refactors. They are contract changes with operational blast radius.

[!warning] A schema change that passes review because it “looks tiny” can still be the most dangerous change in the release if older consumers are still alive.

Why Mixed-Version Testing Matters

A registry compatibility check is necessary, but it is not the whole proof.

You still want to test:

old consumer reading new producer data
new consumer reading historical data
at least one rolling deployment window where versions coexist

If you only test latest producer with latest consumer, you are testing a lab state that production rarely stays in.

Run It Locally

Prerequisites

Docker Desktop
Java 21
Kafka CLI tools

Local Stack

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.1
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.6.1
    depends_on: [zookeeper]
    ports: ["9092:9092"]
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

docker compose up -d

Verification Flow

First verify the registry sees the expected latest version:

curl -s http://localhost:8081/subjects/payment-value/versions/latest

Then do the more important test: produce both versions and read them through the consumer version you actually intend to keep live during rollout.

That second check catches the real integration mistakes.

Operational Guidance

Write down allowed versus forbidden changes

Do not leave compatibility rules as tribal knowledge. A short team policy is often enough:

adding optional fields is allowed
field renumbering is forbidden
semantic repurposing of existing fields is forbidden
removals require a migration plan

Align subject naming and ownership

If nobody knows which subject belongs to which event stream, the registry becomes harder to trust during incidents.

Treat schema review like API review

Because that is what it is. A topic schema is not just serialization detail; it is an interface shared across teams and time.

What This Part Should Leave You With

After Part 1, the team should understand:

why additive, compatibility-preserving changes are the right baseline
why registry acceptance is necessary but insufficient
why mixed-version runtime tests are part of real contract safety

That baseline makes later schema governance and automation credible instead of ceremonial.

Find posts and pages