Configure Kafka Schema Registry with Avro serialization for data processing

Intermediate 45 min May 17, 2026 30 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up Confluent Schema Registry with Avro serialization to manage schemas and ensure data compatibility in your Kafka streaming applications. This guide covers installation, schema management, and producer/consumer configuration.

Prerequisites

  • Apache Kafka running on port 9092
  • Java 8 or higher installed
  • Basic understanding of Kafka topics and producers/consumers

What this solves

Schema Registry provides centralized schema management for Kafka topics, preventing data format mismatches between producers and consumers. Avro serialization gives you compact binary encoding with schema evolution support, essential for production streaming applications where data formats change over time.

Step-by-step installation

Install Java and required dependencies

Schema Registry requires Java 8 or higher to run properly.

sudo apt update
sudo apt install -y openjdk-11-jdk wget curl
sudo dnf update -y
sudo dnf install -y java-11-openjdk-devel wget curl

Create dedicated user and directories

Run Schema Registry as a non-root user for security and create the necessary directories.

sudo useradd --system --create-home --shell /bin/false kafka
sudo mkdir -p /opt/schema-registry /var/log/schema-registry
sudo chown kafka:kafka /var/log/schema-registry

Download and install Confluent Schema Registry

Download the latest Schema Registry from Confluent and extract it to the installation directory.

cd /tmp
wget https://packages.confluent.io/archive/7.5/confluent-community-2.13-7.5.0.tar.gz
tar -xzf confluent-community-2.13-7.5.0.tar.gz
sudo mv confluent-7.5.0 /opt/schema-registry
sudo chown -R kafka:kafka /opt/schema-registry

Configure Schema Registry properties

Create the main configuration file with Kafka connection settings and storage backend.

listeners=http://0.0.0.0:8081
kafkastore.bootstrap.servers=localhost:9092
kafkastore.topic=_schemas
kafkastore.topic.replication.factor=1
schema.registry.group.id=schema-registry

Security settings

kafkastore.security.protocol=PLAINTEXT

Schema compatibility settings

schema.compatibility.level=BACKWARD

Logging

log4j.configuration=file:/opt/schema-registry/etc/schema-registry/log4j.properties

Configure logging

Set up log4j configuration for proper log management and rotation.

log4j.rootLogger=INFO, stdout, file

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n

log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/var/log/schema-registry/schema-registry.log
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=[%d] %p %m (%c)%n

log4j.logger.kafka=WARN
log4j.logger.org.apache.kafka=WARN

Create systemd service file

Set up Schema Registry as a system service for automatic startup and management.

[Unit]
Description=Confluent Schema Registry
After=network.target kafka.service
Requires=kafka.service

[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/schema-registry/bin/schema-registry-start /opt/schema-registry/etc/schema-registry/schema-registry.properties
ExecStop=/opt/schema-registry/bin/schema-registry-stop
Restart=on-failure
RestartSec=5
Environment=JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
Environment=SCHEMA_REGISTRY_HEAP_OPTS="-Xmx512M"
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Start and enable Schema Registry

Enable the service to start automatically on boot and start it now.

sudo systemctl daemon-reload
sudo systemctl enable schema-registry
sudo systemctl start schema-registry
sudo systemctl status schema-registry

Configure Avro serialization settings

Install Avro tools and libraries

Install the necessary Avro libraries for working with schemas and serialization.

cd /opt/schema-registry
sudo -u kafka wget https://repo1.maven.org/maven2/org/apache/avro/avro-tools/1.11.3/avro-tools-1.11.3.jar -P lib/

Create example Avro schema

Define a sample user schema to demonstrate Avro serialization patterns.

{
  "type": "record",
  "name": "User",
  "namespace": "com.example.avro",
  "fields": [
    {
      "name": "id",
      "type": "long"
    },
    {
      "name": "username",
      "type": "string"
    },
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "created_at",
      "type": "long",
      "logicalType": "timestamp-millis"
    },
    {
      "name": "active",
      "type": "boolean",
      "default": true
    }
  ]
}

Register the schema

Register your Avro schema with the Schema Registry using the REST API.

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"schema": "{\"type\": \"record\", \"name\": \"User\", \"namespace\": \"com.example.avro\", \"fields\": [{\"name\": \"id\", \"type\": \"long\"}, {\"name\": \"username\", \"type\": \"string\"}, {\"name\": \"email\", \"type\": \"string\"}, {\"name\": \"created_at\", \"type\": \"long\", \"logicalType\": \"timestamp-millis\"}, {\"name\": \"active\", \"type\": \"boolean\", \"default\": true}]}"}' \
  http://localhost:8081/subjects/user-value/versions

Create and manage schemas

List all registered schemas

View all subjects and their registered schemas in the registry.

curl -X GET http://localhost:8081/subjects
curl -X GET http://localhost:8081/subjects/user-value/versions

Set compatibility levels

Configure schema evolution compatibility to control how schemas can change over time.

# Set global compatibility level
curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"compatibility": "BACKWARD"}' \
  http://localhost:8081/config

Set subject-specific compatibility

curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \ --data '{"compatibility": "FULL"}' \ http://localhost:8081/config/user-value

Test schema compatibility

Validate schema changes before registering new versions to prevent breaking changes.

# Test compatibility of a new schema version
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"schema": "{\"type\": \"record\", \"name\": \"User\", \"namespace\": \"com.example.avro\", \"fields\": [{\"name\": \"id\", \"type\": \"long\"}, {\"name\": \"username\", \"type\": \"string\"}, {\"name\": \"email\", \"type\": \"string\"}, {\"name\": \"created_at\", \"type\": \"long\", \"logicalType\": \"timestamp-millis\"}, {\"name\": \"active\", \"type\": \"boolean\", \"default\": true}, {\"name\": \"phone\", \"type\": [\"null\", \"string\"], \"default\": null}]}"}' \
  http://localhost:8081/compatibility/subjects/user-value/versions/latest

Producer and consumer configuration

Configure Java producer with Avro

Set up a Kafka producer to serialize messages using the registered Avro schema.

# Kafka cluster connection
bootstrap.servers=localhost:9092
acks=all
retries=3
max.in.flight.requests.per.connection=1

Avro serialization

key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer schema.registry.url=http://localhost:8081

Performance settings

batch.size=16384 linger.ms=5 buffer.memory=33554432 compression.type=snappy

Configure Java consumer with Avro

Set up a Kafka consumer to deserialize Avro messages using the schema registry.

# Kafka cluster connection
bootstrap.servers=localhost:9092
group.id=avro-consumer-group
auto.offset.reset=earliest
enable.auto.commit=false

Avro deserialization

key.deserializer=org.apache.kafka.common.serialization.StringDeserializer value.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer schema.registry.url=http://localhost:8081 specific.avro.reader=true

Performance settings

fetch.min.bytes=1024 fetch.max.wait.ms=500 max.poll.records=500

Create sample Java producer code

Example producer implementation that uses the registered User schema.

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;
import io.confluent.kafka.serializers.KafkaAvroSerializer;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericData;
import java.util.Properties;

public class AvroProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class.getName());
        props.put("schema.registry.url", "http://localhost:8081");
        
        Producer producer = new KafkaProducer<>(props);
        
        // Create Avro record
        GenericRecord user = new GenericData.Record(userSchema);
        user.put("id", 1001L);
        user.put("username", "john_doe");
        user.put("email", "john@example.com");
        user.put("created_at", System.currentTimeMillis());
        user.put("active", true);
        
        ProducerRecord record = 
            new ProducerRecord<>("user-events", "user-1001", user);
        
        producer.send(record, (metadata, exception) -> {
            if (exception == null) {
                System.out.printf("Sent record to topic=%s partition=%d offset=%d%n",
                    metadata.topic(), metadata.partition(), metadata.offset());
            } else {
                exception.printStackTrace();
            }
        });
        
        producer.close();
    }
}

Test with command line tools

Use Kafka's built-in console tools to test Avro message production and consumption.

# Create topic
kafka-topics.sh --create --topic user-events \
  --bootstrap-server localhost:9092 \
  --partitions 3 --replication-factor 1

Start Avro console consumer

kafka-avro-console-consumer --bootstrap-server localhost:9092 \ --topic user-events \ --property schema.registry.url=http://localhost:8081 \ --from-beginning

Verify your setup

# Check Schema Registry status
sudo systemctl status schema-registry
curl -X GET http://localhost:8081/subjects

Verify schema registration

curl -X GET http://localhost:8081/subjects/user-value/versions/1

Check compatibility settings

curl -X GET http://localhost:8081/config curl -X GET http://localhost:8081/config/user-value

Test connectivity

telnet localhost 8081

Common issues

SymptomCauseFix
Schema Registry won't startKafka not runningStart Kafka first: sudo systemctl start kafka
"Subject not found" errorSchema not registeredRegister schema using REST API or check subject name
Serialization errorsSchema mismatchVerify schema compatibility and field types
Connection refused on 8081Service not listeningCheck logs: sudo journalctl -u schema-registry -f
OutOfMemoryErrorInsufficient heap sizeIncrease SCHEMA_REGISTRY_HEAP_OPTS in systemd service
Schema evolution errorsIncompatible changesUse compatibility test endpoint before registering

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.