Oracle to ClickHouse® CDC with Debezium 3.5 and Kafka 4: A Complete Beginner's Guide

This is Part 3, the final part, of our series on streaming database changes into the ClickHouse® database with Debezium and Kafka. Part 1 covered PostgreSQL and Part 2 covered MySQL. Here we connect Oracle Database to ClickHouse.

Oracle is the most involved of the three, so set your expectations accordingly. The Kafka and ClickHouse half of the pipeline is identical to the previous parts, but Oracle requires several preparation steps before Debezium can read changes: enabling archive logging, enabling supplemental logging, creating a privileged user for LogMiner, and supplying the Oracle JDBC driver. We will walk through each one and explain why it is needed.

This guide is self-contained, so you can follow it without having read Parts 1 and 2. No prior experience with Debezium, Kafka, or ClickHouse is assumed.

What you will build

A Change Data Capture (CDC) pipeline that copies every insert, update, and delete from Oracle into ClickHouse in near real time.

The flow is: Oracle records every change in its redo log. Debezium uses a built-in Oracle tool called LogMiner to read that redo log and turn each change into an event. Apache Kafka stores those events durably. The ClickHouse Kafka Connect Sink reads them from Kafka and writes them into a ClickHouse table. Change a row in Oracle, and the same change appears in ClickHouse a moment later.

Everything runs locally in Docker.

What is Change Data Capture, in plain English

Say your application stores orders in Oracle. Customers place, edit, and cancel orders all day. Your analytics team wants to run heavy reports, but running them on the production Oracle database would compete with the application for resources.

Copying the whole database to an analytics system every night works, but the data is always stale and the full copy is wasteful. Change Data Capture is the smarter approach: it watches for changes and copies only what actually changed, as it happens, so the analytics copy stays seconds behind production rather than a full day.

CDC is light on the database because it does not run queries against your tables to find changes. Every database keeps a log of every change for crash recovery. In Oracle this is the redo log. Debezium reads it through LogMiner, so it adds little load to Oracle.

Why Oracle and ClickHouse are a great pair

Oracle is a powerful transactional database that runs many of the largest enterprise systems in the world. It is built for correctness and concurrency, one transaction at a time.

ClickHouse is built for analytics: scanning billions of rows and aggregating them in milliseconds. Running large analytical queries directly on Oracle is expensive, both in performance and often in licensing. Streaming changes into ClickHouse lets you keep Oracle as the system of record while doing your heavy reporting on a fast, cost-effective analytical database.

The tools and the exact versions

Pinning specific, compatible versions is what makes this actually run. Do not assume a different tag behaves the same way.

Component	Role	Image and version
Oracle Database Free	Source database	`gvenzl/oracle-free:23-slim` (Oracle 23ai Free)
Oracle JDBC driver	Required by the Oracle connector	`ojdbc11` (from Maven Central)
Apache Kafka	Event log / transport	`apache/kafka:4.1.0` (KRaft mode, no ZooKeeper)
Debezium	Oracle source connector	`quay.io/debezium/connect:3.5`
ClickHouse Kafka Connect Sink	Loads events into ClickHouse	`v1.3.7`
ClickHouse	Analytics database	`clickhouse/clickhouse-server:26.3` (LTS)

A few notes. Debezium 3.5 (specifically 3.5.2.Final, released 2026-06-02) is built and tested against Kafka 4.1. Debezium 3.5 supports Oracle 19c, 21c, 23ai, and 26ai, and for 23ai it uses LogMiner. We use the community gvenzl/oracle-free image because it is the easiest way to run a real Oracle database locally; its 23-slim tag currently provides Oracle 23ai Free, and the image is multi-architecture, so it also runs on Apple Silicon. ClickHouse 26.3 is the current Long Term Support release. Kafka 4 uses KRaft and has no ZooKeeper.

One important licensing point: Debezium cannot ship the Oracle JDBC driver inside its image because of Oracle's license terms, so we download the driver ourselves and mount it in. This is normal and expected for Oracle.

Prerequisites

You need Docker and Docker Compose, plus roughly 6 GB of free memory, since Oracle is heavier than PostgreSQL or MySQL. Allow extra time on first start: Oracle takes a few minutes to initialize.

How Oracle CDC differs from the others

If you followed Parts 1 and 2, here is what changes.

Oracle does not let Debezium read the redo log directly. Instead, Debezium uses LogMiner, an Oracle feature that reads the redo log on Debezium's behalf. For LogMiner to see enough detail, the database must be running in archive log mode and must have supplemental logging turned on, which records the full before-and-after of each changed row. Neither is on by default, so we enable both.

Oracle is also a multitenant database. The gvenzl/oracle-free image gives you a container database (a CDB) named FREE, with a pluggable database (a PDB) inside it named FREEPDB1. Your application tables live in the PDB, while the LogMiner user is a special common user that lives across the whole container database. Debezium needs to be told both names.

Finally, Oracle gives us a clean version number for free. Every change in Oracle has a System Change Number, or SCN, which always increases. That is exactly what ClickHouse's ReplacingMergeTree wants, so versioning is even simpler than it was for MySQL.

Everything else, the entire Kafka and ClickHouse side, is the same as the previous parts.

Step 1: Prepare a project folder

mkdir oracle-to-clickhouse-cdc
cd oracle-to-clickhouse-cdc
mkdir -p connect-plugins oracle-driver

Step 2: Download the ClickHouse Kafka Connect Sink

cd connect-plugins
curl -L -o clickhouse-kafka-connect.zip \
  https://github.com/ClickHouse/clickhouse-kafka-connect/releases/download/v1.3.7/clickhouse-kafka-connect-v1.3.7.zip
unzip clickhouse-kafka-connect.zip
rm clickhouse-kafka-connect.zip
cd ..

Step 3: Download the Oracle JDBC driver

The Debezium connect image includes the Oracle connector but not the Oracle JDBC driver. Download the latest ojdbc11 JAR from Maven Central into the oracle-driver folder. Browse the Maven directory for the current version and grab the JAR:

# List available versions at:
#   https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc11/
# then download the latest, for example:
cd oracle-driver
curl -L -O \
  https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc11/23.7.0.25.01/ojdbc11-23.7.0.25.01.jar
cd ..

If that exact version no longer exists, open the Maven directory link above and substitute the newest ojdbc11 version. We will mount this JAR into the Oracle connector's library folder so Debezium can connect.

Step 4: The Docker Compose file

Create docker-compose.yml:

services:
  # Oracle Database Free (23ai). CDB is FREE, PDB is FREEPDB1.
  oracle:
    image: gvenzl/oracle-free:23-slim
    environment:
      ORACLE_PASSWORD: oraclepw      # password for SYS and SYSTEM
      APP_USER: appuser              # a normal user created in FREEPDB1
      APP_USER_PASSWORD: apppw
    ports:
      - "1521:1521"
 
  # A single-node Kafka 4 broker in KRaft mode (no ZooKeeper).
  kafka:
    image: apache/kafka:4.1.0
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
 
  # Kafka Connect (Debezium image) with the ClickHouse sink and Oracle driver mounted in.
  connect:
    image: quay.io/debezium/connect:3.5
    depends_on:
      - kafka
      - oracle
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: cdc-connect
      CONFIG_STORAGE_TOPIC: connect_configs
      OFFSET_STORAGE_TOPIC: connect_offsets
      STATUS_STORAGE_TOPIC: connect_statuses
      KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE: "false"
      CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: "false"
    volumes:
      - ./connect-plugins/clickhouse-kafka-connect-v1.3.7:/kafka/connect/clickhouse-kafka-connect
      # Place the Oracle JDBC driver next to the Oracle connector.
      - ./oracle-driver:/kafka/connect/debezium-connector-oracle/oracle-driver
 
  # The analytics database.
  clickhouse:
    image: clickhouse/clickhouse-server:26.3
    ports:
      - "8123:8123"
      - "9000:9000"
    environment:
      CLICKHOUSE_USER: default
      CLICKHOUSE_PASSWORD: clickhouse
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

The Oracle connector loads any JAR in its own folder, so mounting the driver into a subfolder of debezium-connector-oracle makes it available on startup.

Step 5: Start the stack and wait for Oracle

docker compose up -d

Oracle initialization takes a few minutes the first time. Watch the log until it reports the database is ready:

docker compose logs -f oracle

When you see a message that the database is ready to use, press Ctrl+C to stop following the log.

Step 6: Enable archive logging and supplemental logging

This is the Oracle-specific preparation. Open a SYSDBA session inside the Oracle container:

docker compose exec oracle sqlplus / as sysdba

First, put the database in archive log mode. This requires a quick restart of the database, which happens entirely inside the container:

SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
-- Confirm it worked; "Database log mode" should say "Archive Mode".
ARCHIVE LOG LIST;

Next, enable minimal supplemental logging at the database level so LogMiner can identify rows:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

Keep this SQL*Plus session open for the next step.

Step 7: Create the LogMiner user for Debezium

Debezium connects as a dedicated common user with permission to run LogMiner and read the necessary system views. In a container database, common user names must begin with c##. Still in your SYSDBA session, create a tablespace for the user and then the user itself:

-- A small tablespace for the connector, in both the CDB and the PDB.
CREATE TABLESPACE logminer_tbs DATAFILE
  '/opt/oracle/oradata/FREE/logminer_tbs.dbf'
  SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
 
ALTER SESSION SET CONTAINER = FREEPDB1;
CREATE TABLESPACE logminer_tbs DATAFILE
  '/opt/oracle/oradata/FREE/FREEPDB1/logminer_tbs.dbf'
  SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
ALTER SESSION SET CONTAINER = CDB$ROOT;
 
-- The common user, visible across the container database.
CREATE USER c##dbzuser IDENTIFIED BY dbz
  DEFAULT TABLESPACE logminer_tbs
  QUOTA UNLIMITED ON logminer_tbs
  CONTAINER = ALL;
 
GRANT CREATE SESSION                TO c##dbzuser CONTAINER = ALL;
GRANT SET CONTAINER                 TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$DATABASE         TO c##dbzuser CONTAINER = ALL;
GRANT FLASHBACK ANY TABLE           TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ANY TABLE              TO c##dbzuser CONTAINER = ALL;
GRANT SELECT_CATALOG_ROLE           TO c##dbzuser CONTAINER = ALL;
GRANT EXECUTE_CATALOG_ROLE          TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ANY TRANSACTION        TO c##dbzuser CONTAINER = ALL;
GRANT LOGMINING                     TO c##dbzuser CONTAINER = ALL;
GRANT CREATE TABLE                  TO c##dbzuser CONTAINER = ALL;
GRANT LOCK ANY TABLE                TO c##dbzuser CONTAINER = ALL;
GRANT CREATE SEQUENCE               TO c##dbzuser CONTAINER = ALL;
GRANT EXECUTE ON DBMS_LOGMNR        TO c##dbzuser CONTAINER = ALL;
GRANT EXECUTE ON DBMS_LOGMNR_D      TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$LOG              TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$LOG_HISTORY      TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$LOGMNR_LOGS      TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS  TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$LOGFILE          TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$ARCHIVED_LOG     TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$TRANSACTION      TO c##dbzuser CONTAINER = ALL;

That is a long list, but each grant lets Debezium read a specific piece of LogMiner state. Type exit to leave SQL*Plus.

This grant set is what Debezium 3.5 documents for an Oracle multitenant database. If a grant is rejected on your Oracle version, check the Oracle setup section of the Debezium documentation for that version rather than assuming.

Step 8: Create a table and enable supplemental logging on it

Connect to the pluggable database as the application user and create a table:

docker compose exec oracle sqlplus appuser/apppw@localhost:1521/FREEPDB1

CREATE TABLE customers (
  id         NUMBER(10) PRIMARY KEY,
  name       VARCHAR2(255) NOT NULL,
  email      VARCHAR2(255) NOT NULL,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
);
 
-- Record the full content of every changed row for this table.
ALTER TABLE customers ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
 
INSERT INTO customers (id, name, email) VALUES (1, 'Ada Lovelace', 'ada@example.com');
INSERT INTO customers (id, name, email) VALUES (2, 'Alan Turing',  'alan@example.com');
INSERT INTO customers (id, name, email) VALUES (3, 'Grace Hopper', 'grace@example.com');
COMMIT;

Remember that Oracle folds unquoted names to uppercase, so this table is known as APPUSER.CUSTOMERS. We will use that uppercase name in the connector configuration. Type exit when done.

Step 9: Create the target table in ClickHouse

ClickHouse is append-only at heart, so to reflect updates and deletes we use a ReplacingMergeTree, which keeps multiple versions of a row and returns only the newest version per key when asked. We add a version column and a deleted flag.

docker compose exec clickhouse clickhouse-client --password clickhouse

CREATE DATABASE IF NOT EXISTS shop;
 
CREATE TABLE shop.customers
(
    ID         Int64,
    NAME       String,
    EMAIL      String,
    UPDATED_AT String,
    -- Filled by the transformation we configure in Step 10.
    _version   UInt64,
    _deleted   UInt8
)
ENGINE = ReplacingMergeTree(_version, _deleted)
ORDER BY ID;

Notice the uppercase column names. Because Oracle uppercases identifiers, Debezium emits fields named ID, NAME, and so on, and the ClickHouse sink matches fields to columns by name, so the column names must match. ORDER BY ID declares the unique key, which matches the Oracle primary key. ReplacingMergeTree(_version, _deleted) keeps the row with the highest _version per ID and treats _deleted = 1 rows as removed.

Step 10: Register the Debezium Oracle source connector

Create a file named oracle-source.json:

{
  "name": "oracle-source",
  "config": {
    "connector.class": "io.debezium.connector.oracle.OracleConnector",
    "tasks.max": "1",
    "database.hostname": "oracle",
    "database.port": "1521",
    "database.user": "c##dbzuser",
    "database.password": "dbz",
    "database.dbname": "FREE",
    "database.pdb.name": "FREEPDB1",
    "topic.prefix": "shop",
    "table.include.list": "APPUSER.CUSTOMERS",
 
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-history.shop",
 
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.delete.tombstone.handling.mode": "rewrite",
    "transforms.unwrap.add.fields": "op,source.scn"
  }
}

Send it to Kafka Connect:

curl -X POST -H "Content-Type: application/json" \
  --data @oracle-source.json \
  http://localhost:8083/connectors

Let us unpack the Oracle-specific settings. database.dbname is the container database, FREE, and database.pdb.name is the pluggable database, FREEPDB1, where our table lives. table.include.list uses the uppercase schema and table name. The two schema.history.internal.kafka lines name the schema history topic, which Debezium manages for you (Oracle, like MySQL, needs this). topic.prefix of shop means events land in a Kafka topic named shop.APPUSER.CUSTOMERS.

The transforms block is the same idea as the previous parts. ExtractNewRecordState flattens Debezium's nested event into a plain row, delete.tombstone.handling.mode set to rewrite turns a delete into a row with an added __deleted field, and add.fields of op,source.scn attaches the operation type and the System Change Number. We use the SCN as our version, because it always increases, which is exactly what ReplacingMergeTree needs. Oracle gives us a clean ordering number for free, just like PostgreSQL's log sequence number.

These transformation option names are correct for Debezium 3.5. For other versions, check that version's ExtractNewRecordState documentation.

Confirm it is running:

curl -s http://localhost:8083/connectors/oracle-source/status

The state should read RUNNING. Debezium first snapshots the three existing rows, then streams new changes. The first LogMiner session can take a little longer to start than PostgreSQL or MySQL, so give it a moment.

Step 11: Register the ClickHouse sink connector

Create a file named clickhouse-sink.json:

{
  "name": "clickhouse-sink",
  "config": {
    "connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
    "tasks.max": "1",
    "topics": "shop.APPUSER.CUSTOMERS",
    "hostname": "clickhouse",
    "port": "8123",
    "database": "shop",
    "username": "default",
    "password": "clickhouse",
    "ssl": "false",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false",
 
    "transforms": "renameFields",
    "transforms.renameFields.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
    "transforms.renameFields.renames": "__source_scn:_version,__deleted:_deleted"
  }
}

Send it:

curl -X POST -H "Content-Type: application/json" \
  --data @clickhouse-sink.json \
  http://localhost:8083/connectors

The sink reads from the same topic Debezium writes to, shop.APPUSER.CUSTOMERS, and inserts into shop.customers in ClickHouse. The ReplaceField transformation renames Debezium's metadata fields (__source_scn and __deleted) to the column names we created (_version and _deleted).

Step 12: See it work

Query the three snapshotted rows:

docker compose exec clickhouse clickhouse-client --password clickhouse \
  --query "SELECT ID, NAME, EMAIL FROM shop.customers FINAL ORDER BY ID"

The FINAL keyword collapses all versions of each row down to the newest one. Without it you might briefly see more than one version, because ClickHouse merges versions in the background on its own schedule. Always use FINAL (or filter to the latest version yourself) with a ReplacingMergeTree.

Now make some changes in Oracle:

docker compose exec oracle sqlplus appuser/apppw@localhost:1521/FREEPDB1

UPDATE customers SET email = 'ada@newmail.com' WHERE id = 1;
INSERT INTO customers (id, name, email) VALUES (4, 'Edsger Dijkstra', 'edsger@example.com');
DELETE FROM customers WHERE id = 2;
COMMIT;

The COMMIT matters: Debezium only sees committed changes. Wait a few seconds, then query ClickHouse again:

docker compose exec clickhouse clickhouse-client --password clickhouse \
  --query "SELECT ID, NAME, EMAIL FROM shop.customers FINAL WHERE _deleted = 0 ORDER BY ID"

You should see Ada with her new email, the new row for Edsger, and Alan gone. The WHERE _deleted = 0 hides the deleted row, which ReplacingMergeTree keeps internally (marked as deleted) so it can override the older live version.

You now have a working real-time CDC pipeline from Oracle to ClickHouse.

How updates and deletes really work

When you updated Ada's email, Oracle wrote the change to its redo log, Debezium read it through LogMiner and emitted a flattened row with the new email and a higher SCN, and the sink inserted it as a new row in ClickHouse. For a moment ClickHouse held two rows with ID = 1. Because the table is a ReplacingMergeTree keyed on ID with _version as the version, a FINAL query returns only the row with the highest SCN, which is the new email. Later, ClickHouse merges the parts in the background and discards the old row.

The delete works the same way: Debezium emitted a row for Alan with _deleted set to true and a higher SCN, that row wins over Alan's older live row, and your query filters out deleted rows, so Alan disappears.

The mental model is the same as the other databases: you never update or delete in place, you always append a newer version, and the table engine plus FINAL give you the correct current picture.

Production considerations

This tutorial runs a single node of everything, which is great for learning but not for production. The general points from Parts 1 and 2 all apply: run one connector task per table, keep events for a row in one partition, replace plaintext with TLS and managed secrets, use at least three Kafka brokers with replication factor three, and enable exactly-once delivery on the ClickHouse sink when correctness is critical.

Oracle adds a few of its own production concerns. Archive logs accumulate on disk, so you must have a retention and cleanup policy, or the database will eventually run out of space. LogMiner uses real resources on the Oracle server, so size the database accordingly and monitor it. Supplemental logging adds a little overhead to writes, which is the price of capturing complete change events. And for real Oracle deployments, be mindful of Oracle licensing, which is a serious consideration that does not apply to PostgreSQL or MySQL. Many teams choose to move analytical workloads to ClickHouse precisely to reduce that cost.

Troubleshooting

If the connector fails with a driver or class-not-found error, the Oracle JDBC driver is not being found. Confirm the ojdbc11 JAR is in the mounted oracle-driver folder and that the path inside the container is under the Oracle connector's directory.

If the connector reports that the database is not in archive log mode, Step 6 did not take effect. Reconnect as SYSDBA and run ARCHIVE LOG LIST to check.

If you see no change events after an update, confirm you ran COMMIT, and confirm supplemental logging is enabled both at the database level and on the table. Without table-level supplemental logging, LogMiner cannot produce complete rows.

If rows never reach ClickHouse, confirm the Kafka topic has messages: docker compose exec kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic shop.APPUSER.CUSTOMERS --from-beginning --max-messages 1. Remember the topic name is case-sensitive and uppercase for Oracle.

If columns do not populate in ClickHouse, the field names from Oracle (uppercase) do not match your ClickHouse column names. The sink maps by name, so they must match exactly.

Cleaning up

docker compose down -v

The -v flag removes the data volumes for a clean slate.

Series wrap-up

Across this three-part series you have built the same CDC architecture for PostgreSQL, MySQL, and Oracle. The pattern never changed: a source database's change log, Debezium turning log entries into events, Kafka carrying them, and the ClickHouse Kafka Connect Sink landing them into a ReplacingMergeTree that you query with FINAL.

What changed each time was only how Debezium reads the source. PostgreSQL exposes logical replication and a clean log sequence number. MySQL exposes its binary log and needs a schema history topic. Oracle needs archive logging, supplemental logging, LogMiner, a privileged user, and a JDBC driver, and it rewards you with a clean System Change Number for versioning. Once you understand the shared shape, adding a new source is mostly a matter of learning that source's log.

If you would like help designing and operating a production-grade CDC pipeline into ClickHouse, including partitioning strategy, schema evolution, exactly-once delivery, and monitoring, the engineers at Quantrail Data do exactly this. Reach out through our services page and we will be glad to help.