Joining Kafka Topics in GlassFlow with ClickHouse

Introduction

In real-time data engineering, it’s common to work with events from multiple sources. Imagine one Kafka topic streaming user data and another streaming role assignments – we often need to join these streams in real time to build a complete picture.

This is where GlassFlow Join comes in. It lets you merge two Kafka topics into a single enriched stream, which you can then ingest into ClickHouse for analytics.

👉 Before diving in, you might want to check out my earlier blogs:

Now, let’s walk through a hands-on example of joining two Kafka topics step by step.

Prerequisites

Docker & Docker Compose installed.
Kafka service running (with ability to create topics).
GlassFlow stack running (stable images).
A ClickHouse server running (to store the results).

1. Create Kafka Topics and Add Sample Data

We’ll create two topics – one for users and one for roles:

docker exec -it kafka kafka-topics --create \
  --topic users_join_topic \
  --bootstrap-server localhost:9092 \
  --partitions 1 --replication-factor 1

docker exec -it kafka kafka-topics --create \
  --topic roles_join_topic \
  --bootstrap-server localhost:9092 \
  --partitions 1 --replication-factor 1

Now push some events into each topic:

users_join_topic

{"id":1,"name":"Alice","email":"alice@example.com","created_at":"2025-09-02 12:00:00"}
{"id":2,"name":"Bob","email":"bob@example.com","created_at":"2025-09-02 12:05:00"}

roles_join_topic

{"id":1,"role":"Admin"}
{"id":2,"role":"User"}

2. Open GlassFlow and choose Join function

3. Setup Kafka Connection

Select Left and Right Topics

Select users_join_topic as the left topic.
Select roles_join_topic as the right topic.

💡 Note: Left = primary stream, right = supplementary data.

4. Define the Join Key

Next, define the key on which the join will happen. In our case, both topics share the id field.

Left Key: users_join_topic.id
Right Key: roles_join_topic.id

5. Configure ClickHouse Connection

We’ll store the joined output into a ClickHouse table.

First, create the table in ClickHouse:

CREATE TABLE users_roles (
    id UInt32,
    name String,
    email String,
    created_at String,
    role String
)
ENGINE = MergeTree
ORDER BY id;

Now configure the ClickHouse connection inside GlassFlow.

6. Select Destination Table

Choose the users_roles table as the destination. Map the fields from both topics to the table columns.

7. Deploy the Pipeline

Deploy the pipeline and wait for it to go Active.

Now insert new events into both Kafka topics – the joined data will appear in ClickHouse in real time.

Final Output

Here’s what the final joined table looks like:

Conclusion

In this tutorial, we successfully:

Created two Kafka topics (users_join_topic and roles_join_topic).
Joined them in GlassFlow using the Join node.
Stored the results in ClickHouse.

This is a powerful pattern in real-time pipelines – combining different streams into a single enriched dataset.

👉 Don’t forget to check out the previous blogs in this series for more on GlassFlow Ingest and GlassFlow Deduplication.

Thinking Ahead with ClickHouse

Getting started with GlassFlow locally is a great first step. But scaling with ClickHouse in production requires more than setup – it means handling deployments, migrations, and performance tuning.

That’s exactly why Quantrail Data exists. We take care of the hard parts – from managed ClickHouse services to migration support – so your team can focus on analytics, not infrastructure.

Ready to scale with confidence? Contact Quantrail Data today.

References

GlassFlow GitHub Repository
GlassFlow Ingestion blog
GlassFlow Deduplication blog

Post Views: 488

Hands-On : Joining Kafka Topics in GlassFlow with ClickHouse