, ,

GlassFlow ETL with ClickHouse: Simplifying Real-Time Data Pipelines

Mohamed Hussain S avatar
GlassFlow ETL with ClickHouse: Simplifying Real-Time Data Pipelines

In modern data engineering, building and managing ETL pipelines can quickly become complex. Traditional setups often require extensive coding, custom orchestration, and constant monitoring of message brokers like Kafka.

This is where GlassFlow comes in – a tool designed to simplify the ClickHouse ETL process with a no-code UI, plug-and-play integration with Kafka, and lightweight deployment via Docker.

GlassFlow is an open-source ETL framework that helps teams easily build real-time data pipelines with ClickHouse. It abstracts away the complexity of ingestion, deduplication, aggregation, and transformation so that engineers can focus more on business logic and less on infrastructure.

At its core, GlassFlow:

  • Provides a UI-driven pipeline builder.
  • Connects seamlessly to Kafka for streaming data.
  • Handles transformations inside the pipeline without requiring external jobs.

Traditional ETL solutions come with a few key challenges:

  1. High complexity – Writing custom Kafka consumers/producers, managing state, and scaling workers.
  2. Fragmented tool chains – Engineers need separate tools for ingestion, deduplication, and transformation.
  3. Difficult to maintain – Scaling and debugging streaming pipelines across distributed systems is painful.

GlassFlow solves these by providing a single, unified platform where you can design, monitor, and run pipelines without writing boilerplate code.

  1. No-Code UI – Drag-and-drop style pipeline creation.
  2. Native Kafka Integration – Ingest directly from Kafka topics, apply deduplication rules, aggregate data, and transform before writing to ClickHouse.
  3. Lightweight Deployment – Run everything via Docker Compose with minimal dependencies.
  4. End-to-End Visibility – Logs and metrics are collected centrally for easy debugging.

The GlassFlow Docker setup highlights the following components:

  • UI – Frontend for building and monitoring ETL pipelines.
  • App (Backend) – Executes pipelines, manages pipeline state, and coordinates tasks.
  • Nginx – Serves as a reverse proxy between UI and backend.
  • NATS – A lightweight messaging system used internally by GlassFlow for communication between services.

This is an important detail many first-time users notice. The UI and backend pipelines require Kafka connections for ingestion and processing. However, internally, GlassFlow services use NATS for communication and event handling.

Think of it like this:

  • Kafka = external pipeline data source (where your events actually come from).
  • NATS = internal event bus that powers GlassFlow’s own coordination between UI ↔ backend ↔ workers.

This separation ensures GlassFlow remains lightweight and easy to deploy, while still integrating tightly with Kafka for real-world pipelines.

You can try GlassFlow locally with a simple Docker Compose setup:

git clone https://github.com/glassflow/clickhouse-etl.git
cd clickhouse-etl
docker compose up

Once deployed, access the UI on http://localhost:8080. From there, you can:

  • Connect to Kafka topics.
  • Apply deduplication rules.
  • Aggregate streaming data.
  • Write transformed output into ClickHouse.

GlassFlow makes it easier than ever to manage real-time ETL pipelines for ClickHouse. By combining Kafka integration with a no-code UI and lightweight NATS-based architecture, it provides a practical solution for teams that want speed and simplicity without sacrificing flexibility.

Whether you are a data engineer experimenting locally or deploying production-grade ETL, GlassFlow is a tool worth exploring.

GlassFlow GitHub Repository

ClickHouse Documentation

NATS.io

Apache Kafka