If you work with modern data systems, JSON eventually becomes unavoidable.
Imagine you’re collecting application events – page views, button clicks, or API logs. Each event arrives as a JSON object, and ClickHouse® happily accepts it. Everything works fine at first.
Then a few weeks later:
- Dashboards start slowing down
- Queries become harder to read
- Someone asks why extracting one field costs so much CPU
That’s usually when people realize:
JSON inside ClickHouse® behaves very differently than expected.
It’s the first part of a multi-part series on JSON in ClickHouse®, focused on building a beginner-friendly understanding of what JSON really means in ClickHouse® before talking about performance or optimization.
A Real-World Scenario: Event Tracking Data
Let’s ground this discussion with a simple example.
Assume your application emits events like this:
{
"event_type": "page_view",
"user_id": 12345,
"page": "/pricing",
"device": {
"os": "linux",
"browser": "firefox"
},
"timestamp": "2026-01-01T10:00:00Z"
}
The Most Common Starting Point: JSON as a String
Most teams start by storing the JSON as-is, without worrying about structure.
CREATE TABLE events
(
event_time DateTime,
event_data String
)
ENGINE = MergeTree
ORDER BY event_time;
The raw JSON above is inserted into event_data.
At this stage:
- ClickHouse® treats JSON as plain text
- It does not understand keys or types
- Any valid JSON shape is accepted
When you need a value, you extract it during query time:
SELECT
JSONExtractString(event_data, 'event_type') AS event_type,
JSONExtractInt(event_data, 'user_id') AS user_id
FROM events;
This feels natural and works well early on. Ingestion is effortless, and schema changes don’t require table changes.
The downside is subtle:
ClickHouse® must parse the JSON every time this query runs.
Why This Becomes a Problem at Scale
At small volumes, parsing JSON repeatedly doesn’t hurt much.
But as data grows:
- Dashboards run every few seconds
- Multiple teams query the same JSON fields
- Queries scan millions or billions of rows
At this point,ClickHouse® is doing the same work again and again – parsing the same JSON structure just to extract the same fields.
This is when JSON stops feeling lightweight and starts becoming expensive.
Moving from “Raw JSON” to “Modeled Data”
If you look closely at the JSON example, some things are obvious:
event_typealways existsuser_idis always a numbertimestampis always a timestamp
Instead of rediscovering this structure on every query, ClickHouse® allows you to model it explicitly.
For example:
CREATE TABLE events
(
event_time DateTime,
event_type String,
user_id UInt64,
page String,
device Map(String, String)
)
ENGINE = MergeTree
ORDER BY event_time;
Now:
- Frequently queried fields are real columns
- Nested but flexible data (
device) remains dynamic - ClickHouse® can store and query this efficiently
This approach requires more thought upfront, but it aligns much better with how ClickHouse® is designed to work.
A Middle Attempt: Automatic JSON Objects
To reduce the friction of modeling JSON, ClickHouse® introduced a type that tried to combine flexibility with structure.
data Object('json')
The idea was:
- Store JSON
- Let ClickHouse® automatically expose keys
- Query them like columns
This worked for some use cases but introduced limitations around schema evolution and control. Over time, it became less central to ClickHouse®’s long-term direction.
You may still encounter it in existing tables, which is why it’s important to recognize – even if it’s no longer the preferred approach.
Note : The experimental Object (or Object('json')) data type is deprecated and should be avoided in a production environment.
A New Direction: The Native JSON Type
More recently, ClickHouse® introduced a native JSON data type:
data JSON
This type:
- Stores JSON in a binary representation
- Parses data lazily
- Understands JSON structure internally
It allows ClickHouse® to work with semi-structured data more efficiently, without forcing you to fully flatten everything upfront.
This doesn’t eliminate the need for schema design – but it gives you more flexibility when working with evolving JSON data.
How JSON Usually Enters ClickHouse®
In real systems, JSON often flows into ClickHouse® through:
JSONEachRowinput format- Kafka engine tables
- Application logs and event streams
- API ingestion pipelines
Problems rarely appear during ingestion. They appear later, when querying becomes frequent and performance starts to matter.
That’s why understanding JSON early is important -decisions made at ingestion time shape how your system behaves months later.
Conclusion: JSON in ClickHouse® Is a Journey
There is no single “correct” way to handle JSON in ClickHouse®.
Most teams start with raw JSON strings.
Many gradually model frequently used fields.
Some adopt the native JSON type for balance and flexibility.
All of these approaches exist because JSON and analytical databases solve different problems.
This first blog focused on building intuition, not prescribing solutions.
In the next parts of this series, we’ll explore each JSON approach in detail, including performance characteristics, query patterns, and guidance on when to use which option.
References
ClickHouse® JSON Functions
https://clickhouse.com/docs/en/sql-reference/functions/json-functions
ClickHouse® Data Types
https://clickhouse.com/docs/en/sql-reference/data-types
JSON Type in ClickHouse®
https://clickhouse.com/blog/introducing-the-json-data-type-in-clickhouse
ClickHouse® Input Formats (JSONEachRow)
https://clickhouse.com/docs/en/interfaces/formats#jsoneachrow
