If Part 1 answered why JSON decisions matter in ClickHouse®, this part focuses on how ClickHouse® has traditionally handled JSON-like data before the native JSON type existed.
If you missed the part 1, you can read it here.
ClickHouse® did not start with a single “JSON solution”.
Instead, it offered multiple data types that could represent JSON indirectly, each with different trade-offs.
Understanding these approaches is essential, because:
- Many production systems still use them
- They explain why the native JSON type exists at all
- You’ll encounter them in real schemas
From Raw JSON to “Structured Enough”
In Part 1, we saw that storing JSON as a String increases the query time and CPU usage.
The natural next question is:
Can we give ClickHouse® some structure, without fully flattening everything?
Historically, ClickHouse® answered this question in three different ways:
MapTupleObject('json')
Each one represents a different compromise between flexibility, performance, and schema control.
Using Map: Dynamic Keys, Uniform Values
We can take the below JSON Data as our example in this Blog.
{
"event_type": "page_view",
"user_id": 12345,
"page": "/pricing",
"device": {
"os": "linux",
"browser": "firefox"
},
"timestamp": "2026-01-01T10:00:00Z"
}
A Map is often the first step teams take away from raw JSON strings.
Instead of storing JSON as text, you extract part of it into a Map column:
CREATE TABLE events
(
event_time DateTime,
event_type String,
user_id UInt64,
device Map(String, String)
)
ENGINE = MergeTree
ORDER BY event_time;
This works well when:
- You want to query by key without parsing JSON.
- Keys are not fixed.
- Values all share the same type.
Example query:
SELECT
device['os'] AS os,
device['browser'] AS browser
FROM events;
What improves
- No JSON parsing at query time
- Key lookups are efficient
- Data is stored in a structured, column-friendly format
The hidden limitations
- All values must share the same type
- Nested objects are not supported
- There is no per-key type enforcement
A Map optimizes access, not meaning.
If your JSON contains mixed types or deep nesting, Map quickly becomes insufficient.
Using Tuple: Fixed Structure Inside a Column
A Tuple represents the opposite extreme.
Instead of dynamic keys, you define a fixed schema inside a single column:
CREATE TABLE events
(
event_time DateTime,
device Tuple(
os String,
browser String
)
)
ENGINE = MergeTree
ORDER BY event_time;
Querying is straightforward:
SELECT
device.os,
device.browser
FROM events;
Why Tuple is fast
- Layout is known at compile time
- Types are strictly enforced
- No runtime discovery or parsing
From ClickHouse®’s perspective, a Tuple is not “semi-structured” at all.
It is fully structured data wrapped inside one column.
The trade-off
- Schema changes require table changes
- Adding or removing fields is painful
- Not suitable for evolving JSON shapes
A Tuple trades flexibility for speed.
It works best when the JSON structure is stable and well understood.
Object('json'): Automatic Structure Discovery
To reduce the friction of manual modeling, ClickHouse® introduced an experimental type:
data Object('json')
The idea was ambitious:
- Store JSON
- Let ClickHouse® discover keys automatically
- Expose them like columns
In practice, this felt convenient at first:
- No upfront schema work
- Fields appeared dynamically
- Queries looked clean
Why it didn’t last
Automatic schema discovery turns out to be the hardest problem:
- Schema evolution becomes unpredictable
- Type inference can change over time
- Users lose control over data layout
Most importantly:
Object('json') does not remove the need to think about schema – it only postpones it.
Because of these issues, Object('json') is now deprecated and should be avoided in production systems.
You may still encounter it in older tables, which is why understanding it matters.
What All These Approaches Have in Common
Despite their differences, String, Map, Tuple, and Object('json') share a key trait:
They all force you to choose between flexibility and structure upfront.
| Approach | Flexibility | Performance | Schema Control |
|---|---|---|---|
| String | Very high | Low | None |
| Map | Medium | Medium | Low |
| Tuple | Low | High | High |
| Object(‘json’) | High | Unpredictable | Low |
There is no free lunch.
Each option exists because real systems evolve differently.
Why This Led to the Native JSON Type
These traditional approaches worked – but none solved the core tension:
- JSON evolves
- Analytics demand structure
- Performance matters at scale
This is the gap the native JSON data type was designed to address.
Instead of choosing either flexibility or structure upfront, it introduces a new model:
- Binary storage
- Lazy parsing
- Internal structural awareness
That’s where Part 3 begins.
Conclusion: Traditional JSON Handling Was Always a Trade-Off
Before the native JSON type, ClickHouse® users had to make a choice:
- Parse everything at query time
- Or lock structure early and lose flexibility
Understanding these historical approaches is critical, because:
- Many systems still rely on them
- They explain why certain schemas behave well – and others don’t
- They provide context for evaluating the native JSON type realistically
In Part 3, we’ll deep dive into the native JSON data type:
And how it changes – but does not eliminate -schema decisions
How it stores data
What “lazy parsing” actually means
References
ClickHouse® Data Types
https://clickhouse.com/docs/en/sql-reference/data-types
JSON Type in ClickHouse®
https://clickhouse.com/blog/introducing-the-json-data-type-in-clickhouse
