,

Traditional JSON handling in ClickHouse®

Sanjeev Kumar G avatar
Traditional JSON handling in ClickHouse®

If Part 1 answered why JSON decisions matter in ClickHouse®, this part focuses on how ClickHouse® has traditionally handled JSON-like data before the native JSON type existed.

If you missed the part 1, you can read it here.

ClickHouse® did not start with a single “JSON solution”.
Instead, it offered multiple data types that could represent JSON indirectly, each with different trade-offs.

Understanding these approaches is essential, because:

  • Many production systems still use them
  • They explain why the native JSON type exists at all
  • You’ll encounter them in real schemas

From Raw JSON to “Structured Enough”

In Part 1, we saw that storing JSON as a String increases the query time and CPU usage.

The natural next question is:

Can we give ClickHouse® some structure, without fully flattening everything?

Historically, ClickHouse® answered this question in three different ways:

  • Map
  • Tuple
  • Object('json')

Each one represents a different compromise between flexibility, performance, and schema control.

Using Map: Dynamic Keys, Uniform Values

We can take the below JSON Data as our example in this Blog.

{
  "event_type": "page_view",
  "user_id": 12345,
  "page": "/pricing",
  "device": {
    "os": "linux",
    "browser": "firefox"
  },
  "timestamp": "2026-01-01T10:00:00Z"
}

A Map is often the first step teams take away from raw JSON strings.

Instead of storing JSON as text, you extract part of it into a Map column:

CREATE TABLE events
(
    event_time DateTime,
    event_type String,
    user_id UInt64,
    device Map(String, String)
)
ENGINE = MergeTree
ORDER BY event_time;

This works well when:

  • You want to query by key without parsing JSON.
  • Keys are not fixed.
  • Values all share the same type.

Example query:

SELECT
    device['os'] AS os,
    device['browser'] AS browser
FROM events;

What improves

  • No JSON parsing at query time
  • Key lookups are efficient
  • Data is stored in a structured, column-friendly format

The hidden limitations

  • All values must share the same type
  • Nested objects are not supported
  • There is no per-key type enforcement

A Map optimizes access, not meaning.

If your JSON contains mixed types or deep nesting, Map quickly becomes insufficient.

Using Tuple: Fixed Structure Inside a Column

A Tuple represents the opposite extreme.

Instead of dynamic keys, you define a fixed schema inside a single column:

CREATE TABLE events
(
    event_time DateTime,
    device Tuple(
        os String,
        browser String
    )
)
ENGINE = MergeTree
ORDER BY event_time;

Querying is straightforward:

SELECT
    device.os,
    device.browser
FROM events;

Why Tuple is fast

  • Layout is known at compile time
  • Types are strictly enforced
  • No runtime discovery or parsing

From ClickHouse®’s perspective, a Tuple is not “semi-structured” at all.
It is fully structured data wrapped inside one column.

The trade-off

  • Schema changes require table changes
  • Adding or removing fields is painful
  • Not suitable for evolving JSON shapes

A Tuple trades flexibility for speed.

It works best when the JSON structure is stable and well understood.

Object('json'): Automatic Structure Discovery

To reduce the friction of manual modeling, ClickHouse® introduced an experimental type:

data Object('json')

The idea was ambitious:

  • Store JSON
  • Let ClickHouse® discover keys automatically
  • Expose them like columns

In practice, this felt convenient at first:

  • No upfront schema work
  • Fields appeared dynamically
  • Queries looked clean

Why it didn’t last

Automatic schema discovery turns out to be the hardest problem:

  • Schema evolution becomes unpredictable
  • Type inference can change over time
  • Users lose control over data layout

Most importantly:

Object('json') does not remove the need to think about schema – it only postpones it.

Because of these issues, Object('json') is now deprecated and should be avoided in production systems.

You may still encounter it in older tables, which is why understanding it matters.

What All These Approaches Have in Common

Despite their differences, String, Map, Tuple, and Object('json') share a key trait:

They all force you to choose between flexibility and structure upfront.

ApproachFlexibilityPerformanceSchema Control
StringVery highLowNone
MapMediumMediumLow
TupleLowHighHigh
Object(‘json’)HighUnpredictableLow

There is no free lunch.

Each option exists because real systems evolve differently.

Why This Led to the Native JSON Type

These traditional approaches worked – but none solved the core tension:

  • JSON evolves
  • Analytics demand structure
  • Performance matters at scale

This is the gap the native JSON data type was designed to address.

Instead of choosing either flexibility or structure upfront, it introduces a new model:

  • Binary storage
  • Lazy parsing
  • Internal structural awareness

That’s where Part 3 begins.

Conclusion: Traditional JSON Handling Was Always a Trade-Off

Before the native JSON type, ClickHouse® users had to make a choice:

  • Parse everything at query time
  • Or lock structure early and lose flexibility

Understanding these historical approaches is critical, because:

  • Many systems still rely on them
  • They explain why certain schemas behave well – and others don’t
  • They provide context for evaluating the native JSON type realistically

In Part 3, we’ll deep dive into the native JSON data type:

And how it changes – but does not eliminate -schema decisions

How it stores data

What “lazy parsing” actually means

References

ClickHouse® Data Types
https://clickhouse.com/docs/en/sql-reference/data-types

JSON Type in ClickHouse®
https://clickhouse.com/blog/introducing-the-json-data-type-in-clickhouse