Jaeger Blog
Introducing Native ClickHouse Support in Jaeger
Photo by Puscas Adryan on UnsplashJaeger v2.18.0 introduces native ClickHouse support as a new storage backend. ClickHouse has been one of the most frequently requested options from users running Jaeger at scale, and this release makes it possible to use it directly with Jaeger.
In this post, I’ll walk you through why ClickHouse is such a powerhouse for storing traces, how the schema is designed under the hood, and how you can start using it with Jaeger today.
Why ClickHouse?
Distributed tracing produces a massive volume of semi-structured event data. Storing that data is only half the problem. Users also need to search by service, operation, tags, duration, time range, and trace ID, often across large datasets. Existing Jaeger backends such as Cassandra and Elasticsearch/OpenSearch work well for many deployments, but they also come with operational tradeoffs around indexing, scaling, retention, and query cost.
High-Throughput Ingest + Low Latency Queries
ClickHouse is a column-oriented OLAP database designed for high-throughput ingestion, compression, and analytical queries. Those characteristics map naturally to trace workloads, where users often scan and filter large amounts of repetitive telemetry data.
Compression That Actually Matters
Trace data is especially compression-friendly because fields like service names, operation names, tag keys, status codes, and resource attributes repeat frequently. Storing those fields column-by-column rather than row-by-row reduces disk usage and I/O, which improves both storage efficiency and query performance.
Real-Time Analytics
ClickHouse also opens the door to richer analytical queries over trace data. Because aggregations are efficient on columnar storage, Jaeger can support use cases such as service-level latency, error-rate, and throughput analysis directly from stored spans, depending on the capabilities exposed by the backend.
Designing The Schema
A major part of the work was designing a schema that matches Jaeger’s core query patterns: trace lookup by trace ID, search by service and operation, filtering by attributes, time-range queries, and aggregation for Jaeger’s Service Performance Monitoring (SPM) feature.
There’s an excellent earlier post by Ha Anh Vu that benchmarked ClickHouse schemas for jaeger-v1, and that work laid the foundation. However, jaeger-v2 adopts the OpenTelemetry data model, which forces us to revisit several decisions.
The full design space is in the Architecture Design Record (ADR). The sections below walk through some of the decisions that are worth calling out.
Choosing a Primary Key
In ClickHouse, the primary key isn’t a uniqueness constraint. Instead, it defines the on-disk sort order and powers a sparse index (one entry per 8,192-row granule). Picking it is the single highest-leverage decision in the schema.
We had two candidates for choosing a primary key:
- Optimize for trace retrieval: sort by trace_id. Every span of a trace lands in one contiguous block, so GetTrace is a single seek + sequential read. However, search queries pay for this optimization because service_name and operation_name filters cannot use the primary key index at all.
- Optimize for search (chosen): sort by (service_name, name, start_time). Search queries, which filter by service, operation, and a time window become direct primary-key lookups.
The decision came down to an asymmetric trade-off. Sorting by trace_id makes search performance terrible, but sorting by (service_name, name, start_time) hurts trace retrieval much less, because we can recover most of the lost performance with two cheap mechanisms:
- A bloom_filter skip index on trace_id, which lets the engine prove a granule can’t contain a given ID without reading it.
- A trace_id_timestamps materialized view (described below) that tells the search path each matching trace’s time bounds, so the follow-up GetTraces call can prune partitions and granules.
An earlier benchmark run with the schema sorted by trace_id showed the asymmetry concretely. Trace retrieval was about 27 ms, but a search query took ~880 ms. Re-sorting by (service_name, name, start_time) pushed trace retrieval to ~100 ms (slower, but still well under interactive thresholds) while bringing multi-filter search down to ~140 ms.
Storing Typed Attributes
In jaeger-v1, tags were always strings. The v2 reader API accepts a typed map, where attributes can be Bool, Int64, Float64, String, or one of the complex types (Bytes, Slice, Map). We need to query across these types, so the storage layer can’t collapse everything to strings.
The schema leverages ClickHouse’s Nested column per primitive type, repeated at the span, event, link, resource, and scope level. As the name implies, a Nested column behaves like a small sub-table inside each row, so attribute filters can use the same query semantics as querying a regular table.
However, it is worth noting that attribute-only searches are inherently more expensive because they cannot fully leverage ClickHouse’s primary index. The table’s index is optimized around top-level structural fields — specifically service, operation, and time. For optimal query performance and to prevent heavy column scans, users should always combine attribute filters with these fields to limit the data ClickHouse has to scan.
Materialized Views
Some of Jaeger’s queries don’t fit the spans table’s sort order. For example, the Jaeger UI needs to quickly load the full list of known service names and operations, while trace searches often need efficient access to trace time ranges.
Rather than answering these with expensive table scans, we use materialized views to precompute the data. In ClickHouse, materialized views automatically transform inserts into a source table and write the results into optimized target tables.
This approach is used to speed up queries for service names, operations, and trace ID timestamp ranges.
Resolving Typed Attributes
A technical challenge that may not be immediately obvious from the span’s schema: how the storage layer interprets attribute lookups. For instance, when searching for http.status_code=200, the system cannot inherently distinguish if “200” is a string, an integer, a span-level attribute, or a resource-level attribute. Depending on the service, the same logical key could be categorized under str_attributes or int_attributes, and it might exist at any of the five data levels: resource, scope, span, event, or link.
To solve this, we maintain a dedicated attribute_metadata table, populated by materialized views off the spans table. This allows the reader to look up the filter key at query time and only query the columns for the types and levels that were observed.
Benchmarks
We benchmarked the ClickHouse backend using 10 million spans across 1 million traces on a single-node ClickHouse deployment. The benchmark measured ingestion throughput, compression, trace retrieval, and search latency.
The backend sustained more than 50k spans/sec during ingestion, achieved an 8.6× compression ratio on the spans table, retrieved traces in around 100 ms, and kept most search queries under 50 ms. More complex filtered queries completed in about 140 ms.
These numbers are encouraging, but they should be read in the context of the benchmark environment and dataset. Full methodology, configuration, and query details are available in the benchmarking report.
Getting Started
ClickHouse support is available in alpha as a storage backend starting with Jaeger v2.18.0. You’ll need a running ClickHouse instance and the jaeger-v2 configuration for the ClickHouse backend. The full instructions are described in the setup guide.
Closing Thoughts
Being a Jaeger maintainer has been one of the most rewarding parts of my career so far. If you want to chat about this work, contribute, or report issues, please open one on GitHub or find us in the CNCF #jaeger Slack.
Introducing Native ClickHouse Support in Jaeger was originally published in JaegerTracing on Medium, where people are continuing the conversation by highlighting and responding to this story.