Skip to main content
Skip to main content

Time-Series query performance

After optimizing storage, the next step is improving query performance. This section explores two key techniques: optimizing ORDER BY keys and using materialized views. We'll see how these approaches can reduce query times from seconds to milliseconds.

Optimize ORDER BY keys

Before attempting other optimizations, you should optimize their ordering key to ensure ClickHouse produces the fastest possible results. Choosing the key right largely depends on the queries you’re going to run. Suppose most of our queries filter by project and subproject columns. In this case, its a good idea to add them to the ordering key - as well as the time column since we query on time as well:

Let's create another version of the table that has the same column types as wikistat, but is ordered by (project, subproject, time).

Let’s now compare multiple queries to get an idea of how essential our ordering key expression is to performance. Note that we have haven't applied our previous data type and codec optimizations, so any query performance differences are only based on the sort order.

Query(time)(project, subproject, time)
2.381 sec1.660 sec
2.148 sec0.058 sec
2.192 sec0.012 sec
2.968 sec0.010 sec

Materialized views

Another option is to use materialized views to aggregate and store the results of popular queries. These results can be queried instead of the original table. Suppose the following query is executed quite often in our case:

Create materialized view

We can create the following materialized view:

Backfilling destination table

This destination table will only be populated when new records are inserted into the wikistat table, so we need to do some backfilling.

The easiest way to do this is using an INSERT INTO SELECT statement to insert directly into the materialized view’s target table using the view's SELECT query (transformation) :

Depending on the cardinality of the raw data set (we have 1 billion rows!), this can be a memory-intensive approach. Alternatively, you can use a variant that requires minimal memory:

  • Creating a temporary table with a Null table engine
  • Connecting a copy of the normally used materialized view to that temporary table
  • Using an INSERT INTO SELECT query, copying all data from the raw data set into that temporary table
  • Dropping the temporary table and the temporary materialized view.

With that approach, rows from the raw data set are copied block-wise into the temporary table (which doesn’t store any of these rows), and for each block of rows, a partial state is calculated and written to the target table, where these states are incrementally merged in the background.

Next, we'll create a materialized view to read from wikistat_backfill and write into wikistat_top

And then finally, we'll populate wikistat_backfill from the initial wikistat table:

Once that query's finished, we can delete the backfill table and materialized view:

Now we can query the materialized view instead of the original table:

Our performance improvement here is dramatic. Before it took just over 2 seconds to compute the answer to this query and now it takes only 4 milliseconds.