Time-Series query performance
After optimizing storage, the next step is improving query performance.
This section explores two key techniques: optimizing ORDER BY
keys and using materialized views.
We'll see how these approaches can reduce query times from seconds to milliseconds.
Optimize ORDER BY keys
Before attempting other optimizations, you should optimize their ordering key to ensure ClickHouse produces the fastest possible results.
Choosing the key right largely depends on the queries you’re going to run. Suppose most of our queries filter by project
and subproject
columns.
In this case, its a good idea to add them to the ordering key - as well as the time column since we query on time as well:
Let's create another version of the table that has the same column types as wikistat
, but is ordered by (project, subproject, time)
.
Let’s now compare multiple queries to get an idea of how essential our ordering key expression is to performance. Note that we have haven't applied our previous data type and codec optimizations, so any query performance differences are only based on the sort order.
Query | (time) | (project, subproject, time) |
---|---|---|
2.381 sec | 1.660 sec | |
2.148 sec | 0.058 sec | |
2.192 sec | 0.012 sec | |
2.968 sec | 0.010 sec |
Materialized views
Another option is to use materialized views to aggregate and store the results of popular queries. These results can be queried instead of the original table. Suppose the following query is executed quite often in our case:
Create materialized view
We can create the following materialized view:
Backfilling destination table
This destination table will only be populated when new records are inserted into the wikistat
table, so we need to do some backfilling.
The easiest way to do this is using an INSERT INTO SELECT
statement to insert directly into the materialized view’s target table using the view's SELECT query (transformation) :
Depending on the cardinality of the raw data set (we have 1 billion rows!), this can be a memory-intensive approach. Alternatively, you can use a variant that requires minimal memory:
- Creating a temporary table with a Null table engine
- Connecting a copy of the normally used materialized view to that temporary table
- Using an INSERT INTO SELECT query, copying all data from the raw data set into that temporary table
- Dropping the temporary table and the temporary materialized view.
With that approach, rows from the raw data set are copied block-wise into the temporary table (which doesn’t store any of these rows), and for each block of rows, a partial state is calculated and written to the target table, where these states are incrementally merged in the background.
Next, we'll create a materialized view to read from wikistat_backfill
and write into wikistat_top
And then finally, we'll populate wikistat_backfill
from the initial wikistat
table:
Once that query's finished, we can delete the backfill table and materialized view:
Now we can query the materialized view instead of the original table:
Our performance improvement here is dramatic. Before it took just over 2 seconds to compute the answer to this query and now it takes only 4 milliseconds.