Skip to main content
Skip to main content

Basic time-series operations

ClickHouse provides several methods for working with time series data, allowing you to aggregate, group, and analyze data points across different time periods. This section covers the fundamental operations commonly used when working with time-based data.

Common operations include grouping data by time intervals, handling gaps in time series data, and calculating changes between time periods. These operations can be performed using standard SQL syntax combined with ClickHouse's built-in time functions.

We’re going to explore ClickHouse time-series querying capabilities with the Wikistat (Wikipedia pageviews data) dataset:

Let’s populate this table with 1 billion records:

Aggregating by time bucket

The most popular requirement is to aggregate data based on periods, e.g. get the total amount of hits for each day:

We’ve used the toDate() function here, which converts the specified time to a date type. Alternatively, we can batch by an hour and filter on the specific date:

The toStartOfHour() function used here converts the given time to the start of the hour. You can also group by year, quarter, month, or day.

Custom grouping intervals

We can even arbitrary intervals, e.g., 5 minutes using the toStartOfInterval() function.

Let’s say we want to group by 4-hour intervals. We can specify the grouping interval using the INTERVAL clause:

Or we can use the toIntervalHour() function

Either way, we get the following results:

Filling empty groups

In a lot of cases we deal with sparse data with some absent intervals. This results in empty buckets. Let’s take the following example where we group data by 1-hour intervals. This will out the following stats with some hours missing values:

ClickHouse provides the WITH FILL modifier to address this. This will fill out all the empty hours with zeros, so we can better understand the distribution over time:

Rolling time windows

Sometimes, we don’t want to deal with the start of intervals (like the start of the day or an hour) but window intervals. Let’s say we want to understand the total hits for a window, not based on days but on a 24-hour period offset from 6 pm.

We can use the date_diff() function to calculate the difference between a reference time and each record’s time. In this case, the day column will represent the difference in days (e.g., 1 day ago, 2 days ago, etc.):