Skip to main content
Skip to main content

Time-Series analysis functions

Time series analysis in ClickHouse can be performed using standard SQL aggregation and window functions. When working with time series data, you'll typically encounter three main types of metrics:

  • Counter metrics that monotonically increase over time (like page views or total events)
  • Gauge metrics that represent point-in-time measurements that can go up and down (like CPU usage or temperature)
  • Histograms that sample observations and count them in buckets (like request durations or response sizes)

Common analysis patterns for these metrics include comparing values between periods, calculating cumulative totals, determining rates of change, and analyzing distributions. These can all be achieved through combinations of aggregations, window functions like sum() OVER, and specialized functions like histogram().

Period-over-period changes

When analyzing time series data, we often need to understand how values change between time periods. This is essential for both gauge and counter metrics. The lagInFrame window function lets us access the previous period's value to calculate these changes.

The following query demonstrates this by calculating day-over-day changes in views for "Weird Al" Yankovic's Wikipedia page. The trend column shows whether traffic increased (positive values) or decreased (negative values) compared to the previous day, helping identify unusual spikes or drops in activity.

Cumulative values

Counter metrics naturally accumulate over time. To analyze this cumulative growth, we can calculate running totals using window functions.

The following query demonstrates this by using the sum() OVER clause creates a running total, while the bar() function provides a visual representation of the growth.

Rate calculations

When analyzing time series data, it's often useful to understand the rate of events per unit of time. This query calculates the rate of page views per second by dividing hourly totals by the number of seconds in an hour (3600). The visual bar helps identify peak hours of activity.

Histograms

A popular use case for time series data is to build histograms based on tracked events. Suppose we wanted to understand the distribution of a number of pages based on their total hits, only including pages that have over 10,000 hits. We can use the histogram() function to automatically generate an adaptive histogram based on the number of bins:

We can then use arrayJoin() to massage the data and bar() to visualize it: