Erik Rigtorp

Efficient rolling statistics with NumPy

When working with time series data with NumPy I often find myself needing to compute rolling or moving statistics such as mean and standard deviation. The simplest way compute that is to use a for loop:

``````def rolling_apply(fun, a, w):
r = np.empty(a.shape)
r.fill(np.nan)
for i in range(w - 1, a.shape[0]):
r[i] = fun(a[(i-w+1):i+1])
return r
``````

A loop in Python are however very slow compared to a loop in C code. Fortunately there is a trick to make NumPy perform this looping internally in C code. This is achieved by adding an extra dimension with the same size as the window and an appropriate stride:

``````def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
``````

Using this function it is easy to calculate for example a rolling mean without looping in Python:

``````>>> x=np.arange(10).reshape((2,5))
>>> rolling_window(x, 3)
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

>>> np.mean(rolling_window(x, 3), -1)
array([[ 1.,  2.,  3.],
[ 6.,  7.,  8.]])
``````

Update 2021-04-21: NumPy now comes with a builtin function `sliding_window_view` that does exactly this. There’s also the Bottleneck library with optimized functions for rolling mean, standard deviation etc.

More about the “stride trick”: SegmentAxis, GameOfLifeStrides