Erik Rigtorp

Efficient rolling statistics with NumPy

When working with time series data with NumPy I often find myself needing to compute rolling or moving statistics such as mean and standard deviation. The simplest way compute that is to use a for loop:

def rolling_apply(fun, a, w):
    r = np.empty(a.shape)
    r.fill(np.nan)
    for i in range(w - 1, a.shape[0]):
        r[i] = fun(a[(i-w+1):i+1])
    return r

A loop in Python are however very slow compared to a loop in C code. Fortunately there is a trick to make NumPy perform this looping internally in C code. This is achieved by adding an extra dimension with the same size as the window and an appropriate stride:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

Using this function it is easy to calculate for example a rolling mean without looping in Python:

>>> x=np.arange(10).reshape((2,5))
>>> rolling_window(x, 3)
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
       [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])
    
>>> np.mean(rolling_window(x, 3), -1)
array([[ 1.,  2.,  3.],
       [ 6.,  7.,  8.]])

Update 2021-04-21: NumPy now comes with a builtin function sliding_window_view that does exactly this. There’s also the Bottleneck library with optimized functions for rolling mean, standard deviation etc.

More about the “stride trick”: SegmentAxis, GameOfLifeStrides