Efficient rolling statistics with NumPy
When working with time series data with NumPy I often find myself needing to compute rolling or moving statistics such as mean and standard deviation. The simplest way compute that is to use a for loop:
def rolling_apply(fun, a, w):
r = np.empty(a.shape)
r.fill(np.nan)
for i in range(w - 1, a.shape[0]):
r[i] = fun(a[(i-w+1):i+1])
return r
A loop in Python are however very slow compared to a loop in C code. Fortunately there is a trick to make NumPy perform this looping internally in C code. This is achieved by adding an extra dimension with the same size as the window and an appropriate stride:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
Using this function it is easy to calculate for example a rolling mean without looping in Python:
>>> x=np.arange(10).reshape((2,5))
>>> rolling_window(x, 3)
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])
>>> np.mean(rolling_window(x, 3), -1)
array([[ 1., 2., 3.],
[ 6., 7., 8.]])
Update 2021-04-21: NumPy now comes with a builtin function
sliding_window_view
that does exactly this. There’s also the
Bottleneck library with optimized
functions for rolling mean, standard deviation etc.
More about the “stride trick”: SegmentAxis, GameOfLifeStrides