Designing a high performance market data feed handler

2012-11-22

The feed handler is one of the most important component of any algorithmic trading system. Without accurate market data any high-frequency or algorithmic strategy won’t be able to make correct decisions. This post describes the design of a simple feed handler for cash equities data.

Cash equities data is usually distributed as a series of events updating the status of an order in the system:

Add order (Reference, Symbol, Side, Price, Quantity)
Cancel order (Reference, Decremented quantity)
Replace order (Reference, New reference, New price, New quantity)
Order executed (Reference, Executed quantity)

During a normal trading day it’s common to receive hundreds of thousands of these events per second. The challenge is to process these events and aggregate the orders into a series of bid and offer prices and sizes with low latency. This involves several steps:

The data is received on the wire
Decode the wire format
Update the state of the order
Update the state of the aggregate order book
Inform consumers of the updated order book

The first step can be accelerated by using kernel bypass technology which transfers data directly from the NIC into userspace memory using DMA. Decoding the wire format is usually trivial. In order to track the state of each order the feed handler needs two mappings. One from symbol to order book and one from order reference to order. These mappings are unordered and implemented using hash tables. Each order book needs to keep track of two ordered sets of aggregated price levels, one for bids and one for offers. A typical data structure for this would be a binary search tree. Finally any consumers of the market data are informed. This setup results in O(log n) complexity in number of price levels of the feed handler.

I created a simple C++ implementation of a feed handler for NASDAQ ITCH 4.1 and benchmarked it using example data downloaded from NASDAQ:

Mean:   0.71 us/update
95%:    1.03 us/update
Max:    1007.14 us/update
Min:    0.01 us/update
0 
0.1 
0.2 
0.3 ******
0.4 ************
0.5 **************
0.6 ****************
0.7 *****************
0.8 ***************
0.9 ***************

These statistics were generated by timing batches of 100 updates in order to reduce the measurement error. The results looks decent, but the histogram shows a fairly wide distribution around the mean. So what’s going on here? Jitter can usually be attributed to memory allocations and cpu cache misses. There are memory allocations occurring in the hash tables and the search trees whenever something is added or removed. Each hash table look up pulls in two cache lines and each search tree operation pulls in many cache lines. How can we improve the cache efficiency of the feed handler? One important property of market data is that the updates are exponentially distributed over price. So by keeping the price levels in a sorted array most of the insertions and deletions would only cause a few elements to be moved around and no memory allocations. Furthermore in-order iteration will be cache efficient:

Mean:   0.57 us/update
95%:    0.79 us/update
Max:    992.53 us/update
Min:    0.01 us/update
0 
0.1 
0.2 **
0.3 ************
0.4 ********************
0.5 *************************
0.6 **********************
0.7 **********
0.8 ***
0.9 **

This looks much better and is very consistent even though complexity is O(n) in number of price levels. There are further micro optimizations possible and big gains can be achieved by only processing a small set of symbols.

The implementation is available at github.