1. The Econometrics of High Frequency Financial Data: Models and Microstructure
Abstract: Recent years have seen a rapid growth in high frequency
financial data, This has opened the possibility of accurately
determining volatility in small time periods, such as one day. We
introduce the types of data, discuss what quantities can reasonably be
estimated in this setting, and review challenges for research.
A main issue in data analysis is microstructure noise. Recent work on
such estimation indicates that it is necessary to fit the data with a
hidden semi-martingale model, thereby incorporating the noise. We
develop the methodology for analyzing such data, including two- and
multi-scale sampling. We shall see that the resulting estimators have
the best possible rates of convergence, and characterize the
statistical error in estimators. We also show how to make these
estimators robust to dependent noise. We shall see that, in a sense,
the estimators automatically "clean" the data. The ideas of two scale
sampling also shed light on covariance estimation for non-synchronized
price series.
2. Between Data Cleaning and Inference: Pre-Averaging and other Robust
Estimators of the Efficient Price
Abstract: Pre-averaging is another popular strategy for mitigating
microstructure in high frequency financial data. As the term suggests,
transaction data (say) are averaged over short time periods ranging
from 30 seconds to five minutes, and the resulting averages
approximate the efficient price process much better than the raw data.
Apart from reducing the size of the microstructure, the methodology
also helps synchronize data from different securities. The procedure
is robust to short term dependence in the noise.
In this talk, we develop a general theory for pre-averaging-based
estimation. We show that, up to a contiguity adjustment, the
pre-averaged process behaves as if one sampled from a semimartingale
(with unchanged volatility) plus a Gaussian error. In fact, locally,
the return process becomes a Gaussian MA(1) process, thus enabling
some of the classical machinery.
Since averages can be subject to outliers, we have developed a broader
theory which also applies to cases where M-estimation is used to pin
down the efficient price in local neighborhoods. While the procedure
entails some information loss, we show that the procedure is
remarkably efficient. And the methodology applies off-the-shelf to any
high frequency econometric problem. Estimating the efficient price is
a form of pre-processing of the data, and hence also the methods in
this paper serve the purpose of data cleaning.