On lead-lag estimation of non-synchronously observed point processes

ArXiv ID: 2601.01871 “View on arXiv”

Authors: Takaaki Shiotani, Takaki Hayashi, Yuta Koike

Abstract

This paper introduces a new theoretical framework for analyzing lead-lag relationships between point processes, with a special focus on applications to high-frequency financial data. In particular, we are interested in lead-lag relationships between two sequences of order arrival timestamps. The seminal work of Dobrev and Schaumburg proposed model-free measures of cross-market trading activity based on cross-counts of timestamps. While their method is known to yield reliable results, it faces limitations because its original formulation inherently relies on discrete-time observations, an issue we address in this study. Specifically, we formulate the problem of estimating lead-lag relationships in two point processes as that of estimating the shape of the cross-pair correlation function (CPCF) of a bivariate stationary point process, a quantity well-studied in the neuroscience and spatial statistics literature. Within this framework, the prevailing lead-lag time is defined as the location of the CPCF’s sharpest peak. Under this interpretation, the peak location in Dobrev and Schaumburg’s cross-market activity measure can be viewed as an estimator of the lead-lag time in the aforementioned sense. We further propose an alternative lead-lag time estimator based on kernel density estimation and show that it possesses desirable theoretical properties and delivers superior numerical performance. Empirical evidence from high-frequency financial data demonstrates the effectiveness of our proposed method.

Keywords: point processes, lead-lag relationships, cross-pair correlation function (CPCF), kernel density estimation, high-frequency financial data, Equities

Complexity vs Empirical Score

  • Math Complexity: 9.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper is highly mathematical, introducing a novel point process framework with cross-pair correlation functions and kernel density estimators to solve a theoretical problem in lead-lag estimation, warranting a high math score. It also demonstrates empirical effectiveness using real high-frequency financial data and Monte Carlo simulations, though it lacks a full backtest-ready trading strategy, placing it in the Holy Grail quadrant.
  flowchart TD
    Goal["Research Goal: Estimate lead-lag between non-synchronous point processes<br>e.g., high-frequency financial order arrivals"] --> Method
    subgraph Method ["Key Methodology"]
        direction LR
        M1["Model Lead-Lag as<br>Cross-Pair Correlation Function<br>of Bivariate Stationary Point Process"] --> M2["Estimate CPCF Shape<br>using Kernel Density Estimation"] --> M3["Identify Lead-Lag Time<br>from CPCF's Sharpest Peak"]
    end
    Method --> Data
    Data["Data Inputs<br>High-Frequency Financial Data<br>Order Timestamps (Equities)"] --> Process
    Process["Computational Process"] --> Outcomes
    subgraph Outcomes ["Key Findings / Outcomes"]
        direction LR
        O1["Validated Dobrev & Schaumburg's<br>Cross-Count Method via CPCF framework"]
        O2["Proposed Kernel Density<br>Estimator (Superior Performance)"]
        O3["Delivered Reliable Lead-Lag<br>Estimation for Non-Synchronous Data"]
    end