Enhancing OHLC Data with Timing Features: A Machine Learning Evaluation

ArXiv ID: 2509.16137 “View on arXiv”

Authors: Ruslan Tepelyan

Abstract

OHLC bar data is a widely used format for representing financial asset prices over time due to its balance of simplicity and informativeness. Bloomberg has recently introduced a new bar data product that includes additional timing information-specifically, the timestamps of the open, high, low, and close prices within each bar. In this paper, we investigate the impact of incorporating this timing data into machine learning models for predicting volume-weighted average price (VWAP). Our experiments show that including these features consistently improves predictive performance across multiple ML architectures. We observe gains across several key metrics, including log-likelihood, mean squared error (MSE), $R^2$, conditional variance estimation, and directional accuracy.

Keywords: machine learning, OHLC data, VWAP prediction, feature engineering, predictive modeling, Equities

Complexity vs Empirical Score

  • Math Complexity: 4.5/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Street Traders
  • Why: The paper employs standard machine learning models with conventional statistical metrics (MSE, R², log-likelihood), showing moderate math complexity, while the rigorous dataset (Russell 3000, 2021), temporal splits, and detailed filtering criteria demonstrate strong empirical backtest-ready implementation.
  flowchart TD
    A["Research Goal: Does OHLC<br>timing data improve<br>VWAP prediction?"] --> B{"Methodology"}
    B --> C["Data: Bloomberg OHLC Bar Data<br>with Timing Features"]
    B --> D["Preprocessing & Feature Engineering"]
    C --> D
    D --> E{"ML Training & Evaluation"}
    E --> F["Multiple Architectures<br>e.g., Regression, Trees, NN"]
    F --> G["Key Findings & Outcomes"]
    G --> H["Improved Predictive Performance<br>across all metrics"]
    G --> I["Metrics: Log-Likelihood, MSE,<br>R², Directional Accuracy"]