Why Regression? Binary Encoding Classification Brings Confidence to Stock Market Index Price Prediction

ArXiv ID: 2506.03153 “View on arXiv”

Authors: Junzhe Jiang, Chang Yang, Xinrun Wang, Bo Li

Abstract

Stock market indices serve as fundamental market measurement that quantify systematic market dynamics. However, accurate index price prediction remains challenging, primarily because existing approaches treat indices as isolated time series and frame the prediction as a simple regression task. These methods fail to capture indices’ inherent nature as aggregations of constituent stocks with complex, time-varying interdependencies. To address these limitations, we propose Cubic, a novel end-to-end framework that explicitly models the adaptive fusion of constituent stocks for index price prediction. Our main contributions are threefold. i) Fusion in the latent space: we introduce the fusion mechanism over the latent embedding of the stocks to extract the information from the vast number of stocks. ii) Binary encoding classification: since regression tasks are challenging due to continuous value estimation, we reformulate the regression into the classification task, where the target value is converted to binary and we optimize the prediction of the value of each digit with cross-entropy loss. iii) Confidence-guided prediction and trading: we introduce the regularization loss to address market prediction uncertainty for the index prediction and design the rule-based trading policies based on the confidence. Extensive experiments across multiple stock markets and indices demonstrate that Cubic consistently outperforms state-of-the-art baselines in stock index prediction tasks, achieving superior performance on both forecasting accuracy metrics and downstream trading profitability.

Keywords: stock index prediction, latent space fusion, binary encoding classification, confidence-guided trading, constituent stock modeling, Equities

Complexity vs Empirical Score

Math Complexity: 6.5/10
Empirical Rigor: 7.0/10
Quadrant: Holy Grail
Why: The paper introduces advanced deep learning concepts like latent space fusion and binary encoding classification with specific loss functions, indicating moderate-to-high mathematical complexity. It also reports extensive experiments across multiple markets and backtested trading profitability, demonstrating strong empirical rigor.

  flowchart TD
    subgraph Problem & Goal
        A["Research Question<br>Why Regression?"]
        B["Challenge: Index aggregation<br>ignores constituent stocks"]
    end

    subgraph Methodology
        C["Cubic Framework<br>Adaptive Fusion Model"]
        D["Binary Encoding Classification<br>Replace regression with<br>binary digit classification"]
        E["Confidence-guided Trading<br>Regularization + Rule-based policy"]
    end

    subgraph Outcomes
        F["Superior Forecast Accuracy<br>vs SOTA baselines"]
        G["Enhanced Trading Profitability<br>via confidence-weighted decisions"]
    end

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    E --> G

Why Regression? Binary Encoding Classification Brings Confidence to Stock Market Index Price Prediction#

Abstract#

Complexity vs Empirical Score#

Why Regression? Binary Encoding Classification Brings Confidence to Stock Market Index Price Prediction

Abstract

Complexity vs Empirical Score