Why Regression? Binary Encoding Classification Brings Confidence to Stock Market Index Price Prediction
ArXiv ID: 2506.03153 “View on arXiv”
Authors: Junzhe Jiang, Chang Yang, Xinrun Wang, Bo Li
Abstract
Stock market indices serve as fundamental market measurement that quantify systematic market dynamics. However, accurate index price prediction remains challenging, primarily because existing approaches treat indices as isolated time series and frame the prediction as a simple regression task. These methods fail to capture indices’ inherent nature as aggregations of constituent stocks with complex, time-varying interdependencies. To address these limitations, we propose Cubic, a novel end-to-end framework that explicitly models the adaptive fusion of constituent stocks for index price prediction. Our main contributions are threefold. i) Fusion in the latent space: we introduce the fusion mechanism over the latent embedding of the stocks to extract the information from the vast number of stocks. ii) Binary encoding classification: since regression tasks are challenging due to continuous value estimation, we reformulate the regression into the classification task, where the target value is converted to binary and we optimize the prediction of the value of each digit with cross-entropy loss. iii) Confidence-guided prediction and trading: we introduce the regularization loss to address market prediction uncertainty for the index prediction and design the rule-based trading policies based on the confidence. Extensive experiments across multiple stock markets and indices demonstrate that Cubic consistently outperforms state-of-the-art baselines in stock index prediction tasks, achieving superior performance on both forecasting accuracy metrics and downstream trading profitability.
Keywords: stock index prediction, latent space fusion, binary encoding classification, confidence-guided trading, constituent stock modeling, Equities
Complexity vs Empirical Score
- Math Complexity: 6.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper introduces advanced deep learning concepts like latent space fusion and binary encoding classification with specific loss functions, indicating moderate-to-high mathematical complexity. It also reports extensive experiments across multiple markets and backtested trading profitability, demonstrating strong empirical rigor.
flowchart TD
subgraph Problem & Goal
A["Research Question<br>Why Regression?"]
B["Challenge: Index aggregation<br>ignores constituent stocks"]
end
subgraph Methodology
C["Cubic Framework<br>Adaptive Fusion Model"]
D["Binary Encoding Classification<br>Replace regression with<br>binary digit classification"]
E["Confidence-guided Trading<br>Regularization + Rule-based policy"]
end
subgraph Outcomes
F["Superior Forecast Accuracy<br>vs SOTA baselines"]
G["Enhanced Trading Profitability<br>via confidence-weighted decisions"]
end
A --> B
B --> C
C --> D
D --> E
E --> F
E --> G