Prediction of high-frequency futures return directions based on the mean uncertainty classification methods: An application in China’s future market
ArXiv ID: 2508.06914 “View on arXiv”
Authors: Ying Peng, Yifan Zhang, Xin Wang
Abstract
In this paper, we mainly focus on the prediction of short-term average return directions in China’s high-frequency futures market. As minor fluctuations with limited amplitude and short duration are typically regarded as random noise, only price movements of sufficient magnitude qualify as statistically significant signals. Therefore data imbalance emerges as a key problem during predictive modeling. From the view of data distribution imbalance, we employee the mean-uncertainty logistic regression (mean-uncertainty LR) classification method under the sublinear expectation (SLE) framework, and further propose the mean-uncertainty support vector machines (mean-uncertainty SVM) method for the prediction. Corresponding investment strategies are developed based on the prediction results. For data selection, we utilize trading data and limit order book data of the top 15 liquid products among the most active contracts in China’s future market. Empirical results demonstrate that comparing with conventional LR-related and SVM-related imbalanced data classification methods, the two mean-uncertainty approaches yields significant advantages in both classification metrics and average returns per trade.
Keywords: High-Frequency Futures, Data Imbalance, Sublinear Expectation, Mean-Uncertainty SVM, Limit Order Book, Futures
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 7.5/10
- Quadrant: Holy Grail
- Why: The paper employs advanced nonlinear expectation theory (sublinear expectation) and develops novel mean-uncertainty LR/SVM methods with theoretical proofs, indicating high math complexity. Empirically, it uses real high-frequency futures data (top 15 products) for backtesting investment strategies and reports performance metrics against benchmarks, showing strong data/implementation rigor.
flowchart TD
A["Research Goal<br>Predict Short-Term Avg.<br>Return Directions in<br>China's Futures Market"] --> B["Data & Problem<br>High-Freq. Futures & Limit Order Book Data<br>Data Imbalance Issue"]
B --> C["Methodology<br>Mean-Uncertainty Logistic Regression<br>Under SLE Framework"]
B --> D["Methodology<br>Mean-Uncertainty Support Vector Machines<br>Under SLE Framework"]
C & D --> E["Investment Strategy<br>Based on Prediction Results"]
E --> F["Findings<br>Superior Classification Metrics &<br>Higher Avg. Returns vs. Conventional Methods"]