Machine Learning vs. Randomness: Challenges in Predicting Binary Options Movements
ArXiv ID: 2511.15960 “View on arXiv”
Authors: Gabriel M. Arantes, Richard F. Pinto, Bruno L. Dalmazo, Eduardo N. Borges, Giancarlo Lucca, Viviane L. D. de Mattos, Fabian C. Cardoso, Rafael A. Berri
Abstract
Binary options trading is often marketed as a field where predictive models can generate consistent profits. However, the inherent randomness and stochastic nature of binary options make price movements highly unpredictable, posing significant challenges for any forecasting approach. This study demonstrates that machine learning algorithms struggle to outperform a simple baseline in predicting binary options movements. Using a dataset of EUR/USD currency pairs from 2021 to 2023, we tested multiple models, including Random Forest, Logistic Regression, Gradient Boosting, and k-Nearest Neighbors (kNN), both before and after hyperparameter optimization. Furthermore, several neural network architectures, including Multi-Layer Perceptrons (MLP) and a Long Short-Term Memory (LSTM) network, were evaluated under different training conditions. Despite these exhaustive efforts, none of the models surpassed the ZeroR baseline accuracy, highlighting the inherent randomness of binary options. These findings reinforce the notion that binary options lack predictable patterns, making them unsuitable for machine learning-based forecasting.
Keywords: Binary Options, Logistic Regression, Gradient Boosting, k-Nearest Neighbors (kNN), Multi-Layer Perceptrons (MLP), Binary Options (Forex)
Complexity vs Empirical Score
- Math Complexity: 2.5/10
- Empirical Rigor: 7.5/10
- Quadrant: Street Traders
- Why: The paper demonstrates a rigorous empirical approach with a substantial real-world dataset, extensive model testing, and optimization, though it lacks the advanced mathematical derivations typical of theoretical finance.
flowchart TD
A["Research Goal<br>Can ML predict<br>binary options movements?"] --> B["Data<br>EUR/USD 2021-2023"]
B --> C["Methodology<br>Train/Test Models"]
C --> D["Models Tested<br>Random Forest, LR, Gradient Boosting,<br>kNN, MLP, LSTM"]
D --> E["Baseline Comparison<br>ZeroR Accuracy"]
E --> F["Key Outcome<br>Models failed to beat<br>baseline; high randomness"]
F --> G["Conclusion<br>Binary options are<br>inherently unpredictable"]