false

Tokenizing Stock Prices for Enhanced Multi-Step Forecast and Prediction

Tokenizing Stock Prices for Enhanced Multi-Step Forecast and Prediction ArXiv ID: 2504.17313 “View on arXiv” Authors: Zhuohang Zhu, Haodong Chen, Qiang Qu, Xiaoming Chen, Vera Chung Abstract Effective stock price forecasting (estimating future prices) and prediction (estimating future price changes) are pivotal for investors, regulatory agencies, and policymakers. These tasks enable informed decision-making, risk management, strategic planning, and superior portfolio returns. Despite their importance, forecasting and prediction are challenging due to the dynamic nature of stock price data, which exhibit significant temporal variations in distribution and statistical properties. Additionally, while both forecasting and prediction targets are derived from the same dataset, their statistical characteristics differ significantly. Forecasting targets typically follow a log-normal distribution, characterized by significant shifts in mean and variance over time, whereas prediction targets adhere to a normal distribution. Furthermore, although multi-step forecasting and prediction offer a broader perspective and richer information compared to single-step approaches, it is much more challenging due to factors such as cumulative errors and long-term temporal variance. As a result, many previous works have tackled either single-step stock price forecasting or prediction instead. To address these issues, we introduce a novel model, termed Patched Channel Integration Encoder (PCIE), to tackle both stock price forecasting and prediction. In this model, we utilize multiple stock channels that cover both historical prices and price changes, and design a novel tokenization method to effectively embed these channels in a cross-channel and temporally efficient manner. Specifically, the tokenization process involves univariate patching and temporal learning with a channel-mixing encoder to reduce cumulative errors. Comprehensive experiments validate that PCIE outperforms current state-of-the-art models in forecast and prediction tasks. ...

April 24, 2025 · 2 min · Research Team

Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models

Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models ArXiv ID: 2504.16635 “View on arXiv” Authors: Fredy Pokou, Jules Sadefo Kamdem, François Benhmad Abstract In an environment of increasingly volatile financial markets, the accurate estimation of risk remains a major challenge. Traditional econometric models, such as GARCH and its variants, are based on assumptions that are often too rigid to adapt to the complexity of the current market dynamics. To overcome these limitations, we propose a hybrid framework for Value-at-Risk (VaR) estimation, combining GARCH volatility models with deep reinforcement learning. Our approach incorporates directional market forecasting using the Double Deep Q-Network (DDQN) model, treating the task as an imbalanced classification problem. This architecture enables the dynamic adjustment of risk-level forecasts according to market conditions. Empirical validation on daily Eurostoxx 50 data covering periods of crisis and high volatility shows a significant improvement in the accuracy of VaR estimates, as well as a reduction in the number of breaches and also in capital requirements, while respecting regulatory risk thresholds. The ability of the model to adjust risk levels in real time reinforces its relevance to modern and proactive risk management. ...

April 23, 2025 · 2 min · Research Team

Breaking the Trend: How to Avoid Cherry-Picked Signals

Breaking the Trend: How to Avoid Cherry-Picked Signals ArXiv ID: 2504.10914 “View on arXiv” Authors: Unknown Abstract Our empirical results show an impressive fit with the pretty complex theoretical Sharpe formula of a trend-following strategy depending on the parameter of the signal, which was derived by by Grebenkov and Serror (2014). That empirical fit convinces us that a mean-reversion process with only one time scale is enough to model, in a pretty precise way, the reality of the trend-following mechanism at the average scale of CTAs and as a consequence, using only one simple EMA, appears optimal to capture the trend. As a consequence, using a complex basket of different complex indicators as signal, do not seem to be so rational or optimal and exposes to the risk of cherry-picking. ...

April 15, 2025 · 2 min · Research Team

Predictive AI with External Knowledge Infusion for Stocks

Predictive AI with External Knowledge Infusion for Stocks ArXiv ID: 2504.20058 “View on arXiv” Authors: Unknown Abstract Fluctuations in stock prices are influenced by a complex interplay of factors that go beyond mere historical data. These factors, themselves influenced by external forces, encompass inter-stock dynamics, broader economic factors, various government policy decisions, outbreaks of wars, etc. Furthermore, all of these factors are dynamic and exhibit changes over time. In this paper, for the first time, we tackle the forecasting problem under external influence by proposing learning mechanisms that not only learn from historical trends but also incorporate external knowledge from temporal knowledge graphs. Since there are no such datasets or temporal knowledge graphs available, we study this problem with stock market data, and we construct comprehensive temporal knowledge graph datasets. In our proposed approach, we model relations on external temporal knowledge graphs as events of a Hawkes process on graphs. With extensive experiments, we show that learned dynamic representations effectively rank stocks based on returns across multiple holding periods, outperforming related baselines on relevant metrics. ...

April 14, 2025 · 2 min · Research Team

BASIR: Budget-Assisted Sectoral Impact Ranking -- A Dataset for Sector Identification and Performance Prediction Using Language Models

BASIR: Budget-Assisted Sectoral Impact Ranking – A Dataset for Sector Identification and Performance Prediction Using Language Models ArXiv ID: 2504.13189 “View on arXiv” Authors: Unknown Abstract Government fiscal policies, particularly annual union budgets, exert significant influence on financial markets. However, real-time analysis of budgetary impacts on sector-specific equity performance remains methodologically challenging and largely unexplored. This study proposes a framework to systematically identify and rank sectors poised to benefit from India’s Union Budget announcements. The framework addresses two core tasks: (1) multi-label classification of excerpts from budget transcripts into 81 predefined economic sectors, and (2) performance ranking of these sectors. Leveraging a comprehensive corpus of Indian Union Budget transcripts from 1947 to 2025, we introduce BASIR (Budget-Assisted Sectoral Impact Ranking), an annotated dataset mapping excerpts from budgetary transcripts to sectoral impacts. Our architecture incorporates fine-tuned embeddings for sector identification, coupled with language models that rank sectors based on their predicted performances. Our results demonstrate 0.605 F1-score in sector classification, and 0.997 NDCG score in predicting ranks of sectors based on post-budget performances. The methodology enables investors and policymakers to quantify fiscal policy impacts through structured, data-driven insights, addressing critical gaps in manual analysis. The annotated dataset has been released under CC-BY-NC-SA-4.0 license to advance computational economics research. ...

April 2, 2025 · 2 min · Research Team

Asymmetry in Distributions of Accumulated Gains and Losses in Stock Returns

Asymmetry in Distributions of Accumulated Gains and Losses in Stock Returns ArXiv ID: 2503.24241 “View on arXiv” Authors: Unknown Abstract We study decades-long historic distributions of accumulated S&P500 returns, from daily returns to those over several weeks. The time series of the returns emphasize major upheavals in the markets – Black Monday, Tech Bubble, Financial Crisis and Covid Pandemic – which are reflected in the tail ends of the distributions. De-trending the overall gain, we concentrate on comparing distributions of gains and losses. Specifically, we compare the tails of the distributions, which are believed to exhibit power-law behavior and possibly contain outliers. Towards this end we find confidence intervals of the linear fits of the tails of the complementary cumulative distribution functions on a log-log scale, as well as conduct a statistical U-test in order to detect outliers. We also study probability density functions of the full distributions of the returns with the emphasis on their asymmetry. The key empirical observations are that the mean of de-trended distributions increases near-linearly with the number of days of accumulation while the overall skew is negative – consistent with the heavier tails of losses – and depends little on the number of days of accumulation. At the same time the variance of the distributions exhibits near-perfect linear dependence on the number of days of accumulation, that is it remains constant if scaled to the latter. Finally, we discuss the theoretical framework for understanding accumulated returns. Our main conclusion is that the current state of theory, which predicts symmetric or near-symmetric distributions of returns cannot explain the aggregate of empirical results. ...

March 31, 2025 · 3 min · Research Team

An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model

An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model ArXiv ID: 2503.22192 “View on arXiv” Authors: Unknown Abstract This research proposes a cutting-edge ensemble deep learning framework for stock price prediction by combining three advanced neural network architectures: The particular areas of interest for the research include but are not limited to: Variational Autoencoder (VAE), Transformer, and Long Short-Term Memory (LSTM) networks. The presented framework is aimed to substantially utilize the advantages of each model which would allow for achieving the identification of both linear and non-linear relations in stock price movements. To improve the accuracy of its predictions it uses rich set of technical indicators and it scales its predictors based on the current market situation. By trying out the framework on several stock data sets, and benchmarking the results against single models and conventional forecasting, the ensemble method exhibits consistently high accuracy and reliability. The VAE is able to learn linear representation on high-dimensional data while the Transformer outstandingly perform in recognizing long-term patterns on the stock price data. LSTM, based on its characteristics of being a model that can deal with sequences, brings additional improvements to the given framework, especially regarding temporal dynamics and fluctuations. Combined, these components provide exceptional directional performance and a very small disparity in the predicted results. The present solution has given a probable concept that can handle the inherent problem of stock price prediction with high reliability and scalability. Compared to the performance of individual proposals based on the neural network, as well as classical methods, the proposed ensemble framework demonstrates the advantages of combining different architectures. It has a very important application in algorithmic trading, risk analysis, and control and decision-making for finance professions and scholars. ...

March 28, 2025 · 2 min · Research Team

A Causal Perspective of Stock Prediction Models

A Causal Perspective of Stock Prediction Models ArXiv ID: 2503.20987 “View on arXiv” Authors: Unknown Abstract In the realm of stock prediction, machine learning models encounter considerable obstacles due to the inherent low signal-to-noise ratio and the nonstationary nature of financial markets. These challenges often result in spurious correlations and unstable predictive relationships, leading to poor performance of models when applied to out-of-sample (OOS) domains. To address these issues, we investigate \textit{“Domain Generalization”} techniques, with a particular focus on causal representation learning to improve a prediction model’s generalizability to OOS domains. By leveraging multi-factor models from econometrics, we introduce a novel error bound that explicitly incorporates causal relationships. In addition, we present the connection between the proposed error bound and market nonstationarity. We also develop a \textit{“Causal Discovery”} technique to discover invariant feature representations, which effectively mitigates the proposed error bound, and the influence of spurious correlations on causal discovery is rigorously examined. Our theoretical findings are substantiated by numerical results, showcasing the effectiveness of our approach in enhancing the generalizability of stock prediction models. ...

March 26, 2025 · 2 min · Research Team

Equity Risk Premiums (ERP): Determinants, Estimation, and Implications – The 2025 Edition

Equity Risk Premiums (ERP): Determinants, Estimation, and Implications – The 2025 Edition ArXiv ID: ssrn-5168609 “View on arXiv” Authors: Unknown Abstract The equity risk premium is the price of risk in equity markets, and it is not only a key input in estimating costs of equity and capital in both corporate Keywords: equity risk premium, cost of equity, valuation, corporate finance, risk and return, Equities Complexity vs Empirical Score Math Complexity: 4.0/10 Empirical Rigor: 6.0/10 Quadrant: Street Traders Why: The paper focuses on practical estimation methods (historical, survey, implied) and uses empirical data from multiple markets, but relies on conceptual frameworks and regression analysis rather than advanced mathematical derivations. flowchart TD A["Research Goal: Determine 2025 Equity Risk Premium"] --> B["Methodology & Data Inputs"] B --> C["Computational Processes"] C --> D["Key Findings & Implications"] subgraph B ["Methodology & Data Inputs"] B1["Historical Market Returns"] B2["Inflation & Treasury Yields"] B3["Valuation Multiples<br>P/E, Dividend Yields"] end subgraph C ["Computational Processes"] C1["Historical Averages"] C2["Build-Up Models<br>ERP = RiskFree + Equity Risk Compensation"] C3["Inverse P/E Implied ERP"] end subgraph D ["Key Findings & Implications"] D1["Updated Cost of Equity<br>Estimates"] D2["Valuation Adjustments<br>for 2025"] D3["Strategic Asset Allocation<br>Guidance"] end

March 26, 2025 · 1 min · Research Team

A Note on the Asymptotic Properties of the GLS Estimator in Multivariate Regression with Heteroskedastic and Autocorrelated Errors

A Note on the Asymptotic Properties of the GLS Estimator in Multivariate Regression with Heteroskedastic and Autocorrelated Errors ArXiv ID: 2503.13950 “View on arXiv” Authors: Unknown Abstract We study the asymptotic properties of the GLS estimator in multivariate regression with heteroskedastic and autocorrelated errors. We derive Wald statistics for linear restrictions and assess their performance. The statistics remains robust to heteroskedasticity and autocorrelation. Keywords: Generalized Least Squares (GLS), Wald Statistics, Heteroskedasticity and Autocorrelation Consistency (HAC), Multivariate Regression, Linear Restrictions, Equities ...

March 18, 2025 · 1 min · Research Team