Algorithmic Trading

Entropy-Assisted Quality Pattern Identification in Finance

Entropy-Assisted Quality Pattern Identification in Finance ArXiv ID: 2503.06251 “View on arXiv” Authors: Unknown Abstract Short-term patterns in financial time series form the cornerstone of many algorithmic trading strategies, yet extracting these patterns reliably from noisy market data remains a formidable challenge. In this paper, we propose an entropy-assisted framework for identifying high-quality, non-overlapping patterns that exhibit consistent behavior over time. We ground our approach in the premise that historical patterns, when accurately clustered and pruned, can yield substantial predictive power for short-term price movements. To achieve this, we incorporate an entropy-based measure as a proxy for information gain. Patterns that lead to high one-sided movements in historical data, yet retain low local entropy, are more informative in signaling future market direction. Compared to conventional clustering techniques such as K-means and Gaussian Mixture Models (GMM), which often yield biased or unbalanced groupings, our approach emphasizes balance over a forced visual boundary, ensuring that quality patterns are not lost due to over-segmentation. By emphasizing both predictive purity (low local entropy) and historical profitability, our method achieves a balanced representation of Buy and Sell patterns, making it better suited for short-term algorithmic trading strategies. ...

Large language models in finance : what is financial sentiment?

Large language models in finance : what is financial sentiment? ArXiv ID: 2503.03612 “View on arXiv” Authors: Unknown Abstract Financial sentiment has become a crucial yet complex concept in finance, increasingly used in market forecasting and investment strategies. Despite its growing importance, there remains a need to define and understand what financial sentiment truly represents and how it can be effectively measured. We explore the nature of financial sentiment and investigate how large language models (LLMs) contribute to its estimation. We trace the evolution of sentiment measurement in finance, from market-based and lexicon-based methods to advanced natural language processing techniques. The emergence of LLMs has significantly enhanced sentiment analysis, providing deeper contextual understanding and greater accuracy in extracting sentiment from financial text. We examine how BERT-based models, such as RoBERTa and FinBERT, are optimized for structured sentiment classification, while GPT-based models, including GPT-4, OPT, and LLaMA, excel in financial text generation and real-time sentiment interpretation. A comparative analysis of bidirectional and autoregressive transformer architectures highlights their respective roles in investor sentiment analysis, algorithmic trading, and financial decision-making. By exploring what financial sentiment is and how it is estimated within LLMs, we provide insights into the growing role of AI-driven sentiment analysis in finance. ...

The Market Maker's Dilemma: Navigating the Fill Probability vs. Post-Fill Returns Trade-Off

The Market Maker’s Dilemma: Navigating the Fill Probability vs. Post-Fill Returns Trade-Off ArXiv ID: 2502.18625 “View on arXiv” Authors: Unknown Abstract Using data from a live trading experiment on the Binance Bitcoin perpetual, we examine the effects of (i) basic order book mechanics and (ii) the persistence of price changes from immediate to short timescales, revealing the interplay between returns, queue sizes, and orders’ queue positions. We document a fundamental trade-off: a negative correlation between maker fill likelihood and post-fill returns. This dictates that viable maker strategies often require a contrarian approach, counter-trading the prevailing order book imbalance. These dynamics render commonly-cited strategies highly unprofitable, leading us to model `Reversals’: situations where a contrarian maker strategy at the touch proves effective. ...

A Novel Loss Function for Deep Learning Based Daily Stock Trading System

A Novel Loss Function for Deep Learning Based Daily Stock Trading System ArXiv ID: 2502.17493 “View on arXiv” Authors: Unknown Abstract Making consistently profitable financial decisions in a continuously evolving and volatile stock market has always been a difficult task. Professionals from different disciplines have developed foundational theories to anticipate price movement and evaluate securities such as the famed Capital Asset Pricing Model (CAPM). In recent years, the role of artificial intelligence (AI) in asset pricing has been growing. Although the black-box nature of deep learning models lacks interpretability, they have continued to solidify their position in the financial industry. We aim to further enhance AI’s potential and utility by introducing a return-weighted loss function that will drive top growth while providing the ML models a limited amount of information. Using only publicly accessible stock data (open/close/high/low, trading volume, sector information) and several technical indicators constructed from them, we propose an efficient daily trading system that detects top growth opportunities. Our best models achieve 61.73% annual return on daily rebalancing with an annualized Sharpe Ratio of 1.18 over 1340 testing days from 2019 to 2024, and 37.61% annual return with an annualized Sharpe Ratio of 0.97 over 1360 testing days from 2005 to 2010. The main drivers for success, especially independent of any domain knowledge, are the novel return-weighted loss function, the integration of categorical and continuous data, and the ML model architecture. We also demonstrate the superiority of our novel loss function over traditional loss functions via several performance metrics and statistical evidence. ...

Efficient Triangular Arbitrage Detection via Graph Neural Networks

Efficient Triangular Arbitrage Detection via Graph Neural Networks ArXiv ID: 2502.03194 “View on arXiv” Authors: Unknown Abstract Triangular arbitrage is a profitable trading strategy in financial markets that exploits discrepancies in currency exchange rates. Traditional methods for detecting triangular arbitrage opportunities, such as exhaustive search algorithms and linear programming solvers, often suffer from high computational complexity and may miss potential opportunities in dynamic markets. In this paper, we propose a novel approach to triangular arbitrage detection using Graph Neural Networks (GNNs). By representing the currency exchange network as a graph, we leverage the powerful representation and learning capabilities of GNNs to identify profitable arbitrage opportunities more efficiently. Specifically, we formulate the triangular arbitrage problem as a graph-based optimization task and design a GNN architecture that captures the complex relationships between currencies and exchange rates. We introduce a relaxed loss function to enable more flexible learning and integrate Deep Q-Learning principles to optimize the expected returns. Our experiments on a synthetic dataset demonstrate that the proposed GNN-based method achieves a higher average yield with significantly reduced computational time compared to traditional methods. This work highlights the potential of using GNNs for solving optimization problems in finance and provides a promising approach for real-time arbitrage detection in dynamic financial markets. ...

FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data

FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data ArXiv ID: 2502.18471 “View on arXiv” Authors: Unknown Abstract Large language models (LLMs) excel at generating human-like responses but often struggle with interactive tasks that require access to real-time information. This limitation poses challenges in finance, where models must access up-to-date information, such as recent news or price movements, to support decision-making. To address this, we introduce Financial Agent, a knowledge-grounding approach for LLMs to handle financial queries using real-time text and tabular data. Our contributions are threefold: First, we develop a Financial Context Dataset of over 50,000 financial queries paired with the required context. Second, we train FinBloom 7B, a custom 7 billion parameter LLM, on 14 million financial news articles from Reuters and Deutsche Presse-Agentur, alongside 12 million Securities and Exchange Commission (SEC) filings. Third, we fine-tune FinBloom 7B using the Financial Context Dataset to serve as a Financial Agent. This agent generates relevant financial context, enabling efficient real-time data retrieval to answer user queries. By reducing latency and eliminating the need for users to manually provide accurate data, our approach significantly enhances the capability of LLMs to handle dynamic financial tasks. Our proposed approach makes real-time financial decisions, algorithmic trading and other related tasks streamlined, and is valuable in contexts with high-velocity data flows. ...

Generalized Mean Absolute Directional Loss as a Solution to Overfitting and High Transaction Costs in Machine Learning Models Used in High-Frequency Algorithmic Investment Strategies

Generalized Mean Absolute Directional Loss as a Solution to Overfitting and High Transaction Costs in Machine Learning Models Used in High-Frequency Algorithmic Investment Strategies ArXiv ID: 2412.18405 “View on arXiv” Authors: Unknown Abstract Regardless of the selected asset class and the level of model complexity (Transformer versus LSTM versus Perceptron/RNN), the GMADL loss function produces superior results than standard MSE-type loss functions and has better numerical properties in the context of optimization than MADL. Better results mean the possibility of achieving a higher risk-weighted return based on buy and sell signals built on forecasts generated by a given theoretical model estimated using the GMADL versus MSE or MADL function. In practice, GMADL solves the problem of selecting the most preferable feature in both classification and regression problems, improving the performance of each estimation. What is important is that, through additional parameterization, GMADL also solves the problem of optimizing investment systems on high-frequency data in such a way that they focus on strategy variants that contain fewer transactions so that transaction costs do not reduce the effectiveness of a given strategy to zero. Moreover, the implementation leverages state-of-the-art machine learning tools, including frameworks for hyperparameter tuning, architecture testing, and walk-forward optimization, ensuring robust and scalable solutions for real-world algorithmic trading. ...

Volatility-Volume Order Slicing via Statistical Analysis

Volatility-Volume Order Slicing via Statistical Analysis ArXiv ID: 2412.12482 “View on arXiv” Authors: Unknown Abstract This paper addresses the challenges faced in large-volume trading, where executing substantial orders can result in significant market impact and slippage. To mitigate these effects, this study proposes a volatility-volume-based order slicing strategy that leverages Exponential Weighted Moving Average and Markov Chain Monte Carlo simulations. These methods are used to dynamically estimate future trading volumes and price ranges, enabling traders to adapt their strategies by segmenting order execution sizes based on these predictions. Results show that the proposed approach improves trade execution efficiency, reduces market impact, and offers a more adaptive solution for volatile market conditions. The findings have practical implications for large-volume trading, providing a foundation for further research into adaptive execution strategies. ...

FinVision: A Multi-Agent Framework for Stock Market Prediction

FinVision: A Multi-Agent Framework for Stock Market Prediction ArXiv ID: 2411.08899 “View on arXiv” Authors: Unknown Abstract Financial trading has been a challenging task, as it requires the integration of vast amounts of data from various modalities. Traditional deep learning and reinforcement learning methods require large training data and often involve encoding various data types into numerical formats for model input, which limits the explainability of model behavior. Recently, LLM-based agents have demonstrated remarkable advancements in handling multi-modal data, enabling them to execute complex, multi-step decision-making tasks while providing insights into their thought processes. This research introduces a multi-modal multi-agent system designed specifically for financial trading tasks. Our framework employs a team of specialized LLM-based agents, each adept at processing and interpreting various forms of financial data, such as textual news reports, candlestick charts, and trading signal charts. A key feature of our approach is the integration of a reflection module, which conducts analyses of historical trading signals and their outcomes. This reflective process is instrumental in enhancing the decision-making capabilities of the system for future trading scenarios. Furthermore, the ablation studies indicate that the visual reflection module plays a crucial role in enhancing the decision-making capabilities of our framework. ...

Enhancing literature review with LLM and NLP methods. Algorithmic trading case

Enhancing literature review with LLM and NLP methods. Algorithmic trading case ArXiv ID: 2411.05013 “View on arXiv” Authors: Unknown Abstract This study utilizes machine learning algorithms to analyze and organize knowledge in the field of algorithmic trading. By filtering a dataset of 136 million research papers, we identified 14,342 relevant articles published between 1956 and Q1 2020. We compare traditional practices-such as keyword-based algorithms and embedding techniques-with state-of-the-art topic modeling methods that employ dimensionality reduction and clustering. This comparison allows us to assess the popularity and evolution of different approaches and themes within algorithmic trading. We demonstrate the usefulness of Natural Language Processing (NLP) in the automatic extraction of knowledge, highlighting the new possibilities created by the latest iterations of Large Language Models (LLMs) like ChatGPT. The rationale for focusing on this topic stems from our analysis, which reveals that research articles on algorithmic trading are increasing at a faster rate than the overall number of publications. While stocks and main indices comprise more than half of all assets considered, certain asset classes, such as cryptocurrencies, exhibit a much stronger growth trend. Machine learning models have become the most popular methods in recent years. The study demonstrates the efficacy of LLMs in refining datasets and addressing intricate questions about the analyzed articles, such as comparing the efficiency of different models. Our research shows that by decomposing tasks into smaller components and incorporating reasoning steps, we can effectively tackle complex questions supported by case analyses. This approach contributes to a deeper understanding of algorithmic trading methodologies and underscores the potential of advanced NLP techniques in literature reviews. ...