Large Language Models (LLMs)

Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents

Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents ArXiv ID: 2502.17967 “View on arXiv” Authors: Unknown Abstract Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks, yet their performance in dynamic, real-world financial environments remains underexplored. Existing approaches are limited to historical backtesting, where trading actions cannot influence market prices and agents train only on static data. To address this limitation, we present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading and directly impact price dynamics. By simulating realistic bid-ask interactions, our platform enables training in scenarios that closely mirror live markets, thereby narrowing the gap between training and evaluation. Experiments reveal that LLMs struggle with numerical reasoning when given plain-text data, often overfitting to local patterns and recent values. In contrast, chart-based visualizations significantly enhance both numerical reasoning and trading performance. Furthermore, incorporating a reflection module yields additional improvements, especially with visual inputs. Evaluations on NASDAQ and CSI datasets demonstrate the superiority of our method, particularly under high volatility. All code and data are available at https://github.com/wekjsdvnm/Agent-Trading-Arena. ...

LLM Agents Do Not Replicate Human Market Traders: Evidence From Experimental Finance

LLM Agents Do Not Replicate Human Market Traders: Evidence From Experimental Finance ArXiv ID: 2502.15800 “View on arXiv” Authors: Unknown Abstract This paper explores how Large Language Models (LLMs) behave in a classic experimental finance paradigm widely known for eliciting bubbles and crashes in human participants. We adapt an established trading design, where traders buy and sell a risky asset with a known fundamental value, and introduce several LLM-based agents, both in single-model markets (all traders are instances of the same LLM) and in mixed-model “battle royale” settings (multiple LLMs competing in the same market). Our findings reveal that LLMs generally exhibit a “textbook-rational” approach, pricing the asset near its fundamental value, and show only a muted tendency toward bubble formation. Further analyses indicate that LLM-based agents display less trading strategy variance in contrast to humans. Taken together, these results highlight the risk of relying on LLM-only data to replicate human-driven market phenomena, as key behavioral features, such as large emergent bubbles, were not robustly reproduced. While LLMs clearly possess the capacity for strategic decision-making, their relative consistency and rationality suggest that they do not accurately mimic human market dynamics. ...

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading ArXiv ID: 2502.11433 “View on arXiv” Authors: Unknown Abstract Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{“FLAG-Trader”}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements. ...

HedgeAgents: A Balanced-aware Multi-agent Financial Trading System

HedgeAgents: A Balanced-aware Multi-agent Financial Trading System ArXiv ID: 2502.13165 “View on arXiv” Authors: Unknown Abstract As automated trading gains traction in the financial market, algorithmic investment strategies are increasingly prominent. While Large Language Models (LLMs) and Agent-based models exhibit promising potential in real-time market analysis and trading decisions, they still experience a significant -20% loss when confronted with rapid declines or frequent fluctuations, impeding their practical application. Hence, there is an imperative to explore a more robust and resilient framework. This paper introduces an innovative multi-agent system, HedgeAgents, aimed at bolstering system robustness via ``hedging’’ strategies. In this well-balanced system, an array of hedging agents has been tailored, where HedgeAgents consist of a central fund manager and multiple hedging experts specializing in various financial asset classes. These agents leverage LLMs’ cognitive capabilities to make decisions and coordinate through three types of conferences. Benefiting from the powerful understanding of LLMs, our HedgeAgents attained a 70% annualized return and a 400% total return over a period of 3 years. Moreover, we have observed with delight that HedgeAgents can even formulate investment experience comparable to those of human experts (https://hedgeagents.github.io/). ...

FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents

FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents ArXiv ID: 2502.07393 “View on arXiv” Authors: Unknown Abstract This paper presents a novel risk-sensitive trading agent combining reinforcement learning and large language models (LLMs). We extend the Conditional Value-at-Risk Proximal Policy Optimization (CPPO) algorithm, by adding risk assessment and trading recommendation signals generated by a LLM from financial news. Our approach is backtested on the Nasdaq-100 index benchmark, using financial news data from the FNSPID dataset and the DeepSeek V3, Qwen 2.5 and Llama 3.3 language models. The code, data, and trading agents are available at: https://github.com/benstaf/FinRL_DeepSeek ...

AlphaSharpe: LLM-Driven Discovery of Robust Risk-Adjusted Metrics

AlphaSharpe: LLM-Driven Discovery of Robust Risk-Adjusted Metrics ArXiv ID: 2502.00029 “View on arXiv” Authors: Unknown Abstract Financial metrics like the Sharpe ratio are pivotal in evaluating investment performance by balancing risk and return. However, traditional metrics often struggle with robustness and generalization, particularly in dynamic and volatile market conditions. This paper introduces AlphaSharpe, a novel framework leveraging large language models (LLMs) to iteratively evolve and optimize financial metrics to discover enhanced risk-return metrics that outperform traditional approaches in robustness and correlation with future performance metrics by employing iterative crossover, mutation, and evaluation. Key contributions of this work include: (1) a novel use of LLMs to generate and refine financial metrics with implicit domain-specific knowledge, (2) a scoring mechanism to ensure that evolved metrics generalize effectively to unseen data, and (3) an empirical demonstration of 3x predictive power for future risk-returns, and 2x portfolio performance. Experimental results in a real-world dataset highlight the superiority of discovered metrics, making them highly relevant to portfolio managers and financial decision-makers. This framework not only addresses the limitations of existing metrics but also showcases the potential of LLMs in advancing financial analytics, paving the way for informed and robust investment strategies. ...

AI-Powered (Finance) Scholarship

AI-Powered (Finance) Scholarship ArXiv ID: ssrn-5103553 “View on arXiv” Authors: Unknown Abstract This paper describes a process for automatically generating academic finance papers using large language models (LLMs). It demonstrates the process’ efficacy by Keywords: Generative AI, Large Language Models (LLMs), Automated Research, Financial Modeling, NLP, Technology Complexity vs Empirical Score Math Complexity: 1.0/10 Empirical Rigor: 0.5/10 Quadrant: Philosophers Why: The paper focuses on the process of using LLMs to generate academic content, lacking advanced mathematical derivations, while showing minimal evidence of backtesting or implementation-heavy data analysis. flowchart TD A["Research Goal Automate Finance Paper Generation"] --> B["Inputs Financial Data + LLM Prompts"] B --> C{"Methodology Multi-Step Chain-of-Thought"} C --> D["Computational Process LLM Synthesis & Modeling"] D --> E{"Evaluation Human Expert Review"} E --> F["Outcomes High-Quality Finance Papers"] E --> G["Outcomes Validation of LLM Efficacy"] F --> H["Final Result AI-Powered Scholarship Pipeline"] G --> H

LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading

LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading ArXiv ID: 2501.09636 “View on arXiv” Authors: Unknown Abstract Recent advances in deep learning and large language models (LLMs) have facilitated the deployment of the mixture-of-experts (MoE) mechanism in the stock investment domain. While these models have demonstrated promising trading performance, they are often unimodal, neglecting the wealth of information available in other modalities, such as textual data. Moreover, the traditional neural network-based router selection mechanism fails to consider contextual and real-world nuances, resulting in suboptimal expert selection. To address these limitations, we propose LLMoE, a novel framework that employs LLMs as the router within the MoE architecture. Specifically, we replace the conventional neural network-based router with LLMs, leveraging their extensive world knowledge and reasoning capabilities to select experts based on historical price data and stock news. This approach provides a more effective and interpretable selection mechanism. Our experiments on multimodal real-world stock datasets demonstrate that LLMoE outperforms state-of-the-art MoE models and other deep neural network approaches. Additionally, the flexible architecture of LLMoE allows for easy adaptation to various downstream tasks. ...

AI-Powered (Finance) Scholarship

AI-Powered (Finance) Scholarship ArXiv ID: ssrn-5060022 “View on arXiv” Authors: Unknown Abstract Keywords: Generative AI, Large Language Models (LLMs), Academic Research, Natural Language Processing, Automation, Technology Complexity vs Empirical Score Math Complexity: 1.0/10 Empirical Rigor: 2.0/10 Quadrant: Philosophers Why: The paper focuses on the conceptual process of using LLMs to generate academic papers, rather than presenting complex mathematical models or empirical backtesting results. flowchart TD A["Research Goal Automate Academic Paper Generation"] --> B{"Methodology"} B --> C["Data/Input LLM & Financial Datasets"] B --> D["Data/Input Research Questions"] C --> E["Computational Process LLM Content Generation"] D --> E E --> F["Key Findings Successful Paper Automation"] E --> G["Key Findings Validated Methodology"]

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management ArXiv ID: 2501.00826 “View on arXiv” Authors: Unknown Abstract Cryptocurrency investment is inherently difficult due to its shorter history compared to traditional assets, the need to integrate vast amounts of data from various modalities, and the requirement for complex reasoning. While deep learning approaches have been applied to address these challenges, their black-box nature raises concerns about trust and explainability. Recently, large language models (LLMs) have shown promise in financial applications due to their ability to understand multi-modal data and generate explainable decisions. However, single LLM faces limitations in complex, comprehensive tasks such as asset investment. These limitations are even more pronounced in cryptocurrency investment, where LLMs have less domain-specific knowledge in their training corpora. To overcome these challenges, we propose an explainable, multi-modal, multi-agent framework for cryptocurrency investment. Our framework uses specialized agents that collaborate within and across teams to handle subtasks such as data analysis, literature integration, and investment decision-making for the top 30 cryptocurrencies by market capitalization. The expert training module fine-tunes agents using multi-modal historical data and professional investment literature, while the multi-agent investment module employs real-time data to make informed cryptocurrency investment decisions. Unique intrateam and interteam collaboration mechanisms enhance prediction accuracy by adjusting final predictions based on confidence levels within agent teams and facilitating information sharing between teams. Empirical evaluation using data from November 2023 to September 2024 demonstrates that our framework outperforms single-agent models and market benchmarks in classification, asset pricing, portfolio, and explainability performance. ...