FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data
ArXiv ID: 2502.18471 “View on arXiv”
Authors: Unknown
Abstract
Large language models (LLMs) excel at generating human-like responses but often struggle with interactive tasks that require access to real-time information. This limitation poses challenges in finance, where models must access up-to-date information, such as recent news or price movements, to support decision-making. To address this, we introduce Financial Agent, a knowledge-grounding approach for LLMs to handle financial queries using real-time text and tabular data. Our contributions are threefold: First, we develop a Financial Context Dataset of over 50,000 financial queries paired with the required context. Second, we train FinBloom 7B, a custom 7 billion parameter LLM, on 14 million financial news articles from Reuters and Deutsche Presse-Agentur, alongside 12 million Securities and Exchange Commission (SEC) filings. Third, we fine-tune FinBloom 7B using the Financial Context Dataset to serve as a Financial Agent. This agent generates relevant financial context, enabling efficient real-time data retrieval to answer user queries. By reducing latency and eliminating the need for users to manually provide accurate data, our approach significantly enhances the capability of LLMs to handle dynamic financial tasks. Our proposed approach makes real-time financial decisions, algorithmic trading and other related tasks streamlined, and is valuable in contexts with high-velocity data flows.
Keywords: Large Language Models, Financial Agents, Real-time Data, SEC Filings, Algorithmic Trading
Complexity vs Empirical Score
- Math Complexity: 2.0/10
- Empirical Rigor: 8.0/10
- Quadrant: Street Traders
- Why: The paper focuses on implementing a practical LLM agent system using large-scale financial datasets and real-time data integration, with minimal advanced mathematical derivations.
flowchart TD
A["Research Goal: Enable LLMs to answer financial queries<br>using real-time data"] --> B{"Methodology"};
B --> C["Develop Financial Context Dataset<br>50k queries"];
B --> D["Train FinBloom 7B<br>on 14M news + 12M SEC filings"];
B --> E["Fine-tune as Financial Agent<br>to generate retrieval contexts"];
C --> F["Computational Process:<br>Agent retrieves real-time text & tabular data"];
D --> F;
E --> F;
F --> G["Key Findings: Efficient real-time data retrieval<br>Reduced latency for financial decision-making<br>Streamlined algorithmic trading workflows"];