Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance
ArXiv ID: 2503.05185 “View on arXiv”
Authors: Unknown
Abstract
Finance decision-making often relies on in-depth data analysis across various data sources, including financial tables, news articles, stock prices, etc. In this work, we introduce FinTMMBench, the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation (RAG) systems in finance. Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages. 1) Multi-modal Corpus: It encompasses a hybrid of financial tables, news articles, daily stock prices, and visual technical charts as the corpus. 2) Temporal-aware Questions: Each question requires the retrieval and interpretation of its relevant data over a specific time period, including daily, weekly, monthly, quarterly, and annual periods. 3) Diverse Financial Analysis Tasks: The questions involve 10 different financial analysis tasks designed by domain experts, including information extraction, trend analysis, sentiment analysis and event detection, etc. We further propose a novel TMMHybridRAG method, which first leverages LLMs to convert data from other modalities (e.g., tabular, visual and time-series data) into textual format and then incorporates temporal information in each node when constructing graphs and dense indexes. Its effectiveness has been validated in extensive experiments, but notable gaps remain, highlighting the challenges presented by our FinTMMBench.
Keywords: multi-modal RAG, financial sentiment analysis, temporal-aware retrieval, large language models (LLMs), financial benchmarking
Complexity vs Empirical Score
- Math Complexity: 2.5/10
- Empirical Rigor: 8.5/10
- Quadrant: Street Traders
- Why: The paper introduces a new benchmark and a novel RAG method, but the methodology relies more on LLM-based data conversion and retrieval techniques rather than advanced mathematical modeling. It demonstrates high empirical rigor through the construction of a comprehensive, multi-modal dataset (FinTMMBench) with specific data sources and evaluation metrics, and proposes a method with implementation details and reported results.
flowchart TD
A["Research Goal<br>Develop Temporal-Aware<br>Multi-Modal RAG for Finance"] --> B["Construct FinTMMBench Benchmark"]
B --> C{"Multi-modal Corpus<br>Temporal-aware Questions<br>Diverse Analysis Tasks"}
C --> D["TMMHybridRAG Methodology"]
D --> E["Modality Conversion<br>LLMs convert tables/charts<br>to text"]
E --> F["Temporal Graph Construction<br>Nodes encode time segments"]
E --> G["Dense Indexing<br>Embed temporal info"]
F --> H["Retrieval & Generation<br>Query relevant data<br>across time periods"]
G --> H
H --> I["Experimental Results"]
I --> J{"Key Findings/Outcomes"}
J --> K["Validated TMMHybridRAG Effectiveness"]
J --> L["Identified Temporal Challenges<br>in Multi-modal Finance"]