KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
ArXiv ID: 2409.13749 “View on arXiv”
Authors: Unknown
Abstract
Although powerful, current cutting-edge LLMs may not fulfil the needs of highly specialised sectors. We introduce KodeXv0.1, a family of large language models that outclass GPT-4 in financial question answering. We utilise the base variants of Llama 3.1 8B and 70B and adapt them to the financial domain through a custom training regime. To this end, we collect and process a large number of publicly available financial documents such as earnings calls and business reports. These are used to generate a high-quality, synthetic dataset consisting of Context-Question-Answer triplets which closely mirror real-world financial tasks. Using the train split of this dataset, we perform RAG-aware 4bit LoRA instruction tuning runs of Llama 3.1 base variants to produce KodeX-8Bv0.1 and KodeX-70Bv0.1. We then complete extensive model evaluations using FinanceBench, FinQABench and the withheld test split of our dataset. Our results show that KodeX-8Bv0.1 is more reliable in financial contexts than cutting-edge instruct models in the same parameter regime, surpassing them by up to 9.24%. In addition, it is even capable of outperforming state-of-the-art proprietary models such as GPT-4 by up to 7.07%. KodeX-70Bv0.1 represents a further improvement upon this, exceeding GPT-4’s performance on every tested benchmark.
Keywords: large language models, financial question answering, LoRA instruction tuning, RAG, synthetic data generation, Multi-Asset
Complexity vs Empirical Score
- Math Complexity: 2.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Street Traders
- Why: The paper primarily focuses on data engineering and training infrastructure for LLMs, with minimal advanced mathematical derivations. It demonstrates high empirical rigor through a multi-stage synthetic dataset pipeline, extensive benchmarking against established financial QA benchmarks, and clear comparative performance metrics.
flowchart TD
A["Research Goal: Adapt LLMs for Specialized Financial QA"] --> B["Data Collection & Processing"]
B --> C["Synthetic Dataset Generation<br/>(Context-Q-A Triplets)"]
B --> D["Base Model Selection<br/>Llama 3.1 8B & 70B"]
C --> E["Fine-Tuning & Training<br/>RAG-aware 4bit LoRA"]
D --> E
E --> F["Model Output: KodeX-8Bv0.1 & 70Bv0.1"]
F --> G["Key Findings"]
G --> G1["KodeX-8Bv0.1: Outperforms GPT-4 by up to 7.07%"]
G --> G2["KodeX-70Bv0.1: Exceeds GPT-4 on every benchmark"]
G --> G3["Domain adaptation yields SOTA results for specialized tasks"]