The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging
ArXiv ID: 2409.19854 “View on arXiv”
Authors: Unknown
Abstract
This paper proposes a novel method for constructing instruction-tuned large language models (LLMs) for finance without instruction data. Traditionally, developing such domain-specific LLMs has been resource-intensive, requiring a large dataset and significant computational power for continual pretraining and instruction tuning. Our study proposes a simpler approach that combines domain-specific continual pretraining with model merging. Given that general-purpose pretrained LLMs and their instruction-tuned LLMs are often publicly available, they can be leveraged to obtain the necessary instruction task vector. By merging this with a domain-specific pretrained vector, we can effectively create instruction-tuned LLMs for finance without additional instruction data. Our process involves two steps: first, we perform continual pretraining on financial data; second, we merge the instruction-tuned vector with the domain-specific pretrained vector. Our experiments demonstrate the successful construction of instruction-tuned LLMs for finance. One major advantage of our method is that the instruction-tuned and domain-specific pretrained vectors are nearly independent. This independence makes our approach highly effective. The Japanese financial instruction-tuned LLMs we developed in this study are available at https://huggingface.co/pfnet/nekomata-14b-pfn-qfin-inst-merge.
Keywords: large language models, instruction tuning, model merging, continual pretraining, financial NLP, financial technology
Complexity vs Empirical Score
- Math Complexity: 2.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Street Traders
- Why: The math complexity is low, as the paper focuses on a high-level methodology (model merging) without heavy mathematical derivations. The empirical rigor is high, evidenced by the construction of a specific Japanese financial corpus, execution of continual pretraining and model merging experiments, and the release of the resulting models on Hugging Face for reproducibility.
flowchart TD
A["Research Goal<br>Instruction-tuned LLM for Finance<br>without Instruction Data"] --> B["Step 1: Continual Pretraining<br>Pretrain Base LLM on Financial Text"]
B --> C["Step 2: Model Merging<br>Merge Domain LLM + Instruction Vector"]
C --> D["Key Finding 1<br>Independence of Vectors<br>Effective without Joint Tuning"]
C --> E["Key Finding 2<br>Successful Financial LLM<br>Released publicly on Hugging Face"]