FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs
ArXiv ID: 2508.17906 “View on arXiv”
Authors: Abhinav Arun, Fabrizio Dimino, Tejas Prakash Agarwal, Bhaskarjit Sarmah, Stefano Pasquali
Abstract
The financial domain poses unique challenges for knowledge graph (KG) construction at scale due to the complexity and regulatory nature of financial documents. Despite the critical importance of structured financial knowledge, the field lacks large-scale, open-source datasets capturing rich semantic relationships from corporate disclosures. We introduce an open-source, large-scale financial knowledge graph dataset built from the latest annual SEC 10-K filings of all S and P 100 companies - a comprehensive resource designed to catalyze research in financial AI. We propose a robust and generalizable knowledge graph (KG) construction framework that integrates intelligent document parsing, table-aware chunking, and schema-guided iterative extraction with a reflection-driven feedback loop. Our system incorporates a comprehensive evaluation pipeline, combining rule-based checks, statistical validation, and LLM-as-a-Judge assessments to holistically measure extraction quality. We support three extraction modes - single-pass, multi-pass, and reflection-agent-based - allowing flexible trade-offs between efficiency, accuracy, and reliability based on user requirements. Empirical evaluations demonstrate that the reflection-agent-based mode consistently achieves the best balance, attaining a 64.8 percent compliance score against all rule-based policies (CheckRules) and outperforming baseline methods (single-pass and multi-pass) across key metrics such as precision, comprehensiveness, and relevance in LLM-guided evaluations.
Keywords: Financial knowledge graph, SEC 10-K filings, Schema-guided extraction, Reflection-agent, LLM-as-a-Judge, General Financial Data Infrastructure
Complexity vs Empirical Score
- Math Complexity: 1.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Street Traders
- Why: The paper is highly empirical, presenting a specific framework for building and evaluating a financial knowledge graph dataset with multiple metrics (precision, compliance scores) and real-world data (SEC 10-K filings). The mathematics involved is primarily algorithmic and structural rather than dense theoretical derivations.
flowchart TD
A["Research Goal<br>Create Financial KG & Framework"] --> B["Data Input<br>SEC 10-K Filings"]
B --> C["Process<br>Schema-Guided Extraction"]
C --> D["Output<br>Agentic KG Construction"]
D --> E["Evaluation<br>Multi-Modal Validation"]
E --> F["Outcome<br>FinReflectKG Dataset"]
F --> G["Key Finding<br>Reflection-Agent achieves 64.8% compliance"]
G --> H["Impact<br>Open-Source Financial Data Infrastructure"]