Climate AI for Corporate Decarbonization Metrics Extraction
ArXiv ID: 2411.03402 “View on arXiv”
Authors: Unknown
Abstract
Corporate Greenhouse Gas (GHG) emission targets are important metrics in sustainable investing [“12, 16”]. To provide a comprehensive view of company emission objectives, we propose an approach to source these metrics from company public disclosures. Without automation, curating these metrics manually is a labor-intensive process that requires combing through lengthy corporate sustainability disclosures that often do not follow a standard format. Furthermore, the resulting dataset needs to be validated thoroughly by Subject Matter Experts (SMEs), further lengthening the time-to-market. We introduce the Climate Artificial Intelligence for Corporate Decarbonization Metrics Extraction (CAI) model and pipeline, a novel approach utilizing Large Language Models (LLMs) to extract and validate linked metrics from corporate disclosures. We demonstrate that the process improves data collection efficiency and accuracy by automating data curation, validation, and metric scoring from public corporate disclosures. We further show that our results are agnostic to the choice of LLMs. This framework can be applied broadly to information extraction from textual data.
Keywords: Large Language Models (LLMs), Natural Language Processing (NLP), Greenhouse Gas Emissions, Sustainable Investing, Text Extraction, Equity (Sustainable/ESG)
Complexity vs Empirical Score
- Math Complexity: 3.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Street Traders
- Why: The paper focuses on practical NLP pipeline implementation and business process automation rather than theoretical finance, with some reported accuracy metrics and model comparison, placing it in the Street Traders quadrant.
flowchart TD
A["<b>Research Goal</b><br/>Automate extraction of corporate<br/>decarbonization metrics from disclosures"] --> B["<b>Data Input</b><br/>Corporate Sustainability<br/>Disclosures (Text)"]
B --> C["<b>LLM Pipeline</b><br/>Climate AI CAI Model"]
C --> D["<b>Extraction Process</b><br/>1. Locate emission targets<br/>2. Extract linked metrics<br/>3. Validate data"]
D --> E["<b>Validation</b><br/>Subject Matter Expert<br/>SME Review & Comparison"]
E --> F{"<b>Key Findings</b>"}
F --> G["<b>Outcome 1</b><br/>Improved data collection<br/>efficiency & accuracy"]
F --> H["<b>Outcome 2</b><br/>Model agnostic to<br/>choice of LLM"]
F --> I["<b>Outcome 3</b><br/>Scalable framework for<br/>textual information extraction"]