BASIR: Budget-Assisted Sectoral Impact Ranking – A Dataset for Sector Identification and Performance Prediction Using Language Models

ArXiv ID: 2504.13189 “View on arXiv”

Authors: Unknown

Abstract

Government fiscal policies, particularly annual union budgets, exert significant influence on financial markets. However, real-time analysis of budgetary impacts on sector-specific equity performance remains methodologically challenging and largely unexplored. This study proposes a framework to systematically identify and rank sectors poised to benefit from India’s Union Budget announcements. The framework addresses two core tasks: (1) multi-label classification of excerpts from budget transcripts into 81 predefined economic sectors, and (2) performance ranking of these sectors. Leveraging a comprehensive corpus of Indian Union Budget transcripts from 1947 to 2025, we introduce BASIR (Budget-Assisted Sectoral Impact Ranking), an annotated dataset mapping excerpts from budgetary transcripts to sectoral impacts. Our architecture incorporates fine-tuned embeddings for sector identification, coupled with language models that rank sectors based on their predicted performances. Our results demonstrate 0.605 F1-score in sector classification, and 0.997 NDCG score in predicting ranks of sectors based on post-budget performances. The methodology enables investors and policymakers to quantify fiscal policy impacts through structured, data-driven insights, addressing critical gaps in manual analysis. The annotated dataset has been released under CC-BY-NC-SA-4.0 license to advance computational economics research.

Keywords: multi-label classification, sectoral impact ranking, fine-tuned embeddings, language models, fiscal policy analysis, Equities

Complexity vs Empirical Score

  • Math Complexity: 4.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper uses advanced NLP techniques and statistical metrics like F1-score and NDCG, but the math is mostly applied ML rather than dense theoretical derivations. It demonstrates high empirical rigor with a specific dataset (1,671 entries), concrete backtesting-ready metrics (F1=0.605, NDCG=0.997), and clear data sources, making it practical for implementation.
  flowchart TD
    A["Research Goal: Systematic<br>Fiscal Policy Impact Analysis"] --> B["Input: BASIR Dataset<br>Budget Transcripts 1947-2025"]
    B --> C["Core Methodology"]
    subgraph C ["Two-Tier Pipeline"]
        C1["Fine-tuned Embeddings<br>Multi-label Sector Classification"] --> C2["Language Models<br>Performance Ranking NDCG"]
    end
    C --> D["Key Outcomes"]
    subgraph D ["Quantified Results"]
        D1["Class. F1-Score: 0.605"]
        D2["Rank. NDCG: 0.997"]
        D3["Dataset Released<br>CC-BY-NC-SA-4.0"]
    end