Quantifying A Firm’s AI Engagement: Constructing Objective, Data-Driven, AI Stock Indices Using 10-K Filings

ArXiv ID: 2501.01763 “View on arXiv”

Authors: Unknown

Abstract

Following an analysis of existing AI-related exchange-traded funds (ETFs), we reveal the selection criteria for determining which stocks qualify as AI-related are often opaque and rely on vague phrases and subjective judgments. This paper proposes a new, objective, data-driven approach using natural language processing (NLP) techniques to classify AI stocks by analyzing annual 10-K filings from 3,395 NASDAQ-listed firms between 2011 and 2023. This analysis quantifies each company’s engagement with AI through binary indicators and weighted AI scores based on the frequency and context of AI-related terms. Using these metrics, we construct four AI stock indices-the Equally Weighted AI Index (AII), the Size-Weighted AI Index (SAII), and two Time-Discounted AI Indices (TAII05 and TAII5X)-offering different perspectives on AI investment. We validate our methodology through an event study on the launch of OpenAI’s ChatGPT, demonstrating that companies with higher AI engagement saw significantly greater positive abnormal returns, with analyses supporting the predictive power of our AI measures. Our indices perform on par with or surpass 14 existing AI-themed ETFs and the Nasdaq Composite Index in risk-return profiles, market responsiveness, and overall performance, achieving higher average daily returns and risk-adjusted metrics without increased volatility. These results suggest our NLP-based approach offers a reliable, market-responsive, and cost-effective alternative to existing AI-related ETF products. Our innovative methodology can also guide investors, asset managers, and policymakers in using corporate data to construct other thematic portfolios, contributing to a more transparent, data-driven, and competitive approach.

Keywords: natural language processing (NLP), index construction, thematic investing, 10-K filings, asset pricing, equities

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Street Traders
  • Why: The paper relies on established NLP and statistical methods like TF-IDF without introducing novel mathematics, but it is highly data-driven, analyzing thousands of filings, constructing indices, and validating results through event studies and risk-adjusted performance metrics.
  flowchart TD
    A["<b>Research Goal</b><br/>Quantify firm AI engagement using<br/>objective, data-driven methods"] --> B["<b>Methodology & Data</b><br/>NLP analysis of 10-K filings<br/>2011-2023, 3,395 NASDAQ firms"]
    B --> C["<b>AI Classification</b><br/>Binary indicators & weighted<br/>AI scores based on term frequency/context"]
    C --> D["<b>Index Construction</b><br/>AII, SAII, TAII05, TAII5X<br/>4 thematic indices created"]
    D --> E["<b>Validation & Testing</b><br/>Event study on ChatGPT launch<br/>Comparison vs. 14 AI ETFs & Nasdaq"]
    E --> F["<b>Key Findings</b><br/>Higher AI engagement = superior returns<br/>NLP indices outperform existing ETFs<br/>Cost-effective, transparent methodology"]