Benchmarking Large Language Model Volatility

ArXiv ID: 2311.15180 “View on arXiv”

Authors: Unknown

Abstract

The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.

Keywords: large language models, sentiment classification, portfolio construction, uncertainty quantification, news analysis, Equities

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Street Traders
  • Why: The paper is empirically heavy with a case study on US equity market sentiment analysis using real LLMs and backtestable portfolio construction, but mathematically sparse, focusing on practical volatility measurement via outputs rather than advanced theory.
  flowchart TD
    A["Research Goal:<br>Quantify LLM Volatility Impact<br>on Financial Decision-Making"] --> B["Methodology:<br>News Sentiment Analysis Case Study"]
    B --> C["Input:<br>US Equity Market News Data"]
    C --> D["Process:<br>LLM Inference with<br>Variable Temperature Parameters"]
    D --> E["Outcome 1:<br>High Variability in<br>Sentence-Level Sentiment"]
    E --> F["Downstream Effect:<br>Volatile Portfolio Construction"]
    F --> G["Key Finding:<br>Uncertainty Cascades to Returns"]
    G --> H["Mitigation Strategies:<br>Low Temperature vs.<br>Computational Ensembling"]