Distributional Reference Class Forecasting of Corporate Sales Growth With Multiple Reference Variables

ArXiv ID: 2405.03402 “View on arXiv”

Authors: Unknown

Abstract

This paper introduces an approach to reference class selection in distributional forecasting with an application to corporate sales growth rates using several co-variates as reference variables, that are implicit predictors. The method can be used to detect expert or model-based forecasts exposed to (behavioral) bias or to forecast distributions with reference classes. These are sets of similar entities, here firms, and rank based algorithms for their selection are proposed, including an optional preprocessing data dimension reduction via principal components analysis. Forecasts are optimal if they match the underlying distribution as closely as possible. Probability integral transform values rank the forecast capability of different reference variable sets and algorithms in a backtest on a data set of 21,808 US firms over the time period 1950 - 2019. In particular, algorithms on dimension reduced variables perform well using contemporaneous balance sheet and financial market parameters along with past sales growth rates and past operating margins changes. Comparisions of actual analysts’ estimates to distributional forecasts and of historic distributional forecasts to realized sales growth illustrate the practical use of the method.

Keywords: Distributional forecasting, Reference class forecasting, Corporate sales growth, Backtesting, Principal component analysis

Complexity vs Empirical Score

  • Math Complexity: 5.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced statistical concepts like distributional forecasting, principal component analysis, and probability integral transform values for evaluation, indicating moderate-to-high math complexity. It demonstrates high empirical rigor by backtesting on a massive dataset (21,808 US firms, 1950-2019) and comparing results against actual analyst estimates.
  flowchart TD
    A["Research Goal<br>Distributional Forecasting of<br>Corporate Sales Growth"] --> B["Data & Inputs<br>21,808 US Firms<br>1950-2019<br>Co-variates: Balance Sheet,<br>Market Parameters, History"]
    B --> C{"Methodology"}
    C --> D["Dimension Reduction<br>Principal Component Analysis<br>Optional Step"]
    C --> E["Reference Class Selection<br>Rank-based Algorithms"]
    D --> F
    E --> F["Distributional Forecasts<br>Ranking via Probability<br>Integral Transform PIT"]
    F --> G["Backtesting & Evaluation<br>Compare Forecasts vs.<br>Actual Realized Growth"]
    G --> H["Key Findings/Outcomes<br>Dimension Reduction +<br>Contemporaneous Data performs best.<br>Effective for bias detection &<br>analyst comparison"]