Disentangling the sources of cyber risk premia
ArXiv ID: 2409.08728 “View on arXiv”
Authors: Unknown
Abstract
We use a methodology based on a machine learning algorithm to quantify firms’ cyber risks based on their disclosures and a dedicated cyber corpus. The model can identify paragraphs related to determined cyber-threat types and accordingly attribute several related cyber scores to the firm. The cyber scores are unrelated to other firms’ characteristics. Stocks with high cyber scores significantly outperform other stocks. The long-short cyber risk factors have positive risk premia, are robust to all factors’ benchmarks, and help price returns. Furthermore, we suggest the market does not distinguish between different types of cyber risks but instead views them as a single, aggregate cyber risk.
Keywords: cyber risk, risk premia, machine learning, disclosure analysis, factor pricing, Equities
Complexity vs Empirical Score
- Math Complexity: 7.5/10
- Empirical Rigor: 8.5/10
- Quadrant: Holy Grail
- Why: The paper employs advanced machine learning (doc2vec, clustering) and sophisticated asset pricing techniques (Fama-MacBeth, GRS tests, Bayesian factor selection), demonstrating high mathematical density. It is highly empirical, using a large dataset of 7000 firms over 17 years with rigorous backtesting, robustness checks, and an event study, making it well-suited for practical implementation.
flowchart TD
G["Research Goal:<br>Quantify & attribute cyber risk premia"] --> M["Key Methodology:<br>ML on disclosure corpus (BERT)"]
M --> D["Data Inputs:<br>10-K filings & cyber lexicon"]
D --> C["Computational Process:<br>Paragraph scoring & firm attribution"]
C --> O1["Outcome 1: Unique Cyber Scores<br>(Uncorrelated w/ firm chars)"]
C --> O2["Outcome 2: Risk Premia<br>High-score stocks outperform"]
C --> O3["Outcome 3: Market Perception<br>Viewed as single aggregate risk"]