Kronos: A Foundation Model for the Language of Financial Markets

ArXiv ID: 2508.02739 “View on arXiv”

Authors: Yu Shi, Zongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu, Changshui Zhang, Jian Li

Abstract

The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis. Our pre-trained model is publicly available at https://github.com/shiyu-coder/Kronos.

Keywords: Time Series Foundation Models (TSFMs), Pre-training, Candlestick Data (K-line), Volatility Prediction, Autoregressive Modeling, Equities / Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 8.5/10
  • Empirical Rigor: 9.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced neural network architectures with specialized quantization and autoregressive modeling (high math), while backing it up with massive-scale pre-training on 12 billion records across 45 exchanges and extensive benchmarking across multiple downstream tasks including forecasting, volatility, and generation (high empirical rigor).
  flowchart TD
    A["Research Goal:<br>Develop a robust Time Series Foundation Model<br>for financial K-line data?"] --> B["Methodology:<br>Specialized Tokenizer + Autoregressive Pre-training"]
    B --> C["Input Data:<br>12B K-line records from 45 global exchanges"]
    C --> D["Process:<br>Pre-train Kronos Model"]
    D --> E["Key Findings (Zero-Shot):"]
    E --> F["+93% RankIC<br>Price Forecasting"]
    E --> G["-9% MAE<br>Volatility Prediction"]
    E --> H["+22% Fidelity<br>Synthetic Data Generation"]