Unified continuous-time q-learning for mean-field game and mean-field control problems

ArXiv ID: 2407.04521 “View on arXiv”

Authors: Unknown

Abstract

This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent’s perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.

Keywords: q-learning, mean-field games, jump-diffusion models, martingale characterization, stochastic control, General Finance / Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 9.2/10
  • Empirical Rigor: 3.0/10
  • Quadrant: Lab Rats
  • Why: The paper is highly theoretical, centered on advanced stochastic calculus, martingale characterizations, and McKean-Vlasov jump-diffusion processes, indicating very high mathematical density. Empirically, it presents a unified algorithm framework but relies on simulated experiments with ‘satisfactory performance’ rather than providing backtest-ready code, datasets, or statistical metrics for real financial applications.
  flowchart TD
    A["Research Goal:<br>Unified continuous-time Q-learning<br>for MFG & MFC with unknown<br>distributions in jump-diffusion models"]
    B["Core Methodology:<br>Decoupled Integrated Q-function<br>(Decoupled Iq-function)"]
    C["Data/Input:<br>Jump-diffusion model dynamics<br>Representative agent states"]
    D["Computational Process:<br>Unified Q-learning algorithm<br>using test policies &<br>averaged martingale orthogonality"]
    E["Outcome 1:<br>MFG Equilibrium Policy<br>(Decoupled Iq-function characterization)"]
    F["Outcome 2:<br>MFC Optimal Policy<br>(Decoupled Iq-function characterization)"]
    F1["Performance: Satisfactory results in<br>financial applications<br>Exact parameterization obtained"]

    A --> B
    B --> C
    C --> D
    D --> E
    D --> F
    E --> F1
    F --> F1