Continuous-time q-learning for mean-field control problems

ArXiv ID: 2306.16208 “View on arXiv”

Authors: Unknown

Abstract

This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.

Keywords: Mean-Field Control, Reinforcement Learning, Entropy-Regularization, Q-Learning, McKean-Vlasov

Complexity vs Empirical Score

Math Complexity: 9.2/10
Empirical Rigor: 6.8/10
Quadrant: Holy Grail
Why: The paper is highly theoretical, involving advanced continuous-time stochastic calculus, McKean-Vlasov dynamics, entropy regularization, and weak martingale characterizations, leading to a high math complexity score. Empirical rigor is moderate, as the paper includes simulation experiments and algorithm design for two financial examples (mean-variance portfolio and R&D investment), but it lacks real-world backtesting data or implementation-heavy details like code repositories.

  flowchart TD
    A["Research Goal:<br>Model-Free Learning<br>for Mean-Field Control"] --> B["Define Dual Q-Functions<br>Integrated q and Essential q<sub>e</sub>"]
    B --> C["Data: Entropy-Regularized<br>McKean-Vlasov System"]
    C --> D["Core Algorithm:<br>Weak Martingale Condition<br>+ Test Policy Search"]
    D --> E["Computational Process:<br>Policy Improvement<br>via q<sub>e</sub> updates"]
    E --> F["Key Finding: Model-Free<br>Optimal Control Solution"]
    F --> G["Validation:<br>LQ & Non-LQ Examples"]

Continuous-time q-learning for mean-field control problems#

Abstract#

Complexity vs Empirical Score#

Continuous-time q-learning for mean-field control problems

Abstract

Complexity vs Empirical Score