Reinforcement Learning for Corporate Bond Trading: A Sell Side Perspective

ArXiv ID: 2406.12983 “View on arXiv”

Authors: Unknown

Abstract

A corporate bond trader in a typical sell side institution such as a bank provides liquidity to the market participants by buying/selling securities and maintaining an inventory. Upon receiving a request for a buy/sell price quote (RFQ), the trader provides a quote by adding a spread over a \textit{“prevalent market price”}. For illiquid bonds, the market price is harder to observe, and traders often resort to available benchmark bond prices (such as MarketAxess, Bloomberg, etc.). In \cite{“Bergault2023ModelingLI”}, the concept of \textit{“Fair Transfer Price”} for an illiquid corporate bond was introduced which is derived from an infinite horizon stochastic optimal control problem (for maximizing the trader’s expected P&L, regularized by the quadratic variation). In this paper, we consider the same optimization objective, however, we approach the estimation of an optimal bid-ask spread quoting strategy in a data driven manner and show that it can be learned using Reinforcement Learning. Furthermore, we perform extensive outcome analysis to examine the reasonableness of the trained agent’s behavior.

Keywords: Fair Transfer Price, Stochastic Optimal Control, Bid-Ask Spread, Reinforcement Learning, Inventory Management, Fixed Income (Corporate Bonds)

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 6.0/10
  • Quadrant: Holy Grail
  • Why: The paper relies on advanced stochastic optimal control and Hamilton-Jacobi-Bellman equations (high math), while also detailing a data-driven RL approach with outcome analysis, though specifics on backtest metrics and implementation details are limited in the provided excerpt.
  flowchart TD
    A["Research Goal: Learn optimal bid-ask<br>spread quoting for illiquid bonds"] --> B["Methodology: Reinforcement Learning<br>vs Stochastic Optimal Control"]
    
    B --> C{"Data/Inputs"}
    C --> C1["Corporate Bond RFQs"]
    C --> C2["Trade Execution Records"]
    C --> C3["Market Reference Prices"]
    
    C --> D["Computational Process"]
    D --> D1["RL Agent Training<br>Deep Q-Network"]
    D --> D2["Compute Fair Transfer Price<br>Stochastic Control Solution"]
    D --> D3["Compare Quote Strategies"]
    
    D --> E["Key Findings/Outcomes"]
    E --> E1["RL learns spread <br>driven by inventory risk"]
    E --> E2["Optimal spread inversely <br>correlated with liquidity"]
    E --> E3["RL agents adapt to market <br>regimes without manual rules"]