Reinforcement Learning Methods for the Stochastic Optimal Control of an Industrial Power-to-Heat System

ArXiv ID: 2411.02211 “View on arXiv”

Authors: Unknown

Abstract

The optimal control of sustainable energy supply systems, including renewable energies and energy storage, takes a central role in the decarbonization of industrial systems. However, the use of fluctuating renewable energies leads to fluctuations in energy generation and requires a suitable control strategy for the complex systems in order to ensure energy supply. In this paper, we consider an electrified power-to-heat system which is designed to supply heat in form of superheated steam for industrial processes. The system consists of a high-temperature heat pump for heat supply, a wind turbine for power generation, a sensible thermal energy storage for storing excess heat and a steam generator for providing steam. If the system’s energy demand cannot be covered by electricity from the wind turbine, additional electricity must be purchased from the power grid. For this system, we investigate the cost-optimal operation aiming to minimize the electricity cost from the grid by a suitable system control depending on the available wind power and the amount of stored thermal energy. This is a decision making problem under uncertainties about the future prices for electricity from the grid and the future generation of wind power. The resulting stochastic optimal control problem is treated as finite-horizon Markov decision process for a multi-dimensional controlled state process. We first consider the classical backward recursion technique for solving the associated dynamic programming equation for the value function and compute the optimal decision rule. Since that approach suffers from the curse of dimensionality we also apply reinforcement learning techniques, namely Q-learning, that are able to provide a good approximate solution to the optimization problem within reasonable time.

Keywords: Stochastic Optimal Control, Reinforcement Learning, Q-learning, Dynamic Programming, Markov Decision Process

Complexity vs Empirical Score

  • Math Complexity: 8.0/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Holy Grail
  • Why: The paper uses advanced stochastic optimal control and Markov decision processes with significant mathematical derivation, but also includes calibration of real-world data and numerical experiments with Q-learning, making it both mathematically dense and empirically grounded.
  flowchart TD
    A["Research Goal:<br>Cost-optimal control of<br>Power-to-Heat System"] --> B["Methodology: Dynamic Programming & Reinforcement Learning"]
    
    B --> C{"Solve Stochastic<br>Optimal Control Problem"}
    C --> D["Approach 1: Classic Dynamic Programming<br>Backward Recursion"]
    C --> E["Approach 2: Reinforcement Learning<br>Q-learning"]
    
    D --> F["Curse of Dimensionality:<br>Computationally Expensive"]
    E --> G["Approximate Solution:<br>Efficient & Scalable"]
    
    F & G --> H["Outcome: Validated Q-learning<br>provides cost-optimal policy"]
    
    H --> I["Key Finding: RL enables<br>real-time control for<br>complex industrial systems"]