On Evaluating Loss Functions for Stock Ranking: An Empirical Analysis With Transformer Model

ArXiv ID: 2510.14156 “View on arXiv”

Authors: Jan Kwiatkowski, Jarosław A. Chudziak

Abstract

Quantitative trading strategies rely on accurately ranking stocks to identify profitable investments. Effective portfolio management requires models that can reliably order future stock returns. Transformer models are promising for understanding financial time series, but how different training loss functions affect their ability to rank stocks well is not yet fully understood. Financial markets are challenging due to their changing nature and complex relationships between stocks. Standard loss functions, which aim for simple prediction accuracy, often aren’t enough. They don’t directly teach models to learn the correct order of stock returns. While many advanced ranking losses exist from fields such as information retrieval, there hasn’t been a thorough comparison to see how well they work for ranking financial returns, especially when used with modern Transformer models for stock selection. This paper addresses this gap by systematically evaluating a diverse set of advanced loss functions including pointwise, pairwise, listwise for daily stock return forecasting to facilitate rank-based portfolio selection on S&P 500 data. We focus on assessing how each loss function influences the model’s ability to discern profitable relative orderings among assets. Our research contributes a comprehensive benchmark revealing how different loss functions impact a model’s ability to learn cross-sectional and temporal patterns crucial for portfolio selection, thereby offering practical guidance for optimizing ranking-based trading strategies.

Keywords: Transformer Models, Ranking Loss Functions, Stock Return Forecasting, Portfolio Selection, Cross-Sectional Analysis, Equities

Complexity vs Empirical Score

Math Complexity: 6.0/10
Empirical Rigor: 7.5/10
Quadrant: Holy Grail
Why: The paper employs advanced deep learning (Transformers) and systematically evaluates multiple complex ranking loss functions, requiring significant mathematical formulation, yet it is grounded in a concrete empirical benchmark using S&P 500 data with a clear backtesting protocol for portfolio selection.

  flowchart TD
    A["Research Goal<br>How do ranking loss functions impact<br>Transformer model performance for<br>stock return forecasting & portfolio selection?"] --> B["Dataset & Inputs"]
    B --> C["Model Architecture & Methodology"]
    C --> D["Training & Evaluation"]
    D --> E["Key Findings & Outcomes"]

    subgraph B ["Data"]
        B1["S&P 500 Historical Data<br>Price, Volume, Indicators"]
        B2["Target Variable<br>Future Stock Returns"]
    end

    subgraph C ["Methodology"]
        C1["Transformer Model<br>Time-series encoder"]
        C2["Loss Functions Tested"]
        C3["Pointwise: MSE/MAE"]
        C4["Pairwise: RankNet"]
        C5["Listwise: ListMLE/SoftRank"]
    end

    subgraph D ["Process"]
        D1["Cross-Sectional Training<br>Learn patterns across assets"]
        D2["Temporal Validation<br>Out-of-sample testing"]
        D3["Portfolio Construction<br>Rank-based top-K selection"]
        D4["Performance Metrics<br>Sharpe Ratio, NDCG, Accuracy"]
    end

    subgraph E ["Outcomes"]
        E1["Listwise losses outperform<br>others for ranking tasks"]
        E2["ListMLE yields highest<br>portfolio Sharpe ratio"]
        E3["Listwise captures cross-sectional<br>patterns better than pointwise"]
        E4["Practical guidance for<br>loss function selection in finance"]
    end

    B1 --> C1
    B2 --> C3
    B2 --> C4
    B2 --> C5
    
    C1 --> C2
    C2 --> C3 & C4 & C5
    
    C3 & C4 & C5 --> D1
    D1 --> D2
    D2 --> D3
    D3 --> D4
    
    D4 --> E1
    D4 --> E2
    D4 --> E3
    D4 --> E4

On Evaluating Loss Functions for Stock Ranking: An Empirical Analysis With Transformer Model#

Abstract#

Complexity vs Empirical Score#

On Evaluating Loss Functions for Stock Ranking: An Empirical Analysis With Transformer Model

Abstract

Complexity vs Empirical Score