Aligning Multilingual News for Stock Return Prediction

ArXiv ID: 2510.19203 “View on arXiv”

Authors: Yuntao Wu, Lynn Tao, Ing-Haw Cheng, Charles Martineau, Yoshio Nozawa, John Hull, Andreas Veneris

Abstract

News spreads rapidly across languages and regions, but translations may lose subtle nuances. We propose a method to align sentences in multilingual news articles using optimal transport, identifying semantically similar content across languages. We apply this method to align more than 140,000 pairs of Bloomberg English and Japanese news articles covering around 3500 stocks in Tokyo exchange over 2012-2024. Aligned sentences are sparser, more interpretable, and exhibit higher semantic similarity. Return scores constructed from aligned sentences show stronger correlations with realized stock returns, and long-short trading strategies based on these alignments achieve 10% higher Sharpe ratios than analyzing the full text sample.

Keywords: optimal transport, multilingual alignment, news sentiment, trading strategy, Sharpe ratio, Equity

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper uses advanced mathematical concepts like optimal transport and Sinkhorn algorithms, while also performing extensive empirical backtesting on a large-scale dataset of 140,000 news article pairs with implemented trading strategies and reported Sharpe ratios.
  flowchart TD
    A["Research Goal: Multilingual News Alignment<br>for Stock Return Prediction"] --> B["Data Input: 140k+ Bloomberg EN/JP<br>Articles (2012-2024, ~3500 Tokyo Stocks)"]
    B --> C["Methodology: Optimal Transport<br>Sentence Alignment"]
    C --> D["Computational Process: Map EN/JP<br>Sentences via Semantic Similarity"]
    D --> E{"Outcomes"}
    E --> F["Sparsity & Interpretability:<br>Aligned content is cleaner"]
    E --> G["Correlation Analysis:<br>Aligned scores vs. Realized Returns"]
    E --> H["Trading Strategy:<br>Long-Short based on alignments"]
    H --> I["Final Result: 10% Higher Sharpe Ratio<br>vs. Full Text Analysis"]