false

CreditARF: A Framework for Corporate Credit Rating with Annual Report and Financial Feature Integration

CreditARF: A Framework for Corporate Credit Rating with Annual Report and Financial Feature Integration ArXiv ID: 2508.02738 “View on arXiv” Authors: Yumeng Shi, Zhongliang Yang, DiYang Lu, Yisi Wang, Yiting Zhou, Linna Zhou Abstract Corporate credit rating serves as a crucial intermediary service in the market economy, playing a key role in maintaining economic order. Existing credit rating models rely on financial metrics and deep learning. However, they often overlook insights from non-financial data, such as corporate annual reports. To address this, this paper introduces a corporate credit rating framework that integrates financial data with features extracted from annual reports using FinBERT, aiming to fully leverage the potential value of unstructured text data. In addition, we have developed a large-scale dataset, the Comprehensive Corporate Rating Dataset (CCRD), which combines both traditional financial data and textual data from annual reports. The experimental results show that the proposed method improves the accuracy of the rating predictions by 8-12%, significantly improving the effectiveness and reliability of corporate credit ratings. ...

August 2, 2025 · 2 min · Research Team

Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500

Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500 ArXiv ID: 2507.09739 “View on arXiv” Authors: Haojie Liu, Zihan Lin, Randall R. Rojas Abstract This study integrates real-time sentiment analysis from financial news, GPT-2 and FinBERT, with technical indicators and time-series models like ARIMA and ETS to optimize S&P 500 trading strategies. By merging sentiment data with momentum and trend-based metrics, including a benchmark buy-and-hold and sentiment-based approach, is evaluated through assets values and returns. Results show that combining sentiment-driven insights with traditional models improves trading performance, offering a more dynamic approach to stock trading that adapts to market changes in volatile environments. ...

July 13, 2025 · 2 min · Research Team

NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance

NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance ArXiv ID: 2507.09601 “View on arXiv” Authors: Hanwool Lee, Sara Yu, Yewon Hwang, Jonghyun Choi, Heejae Ahn, Sungbum Jung, Youngjae Yu Abstract General-purpose sentence embedding models often struggle to capture specialized financial semantics, especially in low-resource languages like Korean, due to domain-specific jargon, temporal meaning shifts, and misaligned bilingual vocabularies. To address these gaps, we introduce NMIXX (Neural eMbeddings for Cross-lingual eXploration of Finance), a suite of cross-lingual embedding models fine-tuned with 18.8K high-confidence triplets that pair in-domain paraphrases, hard negatives derived from a semantic-shift typology, and exact Korean-English translations. Concurrently, we release KorFinSTS, a 1,921-pair Korean financial STS benchmark spanning news, disclosures, research reports, and regulations, designed to expose nuances that general benchmarks miss. When evaluated against seven open-license baselines, NMIXX’s multilingual bge-m3 variant achieves Spearman’s rho gains of +0.10 on English FinSTS and +0.22 on KorFinSTS, outperforming its pre-adaptation checkpoint and surpassing other models by the largest margin, while revealing a modest trade-off in general STS performance. Our analysis further shows that models with richer Korean token coverage adapt more effectively, underscoring the importance of tokenizer design in low-resource, cross-lingual settings. By making both models and the benchmark publicly available, we provide the community with robust tools for domain-adapted, multilingual representation learning in finance. ...

July 13, 2025 · 2 min · Research Team

Interpretable Machine Learning for Macro Alpha: A News Sentiment Case Study

Interpretable Machine Learning for Macro Alpha: A News Sentiment Case Study ArXiv ID: 2505.16136 “View on arXiv” Authors: Yuke Zhang Abstract This study introduces an interpretable machine learning (ML) framework to extract macroeconomic alpha from global news sentiment. We process the Global Database of Events, Language, and Tone (GDELT) Project’s worldwide news feed using FinBERT – a Bidirectional Encoder Representations from Transformers (BERT) based model pretrained on finance-specific language – to construct daily sentiment indices incorporating mean tone, dispersion, and event impact. These indices drive an XGBoost classifier, benchmarked against logistic regression, to predict next-day returns for EUR/USD, USD/JPY, and 10-year U.S. Treasury futures (ZN). Rigorous out-of-sample (OOS) backtesting (5-fold expanding-window cross-validation, OOS period: c. 2017-April 2025) demonstrates exceptional, cost-adjusted performance for the XGBoost strategy: Sharpe ratios achieve 5.87 (EUR/USD), 4.65 (USD/JPY), and 4.65 (Treasuries), with respective compound annual growth rates (CAGRs) exceeding 50% in Foreign Exchange (FX) and 22% in bonds. Shapley Additive Explanations (SHAP) affirm that sentiment dispersion and article impact are key predictive features. Our findings establish that integrating domain-specific Natural Language Processing (NLP) with interpretable ML offers a potent and explainable source of macro alpha. ...

May 22, 2025 · 2 min · Research Team

Large language models in finance : what is financial sentiment?

Large language models in finance : what is financial sentiment? ArXiv ID: 2503.03612 “View on arXiv” Authors: Unknown Abstract Financial sentiment has become a crucial yet complex concept in finance, increasingly used in market forecasting and investment strategies. Despite its growing importance, there remains a need to define and understand what financial sentiment truly represents and how it can be effectively measured. We explore the nature of financial sentiment and investigate how large language models (LLMs) contribute to its estimation. We trace the evolution of sentiment measurement in finance, from market-based and lexicon-based methods to advanced natural language processing techniques. The emergence of LLMs has significantly enhanced sentiment analysis, providing deeper contextual understanding and greater accuracy in extracting sentiment from financial text. We examine how BERT-based models, such as RoBERTa and FinBERT, are optimized for structured sentiment classification, while GPT-based models, including GPT-4, OPT, and LLaMA, excel in financial text generation and real-time sentiment interpretation. A comparative analysis of bidirectional and autoregressive transformer architectures highlights their respective roles in investor sentiment analysis, algorithmic trading, and financial decision-making. By exploring what financial sentiment is and how it is estimated within LLMs, we provide insights into the growing role of AI-driven sentiment analysis in finance. ...

March 5, 2025 · 2 min · Research Team

Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach

Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach ArXiv ID: 2412.06837 “View on arXiv” Authors: Unknown Abstract This study explores the comparative performance of cutting-edge AI models, i.e., Finaance Bidirectional Encoder representations from Transsformers (FinBERT), Generatice Pre-trained Transformer GPT-4, and Logistic Regression, for sentiment analysis and stock index prediction using financial news and the NGX All-Share Index data label. By leveraging advanced natural language processing models like GPT-4 and FinBERT, alongside a traditional machine learning model, Logistic Regression, we aim to classify market sentiment, generate sentiment scores, and predict market price movements. This research highlights global AI advancements in stock markets, showcasing how state-of-the-art language models can contribute to understanding complex financial data. The models were assessed using metrics such as accuracy, precision, recall, F1 score, and ROC AUC. Results indicate that Logistic Regression outperformed the more computationally intensive FinBERT and predefined approach of versatile GPT-4, with an accuracy of 81.83% and a ROC AUC of 89.76%. The GPT-4 predefined approach exhibited a lower accuracy of 54.19% but demonstrated strong potential in handling complex data. FinBERT, while offering more sophisticated analysis, was resource-demanding and yielded a moderate performance. Hyperparameter optimization using Optuna and cross-validation techniques ensured the robustness of the models. This study highlights the strengths and limitations of the practical applications of AI approaches in stock market prediction and presents Logistic Regression as the most efficient model for this task, with FinBERT and GPT-4 representing emerging tools with potential for future exploration and innovation in AI-driven financial analytics ...

December 7, 2024 · 2 min · Research Team

FinBERT-BiLSTM: A Deep Learning Model for Predicting Volatile Cryptocurrency Market Prices Using Market Sentiment Dynamics

FinBERT-BiLSTM: A Deep Learning Model for Predicting Volatile Cryptocurrency Market Prices Using Market Sentiment Dynamics ArXiv ID: 2411.12748 “View on arXiv” Authors: Unknown Abstract Time series forecasting is a key tool in financial markets, helping to predict asset prices and guide investment decisions. In highly volatile markets, such as cryptocurrencies like Bitcoin (BTC) and Ethereum (ETH), forecasting becomes more difficult due to extreme price fluctuations driven by market sentiment, technological changes, and regulatory shifts. Traditionally, forecasting relied on statistical methods, but as markets became more complex, deep learning models like LSTM, Bi-LSTM, and the newer FinBERT-LSTM emerged to capture intricate patterns. Building upon recent advancements and addressing the volatility inherent in cryptocurrency markets, we propose a hybrid model that combines Bidirectional Long Short-Term Memory (Bi-LSTM) networks with FinBERT to enhance forecasting accuracy for these assets. This approach fills a key gap in forecasting volatile financial markets by blending advanced time series models with sentiment analysis, offering valuable insights for investors and analysts navigating unpredictable markets. ...

November 2, 2024 · 2 min · Research Team

Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning

Optimizing Performance: How Compact Models Match or Exceed GPT’s Classification Capabilities through Fine-Tuning ArXiv ID: 2409.11408 “View on arXiv” Authors: Unknown Abstract In this paper, we demonstrate that non-generative, small-sized models such as FinBERT and FinDRoBERTa, when fine-tuned, can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings in sentiment analysis for financial news. These fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment from daily financial news summaries sourced from Bloomberg. To fine-tune and compare these models, we created a novel database, which assigns a market score to each piece of news without human interpretation bias, systematically identifying the mentioned companies and analyzing whether their stocks have gone up, down, or remained neutral. Furthermore, the paper shows that the assumptions of Condorcet’s Jury Theorem do not hold suggesting that fine-tuned small models are not independent of the fine-tuned GPT models, indicating behavioural similarities. Lastly, the resulted fine-tuned models are made publicly available on HuggingFace, providing a resource for further research in financial sentiment analysis and text classification. ...

August 22, 2024 · 2 min · Research Team

Financial sentiment analysis using FinBERT with application in predicting stock movement

Financial sentiment analysis using FinBERT with application in predicting stock movement ArXiv ID: 2306.02136 “View on arXiv” Authors: Unknown Abstract In this study, we integrate sentiment analysis within a financial framework by leveraging FinBERT, a fine-tuned BERT model specialized for financial text, to construct an advanced deep learning model based on Long Short-Term Memory (LSTM) networks. Our objective is to forecast financial market trends with greater accuracy. To evaluate our model’s predictive capabilities, we apply it to a comprehensive dataset of stock market news and perform a comparative analysis against standard BERT, standalone LSTM, and the traditional ARIMA models. Our findings indicate that incorporating sentiment analysis significantly enhances the model’s ability to anticipate market fluctuations. Furthermore, we propose a suite of optimization techniques aimed at refining the model’s performance, paving the way for more robust and reliable market prediction tools in the field of AI-driven finance. ...

June 3, 2023 · 2 min · Research Team

FinBERT - A Large Language Model for Extracting Information from Financial Text

FinBERT - A Large Language Model for Extracting Information from Financial Text ArXiv ID: ssrn-3910214 “View on arXiv” Authors: Unknown Abstract We develop FinBERT, a state-of-the-art large language model that adapts to the finance domain. We show that FinBERT incorporates finance knowledge and can bette Keywords: FinBERT, Natural Language Processing, Large Language Models, Financial Text Analysis, Technology/AI Complexity vs Empirical Score Math Complexity: 2.0/10 Empirical Rigor: 8.0/10 Quadrant: Street Traders Why: The paper focuses on fine-tuning a pre-existing transformer model (FinBERT) with specific financial datasets, which is primarily an empirical, implementation-heavy task with significant data preparation and evaluation metrics, while the underlying mathematics is standard deep learning rather than novel or dense derivations. flowchart TD A["Research Goal:<br>Create domain-adapted LLM for finance"] --> B["Data:<br>Financial Documents & Corpora"] B --> C["Preprocessing:<br>Tokenization & Formatting"] C --> D["Core Methodology:<br>BERT Architecture Adaptation"] D --> E["Training:<br>Domain-specific Fine-tuning"] E --> F["Evaluation:<br>Benchmark Testing"] F --> G["Outcome:<br>FinBERT Model"] F --> H["Outcome:<br>Improved Performance vs. General LLMs"] G --> I["Final Result:<br>State-of-the-art Financial NLP"] H --> I

August 27, 2021 · 1 min · Research Team