BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks
ArXiv ID: 2401.11011 “View on arXiv”
Authors: Unknown
Abstract
Large language models (LLMs) are deep learning algorithms being used to perform natural language processing tasks in various fields, from social sciences to finance and biomedical sciences. Developing and training a new LLM can be very computationally expensive, so it is becoming a common practice to take existing LLMs and finetune them with carefully curated datasets for desired applications in different fields. Here, we present BioFinBERT, a finetuned LLM to perform financial sentiment analysis of public text associated with stocks of companies in the biotechnology sector. The stocks of biotech companies developing highly innovative and risky therapeutic drugs tend to respond very positively or negatively upon a successful or failed clinical readout or regulatory approval of their drug, respectively. These clinical or regulatory results are disclosed by the biotech companies via press releases, which are followed by a significant stock response in many cases. In our attempt to design a LLM capable of analyzing the sentiment of these press releases,we first finetuned BioBERT, a biomedical language representation model designed for biomedical text mining, using financial textual databases. Our finetuned model, termed BioFinBERT, was then used to perform financial sentiment analysis of various biotech-related press releases and financial text around inflection points that significantly affected the price of biotech stocks.
Keywords: Large Language Models (LLMs), BioFinBERT, financial sentiment analysis, biotechnology sector, press release analysis, Equities (Biotechnology Sector)
Complexity vs Empirical Score
- Math Complexity: 4.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Street Traders
- Why: The paper uses established deep learning techniques (fine-tuning BERT models) with minimal novel mathematical derivations, but demonstrates strong empirical rigor through detailed data collection, a concrete backtested trading strategy, and code availability.
flowchart TD
A["Research Goal:<br/>Develop LLM for Sentiment Analysis<br/>of Biotech Press Releases"] --> B["Input Data:<br/>Financial Databases &<br/>Biotech Press Releases"]
B --> C["Methodology:<br/>Finetune BioBERT using<br/>Financial Text Databases"]
C --> D["Computational Process:<br/>Train BioFinBERT LLM<br/>on Financial Sentiment"]
D --> E["Outcome 1:<br/>BioFinBERT Model Validated"]
D --> F["Outcome 2:<br/>Analysis of Sentiment Around<br/>Biotech Inflection Points"]
E --> G["Key Finding:<br/>Effective LLM for Predicting<br/>Stock Response to Clinical Readouts"]
F --> G