Deep Learning and NLP in Cryptocurrency Forecasting: Integrating Financial, Blockchain, and Social Media Data
ArXiv ID: 2311.14759 “View on arXiv”
Authors: Unknown
Abstract
We introduce novel approaches to cryptocurrency price forecasting, leveraging Machine Learning (ML) and Natural Language Processing (NLP) techniques, with a focus on Bitcoin and Ethereum. By analysing news and social media content, primarily from Twitter and Reddit, we assess the impact of public sentiment on cryptocurrency markets. A distinctive feature of our methodology is the application of the BART MNLI zero-shot classification model to detect bullish and bearish trends, significantly advancing beyond traditional sentiment analysis. Additionally, we systematically compare a range of pre-trained and fine-tuned deep learning NLP models against conventional dictionary-based sentiment analysis methods. Another key contribution of our work is the adoption of local extrema alongside daily price movements as predictive targets, reducing trading frequency and portfolio volatility. Our findings demonstrate that integrating textual data into cryptocurrency price forecasting not only improves forecasting accuracy but also consistently enhances the profitability and Sharpe ratio across various validation scenarios, particularly when applying deep learning NLP techniques. The entire codebase of our experiments is made available via an online repository: https://anonymous.4open.science/r/crypto-forecasting-public
Keywords: Cryptocurrency, Bitcoin, Ethereum, Natural Language Processing (NLP), Sentiment Analysis
Complexity vs Empirical Score
- Math Complexity: 4.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Street Traders
- Why: The paper is highly empirical with a public codebase, extensive backtesting, and multiple datasets, while using advanced but established deep learning models rather than novel mathematical derivations.
flowchart TD
A["Research Goal: Forecast Crypto Prices"] --> B["Data Acquisition & Preparation"]
B --> C["Sentiment Analysis with NLP"]
B --> D["Price Target Definition"]
C --> E["Model Training & Integration"]
D --> E
E --> F["Model Evaluation & Trading Simulation"]
F --> G{"Key Findings"}
G --> H["NLP Integration Improves Accuracy"]
G --> I["DL Models Outperform Baselines"]
G --> J["Targets Reduce Volatility & Boost Profitability"]
subgraph "Data Sources"
B1["Financial Data"]
B2["Blockchain Data"]
B3["Social Media Text"]
end
subgraph "Key Methodologies"
C1["BART MNLI Zero-Shot"]
C2["Fine-tuned DL Models"]
C3["Dictionary Methods"]
D1["Daily Price Moves"]
D2["Local Extrema"]
end
B --> B1
B --> B2
B --> B3
C --> C1
C --> C2
C --> C3
D --> D1
D --> D2