false

Deep Learning and NLP in Cryptocurrency Forecasting: Integrating Financial, Blockchain, and Social Media Data

Deep Learning and NLP in Cryptocurrency Forecasting: Integrating Financial, Blockchain, and Social Media Data ArXiv ID: 2311.14759 “View on arXiv” Authors: Unknown Abstract We introduce novel approaches to cryptocurrency price forecasting, leveraging Machine Learning (ML) and Natural Language Processing (NLP) techniques, with a focus on Bitcoin and Ethereum. By analysing news and social media content, primarily from Twitter and Reddit, we assess the impact of public sentiment on cryptocurrency markets. A distinctive feature of our methodology is the application of the BART MNLI zero-shot classification model to detect bullish and bearish trends, significantly advancing beyond traditional sentiment analysis. Additionally, we systematically compare a range of pre-trained and fine-tuned deep learning NLP models against conventional dictionary-based sentiment analysis methods. Another key contribution of our work is the adoption of local extrema alongside daily price movements as predictive targets, reducing trading frequency and portfolio volatility. Our findings demonstrate that integrating textual data into cryptocurrency price forecasting not only improves forecasting accuracy but also consistently enhances the profitability and Sharpe ratio across various validation scenarios, particularly when applying deep learning NLP techniques. The entire codebase of our experiments is made available via an online repository: https://anonymous.4open.science/r/crypto-forecasting-public ...

November 23, 2023 · 2 min · Research Team

Natural Language Processing for Financial Regulation

Natural Language Processing for Financial Regulation ArXiv ID: 2311.08533 “View on arXiv” Authors: Unknown Abstract This article provides an understanding of Natural Language Processing techniques in the framework of financial regulation, more specifically in order to perform semantic matching search between rules and policy when no dataset is available for supervised learning. We outline how to outperform simple pre-trained sentences-transformer models using freely available resources and explain the mathematical concepts behind the key building blocks of Natural Language Processing. ...

November 14, 2023 · 1 min · Research Team

Multi-Label Topic Model for Financial Textual Data

Multi-Label Topic Model for Financial Textual Data ArXiv ID: 2311.07598 “View on arXiv” Authors: Unknown Abstract This paper presents a multi-label topic model for financial texts like ad-hoc announcements, 8-K filings, finance related news or annual reports. I train the model on a new financial multi-label database consisting of 3,044 German ad-hoc announcements that are labeled manually using 20 predefined, economically motivated topics. The best model achieves a macro F1 score of more than 85%. Translating the data results in an English version of the model with similar performance. As application of the model, I investigate differences in stock market reactions across topics. I find evidence for strong positive or negative market reactions for some topics, like announcements of new Large Scale Projects or Bankruptcy Filings, while I do not observe significant price effects for some other topics. Furthermore, in contrast to previous studies, the multi-label structure of the model allows to analyze the effects of co-occurring topics on stock market reactions. For many cases, the reaction to a specific topic depends heavily on the co-occurrence with other topics. For example, if allocated capital from a Seasoned Equity Offering (SEO) is used for restructuring a company in the course of a Bankruptcy Proceeding, the market reacts positively on average. However, if that capital is used for covering unexpected, additional costs from the development of new drugs, the SEO implies negative reactions on average. ...

November 10, 2023 · 2 min · Research Team

Multi-Industry Simplex : A Probabilistic Extension of GICS

Multi-Industry Simplex : A Probabilistic Extension of GICS ArXiv ID: 2310.04280 “View on arXiv” Authors: Unknown Abstract Accurate industry classification is a critical tool for many asset management applications. While the current industry gold-standard GICS (Global Industry Classification Standard) has proven to be reliable and robust in many settings, it has limitations that cannot be ignored. Fundamentally, GICS is a single-industry model, in which every firm is assigned to exactly one group - regardless of how diversified that firm may be. This approach breaks down for large conglomerates like Amazon, which have risk exposure spread out across multiple sectors. We attempt to overcome these limitations by developing MIS (Multi-Industry Simplex), a probabilistic model that can flexibly assign a firm to as many industries as can be supported by the data. In particular, we utilize topic modeling, an natural language processing approach that utilizes business descriptions to extract and identify corresponding industries. Each identified industry comes with a relevance probability, allowing for high interpretability and easy auditing, circumventing the black-box nature of alternative machine learning approaches. We describe this model in detail and provide two use-cases that are relevant to asset management - thematic portfolios and nearest neighbor identification. While our approach has limitations of its own, we demonstrate the viability of probabilistic industry classification and hope to inspire future research in this field. ...

October 6, 2023 · 2 min · Research Team

Predicting Financial Market Trends using Time Series Analysis and Natural Language Processing

Predicting Financial Market Trends using Time Series Analysis and Natural Language Processing ArXiv ID: 2309.00136 “View on arXiv” Authors: Unknown Abstract Forecasting financial market trends through time series analysis and natural language processing poses a complex and demanding undertaking, owing to the numerous variables that can influence stock prices. These variables encompass a spectrum of economic and political occurrences, as well as prevailing public attitudes. Recent research has indicated that the expression of public sentiments on social media platforms such as Twitter may have a noteworthy impact on the determination of stock prices. The objective of this study was to assess the viability of Twitter sentiments as a tool for predicting stock prices of major corporations such as Tesla, Apple. Our study has revealed a robust association between the emotions conveyed in tweets and fluctuations in stock prices. Our findings indicate that positivity, negativity, and subjectivity are the primary determinants of fluctuations in stock prices. The data was analyzed utilizing the Long-Short Term Memory neural network (LSTM) model, which is currently recognized as the leading methodology for predicting stock prices by incorporating Twitter sentiments and historical stock prices data. The models utilized in our study demonstrated a high degree of reliability and yielded precise outcomes for the designated corporations. In summary, this research emphasizes the significance of incorporating public opinions into the prediction of stock prices. The application of Time Series Analysis and Natural Language Processing methodologies can yield significant scientific findings regarding financial market patterns, thereby facilitating informed decision-making among investors. The results of our study indicate that the utilization of Twitter sentiments can serve as a potent instrument for forecasting stock prices, and ought to be factored in when formulating investment strategies. ...

August 31, 2023 · 2 min · Research Team

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey ArXiv ID: 2308.04947 “View on arXiv” Authors: Unknown Abstract Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain. ...

August 9, 2023 · 2 min · Research Team

Multimodal Document Analytics for Banking Process Automation

Multimodal Document Analytics for Banking Process Automation ArXiv ID: 2307.11845 “View on arXiv” Authors: Unknown Abstract Traditional banks face increasing competition from FinTechs in the rapidly evolving financial ecosystem. Raising operational efficiency is vital to address this challenge. Our study aims to improve the efficiency of document-intensive business processes in banking. To that end, we first review the landscape of business documents in the retail segment. Banking documents often contain text, layout, and visuals, suggesting that document analytics and process automation require more than plain natural language processing (NLP). To verify this and assess the incremental value of visual cues when processing business documents, we compare a recently proposed multimodal model called LayoutXLM to powerful text classifiers (e.g., BERT) and large language models (e.g., GPT) in a case study related to processing company register extracts. The results confirm that incorporating layout information in a model substantially increases its performance. Interestingly, we also observed that more than 75% of the best model performance (in terms of the F1 score) can be achieved with as little as 30% of the training data. This shows that the demand for data labeled data to set up a multi-modal model can be moderate, which simplifies real-world applications of multimodal document analytics. Our study also sheds light on more specific practices in the scope of calibrating a multimodal banking document classifier, including the need for fine-tuning. In sum, the paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business and offers practical guidance on how to unlock this potential in day-to-day operations. ...

July 21, 2023 · 2 min · Research Team

Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models

Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models ArXiv ID: 2306.12659 “View on arXiv” Authors: Unknown Abstract Sentiment analysis is a vital tool for uncovering insights from financial articles, news, and social media, shaping our understanding of market movements. Despite the impressive capabilities of large language models (LLMs) in financial natural language processing (NLP), they still struggle with accurately interpreting numerical values and grasping financial context, limiting their effectiveness in predicting financial sentiment. In this paper, we introduce a simple yet effective instruction tuning approach to address these issues. By transforming a small portion of supervised financial sentiment analysis data into instruction data and fine-tuning a general-purpose LLM with this method, we achieve remarkable advancements in financial sentiment analysis. In the experiment, our approach outperforms state-of-the-art supervised sentiment analysis models, as well as widely used LLMs like ChatGPT and LLaMAs, particularly in scenarios where numerical understanding and contextual comprehension are vital. ...

June 22, 2023 · 2 min · Research Team

Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis

Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis ArXiv ID: 2305.07972 “View on arXiv” Authors: Unknown Abstract Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license. ...

May 13, 2023 · 2 min · Research Team

ChatGPT: Unlocking the Future of NLP inFinance

ChatGPT: Unlocking the Future of NLP inFinance ArXiv ID: ssrn-4323643 “View on arXiv” Authors: Unknown Abstract This paper reviews the current state of ChatGPT technology in finance and its potential to improve existing NLP-based financial applications. We discuss the eth Keywords: ChatGPT, Natural Language Processing (NLP), Financial Technology (FinTech), Machine Learning, Ethics in AI, General Finance Complexity vs Empirical Score Math Complexity: 1.5/10 Empirical Rigor: 1.0/10 Quadrant: Philosophers Why: This paper is a literature review discussing the capabilities and applications of ChatGPT in finance, featuring no mathematical derivations, formulas, or empirical backtesting. It focuses on conceptual discussion, ethical considerations, and future research directions, resulting in low scores for both math complexity and empirical rigor. flowchart TD A["Research Goal:<br/>Evaluate ChatGPT in Finance NLP"] --> B["Key Inputs:<br/>Financial Texts, NLP Benchmarks"] B --> C["Methodology:<br/>Review, Compare, Analyze Ethics"] C --> D{"Computational Process"} D --> E["Application:<br/>Sentiment/Forecasting Models"] D --> F["Constraint:<br/>Hallucinations/Data Privacy"] E & F --> G["Outcomes:<br/>Enhanced NLP Capabilities"] G --> H["Outcomes:<br/>Ethical & Bias Considerations"]

January 13, 2023 · 1 min · Research Team