Random Forest

Can an unsupervised clustering algorithm reproduce a categorization system?

Can an unsupervised clustering algorithm reproduce a categorization system? ArXiv ID: 2408.10340 “View on arXiv” Authors: Unknown Abstract Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems’ consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depends on feature selection and the chosen distance metric. Using toy datasets and fund categorization as real-world examples we demonstrate that accurately reproducing ground truth classes is challenging. We also highlight the limitations of standard clustering evaluation metrics in identifying the optimal number of clusters relative to the ground truth classes. We then show that if appropriate features are available in the dataset, and a proper distance metric is known (e.g., using a supervised Random Forest-based distance metric learning method), then an unsupervised clustering can indeed reproduce the ground truth classes as distinct clusters. ...

High-Frequency Trading Liquidity Analysis | Application of Machine Learning Classification

High-Frequency Trading Liquidity Analysis | Application of Machine Learning Classification ArXiv ID: 2408.10016 “View on arXiv” Authors: Unknown Abstract This research presents a comprehensive framework for analyzing liquidity in financial markets, particularly in the context of high-frequency trading. By leveraging advanced machine learning classification techniques, including Logistic Regression, Support Vector Machine, and Random Forest, the study aims to predict minute-level price movements using an extensive set of liquidity metrics derived from the Trade and Quote (TAQ) data. The findings reveal that employing a broad spectrum of liquidity measures yields higher predictive accuracy compared to models utilizing a reduced subset of features. Key liquidity metrics, such as Liquidity Ratio, Flow Ratio, and Turnover, consistently emerged as significant predictors across all models, with the Random Forest algorithm demonstrating superior accuracy. This study not only underscores the critical role of liquidity in market stability and transaction costs but also highlights the complexities involved in short-interval market predictions. The research suggests that a comprehensive set of liquidity measures is essential for accurate prediction, and proposes future work to validate these findings across different stock datasets to assess their generalizability. ...

A Comprehensive Analysis of Machine Learning Models for Algorithmic Trading of Bitcoin

A Comprehensive Analysis of Machine Learning Models for Algorithmic Trading of Bitcoin ArXiv ID: 2407.18334 “View on arXiv” Authors: Unknown Abstract This study evaluates the performance of 41 machine learning models, including 21 classifiers and 20 regressors, in predicting Bitcoin prices for algorithmic trading. By examining these models under various market conditions, we highlight their accuracy, robustness, and adaptability to the volatile cryptocurrency market. Our comprehensive analysis reveals the strengths and limitations of each model, providing critical insights for developing effective trading strategies. We employ both machine learning metrics (e.g., Mean Absolute Error, Root Mean Squared Error) and trading metrics (e.g., Profit and Loss percentage, Sharpe Ratio) to assess model performance. Our evaluation includes backtesting on historical data, forward testing on recent unseen data, and real-world trading scenarios, ensuring the robustness and practical applicability of our models. Key findings demonstrate that certain models, such as Random Forest and Stochastic Gradient Descent, outperform others in terms of profit and risk management. These insights offer valuable guidance for traders and researchers aiming to leverage machine learning for cryptocurrency trading. ...

The Effect of Data Types' on the Performance of Machine Learning Algorithms for Financial Prediction

The Effect of Data Types’ on the Performance of Machine Learning Algorithms for Financial Prediction ArXiv ID: 2404.19324 “View on arXiv” Authors: Unknown Abstract Forecasting cryptocurrencies as a financial issue is crucial as it provides investors with possible financial benefits. A small improvement in forecasting performance can lead to increased profitability; therefore, obtaining a realistic forecast is very important for investors. Successful forecasting provides traders with effective buy-or-hold strategies, allowing them to make more profits. The most important thing in this process is to produce accurate forecasts suitable for real-life applications. Bitcoin, frequently mentioned recently due to its volatility and chaotic behavior, has begun to pay great attention and has become an investment tool, especially during and after the COVID-19 pandemic. This study provided a comprehensive methodology, including constructing continuous and trend data using one and seven years periods of data as inputs and applying machine learning (ML) algorithms to forecast Bitcoin price movement. A binarization procedure was applied using continuous data to construct the trend data representing each input feature trend. Following the related literature, the input features are determined as technical indicators, google trends, and the number of tweets. Random forest (RF), K-Nearest neighbor (KNN), Extreme Gradient Boosting (XGBoost-XGB), Support vector machine (SVM) Naive Bayes (NB), Artificial Neural Networks (ANN), and Long-Short-Term Memory (LSTM) networks were applied on the selected features for prediction purposes. This work investigates two main research questions: i. How does the sample size affect the prediction performance of ML algorithms? ii. How does the data type affect the prediction performance of ML algorithms? Accuracy and area under the ROC curve (AUC) values were used to compare the model performance. A t-test was performed to test the statistical significance of the prediction results. ...

The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance

The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance ArXiv ID: 2402.17194 “View on arXiv” Authors: Unknown Abstract The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level development of the national economy. The prediction of stock price trends is a popular research topic in academia. Predicting the three trends of stock pricesrising, sideways, and falling can assist investors in making informed decisions about buying, holding, or selling stocks. Establishing an effective forecasting model for predicting these trends is of substantial practical importance. This paper evaluates the predictive performance of random forest models combined with artificial intelligence on a test set of four stocks using optimal parameters. The evaluation considers both predictive accuracy and time efficiency. ...

CRISIS ALERT:Forecasting Stock Market Crisis Events Using Machine Learning Methods

CRISIS ALERT:Forecasting Stock Market Crisis Events Using Machine Learning Methods ArXiv ID: 2401.06172 “View on arXiv” Authors: Unknown Abstract Historically, the economic recession often came abruptly and disastrously. For instance, during the 2008 financial crisis, the SP 500 fell 46 percent from October 2007 to March 2009. If we could detect the signals of the crisis earlier, we could have taken preventive measures. Therefore, driven by such motivation, we use advanced machine learning techniques, including Random Forest and Extreme Gradient Boosting, to predict any potential market crashes mainly in the US market. Also, we would like to compare the performance of these methods and examine which model is better for forecasting US stock market crashes. We apply our models on the daily financial market data, which tend to be more responsive with higher reporting frequencies. We consider 75 explanatory variables, including general US stock market indexes, SP 500 sector indexes, as well as market indicators that can be used for the purpose of crisis prediction. Finally, we conclude, with selected classification metrics, that the Extreme Gradient Boosting method performs the best in predicting US stock market crisis events. ...

Maximizing Portfolio Predictability with Machine Learning

Maximizing Portfolio Predictability with Machine Learning ArXiv ID: 2311.01985 “View on arXiv” Authors: Unknown Abstract We construct the maximally predictable portfolio (MPP) of stocks using machine learning. Solving for the optimal constrained weights in the multi-asset MPP gives portfolios with a high monthly coefficient of determination, given the sample covariance matrix of predicted return errors from a machine learning model. Various models for the covariance matrix are tested. The MPPs of S&P 500 index constituents with estimated returns from Elastic Net, Random Forest, and Support Vector Regression models can outperform or underperform the index depending on the time period. Portfolios that take advantage of the high predictability of the MPP’s returns and employ a Kelly criterion style strategy consistently outperform the benchmark. ...

Stock Market Directional Bias Prediction Using ML Algorithms

Stock Market Directional Bias Prediction Using ML Algorithms ArXiv ID: 2310.16855 “View on arXiv” Authors: Unknown Abstract The stock market has been established since the 13th century, but in the current epoch of time, it is substantially more practicable to anticipate the stock market than it was at any other point in time due to the tools and data that are available for both traditional and algorithmic trading. There are many different machine learning models that can do time-series forecasting in the context of machine learning. These models can be used to anticipate the future prices of assets and/or the directional bias of assets. In this study, we examine and contrast the effectiveness of three different machine learning algorithms, namely, logistic regression, decision tree, and random forest to forecast the movement of the assets traded on the Japanese stock market. In addition, the models are compared to a feed forward deep neural network, and it is found that all of the models consistently reach above 50% in directional bias forecasting for the stock market. The results of our study contribute to a better understanding of the complexity involved in stock market forecasting and give insight on the possible role that machine learning could play in this context. ...

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

Quantifying Outlierness of Funds from their Categories using Supervised Similarity ArXiv ID: 2308.06882 “View on arXiv” Authors: Unknown Abstract Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings. ...

Maximally Machine-Learnable Portfolios

Maximally Machine-Learnable Portfolios ArXiv ID: 2306.05568 “View on arXiv” Authors: Unknown Abstract When it comes to stock returns, any form of predictability can bolster risk-adjusted profitability. We develop a collaborative machine learning algorithm that optimizes portfolio weights so that the resulting synthetic security is maximally predictable. Precisely, we introduce MACE, a multivariate extension of Alternating Conditional Expectations that achieves the aforementioned goal by wielding a Random Forest on one side of the equation, and a constrained Ridge Regression on the other. There are two key improvements with respect to Lo and MacKinlay’s original maximally predictable portfolio approach. First, it accommodates for any (nonlinear) forecasting algorithm and predictor set. Second, it handles large portfolios. We conduct exercises at the daily and monthly frequency and report significant increases in predictability and profitability using very little conditioning information. Interestingly, predictability is found in bad as well as good times, and MACE successfully navigates the debacle of 2022. ...