false

Cyber risk and the cross-section of stock returns

Cyber risk and the cross-section of stock returns ArXiv ID: 2402.04775 “View on arXiv” Authors: Unknown Abstract We extract firms’ cyber risk with a machine learning algorithm measuring the proximity between their disclosures and a dedicated cyber corpus. Our approach outperforms dictionary methods, uses full disclosure and not devoted-only sections, and generates a cyber risk measure uncorrelated with other firms’ characteristics. We find that a portfolio of US-listed stocks in the high cyber risk quantile generates an excess return of 18.72% p.a. Moreover, a long-short cyber risk portfolio has a significant and positive risk premium of 6.93% p.a., robust to all factors’ benchmarks. Finally, using a Bayesian asset pricing method, we show that our cyber risk factor is the essential feature that allows any multi-factor model to price the cross-section of stock returns. ...

February 7, 2024 · 2 min · Research Team

Cross-Domain Behavioral Credit Modeling: transferability from private to central data

Cross-Domain Behavioral Credit Modeling: transferability from private to central data ArXiv ID: 2401.09778 “View on arXiv” Authors: Unknown Abstract This paper introduces a credit risk rating model for credit risk assessment in quantitative finance, aiming to categorize borrowers based on their behavioral data. The model is trained on data from Experian, a widely recognized credit bureau, to effectively identify instances of loan defaults among bank customers. Employing state-of-the-art statistical and machine learning techniques ensures the model’s predictive accuracy. Furthermore, we assess the model’s transferability by testing it on behavioral data from the Bank of Italy, demonstrating its potential applicability across diverse datasets during prediction. This study highlights the benefits of incorporating external behavioral data to improve credit risk assessment in financial institutions. ...

January 18, 2024 · 2 min · Research Team

Leveraging Sample Entropy for Enhanced Volatility Measurement and Prediction in International Oil Price Returns

Leveraging Sample Entropy for Enhanced Volatility Measurement and Prediction in International Oil Price Returns ArXiv ID: 2312.12788 “View on arXiv” Authors: Unknown Abstract This paper explores the application of Sample Entropy (SampEn) as a sophisticated tool for quantifying and predicting volatility in international oil price returns. SampEn, known for its ability to capture underlying patterns and predict periods of heightened volatility, is compared with traditional measures like standard deviation. The study utilizes a comprehensive dataset spanning 27 years (1986-2023) and employs both time series regression and machine learning methods. Results indicate SampEn’s efficacy in predicting traditional volatility measures, with machine learning algorithms outperforming standard regression techniques during financial crises. The findings underscore SampEn’s potential as a valuable tool for risk assessment and decision-making in the realm of oil price investments. ...

December 20, 2023 · 2 min · Research Team

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing ArXiv ID: 2309.04557 “View on arXiv” Authors: Unknown Abstract We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${"\cal D"}1,\dots,{"\cal D"}N$ for the same learning model $f_θ$. Our objective is to minimize the cumulative deviation of the generated parameters ${“θ_i(t)"}{“t=0”}^T$ across all $T$ iterations from the specialized parameters $θ^\star{“1”},\ldots,θ^\star_N$ obtained for each dataset, while respecting the loss function for the model $f_{“θ(T)”}$ produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model $f_θ$ is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with $\mathcal{“O”}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$, across all training sets, cannot reduce the regret-optimal algorithm’s regret by more than $\mathcal{“O”}(\varepsilon q \bar{“N”}^{“1/2”})$, where $\bar{“N”}$ is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel. ...

September 8, 2023 · 2 min · Research Team

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models ArXiv ID: 2309.03079 “View on arXiv” Authors: Unknown Abstract Annual Reports of publicly listed companies contain vital information about their financial health which can help assess the potential impact on Stock price of the firm. These reports are comprehensive in nature, going up to, and sometimes exceeding, 100 pages. Analysing these reports is cumbersome even for a single firm, let alone the whole universe of firms that exist. Over the years, financial experts have become proficient in extracting valuable information from these documents relatively quickly. However, this requires years of practice and experience. This paper aims to simplify the process of assessing Annual Reports of all the firms by leveraging the capabilities of Large Language Models (LLMs). The insights generated by the LLM are compiled in a Quant styled dataset and augmented by historical stock price data. A Machine Learning model is then trained with LLM outputs as features. The walkforward test results show promising outperformance wrt S&P500 returns. This paper intends to provide a framework for future work in this direction. To facilitate this, the code has been released as open source. ...

September 6, 2023 · 2 min · Research Team

Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum

Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum ArXiv ID: 2308.16391 “View on arXiv” Authors: Unknown Abstract The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi developer can fool a detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected. On the contrary, a transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. In this paper, we aim to improve the accuracy of the transaction-based models by employing time-series features, which turn out to be crucial in capturing the life-time behaviour a Ponzi application but were completely overlooked in previous works. We propose a new set of 85 features (22 known account-based and 63 new time-series features), which allows off-the-shelf machine learning algorithms to achieve up to 30% higher F1-scores compared to existing works. ...

August 31, 2023 · 2 min · Research Team

Retail Demand Forecasting: A Comparative Study for Multivariate Time Series

Retail Demand Forecasting: A Comparative Study for Multivariate Time Series ArXiv ID: 2308.11939 “View on arXiv” Authors: Unknown Abstract Accurate demand forecasting in the retail industry is a critical determinant of financial performance and supply chain efficiency. As global markets become increasingly interconnected, businesses are turning towards advanced prediction models to gain a competitive edge. However, existing literature mostly focuses on historical sales data and ignores the vital influence of macroeconomic conditions on consumer spending behavior. In this study, we bridge this gap by enriching time series data of customer demand with macroeconomic variables, such as the Consumer Price Index (CPI), Index of Consumer Sentiment (ICS), and unemployment rates. Leveraging this comprehensive dataset, we develop and compare various regression and machine learning models to predict retail demand accurately. ...

August 23, 2023 · 2 min · Research Team

American options in time-dependent one-factor models: Semi-analytic pricing, numerical methods and ML support

American options in time-dependent one-factor models: Semi-analytic pricing, numerical methods and ML support ArXiv ID: 2307.13870 “View on arXiv” Authors: Unknown Abstract Semi-analytical pricing of American options in a time-dependent Ornstein-Uhlenbeck model was presented in [“Carr, Itkin, 2020”]. It was shown that to obtain these prices one needs to solve (numerically) a nonlinear Volterra integral equation of the second kind to find the exercise boundary (which is a function of the time only). Once this is done, the option prices follow. It was also shown that computationally this method is as efficient as the forward finite difference solver while providing better accuracy and stability. Later this approach called “the Generalized Integral transform” method has been significantly extended by the authors (also, in cooperation with Peter Carr and Alex Lipton) to various time-dependent one factor, and stochastic volatility models as applied to pricing barrier options. However, for American options, despite possible, this was not explicitly reported anywhere. In this paper our goal is to fill this gap and also discuss which numerical method (including those in machine learning) could be efficient to solve the corresponding Volterra integral equations. ...

July 26, 2023 · 2 min · Research Team

Comparative Analysis of Machine Learning, Hybrid, and Deep Learning Forecasting Models Evidence from European Financial Markets and Bitcoins

Comparative Analysis of Machine Learning, Hybrid, and Deep Learning Forecasting Models Evidence from European Financial Markets and Bitcoins ArXiv ID: 2307.08853 “View on arXiv” Authors: Unknown Abstract This study analyzes the transmission of market uncertainty on key European financial markets and the cryptocurrency market over an extended period, encompassing the pre, during, and post-pandemic periods. Daily financial market indices and price observations are used to assess the forecasting models. We compare statistical, machine learning, and deep learning forecasting models to evaluate the financial markets, such as the ARIMA, hybrid ETS-ANN, and kNN predictive models. The study results indicate that predicting financial market fluctuations is challenging, and the accuracy levels are generally low in several instances. ARIMA and hybrid ETS-ANN models perform better over extended periods compared to the kNN model, with ARIMA being the best-performing model in 2018-2021 and the hybrid ETS-ANN model being the best-performing model in most of the other subperiods. Still, the kNN model outperforms the others in several periods, depending on the observed accuracy measure. Researchers have advocated using parametric and non-parametric modeling combinations to generate better results. In this study, the results suggest that the hybrid ETS-ANN model is the best-performing model despite its moderate level of accuracy. Thus, the hybrid ETS-ANN model is a promising financial time series forecasting approach. The findings offer financial analysts an additional source that can provide valuable insights for investment decisions. ...

July 17, 2023 · 2 min · Research Team

Mapping Global Value Chains at the Product Level

Mapping Global Value Chains at the Product Level ArXiv ID: 2308.02491 “View on arXiv” Authors: Unknown Abstract Value chain data is crucial to navigate economic disruptions, such as those caused by the COVID-19 pandemic and the war in Ukraine. Yet, despite its importance, publicly available value chain datasets, such as the World Input-Output Database'', Inter-Country Input-Output Tables’’, EXIOBASE'' or the EORA’’, lack detailed information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and rely instead on more aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method based on machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 300+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) and 1200+ products to infer value chain information implicit in their trade patterns. Furthermore, we use proportional allocation to assign the trade flow between regions and countries. This work provides an approximate method to map value chain data at the product level with a relevant trade flow, that should be of interest to people working in logistics, trade, and sustainable development. ...

June 12, 2023 · 2 min · Research Team