Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum
ArXiv ID: 2308.16391 “View on arXiv”
Authors: Unknown
Abstract
The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi developer can fool a detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected. On the contrary, a transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. In this paper, we aim to improve the accuracy of the transaction-based models by employing time-series features, which turn out to be crucial in capturing the life-time behaviour a Ponzi application but were completely overlooked in previous works. We propose a new set of 85 features (22 known account-based and 63 new time-series features), which allows off-the-shelf machine learning algorithms to achieve up to 30% higher F1-scores compared to existing works.
Keywords: Ponzi scheme, Ethereum, smart contract, transaction analysis, machine learning, Cryptocurrency
Complexity vs Empirical Score
- Math Complexity: 3.0/10
- Empirical Rigor: 7.5/10
- Quadrant: Street Traders
- Why: The paper relies on standard machine learning algorithms and feature engineering rather than advanced mathematical derivations, but demonstrates high empirical rigor through extensive experimentation with real blockchain transaction data, feature selection, and comparison against multiple baselines.
flowchart TD
A["Research Goal:<br>Improve accuracy of<br>transaction-based Ponzi detection"] --> B{"Methodology"};
B --> C["Feature Engineering<br>85 total features<br>• 22 known account-based<br>• 63 new time-series"];
B --> D["Dataset<br>Ethereum Transactions"];
C --> E["Model Training<br>Off-the-shelf ML Algorithms"];
D --> E;
E --> F["Model Evaluation<br>F1-score metric"];
F --> G["Key Outcome<br>Up to 30% higher F1-scores<br>vs. existing works"];