false

Insights into Tail-Based and Order Statistics

Insights into Tail-Based and Order Statistics ArXiv ID: 2511.04784 “View on arXiv” Authors: Hamidreza Maleki Almani Abstract Heavy-tailed phenomena appear across diverse domains –from wealth and firm sizes in economics to network traffic, biological systems, and physical processes– characterized by the disproportionate influence of extreme values. These distributions challenge classical statistical models, as their tails decay too slowly for conventional approximations to hold. Among their key descriptive measures are quantile contributions, which quantify the proportion of a total quantity (such as income, energy, or risk) attributed to observations above a given quantile threshold. This paper presents a theoretical study of the quantile contribution statistic and its relationship with order statistics. We derive a closed-form expression for the joint cumulative distribution function (CDF) of order statistics and, based on it, obtain an explicit CDF for quantile contributions applicable to small samples. We then investigate the asymptotic behavior of these contributions as the sample size increases, establishing the asymptotic normality of the numerator and characterizing the limiting distribution of the quantile contribution. Finally, simulation studies illustrate the convergence properties and empirical accuracy of the theoretical results, providing a foundation for applying quantile contributions in the analysis of heavy-tailed data. ...

November 6, 2025 · 2 min · Research Team

FinReflectKG - EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation

FinReflectKG - EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation ArXiv ID: 2510.05710 “View on arXiv” Authors: Fabrizio Dimino, Abhinav Arun, Bhaskarjit Sarmah, Stefano Pasquali Abstract Large language models (LLMs) are increasingly being used to extract structured knowledge from unstructured financial text. Although prior studies have explored various extraction methods, there is no universal benchmark or unified evaluation framework for the construction of financial knowledge graphs (KG). We introduce FinReflectKG - EvalBench, a benchmark and evaluation framework for KG extraction from SEC 10-K filings. Building on the agentic and holistic evaluation principles of FinReflectKG - a financial KG linking audited triples to source chunks from S&P 100 filings and supporting single-pass, multi-pass, and reflection-agent-based extraction modes - EvalBench implements a deterministic commit-then-justify judging protocol with explicit bias controls, mitigating position effects, leniency, verbosity and world-knowledge reliance. Each candidate triple is evaluated with binary judgments of faithfulness, precision, and relevance, while comprehensiveness is assessed on a three-level ordinal scale (good, partial, bad) at the chunk level. Our findings suggest that, when equipped with explicit bias controls, LLM-as-Judge protocols provide a reliable and cost-efficient alternative to human annotation, while also enabling structured error analysis. Reflection-based extraction emerges as the superior approach, achieving best performance in comprehensiveness, precision, and relevance, while single-pass extraction maintains the highest faithfulness. By aggregating these complementary dimensions, FinReflectKG - EvalBench enables fine-grained benchmarking and bias-aware evaluation, advancing transparency and governance in financial AI applications. ...

October 7, 2025 · 2 min · Research Team

Six Levels of Privacy: A Framework for Financial Synthetic Data

Six Levels of Privacy: A Framework for Financial Synthetic Data ArXiv ID: 2403.14724 “View on arXiv” Authors: Unknown Abstract Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the Six Levels’’ that include defenses against those attacks. ...

March 20, 2024 · 2 min · Research Team