MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

Updated Feb 19, 2026

High-quality benchmarks are fundamental to scientific progress: they enable fair comparison across methods, improve reproducibility, and provide practitioners with reliable guidance for methodological choices. We introduce macrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvRBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes.

In this leadership board, we report the aggregated and individual results for the benchmarked methods on three benchmarks. The main performance results we reported are AUROC (Area Under the Receiver Operating Characteristic Curve), AUPRC (Area Under the Precision-Recall Curve), and F1-Score. We compute the group ranking-based statistics and permutation tests with respect to AUROC and AUPRC (please refer to the Ranking-based Metrics Tab for details). For details about benchmark models and evaluation metrics calculation, please refer to the GitHub repo.

Ranking-based Metrics

For each dataset $i$, methods are ranked by performance (higher is better). Let $N$ be the number of datasets and $M$ the number of methods.

Average Rank: $$ \mathrm{Avg.\ Rank}^{(m)}=\frac{1}{N}\sum_{i=1}^{N} r_i^{(m)} $$ where $r_i^{(m)} \in \{1,\dots,M\}$ is the rank of method $m$ on dataset $i$.

ELO: Pairwise rating updated from wins/losses between methods across datasets.

Winrate: Define pairwise win indicator $w_{i,mn} \in \{0,0.5,1\}$ (loss/tie/win of $m$ vs $n$ on dataset $i$): $$ \mathrm{Winrate}^{(m)}=\frac{1}{N(M-1)}\sum_{i=1}^{N}\sum_{n\neq m} w_{i,mn} $$

rAUC (rank-based AUC): Let $W_m(k)$ be cumulative winrate when opponents are included up to rank cutoff $k$: $$ \mathrm{rAUC}^{(m)}=\frac{1}{M}\sum_{k=1}^{M} W_m(k) $$ This summarizes how consistently method $m$ wins across the opponent-rank spectrum.

$C_\delta$ (pairwise advantage margin): Using win/loss indicators $w_{i,mn}$ and $\ell_{i,mn}$: $$ C_\delta^{(m)}=\frac{1}{N(M-1)}\sum_{i=1}^{N}\sum_{n\neq m}\bigl(w_{i,mn}-\ell_{i,mn}\bigr) $$ Higher $C_\delta$ means stronger net pairwise advantage.

Interpretation:

  • Lower Avg. Rank is better.
  • Higher ELO, Winrate, rAUC, and $C_\delta$ are better.

For detailed calculations, see Section C.2 of the paper and the evaluation metric implementation.

Ranking-based results under AUPRC

Ranking-based results under AUROC

if you like to utilize our dataset, please cite this:

@misc{ding2026macrodatanewbenchmarksthousands,
      title={MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection},
      author={Xueying Ding and Simon Klüttermann and Haomin Wen and Yilong Chen and Leman Akoglu},
      year={2026},
      eprint={2602.09329},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.09329},
}