MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

Updated Feb 19, 2026

High-quality benchmarks are fundamental to scientific progress: they enable fair comparison across methods, improve reproducibility, and provide practitioners with reliable guidance for methodological choices. We introduce macrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvRBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes.

In this leadership board, we report the aggregated and individual results for the benchmarked methods on three benchmarks. The main performance results we reported are AUROC (Area Under the Receiver Operating Characteristic Curve), AUPRC (Area Under the Precision-Recall Curve), and F1-Score. We compute the group ranking-based statistics and permutation tests with respect to AUROC and AUPRC (please refer to the Ranking-based Metrics Tab for details). For details about benchmark models and evaluation metrics calculation, please refer to the GitHub repo.


CBLOF	8.43	893	0.41	0.676	0.494
DTE-C	6.78	1090	0.52	0.781	0.454
DTE-NP	6.12	1026	0.59	0.783	0.444
EGMM	4.88	1535	0.68	0.833	0.339
FoMo-0D	6.20	1283	0.59	0.800	0.39
GOAD	11.17	674	0.21	0.557	0.573
ICL	7.89	1180	0.44	0.721	0.499
IForest	9.56	633	0.34	0.628	0.548
KNN	6.68	1036	0.55	0.766	0.447
LOF	6.20	1265	0.58	0.773	0.441
NPT-AD	11.45	55	0.19	0.517	0.58
OCSVM	8.15	370	0.44	0.724	0.524
OutFormer	4.84	1355	0.69	0.854	0.286
TabPFN-OD	5.22	1604	0.66	0.822	0.259


CBLOF	8.38	874	0.41	0.859	0.565
DTE-C	6.24	1411	0.55	0.906	0.508
DTE-NP	5.72	1066	0.62	0.911	0.489
EGMM	4.76	1521	0.69	0.920	0.378
FoMo-0D	6.67	1108	0.55	0.894	0.481
GOAD	11.41	557	0.19	0.724	0.698
ICL	7.96	1176	0.43	0.869	0.584
IForest	8.80	699	0.40	0.858	0.595
KNN	6.22	1037	0.57	0.906	0.494
LOF	6.41	1153	0.56	0.890	0.519
NPT-AD	11.56	138	0.18	0.724	0.683
OCSVM	8.91	397	0.38	0.862	0.619
OutFormer	4.97	1342	0.68	0.931	0.364
TabPFN-OD	5.21	1522	0.65	0.921	0.328

MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

Ranking-based Metrics

Ranking-based results under AUPRC

Ranking-based results under AUROC