Skip to content

General Purpose Datasets

Classic datasets used across ML education, benchmarking, and prototyping.

Tabular Classics

Dataset Rows Cols Size Format License Browser? Link
Iris 150 5 4 KB CSV CC0 Yes UCI
Titanic 891 12 60 KB CSV CC0 Yes Kaggle
Wine Quality 6,497 13 260 KB CSV CC-BY-4.0 Yes UCI
Boston Housing 506 14 50 KB CSV CC0 Yes Kaggle
Adult Income 48,842 14 5.5 MB CSV CC-BY-4.0 Yes UCI
California Housing 20,640 9 1.4 MB CSV CC0 Yes Kaggle
Diamonds 53,940 10 3.2 MB CSV CC0 Yes Kaggle
Heart Disease 303 14 11 KB CSV CC-BY-4.0 Yes UCI
Breast Cancer Wisconsin 569 32 124 KB CSV CC-BY-4.0 Yes UCI
Penguins 344 7 15 KB CSV CC0 Yes GitHub

Image Benchmarks

Dataset Images Classes Size Format License Browser? Link
MNIST 70,000 10 11 MB IDX/PNG CC-BY-SA-3.0 Partial Yann LeCun
Fashion-MNIST 70,000 10 30 MB IDX/PNG MIT Partial GitHub
CIFAR-10 60,000 10 163 MB Binary/PNG MIT No (too large) Toronto
SVHN 600,000+ 10 400 MB MAT/PNG Non-commercial No Stanford
STL-10 113,000 10 2.6 GB Binary Non-commercial No Stanford

Text Benchmarks

Dataset Samples Size Format License Browser? Link
IMDb Reviews 50,000 80 MB Text Open Yes (as CSV) Stanford
AG News 127,600 29 MB CSV Open Yes Papers
20 Newsgroups 18,846 14 MB Text Open Yes (as CSV) scikit-learn
SMS Spam 5,574 477 KB CSV CC-BY-4.0 Yes UCI
Yelp Reviews 6.99M 5.3 GB JSON Open No (too large) Yelp

Browser-compatible datasets

Datasets marked "Yes" in the Browser column can be loaded directly into our Data Profiler or Dataset Explorer for instant analysis — no Python needed.