General Purpose Datasets¶
Classic datasets used across ML education, benchmarking, and prototyping.
Tabular Classics¶
| Dataset | Rows | Cols | Size | Format | License | Browser? | Link |
|---|---|---|---|---|---|---|---|
| Iris | 150 | 5 | 4 KB | CSV | CC0 | Yes | UCI |
| Titanic | 891 | 12 | 60 KB | CSV | CC0 | Yes | Kaggle |
| Wine Quality | 6,497 | 13 | 260 KB | CSV | CC-BY-4.0 | Yes | UCI |
| Boston Housing | 506 | 14 | 50 KB | CSV | CC0 | Yes | Kaggle |
| Adult Income | 48,842 | 14 | 5.5 MB | CSV | CC-BY-4.0 | Yes | UCI |
| California Housing | 20,640 | 9 | 1.4 MB | CSV | CC0 | Yes | Kaggle |
| Diamonds | 53,940 | 10 | 3.2 MB | CSV | CC0 | Yes | Kaggle |
| Heart Disease | 303 | 14 | 11 KB | CSV | CC-BY-4.0 | Yes | UCI |
| Breast Cancer Wisconsin | 569 | 32 | 124 KB | CSV | CC-BY-4.0 | Yes | UCI |
| Penguins | 344 | 7 | 15 KB | CSV | CC0 | Yes | GitHub |
Image Benchmarks¶
| Dataset | Images | Classes | Size | Format | License | Browser? | Link |
|---|---|---|---|---|---|---|---|
| MNIST | 70,000 | 10 | 11 MB | IDX/PNG | CC-BY-SA-3.0 | Partial | Yann LeCun |
| Fashion-MNIST | 70,000 | 10 | 30 MB | IDX/PNG | MIT | Partial | GitHub |
| CIFAR-10 | 60,000 | 10 | 163 MB | Binary/PNG | MIT | No (too large) | Toronto |
| SVHN | 600,000+ | 10 | 400 MB | MAT/PNG | Non-commercial | No | Stanford |
| STL-10 | 113,000 | 10 | 2.6 GB | Binary | Non-commercial | No | Stanford |
Text Benchmarks¶
| Dataset | Samples | Size | Format | License | Browser? | Link |
|---|---|---|---|---|---|---|
| IMDb Reviews | 50,000 | 80 MB | Text | Open | Yes (as CSV) | Stanford |
| AG News | 127,600 | 29 MB | CSV | Open | Yes | Papers |
| 20 Newsgroups | 18,846 | 14 MB | Text | Open | Yes (as CSV) | scikit-learn |
| SMS Spam | 5,574 | 477 KB | CSV | CC-BY-4.0 | Yes | UCI |
| Yelp Reviews | 6.99M | 5.3 GB | JSON | Open | No (too large) | Yelp |
Browser-compatible datasets
Datasets marked "Yes" in the Browser column can be loaded directly into our Data Profiler or Dataset Explorer for instant analysis — no Python needed.