Skip to content

Free Datasets

A curated catalog of high-quality, freely available datasets for machine learning, data science, and analytics.

Categories

Category Examples Count
General Purpose Iris, Titanic, MNIST, CIFAR-10, ImageNet subsets 15+
Computer Vision COCO, Open Images, PASCAL VOC, CelebA, LFW 12+
Natural Language Wikipedia dumps, Common Crawl, BookCorpus, SQuAD 15+
Tabular & Structured UCI ML Repository, Kaggle datasets, Census data 12+
Audio & Speech LibriSpeech, Common Voice, AudioSet, VoxCeleb 10+
Time Series Stock prices, weather, energy, IoT sensor data 10+
Geospatial OpenStreetMap, satellite imagery, climate data 8+
Healthcare & Bio MIMIC, PhysioNet, PubMed, protein structures 10+
Government & Public US Census, EU Open Data, World Bank, UN data 12+

Dataset Registries

These are platforms where you can discover thousands more datasets:

Platform URL Notes
Hugging Face Datasets huggingface.co/datasets 100k+ datasets, easy download via datasets library
Kaggle kaggle.com/datasets 50k+ datasets, requires free account
Google Dataset Search datasetsearch.research.google.com Search engine for datasets across the web
UCI ML Repository archive.ics.uci.edu Classic ML datasets, well-documented
Papers With Code paperswithcode.com/datasets Datasets linked to research papers
AWS Open Data registry.opendata.aws Large-scale datasets hosted on S3
GitHub Awesome Lists github.com/awesomedata/awesome-public-datasets Community-curated list
data.gov data.gov US government open data
EU Open Data data.europa.eu European Union open data

Metadata standard

Every dataset in our catalog includes:

  • Name and description
  • Size (rows, columns, file size)
  • Format (CSV, JSON, Parquet, images, etc.)
  • License (CC0, CC-BY, MIT, etc.)
  • Direct link to download
  • Browser-compatible flag (can it be loaded in our tools?)
  • Citation (BibTeX where available)