Skip to content

Healthcare & Bio Datasets

Free datasets for medical AI, bioinformatics, drug discovery, and clinical research.

Ethical use

Healthcare datasets may contain sensitive information. Always check the data use agreement and follow ethical guidelines for your institution.

Clinical

Dataset Records Size License Link
MIMIC-IV 300K+ ICU stays 7 GB PhysioNet credentialed physionet.org
eICU 200K+ ICU stays 3 GB PhysioNet credentialed physionet.org
PhysioNet 80+ datasets Varies Various open physionet.org
NHANES 150K+ participants 500 MB Open cdc.gov
UK Biobank 500K participants Petabytes Application required ukbiobank.ac.uk

Genomics & Proteomics

Dataset Records Size License Link
UniProt 250M+ proteins 120 GB CC-BY-4.0 uniprot.org
PDB 200K+ structures 50 GB CC0 rcsb.org
NCBI GenBank 230M+ sequences Terabytes Open ncbi.nlm.nih.gov
1000 Genomes 3,202 genomes 800 TB Fort Lauderdale internationalgenome.org
AlphaFold DB 200M+ predictions 23 TB CC-BY-4.0 alphafold.ebi.ac.uk

Drug Discovery

Dataset Compounds Size License Link
ChEMBL 2.4M compounds 4 GB CC-BY-SA-3.0 ebi.ac.uk/chembl
PubChem 116M+ compounds 50 GB Open pubchem.ncbi.nlm.nih.gov
ZINC 230M+ compounds 300 GB Free academic zinc.docking.org
DrugBank 14K+ drugs 500 MB CC-BY-NC-4.0 drugbank.com

Medical Literature

Dataset Records Size License Link
PubMed 36M+ articles Metadata: 50 GB Open pubmed.ncbi.nlm.nih.gov
PMC Open Access 8.5M+ full-text 400 GB CC variants ncbi.nlm.nih.gov/pmc
CORD-19 1M+ COVID papers 12 GB Various semanticscholar.org