Audio & Speech Datasets¶
Free datasets for speech recognition, music analysis, sound classification, and audio processing.
Speech Recognition¶
| Dataset | Hours | Size | Language | License | Link |
|---|---|---|---|---|---|
| LibriSpeech | 1,000 | 60 GB | English | CC-BY-4.0 | openslr.org |
| Common Voice | 19,000+ | 70+ GB | 100+ languages | CC0 | commonvoice.mozilla.org |
| VoxPopuli | 400K+ | 1.8 TB | 23 EU languages | CC0 | HuggingFace |
| GigaSpeech | 10,000 | 700 GB | English | Apache-2.0 | github.com/SpeechColab |
| TED-LIUM | 452 | 35 GB | English | CC-BY-NC-ND | openslr.org |
| FLEURS | 12+ per lang | 350 GB | 102 languages | CC-BY-4.0 | HuggingFace |
Sound Classification¶
| Dataset | Clips | Classes | Size | License | Link |
|---|---|---|---|---|---|
| AudioSet | 2M+ | 632 | YouTube IDs | CC-BY-4.0 | research.google.com |
| ESC-50 | 2,000 | 50 | 600 MB | CC-BY-NC | github.com/karolpiczak |
| UrbanSound8K | 8,732 | 10 | 6.8 GB | CC-BY-NC | urbansounddataset.weebly.com |
| FSD50K | 51,197 | 200 | 27 GB | CC-BY-4.0 | zenodo.org |
Music¶
| Dataset | Tracks | Size | License | Link |
|---|---|---|---|---|
| GTZAN | 1,000 | 1.2 GB | Research | marsyas.info |
| MagnaTagATune | 25,863 | 3 GB | Research | mirg.city.ac.uk |
| Free Music Archive | 106,574 | 879 GB | CC variants | github.com/mdeff/fma |
| MusicNet | 330 | 11 GB | CC-BY-4.0 | zenodo.org |
Speaker Recognition¶
| Dataset | Speakers | Hours | License | Link |
|---|---|---|---|---|
| VoxCeleb1 | 1,251 | 352 | CC-BY-4.0 | robots.ox.ac.uk |
| VoxCeleb2 | 6,112 | 2,442 | CC-BY-4.0 | robots.ox.ac.uk |
| VCTK | 110 | 44 | CC-BY-4.0 | datashare.ed.ac.uk |