Vision Models
Free pre-trained models for computer vision tasks.
Object Detection
| Model |
Objects |
Speed |
Size |
License |
Link |
| YOLOv8n |
80 COCO |
Very fast |
6 MB |
AGPL-3.0 |
ultralytics.com |
| YOLOv8s |
80 COCO |
Fast |
22 MB |
AGPL-3.0 |
ultralytics.com |
| DETR ResNet-50 |
91 COCO |
Medium |
160 MB |
Apache-2.0 |
HF |
| RT-DETR |
80 COCO |
Fast |
65 MB |
Apache-2.0 |
HF |
| OWL-ViT v2 |
Open vocabulary |
Slow |
600 MB |
Apache-2.0 |
HF |
Image Segmentation
| Model |
Type |
Size |
License |
Link |
| SAM (Segment Anything) |
Interactive/automatic |
2.4 GB |
Apache-2.0 |
HF |
| SAM-HQ |
High-quality SAM |
2.6 GB |
Apache-2.0 |
HF |
| MobileSAM |
Lightweight SAM |
40 MB |
Apache-2.0 |
HF |
| SegFormer-B5 |
Semantic (ADE20K) |
330 MB |
Apache-2.0 |
HF |
Depth Estimation
| Model |
Type |
Size |
License |
Link |
| Depth Anything v2 |
Monocular depth |
97 MB (small) |
Apache-2.0 |
HF |
| MiDaS v3 |
Monocular depth |
400 MB |
MIT |
HF |
| ZoeDepth |
Metric depth |
340 MB |
MIT |
HF |
OCR & Document
| Model |
Task |
Size |
License |
Link |
| Tesseract.js |
OCR (100+ languages) |
15 MB + lang |
Apache-2.0 |
tesseract.projectnaptha.com |
| TrOCR |
Handwriting recognition |
330 MB |
MIT |
HF |
| LayoutLM v3 |
Document understanding |
350 MB |
CC-BY-NC-SA |
HF |
| Donut |
Doc parsing (no OCR) |
690 MB |
MIT |
HF |
Face & Body