Skip to content

Vision Models

Free pre-trained models for computer vision tasks.

Object Detection

Model Objects Speed Size License Link
YOLOv8n 80 COCO Very fast 6 MB AGPL-3.0 ultralytics.com
YOLOv8s 80 COCO Fast 22 MB AGPL-3.0 ultralytics.com
DETR ResNet-50 91 COCO Medium 160 MB Apache-2.0 HF
RT-DETR 80 COCO Fast 65 MB Apache-2.0 HF
OWL-ViT v2 Open vocabulary Slow 600 MB Apache-2.0 HF

Image Segmentation

Model Type Size License Link
SAM (Segment Anything) Interactive/automatic 2.4 GB Apache-2.0 HF
SAM-HQ High-quality SAM 2.6 GB Apache-2.0 HF
MobileSAM Lightweight SAM 40 MB Apache-2.0 HF
SegFormer-B5 Semantic (ADE20K) 330 MB Apache-2.0 HF

Depth Estimation

Model Type Size License Link
Depth Anything v2 Monocular depth 97 MB (small) Apache-2.0 HF
MiDaS v3 Monocular depth 400 MB MIT HF
ZoeDepth Metric depth 340 MB MIT HF

OCR & Document

Model Task Size License Link
Tesseract.js OCR (100+ languages) 15 MB + lang Apache-2.0 tesseract.projectnaptha.com
TrOCR Handwriting recognition 330 MB MIT HF
LayoutLM v3 Document understanding 350 MB CC-BY-NC-SA HF
Donut Doc parsing (no OCR) 690 MB MIT HF

Face & Body

Model Task Browser? Size Link
MediaPipe Face Face detection + mesh Yes 2 MB ai.google.dev
MediaPipe Pose Body pose estimation Yes 5 MB ai.google.dev
MediaPipe Hands Hand tracking Yes 4 MB ai.google.dev
BlazeFace Fast face detection Yes 400 KB github.com/nicedoc