New to AI? Start here.
Never trained an AI model before? No problem. This page explains the whole idea in plain language — what labeling is, what training is, and exactly how to start. No jargon.
STEP ONE — THE IDEA
What is labeling?
Think of teaching a child.
Imagine showing a child photos and saying “this is a car,” “this is a dog.” After enough examples, the child just knows. Labeling is exactly that: you show the computer images and point out what's in them.
In practice: you draw a box around each object in an image and give it a name (“car,” “person,” “helmet”). Do this for a batch of images and you’ve built a labeled dataset.
The tool for this: Annotation StudioSTEP TWO — THE IDEA
What is training?
Now the child learns on its own.
After the child has seen thousands of labeled examples, it can recognize a car it has never seen before. Training is when the computer studies your labeled images and learns the patterns — so it can find those objects in brand-new images by itself.
In practice: you feed your labeled dataset to the trainer, it runs for a while, and out comes a “model” — a file that can detect your objects automatically.
The tool for this: Model TrainingTHE BIG PICTURE
The whole journey, in 4 steps
Collect images
Gather photos of the things you want the AI to recognize.
Label them
Draw boxes and name the objects. Use Annotation Studio.
Train a model
Let the computer learn from your labels. Use Model Training.
Use your model
Your trained model now detects objects on its own. Deploy it anywhere.
GOING DEEPER
The settings that actually matter
When you train, a few numbers control how the learning happens. You don't have to set them — Model Training picks smart defaults — but here's what they mean so the screen makes sense.
Epochs
how many timesOne epoch = the model looks at your entire set of images once. It needs several passes to really learn — like re-reading a book. Too few epochs and it barely learns; too many and it just memorizes your exact images and gets worse on new ones (that's called overfitting).
Batch size
how many at onceHow many images the model looks at before it updates what it has learned. Bigger batches train more smoothly but need more GPU memory (VRAM). Model Training caps this to fit your card automatically.
Image size
how much detailEvery image is resized to a square (commonly 640×640) before training. Bigger means the model sees finer detail — but trains slower and needs more memory.
Don't want to think about any of this? You don't have to. Hit Start with the defaults — they're tuned to just work.
CHOOSING YOUR MODEL
Which model should you train?
Why only YOLO?
Model Training uses YOLO — and only YOLO — because it's one of the best object-detection model families ever made: fast, accurate, and proven on real projects. We include every YOLO version so you can pick the one that fits you.
The versions
YOLO keeps improving. Each newer version is generally a better balance of speed and accuracy than the one before it.
The stable classic — widely used and reliable.
Newer refinements — better accuracy, faster detection.
A refined all-rounder — efficient and accurate.
The newest generation — our top pick for the best results today.
Our recommendation: go with v11 or YOLO26 — they give the best results for most people.
The sizes — Nano to Extra-Large
Each version also comes in sizes. Same brain, different capacity: bigger sizes are more accurate but need more powerful hardware and train slower.
| Size | Speed | Accuracy | Best for |
|---|---|---|---|
| Nano (n) | Fastest | Lowest | Real-time speed · small images (~320px) |
| Small (s) | Fast | Good | Fast detection · ~320–640px |
| Medium (m) | Balanced | High | Balanced all-rounder · ~640px |
| Large (l) | Slower | Higher | Small objects & detail · ~640–1280px |
| Extra-Large (x) | Slowest | Highest | Max accuracy · fine detail · ~1280px |
How do I pick the size for my dataset?
There's no single right answer — it depends on four things. The biggest one is how much data you actually have.
How many images do you have?
Hundreds → Nano or Small. A few thousand → Medium. Tens of thousands → Large or X. A big model trained on a tiny dataset just memorizes it (overfitting) — so small data wants a small model.
How hard is the task?
A few clear, different classes (car vs. person) → a small size handles it fine. Many similar classes, or tiny objects in the frame → a bigger size sees more.
How strong is your GPU?
Low VRAM → smaller model + smaller image size. Strong GPU → you can afford a bigger model and higher resolution.
Do you need speed?
Real-time or many images per second → Nano / Small. Running offline where accuracy matters more → size up.
Rule of thumb: more data + harder task + stronger GPU → go bigger. When in doubt, start at Medium + v11, test it, and only move up a size if you need more accuracy.
READY?
How to start
Two tools, used in order. Both run on your own machine.
Annotation Studio
Label your images. Draw boxes, name objects, export a clean dataset.
Open Annotation StudioModel Training
Feed your labeled data in and train a working model — no terminal, no Python.
Open Model TrainingThat’s the whole idea.
Label → Train → Use. Start with the first tool and follow the steps inside it.