There are now more than a hundred capable open AI models. Most of the time you only need to know four. Here's what each is actually good at — and how to pick.
If you've browsed the Ollama model library or Hugging Face recently you already know the problem: there are too many models, most of them sound similar, and nobody explains which one you should actually use.
The honest answer is that for 95% of tasks you only need to know four families. This is a quick guide to when each one shines.
Think in two dimensions first: size, then specialty
Model sizes are usually written in billions of parameters — 1B, 3B, 7B, 14B, 70B and so on. Bigger models are smarter but slower and heavier. The useful tiers for most people:
- 1–3B: tiny, near-instant, fine for simple summarisation and extraction. Runs on almost anything.
- 7–9B: the sweet spot. Genuinely capable. Runs comfortably on 16 GB laptops.
- 13–14B: smarter, noticeably slower. Needs a decent machine.
- 70B+: approaching frontier quality but needs a GPU or a beefy Mac Studio.
Pick your size first based on hardware, then pick a family based on what you're doing.
Llama 3.3 — the reliable generalist
Meta's Llama family is the default for a reason. The 3.3 generation is well-balanced across reasoning, writing, summarisation, and instruction-following. It's a safe first choice when you don't know what you need — it's rarely the single best at anything, but it's rarely bad at anything either.
Best for: general chat, summarisation, writing, most business tasks.
Qwen 2.5 — multilingual and code-strong
Alibaba's Qwen family has quietly become the one to beat on a lot of benchmarks. It handles non-English languages noticeably better than Llama and ships in specialised variants — Qwen 2.5-Coder is one of the strongest open code models available, and Qwen 2.5-VL handles images as well as text.
Best for: code, multilingual work, structured output, image understanding.
Mistral — efficient and focused
Mistral's models (Mistral 7B, Mixtral, and the newer Ministral family) punch above their weight on reasoning tasks and are unusually token-efficient, meaning they say more with less.
Best for: reasoning, analysis, tasks where you want fast, concise answers.
Phi — tiny but surprisingly capable
Microsoft's Phi models are small (often under 4B parameters) but trained on extremely high-quality data. The result is a model that runs on modest hardware while handling complex instructions better than you'd expect from its size.
Best for: edge devices, background tasks, situations where you need good-enough intelligence with minimal resources.
Honourable mentions
- Gemma (Google) — clean, safety-tuned, pairs well with Google infrastructure.
- DeepSeek — strong reasoning variants, particularly good at maths and code.
- Command R (Cohere) — built specifically for retrieval-augmented workflows.
- LLaVA and Qwen-VL — when you need a model that can read images.
Picking by use case
"I want to chat with a local AI about documents and general questions"
Llama 3.3 8B. It's the most forgiving starting point.
"I want AI help with code"
Qwen 2.5-Coder 7B or 14B. It outperforms most general models on programming tasks.
"I need reliable JSON output from the model"
Qwen 2.5 or Mistral. Both handle structured output more predictably than Llama.
"I want something that works on weak hardware"
Phi-3.5 Mini or Llama 3.2 3B. You'll be surprised how much they can do.
"I want the most capable thing I can run on a good Mac"
Llama 3.3 70B (quantised to 4-bit) or Qwen 2.5 72B. Genuine frontier-adjacent quality on a local machine.
A note on quantisation
When you see labels like Q4, Q5, or Q8 on a model download, that refers to quantisation — how compressed the weights are. Q4 takes a quarter of the space of the full-precision model with a minor quality drop; Q8 is near-identical quality but larger. For most people, Q4 is the right default: dramatically less RAM used, barely any quality loss.
The honest recommendation
Start with Llama 3.3 8B. Try one task. If something feels off, try Qwen. You'll rarely need to look further.
The ecosystem moves fast enough that this list will be out of date by year's end — but the decision framework won't. Size first, family second, quantisation third. Everything else is details.




