Local models are AI language models that run entirely on your own hardware — your laptop, desktop, or server — rather than through cloud APIs. They offer privacy, offline access, and zero API costs, though they require capable hardware and typically perform below the largest cloud models.
Local models put AI directly on your machine. No API keys, no internet required, no data leaving your computer.
| Cloud APIs | Local Models |
|---|---|
| Best quality models | Good but smaller models |
| Pay per token | Free after download |
| Requires internet | Works offline |
| Data sent to servers | Data stays on your machine |
| No hardware requirements | Needs capable GPU/CPU |
| Tool | Platform | Notes |
|---|---|---|
| Ollama | Mac, Linux, Windows | Simplest to start with |
| LM Studio | Mac, Windows | Visual interface |
| llama.cpp | All | Maximum performance |
ollama pull llama3
ollama run llama3
That's it. You're running AI locally.
| Model Size | RAM Needed | GPU Recommended |
|---|---|---|
| 7B parameters | 8GB+ | Optional |
| 13B parameters | 16GB+ | Recommended |
| 70B parameters | 64GB+ | Required |
Use local models when:
Use cloud APIs when:
Local models work well for quick prototyping, learning, and tasks where privacy matters. For serious development work, cloud models still lead in quality — but the gap is closing rapidly.