Local Models

Local models are AI language models that run entirely on your own hardware — your laptop, desktop, or server — rather than through cloud APIs. They offer privacy, offline access, and zero API costs, though they require capable hardware and typically perform below the largest cloud models.

Example

You install Ollama on your MacBook and download Llama 3. Now you can generate code, ask questions, and experiment with AI without sending any data to external servers — and without paying per-token API fees.

Local models put AI directly on your machine. No API keys, no internet required, no data leaving your computer.

Why Run Models Locally?

Cloud APIsLocal Models
Best quality modelsGood but smaller models
Pay per tokenFree after download
Requires internetWorks offline
Data sent to serversData stays on your machine
No hardware requirementsNeeds capable GPU/CPU

Getting Started

Tools for Running Local Models

ToolPlatformNotes
OllamaMac, Linux, WindowsSimplest to start with
LM StudioMac, WindowsVisual interface
llama.cppAllMaximum performance

Quick Start with Ollama

ollama pull llama3
ollama run llama3

That's it. You're running AI locally.

Hardware Requirements

Model SizeRAM NeededGPU Recommended
7B parameters8GB+Optional
13B parameters16GB+Recommended
70B parameters64GB+Required

When to Use Local vs Cloud

Use local models when:

  • Privacy is critical (sensitive code, private data)
  • You want zero ongoing costs
  • You're experimenting frequently
  • Internet is unreliable

Use cloud APIs when:

  • You need the best quality output
  • Complex reasoning tasks
  • Production applications
  • You want the latest model capabilities

Local Models for Vibe Coding

Local models work well for quick prototyping, learning, and tasks where privacy matters. For serious development work, cloud models still lead in quality — but the gap is closing rapidly.