Inference is the process of an AI model generating output from input — the moment when a trained model actually produces code, text, or predictions. When you send a prompt to Claude or ChatGPT, inference is what happens on the server to produce the response.
Inference is the production phase of AI — when trained models do actual work. Understanding inference helps explain why AI responses take time, cost money, and vary in speed.
Training:
Inference:
Speed:
Cost:
Quality:
Cloud inference (most common):
Local inference:
Most vibe coders use cloud inference through tools like Cursor, benefiting from powerful models without managing infrastructure.