How AI Models Are Trained: 3 Phases Explained
Building an LLM is less like programming a computer and more like educating a student. It happens in three broad phases, each serving a different purpose.
Phase 1: Read Everything (Pre-training)
The model starts by reading an enormous amount of text — books, articles, websites, code, and more. During this phase, it doesn’t learn to answer questions. It learns language itself: grammar, facts, relationships between ideas, writing styles, and even some reasoning patterns.
How? The same next-word prediction from the previous snack. The model reads a sentence, predicts what word comes next, checks if it was right, and adjusts its parameters. Repeat this trillions of times and it develops a rich understanding of how language works.
Training example:
Input: "Water freezes at zero degrees"
Model predicts next word: "Celsius" ✓ (adjusts parameters to reinforce this)
After pre-training, the model is impressive but unrefined — like a student who has read the entire library but hasn’t learned how to have a conversation.
Phase 2: Learn to Follow Instructions (Fine-tuning)
Next, humans create examples of good conversations: questions paired with high-quality answers. The model trains on these examples, learning to respond in a helpful, structured way rather than simply continuing text.
This is what transforms a text predictor into something that feels like an assistant. The model learns formats (“here’s a numbered list”), boundaries (“I shouldn’t help with that”), and style (“be concise and clear”).
Phase 3: Learn from Feedback
Finally, human reviewers rate the model’s responses — thumbs up for helpful answers, thumbs down for unhelpful or harmful ones. The model uses these ratings to adjust its behavior, gradually learning to prefer responses that humans find genuinely useful.
This phase is what separates modern AI assistants from raw text generators. It aligns the model with human preferences rather than just statistical patterns.
A New Frontier: Learning to Think
Recent models go beyond these three phases. Starting in late 2024, models like OpenAI’s o-series and Anthropic’s Claude were trained to reason step by step before answering — “thinking” internally and considering multiple approaches before committing to a response.
This shift from “predict faster” to “think deeper” has produced dramatic improvements in math, science, and complex problem-solving.
Now that you know how models are built, let’s look at the raw material they work with: tokens.