What Are AI Tokens? How LLMs Read Your Words

3 min read

AI models don’t read words the way you do. Before processing your message, the model breaks your text into smaller pieces called tokens. Understanding tokens helps you understand costs, limits, and some of the quirks you’ll notice when using AI.

What Is a Token?

A token is a chunk of text — sometimes a whole word, sometimes part of a word, sometimes just a punctuation mark. The model’s tokenizer splits your input into these pieces before processing begins.

"I don't like flying."

Tokens: ["I", " don", "'t", " like", " flying", "."]

Notice that “don’t” gets split into two tokens. Common words like “the” or “hello” are usually single tokens, while longer or rarer words get broken into pieces: “unbelievable” might become [“un”, “believ”, “able”].

Why Not Just Use Whole Words?

If models used complete words, they’d need a vocabulary entry for every word in every language — plus every typo, technical term, and made-up word. That vocabulary would be impossibly large.

Tokenization solves this by working with sub-word pieces. The model learns around 30,000-100,000 token pieces that can be combined to represent any word. It’s like how 26 letters can form any English word, except tokens are larger chunks that capture meaningful patterns.

Why Tokens Matter to You

Cost. AI providers charge per token — both for the tokens you send (input) and the tokens the model generates (output). Here’s a quick reference:

WordsApproximate TokensExample
100~130A short email
500~650A blog post intro
1,000~1,300A full article
5,000~6,500A long report
75,000~100,000A novel

A rough rule of thumb for English: one token is about three-quarters of a word.

Limits. Models have a maximum number of tokens they can process at once (more on that in the next snack). Knowing your token count helps you stay within those limits.

Quirks. Tokenization explains some odd AI behaviors. Models can struggle with counting letters in words (because they see tokens, not individual characters) or simple arithmetic with digit-heavy numbers (because large numbers get split into multiple tokens).

A Note on Languages

The “three-quarters of a word” estimate applies to English. Other languages — especially those with non-Latin scripts like Chinese, Arabic, or Korean — often require more tokens per word, because most tokenizers were trained on English-heavy datasets.

Now that you know what tokens are, let’s explore the space where those tokens live: the context window.

Quick Quiz

Question 1 of 2

What is a token in the context of AI models?