How Claude Actually Works

January 29, 2026

How Claude Actually Works

When you type something into Claude, what actually happens? The short answer is: your words get converted into numbers, those numbers pass through dozens of processing stages that transform them, and at the end the model predicts what word should come next. Then it repeats, one word at a time, until it's done responding.

The longer answer is more interesting. Let's walk through it.

From Words to Numbers and Back

The first thing that happens is tokenization. Your input gets split into tokens—usually whole words, but sometimes word fragments. "The flowers are finally blooming" becomes something like: ["The", " flowers", " are", " finally", " bloom", "ing"]. Each token gets assigned a number from the model's vocabulary.

Those token numbers then get converted into vectors, which are lists of numbers that represent each token's meaning. Think of a vector as coordinates in a vast space where similar concepts are near each other. "flower" and "plant" would be closer together than "flower" and "concrete."

These vectors then pass through the model's layers. Claude has dozens of them stacked on top of each other. Each layer transforms the vectors, refining the model's understanding. Early layers might recognize basic grammar; middle layers might understand that "flowers" is the subject and "blooming" describes what they're doing; later layers build up to the full meaning and figure out how to respond.

After the final layer, the model takes the transformed vector for the last position and projects it onto its entire vocabulary—essentially asking "given everything I've processed, what word should come next?" It outputs a probability for every possible token, picks one, and that becomes the first word of the response. Then it feeds that word back in and repeats the whole process to generate the next word.

Inside a Layer: Attention and Processing

Each layer has two main parts that do different jobs.

The first is attention. This is how tokens share information with each other. When processing "blooming," the attention mechanism lets the model look back at "flowers" and recognize they're connected—the flowers are the thing that's blooming. It computes attention weights that determine how much each previous token influences the current one.

The model learns what to pay attention to through training. Over millions of examples, it figures out patterns like "verbs should attend to their subjects" or "pronouns should attend to what they refer to."

The second part is the feed-forward network (also called the MLP). After attention has let tokens share information, the feed-forward network processes each token's vector individually: transforming it, extracting patterns, doing the "thinking" work. It expands the vector into a much higher dimension, applies some math, then compresses it back down. This is where most of the model's knowledge lives.

Neurons, Features, and Superposition

Inside those feed-forward networks are neurons. Millions of them. Each neuron is a single number that activates to different degrees depending on the input. You might imagine that "neuron 4,582 in layer 23" fires when it sees references to flowers, making it easy to interpret.

Unfortunately, it's not that clean.

The model has more concepts it needs to represent than it has neurons. So it cheats: it stores multiple features (meaningful concepts like "plant terminology" or "past tense verbs" or "questions about science") as overlapping patterns across the same neurons. This is called superposition.

Think of it like this: if you have 4 neurons, you might think you can only represent 4 things. But if you use combinations (ex: neurons 1+2, neurons 2+3, neurons 1+3+4) you can represent far more. It's similar to how binary uses combinations of 0s and 1s to represent any number.

The downside: this makes interpretation hard. Neuron 4,582 doesn't mean "flowers"—it fires for some inscrutable combination of unrelated concepts that happen to share that direction in the model's internal space.

Untangling the Mess: Dictionary Learning

Researchers are working on tools to reverse-engineer superposition. The main approach is called dictionary learning, using something called a sparse autoencoder (SAE).

The idea: take the model's messy internal activations and decompose them into a larger set of cleaner features. You train the SAE to compress activations into a sparse code (mostly zeros, a few active elements) and then reconstruct the original. The "dictionary" it learns (the set of directions it uses) ideally corresponds to interpretable concepts.

Instead of "neuron 4,582 fired," you get "feature 12,847 (references to flowers) fired." That's a much more useful unit for understanding what the model is doing.

Circuits: How Components Wire Together

Beyond individual features, researchers study circuits, which are specific pathways through the model that implement particular behaviors. The idea is that neural networks aren't just inscrutable blobs; they contain identifiable subgraphs of connected components that perform coherent computations.

A famous example is the induction head, which handles pattern completion. If you write "The flowers are blooming. The flowers are..." the model predicts "blooming" because:

An attention head in an early layer marks each token with what came before it
An attention head in a later layer searches for previous instances of the current pattern ("The flowers are"), finds what followed ("blooming"), and boosts that as the prediction

Two components, working together across layers, implementing a simple but powerful copying algorithm. That's a circuit.

Picking the Next Word: Sampling

After all that processing, the model outputs a probability distribution over its entire vocabulary: maybe 50,000+ possible next tokens. How does it pick one?

Greedy decoding just picks the single most probable token every time. It's deterministic but can be repetitive and boring.

Temperature reshapes the distribution before sampling. Higher temperature flattens it (more random, more creative), lower temperature sharpens it (more focused, more predictable).

Top-k sampling only considers the k most probable tokens (e.g., top 40) and samples from those.

Top-p (nucleus) sampling includes tokens until their cumulative probability reaches some threshold (e.g., 0.9), so the pool size varies dynamically based on how confident the model is.

Different sampling strategies produce different vibes in the output—this is part of why the same model can feel more or less creative depending on settings.

What the Model Actually Is: Weights

Everything we've looked at, the transformations, the attention patterns, the neuron activations, is determined by the model's weights. These are the billions of numbers that define every operation the network performs. In a meaningful sense, the weights are the model. Two models with identical architectures but different weights behave completely differently.

Training is just the process of adjusting these weights to make better predictions. The model starts with random weights, processes enormous amounts of text, and gradually tunes the weights until the transformations they define produce good outputs.

The architecture (how many layers, how attention works, etc.) is just scaffolding. The intelligence—such as it is—lives in the specific weight values that emerged from training.

That's the basic picture. There's much more to explore, but this covers the core mechanics of how a prompt becomes a response, one token at a time.