What large language models are

A large language model (LLM) is a neural network trained on enormous amounts of text data to predict the next word (or token — a piece of a word) in a sequence. By training on hundreds of billions of text examples, the model learns statistical patterns that allow it to generate coherent, contextually appropriate text.

How training works

LLMs are trained in stages. First, pre-training: the model learns from vast text corpora — books, websites, code, scientific papers — by repeatedly predicting masked or next tokens and adjusting its parameters when it gets them wrong. Then fine-tuning and reinforcement learning from human feedback (RLHF): human raters evaluate outputs and the model is adjusted to produce responses that are more helpful, harmless and honest.

What they are not

Despite extraordinary capabilities, LLMs do not understand language in the way humans do. They manipulate statistical patterns. This is why they are capable of writing authoritative-sounding text on topics they have no reliable information about, and why they can produce confident errors — a phenomenon called hallucination. They have no persistent memory, no access to real-time information (without tools), and no ability to verify their own outputs.

The current landscape

GPT-4 (OpenAI), Claude (Anthropic), Gemini (Google) and LLaMA (Meta) are among the most capable current models. They differ in training data, alignment approach, context window size and benchmark performance. The capabilities of frontier models are advancing rapidly, while costs are falling fast as inference becomes more efficient.