Scaling Laws

Scaling laws describe how AI systems improve as you spend more compute, data, and time. They’re broad, empirical patterns—not strict formulas—that help you plan capability gains and budget trade-offs. In modern AI, these patterns show up across the whole lifecycle: when models study (pre-training), practice (post-training), and think at answer time (inference).

Big ideas:

More resources → better quality, with diminishing returns.
Data quality and signal quality matter as much as quantity.
You balance three dials—study, practice, thinking—to hit your accuracy, latency, and cost targets.

1) Pre-Training (Study Time)

The model learns general skills from vast data.

What you scale: training tokens, model size (parameters), training steps/compute.
What improves: broad competence (language, coding, vision), sample efficiency downstream.
Trade-offs: bigger isn’t always better if under-trained or fed noisy/duplicate data. Aim to match data to model size so you don’t “starve” the model.

2) Post-Training (Practice Time)

You shape the base model for specific tasks and preferences.

What you scale: supervised examples, feedback rounds (RLHF/RLAIF), distillation passes, evaluator/verifier quality.
What improves: task accuracy, alignment, safety, instruction-following.
Trade-offs: risk of reward hacking or overfitting; keep diverse, high-signal feedback and honest evals.

3) Inference (Thinking Time)

At answer time, let the model reason and use tools before responding.

What you scale: steps of reasoning (Chain/Tree/Graph-of-Thought), number of samples (self-consistency), tool calls (search, code, calculators).
What improves: reliability on hard problems, grounded answers, fewer mistakes.
Trade-offs: higher latency and cost per query; returns taper, so cap “thinking” to your SLA.

‍