Whoever Controls Energy Controls Compute; Whoever Controls Compute Controls AI: Jonathan Ross

Whoever Controls Energy Controls Compute; Whoever Controls Compute Controls AI: Jonathan Ross’s Playbook for the AI Decade

October 1st 2025

Groq founder & CEO Jonathan Ross returned to 20VC with a blunt thesis: the next leg of AI will be won by those who can deliver fast, abundant inference—and the power to run it. Training headlines grab attention, but latency, capacity, and energy will determine who captures the dollars. OpenAI and Anthropic will likely co-design chips to secure supply; NVIDIA’s dominance persists as more inference begets more training. Europe can still compete—if it puts compute where the electrons are cheapest.

TL;DR

Inference is the bottleneck. Double a lab’s inference capacity and revenue “almost doubles” because tokens sold, speed, and engagement scale together.
Verticalization is coming. Expect top labs and hyperscalers to build or co-design silicon—not to beat NVIDIA outright, but to control allocation and timelines.
Energy is strategy. Nations (and companies) that site compute on cheap, reliable power will set the pace.
NVIDIA still wins. More inference → more training → sustained GPU demand and premium pricing.

The macro thesis: speed, capacity, and the end of “good enough”

Ross’s core assertion is simple and uncomfortable: we are compute-limited, and the market is underestimating how much inference capacity the world can productively consume. Two dynamics drive this:

Speed compounds value. Lower latency doesn’t just feel better—it converts. Borrowing from early web and CPG playbooks, Ross argues that faster response loops tighten the dopamine cycle, lift engagement, and, over time, build brand preference. In AI products, a 100–300 ms difference is a moat.
Tokens are revenue. Labs are capacity-gated. Remove rate limits and throughput caps and paid usage climbs—not linearly for every product, but meaningfully across the ecosystem. This is especially true as teams route more “smart spend” to higher-value prompts (e.g., self-consistency, tool use, reranking).

Net: The winner isn’t the team with one point more on a benchmark; it’s whoever can deliver low-latency, low-cost tokens at scale—today, not two years from now.

Why labs will build (or co-design) chips—even if NVIDIA keeps soaring

Building silicon is hard; keeping software and compilers current is harder. Ross still expects OpenAI, Anthropic, and hyperscalers to push into chips for one overriding reason: allocation control. With HBM and advanced packaging as gating factors, getting to the front of the line matters more than squeezing a few percent on FLOPS.

Supply chain clockspeeds: Traditional GPU roadmaps demand 18–24 month commitments. Any stack that can add meaningful capacity on a ~6-month cadence becomes strategically irresistible for buyers staring at waitlists and lost revenue.
Control beats perfection: A “good enough” in-house part that’s available often beats the world-class part that isn’t.

Important nuance: Increased inference pulls forward training. Better serving models spurs more fine-tuning, larger pretrains, and new modalities—reinforcing NVIDIA’s demand. In Ross’s five-year view, NVIDIA could retain >50% of revenue share even if its unit share falls, buoyed by brand, ecosystem, and training gravity.

Energy is policy: place compute where electrons are cheap

Ross’s refrain: “The countries that control compute will control AI—and you cannot have compute without energy.”

Practical implications

Siting beats slogans. Data-sovereignty rules don’t create watts. Europe can compete by placing data centers near abundant wind, hydro balancing, or friendly nuclear (domestic or allied).
Partnerships over purity. “Data embassies” in energy-rich regions (think Gulf states) can square sovereignty with power availability.
Permitting is a moat. The soft costs—delays, paperwork, uncertainty—now rival hard infrastructure costs. Jurisdictions that permit quickly will attract hyperscaler capex.

Bottom line: In the AI economy, grid pragmatism > model purism. Electrons decide.

The chip economics everyone forgets

Ross separates two phases of hardware value:

Deployment phase (capex-bound): New chips must clear payback vs. purchase + build cost.
Run phase (opex-bound): Once deployed, older parts keep earning if they beat power + rack costs—even after they’re no longer “state-of-the-art.”

Because compute is chronically short, even “older” accelerators can be fully utilized at healthy prices. That scarcity props up margins and encourages multiple hardware lines to thrive simultaneously.

China’s “home game” vs “away game”

Home game: With subsidies and aggressive nuclear buildouts, China can ensure domestic supply even if some model families are costlier to run.
Away game: Serving allies with constrained grids favors more energy-efficient inference and flexible, non-GPU supply chains. Ross expects the U.S. and partners to retain an advantage here for 2–3 years—if they move quickly.

Pricing philosophy: low margins, infinite volume

Compute follows Jevons’ paradox: lower cost per token expands total usage. Ross argues for intentionally thin margins (consistent with business stability) to build trust, maximize volume, and accelerate flywheels. High margins reduce volatility risk but invite competition; low margins align with customers and compound brand equity.

Labor, deflation, and the rise of “vibe coding”

Ross flips the common anxiety on its head:

Deflationary pressure: As AI and robotics permeate supply chains—from crop yields to logistics—unit costs fall.
New job markets: Expect labor shortages in emerging categories rather than mass unemployment.
Vibe coding goes mainstream: Natural-language software authoring turns “coding” into a baseline competency across roles, much like reading/writing after the printing press.

Forecast: five things to watch (next 12–24 months)

Token policy changes at the labs: Relaxed rate limits or new high-throughput tiers will signal confidence in added capacity.
Custom silicon announcements: Not just chips, but packaging + HBM deals that secure allocation two years out.
Permitting reforms and siting wins: Jurisdictions that fast-track power + cooling will collect hyperscaler MOU’s.
Latency ladders in enterprise: Competitive wins driven explicitly by response-time SLAs (not just accuracy).
Training–inference flywheel evidence: More specialized pretrains and fine-tunes that exist solely because cheaper inference made the business case work.

Why this matters

AI’s next surge won’t be decided only by clever architectures. It will be decided by who delivers tokens faster, cheaper, and more reliably, and who finds the electrons to do it. If Ross is right, strategy in 2026–2030 is a four-way optimization: chips, compilers, energy, and latency. Miss any one, and you’ll feel compute-short in a compute-short world.

“The countries that control compute will control AI—and you cannot have compute without energy.”

‍

REACH OUT

Discover the potential of AI and start creating impactful initiatives with insights, expert support, and strategic partnerships.
‍

View Post

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

View Post

Whoever Controls Energy Controls Compute; Whoever Controls Compute Controls AI: Jonathan Ross’s Playbook for the AI Decade

TL;DR

The macro thesis: speed, capacity, and the end of “good enough”

Why labs will build (or co-design) chips—even if NVIDIA keeps soaring

Energy is policy: place compute where electrons are cheap

Practical implications

The chip economics everyone forgets

China’s “home game” vs “away game”

Pricing philosophy: low margins, infinite volume

Labor, deflation, and the rise of “vibe coding”

Forecast: five things to watch (next 12–24 months)

Why this matters

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI’s Full-Stack Ambition: Sora, Energy, and the New Infrastructure Race

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video

OpenAI Launches Sora 2 and the Sora App: A New Medium for AI Video