Whoever Controls Energy Controls Compute; Whoever Controls Compute Controls AI: Jonathan Ross
Whoever Controls Energy Controls Compute; Whoever Controls Compute Controls AI: Jonathan Ross’s Playbook for the AI Decade
October 1st 2025
Groq founder & CEO Jonathan Ross returned to 20VC with a blunt thesis: the next leg of AI will be won by those who can deliver fast, abundant inference—and the power to run it. Training headlines grab attention, but latency, capacity, and energy will determine who captures the dollars. OpenAI and Anthropic will likely co-design chips to secure supply; NVIDIA’s dominance persists as more inference begets more training. Europe can still compete—if it puts compute where the electrons are cheapest.
TL;DR
Inference is the bottleneck. Double a lab’s inference capacity and revenue “almost doubles” because tokens sold, speed, and engagement scale together.
Verticalization is coming. Expect top labs and hyperscalers to build or co-design silicon—not to beat NVIDIA outright, but to control allocation and timelines.
Energy is strategy. Nations (and companies) that site compute on cheap, reliable power will set the pace.
NVIDIA still wins. More inference → more training → sustained GPU demand and premium pricing.
The macro thesis: speed, capacity, and the end of “good enough”
Ross’s core assertion is simple and uncomfortable: we are compute-limited, and the market is underestimating how much inference capacity the world can productively consume. Two dynamics drive this:
Speed compounds value. Lower latency doesn’t just feel better—it converts. Borrowing from early web and CPG playbooks, Ross argues that faster response loops tighten the dopamine cycle, lift engagement, and, over time, build brand preference. In AI products, a 100–300 ms difference is a moat.
Tokens are revenue. Labs are capacity-gated. Remove rate limits and throughput caps and paid usage climbs—not linearly for every product, but meaningfully across the ecosystem. This is especially true as teams route more “smart spend” to higher-value prompts (e.g., self-consistency, tool use, reranking).
Net: The winner isn’t the team with one point more on a benchmark; it’s whoever can deliver low-latency, low-cost tokens at scale—today, not two years from now.
Why labs will build (or co-design) chips—even if NVIDIA keeps soaring
Building silicon is hard; keeping software and compilers current is harder. Ross still expects OpenAI, Anthropic, and hyperscalers to push into chips for one overriding reason: allocation control. With HBM and advanced packaging as gating factors, getting to the front of the line matters more than squeezing a few percent on FLOPS.
Supply chain clockspeeds: Traditional GPU roadmaps demand 18–24 month commitments. Any stack that can add meaningful capacity on a ~6-month cadence becomes strategically irresistible for buyers staring at waitlists and lost revenue.
Control beats perfection: A “good enough” in-house part that’s available often beats the world-class part that isn’t.
Important nuance: Increased inference pulls forward training. Better serving models spurs more fine-tuning, larger pretrains, and new modalities—reinforcing NVIDIA’s demand. In Ross’s five-year view, NVIDIA could retain >50% of revenue share even if its unit share falls, buoyed by brand, ecosystem, and training gravity.
Energy is policy: place compute where electrons are cheap
Ross’s refrain: “The countries that control compute will control AI—and you cannot have compute without energy.”
Practical implications
Siting beats slogans. Data-sovereignty rules don’t create watts. Europe can compete by placing data centers near abundant wind, hydro balancing, or friendly nuclear (domestic or allied).
Partnerships over purity. “Data embassies” in energy-rich regions (think Gulf states) can square sovereignty with power availability.
Permitting is a moat. The soft costs—delays, paperwork, uncertainty—now rival hard infrastructure costs. Jurisdictions that permit quickly will attract hyperscaler capex.
Bottom line: In the AI economy, grid pragmatism > model purism. Electrons decide.
The chip economics everyone forgets
Ross separates two phases of hardware value:
Deployment phase (capex-bound): New chips must clear payback vs. purchase + build cost.
Run phase (opex-bound): Once deployed, older parts keep earning if they beat power + rack costs—even after they’re no longer “state-of-the-art.”
Because compute is chronically short, even “older” accelerators can be fully utilized at healthy prices. That scarcity props up margins and encourages multiple hardware lines to thrive simultaneously.
China’s “home game” vs “away game”
Home game: With subsidies and aggressive nuclear buildouts, China can ensure domestic supply even if some model families are costlier to run.
Away game: Serving allies with constrained grids favors more energy-efficient inference and flexible, non-GPU supply chains. Ross expects the U.S. and partners to retain an advantage here for 2–3 years—if they move quickly.
Pricing philosophy: low margins, infinite volume
Compute follows Jevons’ paradox: lower cost per token expands total usage. Ross argues for intentionally thin margins (consistent with business stability) to build trust, maximize volume, and accelerate flywheels. High margins reduce volatility risk but invite competition; low margins align with customers and compound brand equity.
Labor, deflation, and the rise of “vibe coding”
Ross flips the common anxiety on its head:
Deflationary pressure: As AI and robotics permeate supply chains—from crop yields to logistics—unit costs fall.
New job markets: Expect labor shortages in emerging categories rather than mass unemployment.
Vibe coding goes mainstream: Natural-language software authoring turns “coding” into a baseline competency across roles, much like reading/writing after the printing press.
Forecast: five things to watch (next 12–24 months)
Token policy changes at the labs: Relaxed rate limits or new high-throughput tiers will signal confidence in added capacity.
Custom silicon announcements: Not just chips, but packaging + HBM deals that secure allocation two years out.
Permitting reforms and siting wins: Jurisdictions that fast-track power + cooling will collect hyperscaler MOU’s.
Latency ladders in enterprise: Competitive wins driven explicitly by response-time SLAs (not just accuracy).
Training–inference flywheel evidence: More specialized pretrains and fine-tunes that exist solely because cheaper inference made the business case work.
Why this matters
AI’s next surge won’t be decided only by clever architectures. It will be decided by who delivers tokens faster, cheaper, and more reliably, and who finds the electrons to do it. If Ross is right, strategy in 2026–2030 is a four-way optimization: chips, compilers, energy, and latency. Miss any one, and you’ll feel compute-short in a compute-short world.
“The countries that control compute will control AI—and you cannot have compute without energy.”
REACH OUT
REACH OUT
REACH OUT
Discover the potential of AI and start creating impactful initiatives with insights, expert support, and strategic partnerships.