Skip to content
← Writing

The Platform and Curation will be the Key to Consumer AI

The consensus is that ChatGPT has won consumer AI. After this year's WWDC and I/O, the scoreboard looks wrong. The real contest is two races at once, and the platform owners are positioned to win the one that matters.

There is a comfortable story about consumer AI that goes like this. OpenAI shipped first, ChatGPT became a verb, it now has more than nine hundred million people using it every week, and so the race is effectively over. The incumbents were caught flat-footed, a startup ran past them, and the only question left is how large the lead becomes.

The number and pace of user acquisition for ChatGPT was genuinely unprecedented. However, measuring this race on weekly visits to a destination only measures habit. Capture is the question that decides a platform shift. ChatGPT has owned the feature set but has failed to own the ground it lives on.

I want to be honest about how I got here, because I actually started with a different conclusion. I set out to make the case that Google would win consumer AI. Google had what felt like a strong hand; DeepMind’s research depth, control of Android, a deepening relationship with Apple, the long history of distribution beating better technology. The argument kept surviving, but it kept resolving into a claim narrower and stranger than the one I started with. Working out why is the subject of this piece, and the answer changed my mind about what the question even is.

The wrong scoreboard

Treating ChatGPT’s active users as the assessment of OpenAI being the winner makes the same mistake as judging a race by who leads after the first lap. The first phase of a platform transition rewards whoever was ready at the gun. ChatGPT was a clean, fast, single-purpose product that arrived when nothing else like it existed, and it captured the people curious enough to seek out a new tool and type into a blank box and pay a subscription.

When you look more deeply at Google’s assistant reach, it is 750m reported monthly where ChatGPT’s is reported weekly, much of Google’s figure is bundled into products people did not choose for their AI, and the cleanest like-for-like comparison does not exist in public. What this tough comparison reveals is there are two different ways a company can win. There is the model, the raw capability, and there is the platform, the place the capability reaches a person. The company that wins the best model and the company that wins the most surface don’t need to be the same company. For most of the history of consumer technology, they have not been.

What the two keynotes gave away

If you watched Apple’s WWDC and Google’s I/O back to back, there was no mention of building a better chatbox.

Apple wove intelligence into the camera, into mail, into photos, into the browser, into messages, at the OS level. The rebuilt assistant was presented as a connective tissue running underneath the things you already do. Siri’s AI was demoed answering a request for directions to a landmark that had only appeared in an Instagram post. Google did the same across Workspace, Android, and Search, dissolving its model into tools that hundreds of millions of people open every day.

The most revealing single announcement was Apple making its foundation models available to most developers at no cost free access to models running on Private Cloud Compute for any developer with fewer than two million first-time downloads, with heavier inference folded into subscriptions people already pay for. A very telling strategic decision, which I will come back to.

The platform owners are trying to make sure you never have to go anywhere, because the intelligence is already inside whatever you opened.

Why embedded usually beats the destination

The primary advantage of being embedded is friction. A destination app has to win a new behaviour. Every day it asks the user to stop what they are doing, switch context, open a separate thing, and sustain a habit that did not exist a few years ago. An embedded feature inherits a habit that already exists. It meets intent where the intent already lives. You were already in the photo editor, so the edit is just there. You were already in the inbox, so the summary is just there. Nobody had to be persuaded to change what they open in the morning.

This is why, across most platform transitions of the last few decades, distribution and ecosystem control beat superior technology more often than not. The better browser did not win, the bundled one did, and then the one attached to the dominant search engine. The better mobile operating system that won was the one given away to every manufacturer that took most of the world.

There is one notable exception, and it is the exception OpenAI is built on. Google was the fifth or sixth serious search engine to market, and it won on a better product joined to a better business model, against incumbents who already had the distribution. Social was similar, the network with the best product and the right timing took the market away from entrenched players. The pattern is not “distribution always wins.” It is “distribution usually wins, except when a newcomer’s product has an inbuilt flywheel that compounds and manufactures its own distribution before the incumbents can respond.”

That exceptionalism is likely what drove the initial ChatGPT product story. They tried to build the product advantage but failed to establish a consumer platform surface with GPTs, SORA, and a late-to-market ad product, and lost the enterprise agentic coding platform to Anthropic’s Claude Code. My read is that the window in which a product advantage outruns incumbent distribution is the early window, before the incumbents organise, and that we are now past it because the incumbents have now visibly organised on stage, this year.

With the advantage fumbled by OpenAI, the likely end state is a number of winners on two tracks. There is a destination layer, where a genuinely better product can still hold high-intent audiences for the deep work. Think of deep work as the long research task, the document drafted from nothing, the problem you want to think through with something that has no other job in that moment. And there is an embedded or entertainment layer, the ambient majority of moments, which belongs to whoever owns the surface.

Picking a winner after the first lap

In consumer, where the demand for deep productivity is low, the embedded layer is where most of the value will be captured. The advantage goes to whoever owns the surface the feature is embedded in. On Android and across Workspace, that is Google. iOS, the largest premium consumer surface on earth and the one where people actually pay for things, the surface belongs to Apple. With consumer social, time will tell if it is a meaningful surface. SORA showed that the demand for pure AI-driven content is not yet a feasible cost structure.

The most striking detail of Apple’s product is that it chose to build its intelligence layer on a model developed in collaboration with Google’s Gemini, reportedly paying somewhere around a billion dollars a year for it under a multi-year deal, while likely building its own model to replace it.

Apple’s bet is interesting and best aligned with its strengths. Rather than pushing the frontier, it is distilling Google’s models and leveraging its privacy advantage to build around your private context. Privacy is not without cost. Routing queries through Private Cloud Compute adds latency, which was noticeable in the live demos and my testing in the beta. The reason I do not think this is a major concern long term is that Apple controls the full stack. It owns the silicon, the operating system, the runtime, and the application layer. That lets it optimise multiple aspects of the platform and hardware to improve hybrid inference over time. It is also only the long-running agentic tasks that require a large amount of resources today. Whether it compresses it fast enough is a live question.

For Google, it is a strong endorsement that they have the better model, good enough that the most demanding integrator in the industry licensed it rather than shipping something weaker of its own. Apple’s top-tier cloud model is described as matching Gemini Frontier quality and running on Nvidia GPUs inside Google’s cloud. It runs inside Apple’s privacy boundary, and Apple has made it clear the shipped models are its own code, technology, and data, with Gemini used only in distillation and training. Apple keeps the brand, the relationship, the billing, and the option to swap the supplier out later.

Many have compared this to the “Intel Inside” position Apple took moving from Power PC in 2006. It’s a clear need for Apple, but why is Google supplying the intelligence underneath a rival’s platform? Google is essential but anonymous, and the day a cheaper or in-house equivalent is good enough, it is likely to be replaced. However, in mobile, Google had no presence inside iOS except as the paid-for search default. In AI, Google owns its own surface entirely and sits inside the rival’s surface as the supplier of record. That is a stronger structural position than Google ever held in mobile. The company is hedged across both platforms in a way it never managed before. Google’s foundation model deal is a perfect platform play. If you must be a component, it is far better to be the indispensable one inside everyone’s stack than to be locked out. This is a lesson that platform companies should all learn. It’s always better to be integrated into a competitive surface than to be disintermediated entirely.

I have argued before, years ago, that Apple’s strategy has always been to own the customer relationship through curation and integration rather than to win on raw capability or data:

Apple does not prioritise the collection of data, because it would not be effective at utilising it. It offers curated feeds, because of its belief that it holds the insight to make the best decisions for customers.

That argument has aged well, and this is the same mechanism one layer up. Apple does not need the best AI. It needs an AI good enough to keep you inside its ecosystem, and it is happy to rent that from a rival while the rental is cheaper than building.

So the honest version of my thesis is not that Google wins consumer AI. It is that the platform owners win the consumer relationship, Google is one of two that do, and Google is additionally the supplier inside the other one’s house. Dominant where it owns the surface, indispensable where it does not.

Pure AI companies are being squeezed off the consumer floor, back to the enterprise

Let’s return to that quiet announcement about free models, because it is actually Apple’s sharpest move and it is aimed at the layer below the platforms.

Making foundation models free to most developers and bundling the heavier inference into existing subscriptions, Apple is commoditising the thing its pure model competitors need to charge for. For a whole class of developers, the per-token economics the independent labs depend on at the low end simply stop applying, because the platform now gives away a model good enough for most app-level tasks as a cost of being on the platform. Apple even built the framework so a developer can call Claude or Gemini through the same Swift API and swap providers without changing code, which turns the model into an interchangeable part by design.

A four-step flow showing the commoditisation squeeze: the platform gives away baseline AI, per-token economics collapse, AI labs are pushed up-market, and only the frontier and enterprise remain.

Commoditisation is happening at the floor. The everyday model good enough for most consumer use cases is becoming free plumbing. The frontier models will continue to push ahead and absorb most of the cost. OpenAI is reportedly running at around two billion dollars a month with fifty million paying subscribers, and it is not earning that by being the cheapest summariser inside someone’s inbox. It is earning it from the people who want the best available reasoning. The mistake was assuming that the vast majority of that would be consumers rather than enterprises, where job replacement and efficiency are worth real money.

The consumer baseline is being commoditised by the platforms, which caps how much of the everyday consumer layer a pure-play AI lab can monetise directly, while the frontier stays differentiated and expensive. The labs are being pushed up-market, out of the ambient layer and into the high-intent destination and the enterprise, which is a real business but is a meaningfully different one than the consumer market.

Where the technical bet actually sits

This is the business theory version of this argument, but I want to go one layer deeper to the engineering. The engineering matters in one specific place, and one that converges with the distribution argument.

Embedded surfaces are inherently multimodal. Think of a camera that understands what it is looking at, an assistant that acts on what is on your screen, a home video being changed to remove the light flash in the background. These use cases demand models that represent the world richly enough to reason about images, video, and space. This is why multimodality is essential for the embedded layer rather than a nice-to-have, and it is where Google’s data advantage of owning the world’s largest video library is decisive.

It also makes the case for world models. My belief is that systems need a latent understanding of the world, the representation-space prediction that approaches like JEPA reach for. The focus of these models has been in robotics that need to manipulate physical objects. I believe this is directly relevant to predicting and generating video and images efficiently, and efficiency at that frontier is worth a great deal.1 If world models turn out to matter, they reward whoever has the most multimodal data to train them and the most surfaces to deploy them on.

The agentic risk

The biggest risk to my argument is that the contest isn’t “embedded feature versus destination app.” The third possibility is that the agent is the surface that wins, not a chatbox you visit or a feature sprinkled inside each app. In the agentic world you delegate to a single orchestrating layer, which then reaches across every app and surface on your behalf. If that is where consumer behaviour lands, then owning the camera or the inbox matters less than owning the agent that drives them. At that point it is not obvious the surface owner wins.

The agent layer could be captured by the company with the most trusted reasoning rather than the most user context. Imagine an agentic orchestration platform that consumers use to perform all the actions they need.

Overall, I am sceptical of this risk. I am struggling to think of real consumer flows that lend themselves to full delegation. You would need jobs to be done with a clear success criterion and no meaningful human preference in the loop. For enterprises, it’s clear they want productivity and automation. Shopping is the example Silicon Valley loves to use. A fully “agentic” purchasing experience has existed for two decades… it’s called personal shoppers, yet the dominant model is still a curated storefront with a human making the final call. I believe consumers want to remain in the loop.

Despite this, Apple has still made bets on agentic flows. Apple deepened Siri Shortcuts with AI voice-driven automations that run tasks described by the user. It also added an “agentic” Safari password reset flow where the user simply tells Apple to reset a password and the agent executes it. It removes a specific friction from a specific moment without replacing the user’s agency in the broader task. That pattern, the contextual micro-agent embedded in an existing interface, is probably where consumer agentics actually lands, and it is a pattern that rewards the surface owner more than it rewards the general-purpose AI lab.

Even if a new agentic platform is created, I believe the surface owners are best positioned to win, because an agent still has to run somewhere and the operating system is the most privileged place to run it. With my agents, my biggest challenge has been managing the agents running on different devices with different contexts all needing access to my personal data. If I am wrong about consumer AI over the next decade, the most likely reason is that the interface moved one level up, to the agent.

Two layers, two races

The clearer picture is two races at once, scored on two questions. Does this company own a consumer surface people open every day? Does it have a model at or near the frontier? A few companies are contesting both. Most are strong in one column and trying to climb the other.

CompanyOwns a consumer surface (Consumer Advantage)Model at/near the frontier (Frontier Advantage)
Google✅ Yes: Android, Search, Workspace✅ Yes
Apple✅ Yes: iOS, the premium surface🟥 Not yet, renting Gemini, building its own
Meta✅ Yes: the social apps, glasses🟨 Mid-tier, improving
Microsoft🟨 Partially: enterprise and the desktop🟨 Via partners, now launching its own models not at the frontier
Amazon🟨 Partially: the home and commerce🟨 Via partners
OpenAI🟥 No: Chat is only the destination it built✅ Yes
Anthropic🟥 No: enterprise and developer tools✅ Yes
A company that does not yet existUnless a new platform launchesCapital heavy, may be a JEPA or world model focused competitor

The table is a map of where strength sits. These are the positions that will define the structure of the market for the next decade. Google is the only company that can run in both races convincingly, which is what drove my initial bullish read and is core of any bullish case for them to own a large share of the consumer market. The AI labs answer yes on the model and no on the surface causing them to be structurally disadvantaged by Apple’s free model push. Apple answers yes on the surface and not-yet on the model, which is why its in-house model programme is the most consequential project in the table. Microsoft is in a weak position despite having a chance to own OpenAI entirely, it is not sufficiently positioned as Windows is not a meaningful consumer surface, and Copilot is threatened by Claude Code. It’s hard not to see this as a potentially dominant position squandered. The empty bottom row is there because every prior transition produced at least one winner nobody had on the board, and it would be arrogant to assume this one will not.

What this means if you are building or allocating

For anyone building, do not build a destination if you can be a feature inside one people already open. For new entrants to beat incumbents, your product needs to be able to manufacture its own network and platform to offset distribution, allowing you to build the destination and move before the incumbents organise. Longer term, my assumption is that the raw cost of baseline model intelligence trends towards zero, because the platforms are actively driving it there. The moats that remain will be in the workflow, in proprietary data, in distribution you can own, or at the frontier where capability is still scarce.

For allocators, stop scoring this race on active users. For consumer, the core metric is platform lock-in. How many workflows are embedded into the platform and what is the rate of free monetisation and churn. For enterprise, it’s workflow touch points and frontier model leadership. There are two separate bets with different risks and a low correlation between them. Who will own the consumer surfaces is a bet on Google, Apple, or a new surface replacing the smartphone. Who holds the enterprise frontier is a bet on enterprise job replacement and the cloud.

Google is the company who started the transformer race, and the strongest single factor in its favour is that it is the only name that sits on both sides of the trade.

Footnotes

  1. The point is that prediction in representation space rather than pixel space is more compute-efficient and yields embeddings with better semantic structure, which is what makes the approach interesting for video understanding and generation specifically, not only for embodied or robotic systems. Standard models are asked to produce the next frame of a scene with a car driving on the motorway. Transformer-based models focused equally on the whole scene, but JEPA-based models understood the context and spent more effort on the driving car than the leaves on the trees. This shows that world model approaches can focus on the right elements due to holding an understanding of the world. My belief is that this efficiency will become a durable consumer advantage, rewarding the holders of multimodal data and surface.