AI 13 June 2026 16 min read

Platforms and the winners of Consumer AI

It had become consensus that OpenAI was ahead in consumer AI. The argument is simple, they are winning in every obvious metric. ChatGPT has become synonymous with AI and has the most subscribers. After this year's WWDC and I/O, the obvious scoreboard looks wrong. There are actually two races at once, and the platform owners are positioned to win where ChatGPT had its largest lead.

Damilola Payne

There is a comfortable story about consumer AI that goes like this. OpenAI shipped first, ChatGPT became a verb, it now has more than 900 million people using it every week, and so the race is effectively over. The incumbents were caught flat-footed, a startup ran past them, and the only question left is how large the lead will become.

The pace of user acquisition for ChatGPT was genuinely unprecedented. Despite how impressive the daily user metrics are, what they actually measure is habit. To predict a platform shift, we need to measure capture.

The wrong scoreboard

Treating ChatGPT’s active users as the assessment makes the same mistake as judging a race by who leads after the first lap. The first phase of a platform transition rewards whoever is ready at the gun. ChatGPT was a clean, fast, single-purpose product that arrived when nothing like it existed, and it captured the people curious enough to seek out a new tool, type into a blank box, and pay a subscription.

Despite OpenAI’s apparent lead, Google has all the ingredients to win; DeepMind’s research depth, control of Android, a deepening relationship with Apple, a strong rollout of AI into search, a strong ad product, the long history of distribution beating better technology.

When you look more deeply at Google’s assistant reach, it is 750 million reported monthly, whereas ChatGPT’s is reported weekly. Much of Google’s figure is bundled into products that people did not choose for their AI.

What this tough compare reveals, is that there are two different ways a company can win.

The model - the raw capability, and the platform - the place where the capability reaches a user. When you look through the history of consumer technology, embedding capabilities into the surface has been the most important factor.

Why embedded usually beats the destination

The primary advantage of being embedded is low friction. A destination app has to win a new behaviour. Every day, it asks the user to stop what they are doing, switch context, open a separate thing, and sustain a habit that did not exist a few years ago. An embedded feature inherits a habit that already exists. It meets intent where the intent already lives. You were already in the photo editor, so the edit is just there. You were already in the inbox, so the summary is just there. Nobody had to be persuaded to change what they open in the morning.

This is why, across most platform transitions over the last few decades, distribution and ecosystem control have beaten superior technology more often than not. The better browser did not win, the bundled one did, and then it was unseated by the one attached to the dominant search engine. The mobile operating system that won was the one given away to every manufacturer in the world.

There is one notable exception. Google was the fifth or sixth serious search engine to market, and it won on a better product against incumbents who already had the distribution. This is against conventional wisdom but my view is Google won because its product advantage and business model created strong network effects. Social was similar. The better social product created a much stronger network, and paired with the right timing, took the market away from entrenched players. The pattern is not “distribution always wins.” It is “distribution usually wins, except when a newcomer’s product has an inbuilt flywheel that compounds and manufactures its own distribution before the incumbents can respond.”

The Google and Meta dream is likely what drove the initial ChatGPT product story. They tried and failed to establish a sticky flywheel surface with GPTs and SORA, were late to market with an ad product, and, lastly, fell behind Anthropic’s Claude Code enterprise agentic coding platform. My read is that the window in which a product advantage outruns incumbent distribution is in the early window, with focus, before the incumbents organise. The incumbents have now visibly organised on stage, this year.

The most likely end state now is several winners on two tracks. There is a destination layer, where a genuinely better product can still hold high-intent audiences for the deep work. Think of deep work as the long research task, the document drafted from nothing, the problem you want to think through with something that has no other job in that moment. And there is an embedded or entertainment layer, the ambient majority of moments, which now belongs to whoever owns the surface.

What the two keynotes gave away

If you watched Apple’s WWDC and Google’s I/O back to back, there was no mention of building a better chatbox.

Apple wove intelligence into the camera, mail, photos, the browser, and messages at the OS level. The rebuilt assistant was presented as a connective tissue running underneath the things you already do. Siri’s AI was demoed answering a request for directions to a landmark that had only appeared in an Instagram post. Google did the same across Workspace, Android, and Search, dissolving its model into tools that hundreds of millions of people open every day.

The most fascinating wrinkle of Apple’s new Siri AI is that it chose to build its intelligence layer on a model developed in collaboration with Google’s Gemini, reportedly paying somewhere around a billion dollars a year for it under a multi-year deal, while likely building its own model to replace it.

Apple’s bet is best aligned with its strengths. Rather than pushing the frontier, it is distilling Google’s models and leveraging its privacy advantage to build around your private context. Privacy is not without cost. Routing queries through Private Cloud Compute adds latency, which was noticeable in the live demos and my testing in the beta. I do not think this is a major concern in the long term because Apple controls the full stack. It owns the silicon, the operating system, the runtime, and the application layer. That lets it optimise multiple aspects of the platform and hardware to improve hybrid inference over time.

For Google, it is a strong endorsement that they have a model good enough for the most demanding integrator in the industry to license it rather than ship something weaker of its own. Apple’s top-tier cloud model is described as matching Gemini Frontier quality and running on Nvidia GPUs inside Google’s cloud. It runs within Apple’s privacy boundary, and Apple has made it clear that the shipped models use its own code, technology, and data, with Gemini used only for distillation and training. Apple keeps the brand, the relationship, the billing, and the option to swap suppliers later.

Many have compared this to the “Intel Inside” position Apple took moving from Power PC in 2006. It’s a clear need for Apple, but why is Google supplying the intelligence underneath a rival’s platform? Google is essential but anonymous, and the day a cheaper or in-house equivalent is good enough, it is likely to be replaced. However, in mobile, Google had no presence inside iOS except as the paid-for search default. In AI, Google owns its own surface entirely and sits inside the rival’s surface as the supplier of record. Google now has a stronger structural position than it ever held in mobile. Google’s foundation model deal is a perfect platform play.

If you must be a component, it is far better to be the indispensable one inside everyone’s stack than to be locked out. This is a lesson that platform companies should all learn. It’s always better to be integrated into a competitive surface than to be disintermediated entirely.

I have argued in a previous article that Apple’s strategy has always been to own the customer relationship through curation and integration rather than to win on raw capability or data:

Apple does not prioritise the collection of data, because culturally it would not be effective at utilising it.

That argument has aged well, and this is the same mechanism one layer up. Apple does not need the best AI. It needs an AI good enough to keep you inside its ecosystem, and it is happy to rent that from a rival, as long as the rental is cheaper than building.

Pure AI companies are being squeezed off the consumer floor, back to the enterprise

Amidst an unusually short keynote, there was an important announcement about free models. Apple is making its foundation models available to most developers at no cost. There is free access to models running on Private Cloud Compute for developers with fewer than 2 million first-time downloads, with heavier inference folded into subscriptions people already pay for.

Apple is commoditising app inference, which its pure-play competitors need to charge for. For a whole class of developers, the per-token economics the independent labs depend on at the low end stop applying, because the platform now gives away a model good enough for most app-level tasks as a cost of being on the platform. Apple even built the framework so a developer can call Claude or Gemini through the same Swift API and swap providers without changing code, which turns the model into an interchangeable part by design.

A four-step flow showing the commoditisation squeeze: the platform gives away baseline AI, per-token economics collapse, AI labs are pushed up-market, and only the frontier and enterprise remain.

Commoditisation is happening at the floor. The everyday model, good enough for most consumer use cases, is becoming free plumbing. The frontier models will continue to push ahead and absorb most of the cost. OpenAI is reportedly running at around two billion dollars a month with fifty million paying subscribers, and it is not earning that by being the cheapest summariser inside someone’s inbox. It is earning it from the people who want the best available reasoning. The mistake was assuming that the vast majority of that would be consumers rather than enterprises, where job replacement and efficiency are worth real money.

The consumer baseline is being commoditised by the platforms, which caps how much of the everyday consumer layer a pure-play AI lab can monetise directly. At the same time, the frontier stays differentiated and expensive. The labs are being pushed up-market, out of the ambient layer and into the high-intent destination and the enterprise, which is a real business but meaningfully different from the consumer market.

Where the technical bet actually sits

So far, I have made the business theory version of this argument, but I want to go one layer deeper into the engineering. The engineering matters in one specific place, and one that converges with the distribution argument.

Embedded surfaces are often multimodal. Think of a camera that understands what it is looking at, an assistant that acts on what is on your screen, and a home video being changed to remove the light flash in the background. These use cases demand models that represent the world richly enough to reason about images, video, and space.

The multimodal nature of these surfaces makes the case for world models. I believe that efficient systems need a latent understanding of the world, the representation-space prediction that approaches like JEPA reach for. The focus of these models has been on robotics that require manipulating physical objects. I believe these model approaches are directly relevant to predicting and generating video and images efficiently, and that efficiency in these multimodal use cases is worth significantly more to consumers.¹ World models will reward whoever has the most multimodal data to train them and the most surfaces to deploy them on, for which Google has an enormous lead.

The agentic risk

The biggest risk to my argument is that the contest isn’t “embedded feature versus destination app.” The third possibility is that the agent is the surface that wins, not a chatbox you visit or a feature sprinkled inside each app. In the agentic world, you delegate to a single orchestrating layer, which then reaches across every app and surface on your behalf. If that is where consumer behaviour lands, then owning the camera or the inbox matters less than owning the agent that drives them. At that point, it is not obvious that the surface owner wins.

The agent layer could be captured by a company with the most trusted reasoning rather than the most user context. Imagine an agentic orchestration platform that consumers use to perform all the actions they need.

Overall, I am sceptical of this risk. I am struggling to think of real consumer flows that lend themselves to full delegation. You would need jobs to be done with a clear success criterion and no meaningful human preference in the loop. For enterprises, it’s clear they want productivity and automation. Shopping is the example Silicon Valley loves to use. A fully “agentic” purchasing experience has existed for two decades… It’s called personal shoppers, yet the dominant model is still a curated storefront with a human making the final call. I believe consumers want to remain in the loop.

Despite this, Apple has still made bets on agentic flows. Apple deepened Siri Shortcuts with AI voice-driven automations that run tasks described by the user. It also added an “agentic” Safari password reset flow where the user tells Apple to reset a password, and the agent executes it. It removes a specific friction from a specific moment without replacing the user’s agency in the broader task. That pattern, the contextual micro-agent embedded in an existing interface, is probably where consumer agentics actually lands, and it is a pattern that rewards the surface owner more than it rewards the general-purpose AI lab.

Even if a new agentic platform is created, I believe the surface owners are best positioned to win, because an agent still has to run somewhere. The operating system is the most privileged place to run it. With my agents, my biggest challenge has been managing agents running on different devices with different contexts, all of which need access to my personal data. If I am wrong about consumer AI over the next decade, the most likely reason is that the interface moved one level up, to the agent.

Two layers, two races

Let’s score the space on two questions. Does this company own a consumer surface that people open every day? Does it have a model at or near the frontier?

Company	Owns a consumer surface (Consumer Advantage)	Model at/near the frontier (Frontier Advantage)
Google	✅ Yes: Android, Search, Workspace	✅ Yes
Apple	✅ Yes: iOS, the premium surface	🟥 Not yet, renting Gemini, building its own
Meta	✅ Yes: the social apps, glasses	🟨 Mid-tier, improving
Microsoft	🟨 Partially: enterprise and the desktop	🟨 Via partners, now launching its own models not at the frontier
Amazon	🟨 Partially: the home and commerce	🟨 Via partners
OpenAI	🟥 No: Chat is only the destination it built	✅ Yes
Anthropic	🟥 No: enterprise and developer tools	✅ Yes
A company that does not yet exist	Unless a new platform launches	Capital heavy, may be a JEPA or world model focused competitor

The table is a map of where strength sits. These are the positions that will define the market structure for the next decade.

Google is the only company that can run in both races convincingly, which is the core of any bullish case for them to own a large share of the consumer market.
The AI labs answer yes on the model and no on the surface, causing them to be structurally disadvantaged by Apple’s free model push.
Apple answers yes on the surface and not-yet on the model, which is why its in-house model programme is the most consequential project in the table.
Microsoft is in a weak position despite having a chance to own OpenAI entirely, it is not sufficiently positioned as Windows is not a meaningful consumer surface, and Claude Code threatens Copilot. It’s hard not to see this as a potentially dominant position squandered.
The empty bottom row is there because every prior transition produced at least one winner nobody had on the board, and it would be arrogant to assume this one will not.

What this means if you are building or allocating

For anyone building, do not build a destination if you can be a feature inside one that people already open. For new entrants to beat incumbents, your product needs to be able to build its own network and platform to offset distribution costs, allowing you to build the destination and move in before the incumbents organise. Longer term, I assume that the raw cost of baseline model intelligence will trend towards zero, as the platforms actively drive it there. The moats that remain will be in the workflow, in proprietary data, in distribution you can own, or at the frontier where capability is still scarce.

For allocators, stop scoring this race on active users. For consumers, the core metric is platform lock-in. How many workflows are embedded into the platform, and what is the rate of free monetisation and churn? For enterprise, it’s workflow touch points and frontier model leadership. There are two separate bets with different risks and a low correlation between them. Who will own the consumer surfaces is a bet on Google, Apple, or a new surface replacing the smartphone. Who holds the enterprise frontier is a bet on enterprise job replacement and the cloud.

Google is the company that started the transformer race, and the strongest single factor in its favour is that it is the only name that sits on both sides of the trade.

The point is that prediction in representation space rather than pixel space is more compute-efficient and yields embeddings with better semantic structure, which is what makes the approach interesting for video understanding and generation specifically, not only for embodied or robotic systems. Let’s take the example of a model asked to produce the next frame of a scene with a car driving on the motorway. Token reconstruction-based models focus equally on the whole scene, including irrelevant aspects like the leaves on trees and clouds in the sky, but JEPA-based models understand the context and spent most effort on the driving car because it has a contextual understanding that the focus is the car and it travels on a road. JEPA style training asks the model to predict abstract representations of missing or future parts of a scene. That encourages the representation to capture the semantic and dynamic structure of the world, objects, motion, spatial relationships and likely state transitions, rather than every surface level visual detail. I believe that this efficiency will become a durable consumer advantage, rewarding the holders of multimodal data and surfaces. ↩