31 August 2025
News
Pilots and employment
Two papers about AI got a lot of attention this week. First, a group at MIT claimed that it had done a survey showing that 95% of corporate AI pilots fail. Second, the HAI lab at Stanford used a detailed dataset of US employment over the last three years to argue that entry level programming jobs have declined sharply. See this week’s column. EMPLOYMENT, PILOTS
Google Nano Banana
Yes, that’s really the product name. Google released a new image generation model that shot to the top of most benchmarks. I’m old enough to remember when Google was supposedly unable to do AI (Bard, anyone?) - there remain lots of questions about Google’s role in the new world (how do ads work?) but no-one is comparing Sundar Pichai to Steve Ballmer this week. LINK
Microsoft’s independence
Microsoft is slowly working its way towards having credible LLMs of its own, instead of just relying on OpenAI (a lucrative but painful relationship), and this week the AI group, led by Mustafa Suleyman (cofounder of DeepMind), released its first foundation model, MAI-1. It came in at number 15 on the LMArena leaderboard (one of many imperfect ways to evaluate), which in crude terms puts it at par with the SOTA of six months ago. LINK
The week in Anthropic
Anthropic released an abuse report on ways that it found bad people doing bad things with its products. Welcome to the internet. LINK
It also changed its T&Cs to allow include user input as training data unless you opt out. LINK
And, Anthropic launched a Chrome extension. The trend to try to add LLM assistants to web browsers reminds me a lot of Internet Explorer toolbars 25 years ago (an important part of Google’s early growth story), except that these days most use is on mobile, where this is irrelevant. LINK
Chaos and the end of de minimis
I’ve written a few times about de minimis, the customs rule that, in the USA, exempted packages worth under $800 from inspection and duties. This got a lot of attention in the last couple of years because it enabled the direct shipment model used by Shein and Temu, and the on-demand manufacturing that lets Shein sell an order of magnitude more SKUs than conventional fast fashion retailers. It was clear that this had got out of hand: most countries use $200 (the US increased from that to $800 in 2016), and the US was going to end the rule in summer 2027. However, Donald Trump ended the exemption entirely for China and HK in May and then on 30 July extended that to all countries, giving only one month’s notice, meaning today.
This has created chaos. Narrowly, there were 1.4bn of these shipments last year and US does not have the customs systems to inspect or charge duties on all of them (which is a major reason for having the rule in the first place) and can’t build them in a month. So, a range of post offices around the world have halted package shipments to the USA entirely, since they don’t know where to send them.
But more generally, in the last decade a lot of companies built their ecommerce logistics chains around this. For example, Tapestry (which owns Coach and Kate Spade) has 13-14% of sales affected, since it was fulfilling D2C orders direct from outside the USA, and this was also the final straw for Ssense, a luxury reseller in Canada that just went bankrupt. Making things worse, no-one knows how the new rules work yet: if you buy a watch from Germany, do you pay the ‘EU Import’ tariff, or do you also have to pay the ‘copper imports’ tariff on the copper content? LINK, POST OFFICES, COACH, SSENSE
Musk suing Apple
Elon Musk says he’s going to sue Apple because his ‘Grok’ LLM app (the one that called itself ‘MechaHitler’) isn’t at the top of the App Store charts (it isn’t at the top of the AI benchmarks either, nor Google Trends, nor web traffic). Of course, anyone who’s been paying attention for the last decade will recall that there is very little predictive value in things that Elon Musk says he will do. LINK
Pattern aggregation
Pattern is an aggregation platform that sells products from over 200 partner brands across all the marketplaces you can think of. Revenue last year was $1.8bn, and it just filed for IPO. LINK
Ideas
China’s new AI strategy paper. LINK
There’s a lot of debate about how much search traffic is really declining, due to LLM use on one hand (PSA: penetration is still low at maybe 10-15% DAU) and Google’s AI Overviews on the other. LINK
India banned online gambling, which is apparently worth $23bn there. That will be interesting to watch. LINK
China tried to recruit (or compromise) this Stanford student as a future spy. LINK
India’s billion-dollar e-waste industry. LINK
Models are trying to work out how to think about cloning themselves with AI. LINK
Google Cloud’s blockchain for finance. The nonsense, ‘visionaries’ and con-artists have moved on (mostly to AI), but people deep in the plumbing of the financial world are quietly building things. LINK
Outside interests
“Would you show us a little more of your room, please?” The email to Sotheby’s that led to a €3.5m sale. LINK
This F35 pilot spent 50 minutes on a conference call with Lockheed engineers before giving up and ejecting (safely). ‘Zoom and boom’ - but who amongst us has never felt the same urge? LINK
TypePad is shutting down. A pioneer of blogging and at one time a rival of Wordpress, but long-since irrelevant (Wordpress survived by pivoting to corporate, abandoning ‘blogging’ to social and managed platforms.) LINK
A huge auction of vintage computers, typewriters, adding machines and cameras. LINK
The family who own an empty beach plot in the Hamptons. LINK
Data
OnlyFans paid a $701m dividend, and paid $5.8bn to creators last year, on $7.2bn revenue, up from $6.6bn in the previous year. As Yorkshiremen say, “Where there’s muck there’s brass.” LINK
A16Z released its annual data dump from Similarweb of popular consumer AI apps. LINK
An academic survey that says use of generative AI at work is now at about 15% DAU in the USA. LINK
Gallup, on the other hand, says 8% DAU (but its 2024 number was also half of what other surveys said). LINK
Column
AI metrics
I’m working on a new presentation for the autumn, mostly (of course) about AI, and naturally, I’m making charts. There’s the capex chart, and some benchmark charts, and plenty of others, but one of the really basic, early ones to use is someone along the lines of, well, how many people are using this? And that’s a problem.
When the iPhone was taking off, Apple, RIM, and Nokia published a number for unit sales every quarter, and Google gave enough numbers for activations that you could get a good picture of the entire market. WhatsApp used to announce messages sent per day, and Skype showed logged-in users. Meta still releases daily active users today.
But for generative AI, three years after the race began, we’re pretty much in the dark. OpenAI releases weekly active users from time to time - 700m at the beginning of this month - but that’s a weird number. If chatbots are taking over the world, and becoming part of everyone’s computing, and can do, if not yet everything, then certainly a lot of useful things every day, then what does it mean if someone is only a ‘weekly active user’? Surely daily is what matters? After all, if someone understands ChatGPT, and has an account, and knows how to use it… but can only think of a reason to use it once or twice a week, then they’re not really a user at all.
At least that’s a real number, though - after that things get really fuzzy. Google gives numbers of AI Overviews, but that’s a number it controls itself, not a measure of user adoption. Google and Microsoft have recently given numbers of tokens generated, which seems very similar to giving a number of ‘bits sent’ for YouTube 20 years ago. If you told me that YouTube data traffic has doubled, that sounds good, but does it mean that twice as many people are using it, or that completion rates doubled, or that there’s a lot more HD video? Does it tell us India moved to 4G? And meanwhile, did you move to a more efficient compression algorithm and so really your usage quadrupled? Then, we can try to get to something from the outside, with panel data from Similarweb, but that doesn’t tell us much about use at work and even less about mobile use.
That leaves surveys, and these come with problems of their own. Too many organisations are still running surveys that ask questions like ’have you ever used generative AI’ and present the answer as though it shows adoption - when most of the people who answer ‘yes’ have used it once, last year. (Worse, some of them make no attempt even to define ‘AI’). This is professional malpractice. The equivalent worthless question for the enterprise, which I’ve seen in data from the US Census, is to ask if a company ‘uses’ AI. What does this mean? Have they doubled code output with Claude or does someone in Marketing have a Midjourney account? Alternatively, as we saw in the much-discussed MIT study this week, they ask you if you’ve had lots of pilots that failed - to which the only correct answer is, well, that’s why you do pilots.
Some of this is a cri de cœur from a frustrated analyst, to be sure. We know LLMs are growing fast, and we know that OpenAI is way out in the lead for consumer mind-share. Does a chart to show that matter? It’s also very early and we’re on an exponential curve - a lot of the issues I touched on sound familiar from the 90s, when people talked about equally meaningless metrics like ‘hits’ or ‘registered users’. We don’t know what to measure.
However, if the core science question is how long and how much these models will scale (another measurement problem), the core product question is how much the LLMs themselves - the raw chatbot - are the product that ‘everyone’ will use ‘all the time’ for everything, and how much they need to disappear inside products and APIs. We can’t see that chart. Going back to smartphones, was open going to win? Would Apple be crushed by Android the way it was by Windows? A lot of very clever people thought so, and didn’t understand that Apple was winning where it mattered. That was what the charts told you. That’s why metrics matter - what’s going on?