Video is the new HTML

Jonah Peretti made these numbers public late last year. To me, they're interesting because they show two distinct trends - proliferation of distribution platforms and proliferation of content models.


Each of these segments is a platform with a different model for acquiring users, and each is also a platform with a different content format. The way you get views, the kinds of content that works, and the kind of content that's possible are different. (Buzzfeed, of course, is amongst other things a machine for understanding and optimising for this environment.)

Within this proliferation, distribution models are moving towards both algorithmic feeds (Facebook, Twitter, Instagram) and manual curation (Snapchat Discover), and content models have on the one hand richer and more immersive formats (often delivered using video files) and on the other lighter-weight bandwidth-optimised text-based formats (AMP, Facebook Instant Articles). And though AMP and Articles are pitched on speed of loading, they're also, like video on Facebook or Snapchat, under the control of the underlying platform owner.

Meanwhile, half of the point of both Google's AMP and Facebook's Instant Articles (whether implicit or explicit) is that you get the bandwidth saving and the faster rendering by taking out all the ad tech and analytics JavaScript and instead using solutions from Google and Facebook. But equally, at the other end of the bandwidth scale, Snapchat Discover also makes you rely on the platform to tell you what's going on. In most of these cases, and especially Facebook or Snapchat, the host platform promises better information about use and user (and hence in theory better economics) than you can get from all that JavaScript. Maybe. And in parallel, you get not just new content and metrics but a new ad format, or Snapchat in particular, that can feel far more native and naturally part of the general content experience than any web banner.

That is, these models change how you get audience, what the audience sees, what you know about the audience and how you can make money from it. (And then, just to make life simple, something around a third of mobile web use actually happens as in-app views within Facebook.)

Next, while Facebook has Instant Articles, Google now has Instant Apps. You tap on a link, and 'native' (at any rate, not HTML) code instantly (hopefully) appears and runs. You could see this as the return of Java (and Android in a sense *is* Java), or the return of Flash. I think the Flash parallel works much more broadly, too. Snapchat Discover certainly looks like Flash - though technically the delivery format might be h264 video, the actual content looks a lot like what people were doing with Flash 10 years ago - rich, engaging, moving content blending sound, motion, animation and, sometimes, actual live-action footage. We've gone from delivering video with Flash to delivering Flash with video. That is, video is a new HTML - a new content delivery format, and not necessarily about live action at all. Instant Apps do the same but with the Android run time instead of Snapchat's, and though the Instant Apps demo at Google IO showed things that look like apps rather than content, the principle is the same - richer than HTML, but better than going to the app store. But even AMP or Instant Stories bear the same interpretation - we move away from plain old HTML and JavaScript to get better experiences.

One could also suggest that this means video (including GIFs, or whatever format you want to add) acts as a new card format - a way to encapsulate any kind of content and let it travel, shareably, across the Internet. Embedding a GIF or video into a social network feed is, again, an alternative to HTML as a content delivery format, and again, you can embed anything you want, including ads.

This also points to another proliferation - metrics. When Snapchat says it has '10bn daily video views', what does that mean and what can one compare it with? How does one think about auto-playing video? What if the user doesn't hear the sound, or if there is no sound? One certainly can't compare it to TV viewing - or at least, only in terms of overall time spent on exactly the same basis as Facebook or any other piece of content. YouTube is at least conceptually the same form as TV, but Snapchat really isn't. And of course, it's the platform owners themselves who invent and report the metrics.

Extending this issue, if one cannot compare time spent, it's also tough to compare ad spend. Should time spent on a 'video' platform whose views are mostly silent, mostly scrolled past and mostly skipped count the same as time watching a hit show on a TV? How about time spent on a TV show playing on a screen on the wall while the family look at rich, engaging and interactive content (delivered using h264) on their smartphones?  (And what, skipping forward a bit, should one think about the value and engagement of ads in VR?)

This also makes me feel that mobile ad blocking is going to become even more problematic. Facebook has been the world's biggest ad-blocker for a long time, just as it's one of the world's biggest mobile web browsers. But if a platform sends me encrypted data from a single IP, that may be just a single h264 stream, that happens to have an ad somewhere within it, that's then rendered in a proprietary runtime on the device, how on earth can anyone strip that out? The biggest impact of any ad-blocking that does happen may be to drive content owners further away from the open web. 

One of my frameworks for thinking about mobile is that we're looking for another runtime - somewhere to build experiences on mobile that comes after the web and mobile apps - and that that new runtime will probably comes with new engagement and discovery models and possibly new revenue models too. It's pretty obvious that this is a useful way to look at Google Assistant or Facebook's Bots platform, but it applies to content as much as it does to code per se: Snapchat is just as much a development platform as Wechat, you just have to look at it from the right angle. The screen itself is the runtime, and the richer and more native you can be to that the better.