Smart home, machine learning and discovery

My grandparents could have told you how many electric motors they owned. There was one in the car, one in the fridge, one in the vacuum cleaner, and they probably owned a dozen in total. Today we have no idea and it’s not a meaningful question, but we probably do know how many devices we own with a network connection. Again, our children and grandchildren will have no idea, and it won’t matter.

In both of these cases, a wave of commodity components enabled a wave of product creation. The electrification of the home was enabled by cheap DC motors, heating elements and so on, and the current wave of ‘smart home’ devices is enabled by cheap and low power cameras, wifi chips, microphones and so on (mostly coming out of the smartphone supply chain).

Equally, in both of these cases there’s a discovery phase: we may have all of these components but we still have to work out the right ways to combine then. Hence, people proposed all sorts of electric devices for the home, and we collectively worked out which made sense and where - everyone in Britain has a kettle, most people in America have a blender, and no-one has an electric can opener. The same is happening with ‘smart home’ now. Lots of ideas for products are being tried - some will be the kettles and some will be the can openers, and it will only be obvious which in hindsight. Part of this process is also working out where the company value goes - which things are commodities from the existing manufacturers (oven companies, locks companies etc), which are commodities from Shenzhen, and which are opportunities for new company creation.

All of this overlaps very directly with a parallel process of creation and discovery happening in machine learning, especially in consumer products. Again, we have an ever-growing set of components in things like computer vision, speech and NLP, as well as broader and less visible kinds of ML-based pattern recognition. Again, many of these components are now commodities, or are quickly becoming commodities. And, again, we are working out how to combine them, build them into products, add them to other products, and surface that to the user.

Hence:

What can we build with motors and heating elements?
What can we build with wifi chips, cameras and microphones?
What can we build with image recognition, natural language progressing and pattern recognition?

Meanwhile, these machine learning components are also themselves components for smart home (ML makes the connected camera or the smart thermostat useful) and vice versa (a smart speaker is often just an end-point for a voice assistant).

Part of the discovery challenge for ‘smart home’, of course, is how much these devices should actually be connected to each other - is this ‘internet of things’ or just ‘things connected to the internet’?

Voice assistants (based on machine learning) are obviously one part of that puzzle - do I use voice to control everything? Maybe. I tend to think about this in terms of Venn Diagrams - it would be good to use voice to tell the oven to preheat to 350 degrees, and it would be good if the smart door lock can talk to the burglar alarm without my having to say anything, but the door lock doesn’t need to connect to the oven. That is, we have a series of expanding point solutions (if only because some of these devices last for a decade and you’re unlikely to replace an oven just to get voice assistant support).

Equally, one could wonder how much ML might become more than a series of point solutions - whether there should be more connective tissue.

The prevalent Silicon Valley view is that asking for a single ‘AI’ layer is like asking for a single ‘database’ layer. We don’t expect our photos, email, text messages and Instagram updates to all live in a unified ‘database layer’ - those are all just different pieces of software, even if they use the same underlying technologies. Equally, machine learning will be in all sorts of different places doing totally different things. The battery optimisation uses ML, and Google’s night mode uses ML, but those are clearly completely different pieces of code and the user never needs to hear ‘AI’ when they use this. Even when something is obviously ‘AI’, and even uses the same core tech, the product might be totally different. If I give Google Photos a picture of a dog at a beach, the use case is “show me photos of my dog at the beach”. But if I give that image to Google Lens, telling me it’s a dog at the beach is not useful. We didn’t buy electric motors - we bought drills. We don’t buy wifi chipsets, and we won’t buy ‘AI’.

On the other hand, part of the challenge of machine learning is not just working out what problems to solve but working out how to surface that to the user. When your iPhone detects a flight confirmation in your email and adds it to your calendar, it says that ‘Siri found flights’ - Siri is not a single piece of software (in this case it probably doesn’t even use any machine learning), but rather Apple is using it as a brand for ‘the phone is watching and making suggestions’. There are lots of messaging and user communication issues here (not least privacy). Very often the system might not be certain what it’s seen - so do you say ‘maybe X’? In particular, for a voice assistant or a knowledge graph like Google Lens, how do you communicate what you can and can’t do, and how do you communicate uncertainty? Perhaps the ‘AI layer’ is all about messaging to the user that this has ML characteristics, and about setting the right kinds of expectations for what will work (I wrote more about this here, looking at Google Lens). In other words, we may say it’s AI to lower expectations, not raise them. And maybe some of the AI stuff needs to branded ‘AI’ for people to understand it, and perhaps that ‘AI’ brand will cover things that aren’t really AI at all.

Voice, IoT, Artificial IntelligenceBenedict Evans4 April 2019