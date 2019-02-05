I've talked quite a lot about the impact of machine learning and computer vision in general on everything from e-commerce recommendation to social to all kinds of cool industrial applications, but it's also interesting just to look at the effect that machine learning is having on actual cameras.

For both Apple and Google, most of the advances in smartphone cameras now happen in software. The marketing term for this is ‘computational photography’, which really just means that as well as trying to make a better lens and sensor, which are subject to the rules of physics and the size of the phone, we use software (now, mostly, machine learning or ‘AI’) to try to get a better picture out of the raw data coming from the hardware. Hence, Apple launched ‘portrait mode’ on a phone with a dual-lens system but uses software to assemble that data into a single refocused image, and it now offers a version of this on a single-lens phone (as did Google when it copied this feature). In the same way, Google’s new Pixel phone has a ‘night sight’ capability that is all about software, not radically different hardware. The technical quality of the picture you see gets better because of new software as much as because of new hardware.

Most of how this is done will be invisible to the user. HDR went from a garish novelty to a setting in the camera that sometimes worked to, now, something automatic that you never need to know about. I expect the separate ‘portrait mode’ or ‘night sight’ options will disappear, just like the ‘HDR’ button did.

This will probably also go several levels further in, as the camera goes better at working out what you’re actually taking a picture of. When you take a photo on a ski slope it will come out perfectly exposed and colour-balanced because the camera knows this is snow and adjusts correctly. Today, portrait mode is doing face detection as well as depth mapping to work out what to focus on; in the future, it will know which of the faces in the frame is your child and set the focus on them.

So, we are clearly well on the way to the point at which any photograph a normal consumer takes will be technically perfect. However, there’s a second step here - not just “what is this picture and how should we focus it?” but “why did you take the picture?”

One of the desire paths of the smartphone camera is that since we have it with us all the time and we can take unlimited pictures for free, and have them instantly, we don’t just take more pictures of our children and dogs but also pictures of things that we’d never have taken pictures of before. We take pictures of posters and books and things we might want to buy - we take pictures of recipes, catalogues, conference schedules, train timetables (Americans, ask a foreigner) and fliers. The smartphone image sensor has become a notebook. (Something similar has happened with smartphone screenshots, another desire path that no-one thought would become a normal consumer behavior.)

Machine learning means that the computer will be able to unlock a lot of this. If there's a date in this picture, what might that mean? Does this look like a recipe? Is there a book in this photo and can we match it to an Amazon listing? Can we match the handbag to Net a Porter? And so you can imagine a suggestion from your phone: “do you want to add the date in this photo to your diary?” in much the same way that today email programs extract flights or meetings or contact details from emails.

This is an interesting product design challenge. Some of this can be passive, as with automatically detecting flights in email - you wait until you know you have something. Machine learning means we now have this with face recognition and object classification: every image on your phone is indexed by default, and you can ask for ‘all pictures of my son at the beach’ or ‘every picture of a dog’. But you can do many more analyses than this, and we take lots of photos, and there will be something you could analyse in all of them. You can perhaps index or translate all of the text in all the photos you take (presuming that isn’t resource-prohibitive), but should you do a product search on every object in every picture on the phone? At some point, you probably need some sort of ‘tell me about this’ mode, where you explicitly ask the computer to do ‘magic’.

Asking a computer to ‘tell me about this picture’ poses other problems, though. We do not have HAL 9000, nor any path to it, and we cannot recognise any arbitrary object, but we can make a guess, of varying quality, in quite a lot of categories. So how should the user know what would work, and how does the system know what kind of guess to make? Should this all happen in one app with a general promise, or many apps with specific promises? Should you have a poster mode, a ‘solve this equation’ mode, a date mode, a books mode and a product search mode? Or should you just have mode for ‘wave the phone’s camera at things and something good will probably happen’?

This last is the approach Google is taking with ‘Lens’, which is integrated into the Android camera app next to ‘Portrait’ - point it at things and magic happens. Mostly.