In the introduction to her new book, Hannah Fry points out something interesting about the phrase “Hello World.” It’s never been quite clear, she says, whether the phrase—which is frequently the entire output of a student’s first computer program—is supposed to be attributed to the program, awakening for the first time, or to the programmer, announcing their triumphant first creation.
Perhaps for this reason, “Hello World” calls to mind a dialogue between human and machine, one which has never been more relevant than it is today. Her book, called Hello World, published in September, walks us through a rapidly computerizing world. Fry is both optimistic and excited—along with her Ph.D. students at the University of College, London, she has worked on many algorithms herself—and cautious. In conversation and in her book, she issues a call to arms: We need to make algorithms transparent, regulated, and forgiving of the flawed creatures that converse with them.
I reached her by telephone while she was on a book tour in New York City.
Why do we need an FDA for algorithms?
It used to be the case that you could just put any old colored liquid in a glass bottle and sell it as medicine and make an absolute fortune. And then not worry about whether or not it’s poisonous. We stopped that from happening because, well, for starters it’s kind of morally repugnant. But also, it harms people. We’re in that position right now with data and algorithms. You can harvest any data that you want, on anybody. You can infer any data that you like, and you can use it to manipulate them in any way that you choose. And you can roll out an algorithm that genuinely makes massive differences to people’s lives, both good and bad, without any checks and balances. To me that seems completely bonkers. So I think we need something like the FDA for algorithms. A regulatory body that can protect the intellectual property of algorithms, but at the same time ensure that the benefits to society outweigh the harms.
Why is the regulation of medicine an appropriate comparison?
If you swallow a bottle of colored liquid and then you keel over the next day, then you know for sure it was poisonous. But there are much more subtle things in pharmaceuticals that require expert analysis to be able to weigh up the benefits and the harms. To study the chemical profile of these drugs that are being sold and make sure that they actually are doing what they say they’re doing. With algorithms it’s the same thing. You can’t expect the average person in the street to study Bayesian inference or be totally well read in random forests, and have the kind of computing prowess to look up a code and analyze whether it’s doing something fairly. That’s not realistic. Simultaneously, you can’t have some code of conduct that every data science person signs up to, and agrees that they won’t tread over some lines. It has to be a government, really, that does this. It has to be government that analyzes this stuff on our behalf and makes sure that it is doing what it says it does, and in a way that doesn’t end up harming people.
How did you come to write a book about algorithms?
Back in 2011 in London, we had these really bad riots in London. I’d been working on a project with the Metropolitan Police, trying mathematically to look at how these riots had spread and to use algorithms to ask how could the police have done better. I went to go and give a talk in Berlin about this paper we’d published about our work, and they completely tore me apart. They were asking questions like, “Hang on a second, you’re creating this algorithm that has the potential to be used to suppress peaceful demonstrations in the future. How can you morally justify the work that you’re doing?” I’m kind of ashamed to say that it just hadn’t occurred to me at that point in time. Ever since, I have really thought a lot about the point that they made. And started to notice around me that other researchers in the area weren’t necessarily treating the data that they were working with, and the algorithms that they were creating, with the ethical concern they really warranted. We have this imbalance where the people who are making algorithms aren’t talking to the people who are using them. And the people who are using them aren’t talking to the people who are having decisions made about their lives by them. I wanted to write something that united those three groups.
Humans can’t tell you accurately how they’re arriving at decisions. Humans are sloppy and messy and irrational.
What is your favorite algorithm?
I think my favorite one, both because of how powerful it is and how much good it can do, and because it’s very clever, is called geoprofiling. It’s an algorithm that’s used to try and track down serial killers or people who commit multiple crimes. Most serial killers won’t tend to commit crimes on their own doorstep. Generally not something that you would do if you want to avoid police sniffing around your neighborhood. But then also, on the flip side, most serial killers don’t tend to travel really long distances to commit crime. There is some level of convenience in the way that people commit their crimes, as weird as it sounds to say that. If you combine those two pieces of information into your clever little algorithm, what you can do is, you can take a map of loads of crimes and you can use the algorithm to highlight where the serial killer is most likely to have come from. This kind of algorithm has also been used in Egypt to track down where malaria-carrying insects are breeding, who Banksy is based on where his paintings have been found, and where bomb factories are.
What is your least favorite algorithm?
I don’t like the algorithms that don’t do what they claim they do. When I was researching my book, I spoke to the CEO of a company, and I mean ... It was a joke. These people are complete charlatans. They claim they can take any movie script and feed it into a neural network that will highlight as little as one word in the script that can be changed to make the movie more profitable at the box office. When I pushed this CEO for evidence that this algorithm actually worked, he launched into this story about how a big movie star was let go from this movie franchise purely on the basis of his algorithm’s say so. And when I pointed out that that wasn’t actually proof that his algorithm worked, it was proof that people believed his algorithm, he kind of shrugged it off and was like, “Oh, well, you know, I’m not performing an academic exercise.” Those are the algorithms that I hate the most. Those and others like it, where people are just exploiting the public’s enthusiasm for artificial intelligence, and using it to their own advantage. With total junk, technically.
What is the most dangerous algorithm?
In terms of wide-reaching impact, the stuff that’s happened with Facebook’s Newsfeed is really, really concerning. Fifteen years ago, let’s say, all of us were watching all the same TV programs, were reading the same newspapers. The places we would get our news, and especially our politics, tended to be universal. And what that meant was that when you had a national conversation about an issue, everyone was coming to that conversation with the same information. But as soon as Facebook decided that they wanted to become purveyors of news, suddenly you have these highly personalized newsfeeds where everything is based on what your friends like, what you like, things that you’ve read in the past. And that’s become so infinitesimally cut up into tiny little chunks, that suddenly when you try and have a national conversation, people are missing each other. They’re talking about different things, even though they think they’re talking about the same thing. Even before all of this Cambridge Analytica stuff, which is a whole other level, I think there is a really serious implication on democracy and on politics. But it’s something that can happen without anybody ever being malicious or having ill intent. It’s just a totally unintended consequence of barging in somewhere without thinking through what the long-term implications of being in that space was.
What digital ecosystem do you personally live in? Apple, Google, Microsoft?
I tend to be more Apple because I think that they take privacy a bit more seriously. This is actually something that’s quite interesting, because it illustrates how people are not engaged by their own privacy. There’s a big difference between the way that Google deals with their photos and the way that Apple deals with their photos. Apple photos are your photos. You keep them. When they do facial and image recognition on them, they use meta-level features of your data. The way that they’re communicating is in the most private way that they possibly can while still collecting the data they need. Google on the other hand, totally own your pictures. They can use them to track with their facial recognition software, they can use them for their own experiments. So that is one of the reasons why I live slightly more in the Apple universe. Safari is also slightly more private in terms of the information that can be collected about you compared to Chrome. But I don’t actually think it’s really changed the way that people are choosing their products. I think that’s quite interesting actually, that we sort of talk about privacy in a way, we don’t want to be creeped out, and we don’t want really personal stuff to come out and be used against us, but I think all of us in some way do feel quite comfortable just taking someone’s word for it. We feel that there’s anonymity in amongst all of the other people who are doing it, too.
Should we own our own data?
I think the most persuasive proposal that I’ve seen are the ones where you have data bankers, intermediaries in charge of sorting all of this out for you. So just in the same way that you would go to a high street bank to deposit your money, and they take care of it for you and they invest it wisely, and you see some kind of return on that investment, the data banker does that same sort of thing, but for data. It would be someone who operates on behalf of the consumer rather than on behalf of the company.
Right now other people are making lots of money on our data.
So much money. I think the one that stands out for me is a company called Palantir, founded by Peter Thiel in 2003. It’s actually one of Silicon Valley’s biggest success stories, and is worth more than Twitter. Most people have never heard of it because it’s all operating completely behind the scenes. This company and companies like it have databases that contain every possible thing you can ever imagine, on you, and who you are, and what you’re interested in. It’s got things like your declared sexuality as well as your true sexuality, things like whether you’ve had a miscarriage, whether you’ve had an abortion. Your feelings on guns, whether you’ve used drugs, like, all of these things are being packaged up, inferred, and sold on for huge profit.
Would you ever do 23andme?
Not if I could avoid it. I’d be curious, but I feel uncomfortable about my DNA being on a database. When you consent to giving up your DNA, you’re not consenting just for yourself. You’re also consenting for everyone in your family, and for all of your offspring and future generations of your offspring. You’re giving away something that can’t be changed, ever, and can’t be denied, ever. It’s the most personal piece of data that you have. I don’t think that whatever curiosity I might have about my origins is worth the potential downsides.
Your feelings on guns, whether you’ve used drugs, all of these things are being packaged up, inferred, and sold on for huge profit.
Should we try to make algorithms perfect?
We’re thinking about this the wrong way. We should stop thinking about how accurate can you make an algorithm, and how few outliers can you end up having. Instead we should start accepting that algorithms are never going to be perfect. Stop over-relying on them, and make it so that the very human habit of over-trusting machines is considered at every possible step of the process. It’s a design question. Algorithms need to be designed for redress when they inevitably make mistakes.
Compas software, which is used in courts, has forced us to quantify our biases. Is this a silver lining of algorithms?
These algorithms that are in courtrooms have massive problems with them that really need to be sorted out. But I also am broadly in favor of them for exactly that reason. If you have an algorithm that is deciding who gets bail and who doesn’t, and that has problems, at least you can tweak the algorithm and improve it. You can’t ask human judges to edge their decision-making toward a different kind of process, because honestly, humans can’t tell you accurately how they’re arriving at their decisions anyway. Humans are really sloppy and messy and irrational. We can’t make consistent decisions about coffee. Exactly as you say, sitting down and defining what’s important, and what we value as a society, is an incredibly beneficial thing.
What do you mean when you say that the best algorithms are the ones that take the human into account at every stage?
I think the best example of this is how IBM’s Watson beat Jeopardy. The really clever thing about the way that that machine was designed is that when it gave an answer, it didn’t just say, “Here’s what the answer is.” Instead, it gave the three top answers that it had considered, along with a confidence rating on each of those answers. Essentially, the machine was wearing its uncertainty proudly at all stages. That makes a massive difference. Say you’ve got a sat nav that says to you, “Okay, this is the route you’re going,” and you just follow that route blindly. You’re far more likely to end up in a situation where you get in some crazy pickle, than if the sat nav, as most of them do now, says, “Okay, here are three options, and the pros and cons of these three options, now you decide from that information that I’ve given you.”
How do driverless cars illustrate this point?
There’s two ways that driverless cars are being built at the moment. One where the algorithm is in control of the driverless car, and does most of the driving, except in an emergency where a human is required to step in and take over. The problem with that, is that it’s just not in our human skillset to do that. We’re not very good at paying attention, at having no warning and having to do something complicated. And yet those are things that actually a computer is really good at. A computer can monitor something in the background without any trouble whatsoever, step in when required, and perfectly execute whatever it’s been trained to do. That’s the other way that these driverless cars are being designed: You’re the one in control, but if something happens and it notices a crash is about to occur, it steps in and it takes over. That’s what I really mean about thinking about the human at every step of the process. It’s not just, “Oh, what would you like the algorithm to do?” Or, “How do you want it to work?” The algorithm fully acknowledges human flaws and biases, and our own human issues of over-trusting things and falling asleep at the wheel. It’s designed to not just exist on its own, but within and around our human failings.
Do we need to develop a brand-new intuition about how to interact with algorithms?
It’s not on us to change that as the users. It’s on the people who are designing the algorithms to make their algorithms to fit into existing human intuition. I always think about when the iPhone was developed. The thing that was really remarkable about it was that it wasn’t just about the technology anymore, it was about the user. It was about how people really worked. Not just what they said they wanted, but how they really operated, right front and center at every stage of the design process. That’s what we need for algorithms. It’s not about releasing these really clunky Nokias with weird keyboards and stuff and expecting people to be able to learn how to use them better. It’s about starting from scratch and saying, “How do people work? What are our flaws as humans? And how do you build something that is aware of that?”