Beating Back the Creeping Tyranny of Big Data
The data scientists writing the algorithms that drive giants like Alphabet Inc. (Google) and Facebook Inc. are today’s technology wizards, and companies and governments increasingly use their creations — often in secret and with little oversight — to do everything from hiring and firing employees to identifying likely suspects for police monitoring. But there’s a dark side — and computer scientists warn that we’ll need a lot more transparency if the big-data revolution is really to work for all of us.
In her recent book “Weapons of Math Destruction,” mathematician Cathy O’Neil tells the story of Sarah Wysocki, a teacher fired from her job at MacFarland Middle School in Washington, after a computer algorithm churning through numbers on student performance judged her to be a poor teacher. Both students and parents consistently ranked Wysocki as an excellent teacher, yet she couldn’t fairly challenge the decision because the company that developed the algorithm claimed a right to proprietary secrecy. Her firing stood despite near certainty that the algorithm, with the limited data it analyzed, couldn’t have reached any statistically meaningful conclusion.
Wysocki was soon hired by a better-funded school system that relied on people to make decisions. Many others haven’t been so fortunate. O’Neil’s book presents an alarming picture of a race to profit from the explosion of data on human behavior, often taking place with little concern for basic norms of fairness. Companies sometimes deny credit to individuals if they’ve shopped in stores frequented by others with poor credit histories. Automated analysis of data is widely used to make decisions on university admissions, on hiring, even policing strategy, and the practice, as O’Neil shows, often reinforces racial discrimination, despite its seeming objectivity.
What can be done? That’s far from clear, but computer scientists increasingly recognize the seriousness of the problem. In a new paper, computer scientist Bruno Lepri and colleagues explore some key ideas on how technology itself might help.
A first thing is to find ways to control how data can be used. Researchers at the Massachusetts Institute of Technology Media Lab are developing a cloud-computing platform called Enigma to let individuals share their data, while controlling how it can be used. Suppose an insurance company wants to use people’s mobile phone data to assess risks more accurately, which could reduce client premiums. Both company and clients could benefit, but if individuals just hand over their data, the company might sell it or use it for other purposes. The system being developed — based in bitcoin technology — would let individuals retain ownership of their data, follow how it is used and instantly opt-out.
Another need is to make more widely available the vast quantities of data held privately by companies such as Amazon.com Inc. or Facebook. The Open Algorithms project — a collaborative effort among a number of telecommunication companies and universities — aims to make private data open to broader use, but in a safe way. The idea is to share computing code rather than data. Government agencies, scientists or other public-policy groups would be able to run their algorithms on the servers of partner companies, using the data those companies have collected, but for scientific purposes or to evaluate policies on anything from economics to public health. If peoples’ behavior creates the data, everyone should be able to benefit from it, not only individual businesses.
A final problem is to rebalance power between companies and individuals. As algorithms become more complex and powerful, the patterns they detect in data may not be understood by anyone, including program developers, leaving individuals no way to challenge decisions made on their basis. The Defense Advanced Research Projects Agency — the U.S. agency largely responsible for the creation of the internet — has an important project aimed at making artificial intelligence more understandable. If an algorithm makes some decision — say, assigning more police to a particular neighborhood — it should also generate results that explain how it made that choice, and making the uncertainties clear. Authorities could then use the algorithm more intelligently.
These are only a few of the encouraging initiatives beginning to emerge. There is obviously huge potential for the big-data revolution to achieve good things. But technology can be used for good or bad; there’s always a moral side to its use. We need to push for more technology that protects privacy, and makes data work for individuals as well as private companies.