The Data Science Movement: Reflections for Policymakers

By Anneliese Luck

The rapid growth of information technologies has changed the structure of our world— shifting the landscape of communication, behavior, identity, and decision-making. This change precipitated an explosion in the field of data science, along with an overwhelming tangle of technical buzzwords ‒ “big data,” “AI,” “machine learning” ‒ each term more promising and nebulous than the last.

UC Berkeley has been an institutional leader in this data science movement. When the undergraduate Data Science major was introduced last Fall, 780 students filed pre-declarations, quickly putting it on track to becoming one of the most popular majors on campus. Now, it has found its way to the Goldman School of Public Policy, as applications for the Graduate Certificate in Applied Data Science opened this past September.

Beyond the boundaries of campus, the relentless momentum of this movement is evident. It has become increasingly difficult to navigate the professional world without coming into contact with a data scientist, which Harvard Business Review called “the sexiest job of the 21st Century” in 2012.

The policy world is no exception. Over the past few decades, a wealth of research focused on how data can reduce uncertainty in decision making and shape more effective policies. This has sparked conversation around whether data science is the “next frontier” of evidence-based policy making. In 2015, President Barack Obama appointed DJ Patil, the former Head of Products and Chief Scientist at LinkedIn, as the first ever Chief Data Scientist of the United States Office of Science and Technology Policy.  Since then, we have seen new data science techniques expand to states, appearing across the landscape of California public policy, from predictive policing in Los Angeles to traffic reduction in San Francisco.

But in the field of public policy, the technical is intrinsically linked to the social and ethical. As responsible policymakers and analysts, we must thoroughly evaluate the question: if the rapid explosion of data technology has already transformed the structure of our world, how will it impact public policy?

We can approach the question through the perspective of allocative and representational harms, borrowing from the algorithmic impact assessment framework for public agencies presented by Reisman et al (2018). Allocative harms are an economically-oriented view, centered on the ways in which data analysis can affect the distribution of resources to a person or group. Representational harms describe the ways in which data analysis can shape cultural and societal identities.

It is unsurprising that allocative effects have historically been prioritized in analysis, since these are more immediate and more easily quantifiable (i.e. loans, jail time, health services). The promises of data science applications in policy are rooted in the goal of improving allocation— of efficiently and effectively distributing limited resources to populations with the highest need. Machine learning models employed in housing, education, and environmental justice come with promises to better predict tenants at risk of harassment, students at risk of falling behind, or city infrastructure at risk of endangering local economic development so as to proactively tailor effective interventions.

However, these same identification and machine learning techniques present heightened risks of allocative harm. Inherently racist algorithms used in criminal justice sentencing and facial recognition technology used in identifying undocumented immigrants are illustrative examples of the ways that data science can serve as a tool that reproduces structural inequalities already present in our systems.

Additionally, the representational effects of these technologies have largely existed in the shadow of data science. These harms, which deal with the ways that data science applications can encroach on senses of self, can often be neglected in traditional analyses because they can be long-term and difficult to formalize. While one can imagine these technologies’ empowering potential, more often, as Crawford points out, they reveal tendencies to “reinforce the subordination of some groups along the lines of identity” ‒ from racist labeling in Google photos to the erasing of indigenous identities and labor behind machine learning algorithms.

In Catching Our Breath: Critical Race STS and the Carceral Imagination, Ruha Benjamin illustrates how these representational harms operate in the shadows. She examines the practice of medical hot spotting, in which data science techniques are used to identify medically vulnerable populations so as to target them with at-home care and lower health care costs. At the same time, the act of identifying these populations as vulnerable only serves to “reproduce the very forms of classification stigma that restricts life chances in the first place,” highlighting the representational harms that form the foundation of these allocative systems.

In a society where technological innovation and unintended consequences often come hand-in-hand, we are quick to triumph human agency when benefits are realized and absolve responsibility when harms are felt. Despite the blind faith that we place in the ‘neutrality’ of numbers, data science is not as objective as it seems. At every step of the data life cycle ‒ from collection to classification to processing to communication ‒ a series of microdecisions are made that craft the meaning and truth behind the data.

There seems to be a mismatch between the authority, legitimacy and power we endow onto the ‘inherent objectivity’ of data and the subjective process behind creating it. When these processes have discriminatory outcomes, this mismatch of perceived objectivity and concealed bias ‒ which Benjamin calls The New Jim Code ‒ will serve to enable “social containment while appearing fairer than discriminatory practices of a previous era”.

While data can be a useful and powerful tool for shaping policy and progress, it also concretely and tangibly impacts people’s lives, and consequently, reshapes social identities and structures. In a world that is increasingly thinking of persons as data, there is a pressing need and responsibility, particularly as policymakers, to remember the humans behind these numbers.

Anneliese Luck is a Master of Public Policy candidate at the Goldman School of Public Policy and a Graduate Student Instructor of Human Contexts and the Ethics of Data.

The views expressed in this article do not necessarily represent those of the Berkeley Public Policy Journal, the Goldman School of Public Policy, or UC Berkeley.