Data Feminism (2020) by Catherine D’Ignazio and Lauren F. Klein provides an important perspective about data and power. Those in power use data, intentionally or unintentionally, to maintain the status quo and oppress marginalized people. Yet data is a two-edged sword and those without power can use it to challenge the status quo.
While inspired by feminist theory, the book is not limited to discussions of women. It is inclusive and covers marginalized people who are discriminated against by the use of data, regardless of gender, sex, age, race, class, ability, religion, or geography.
People in power may not realize that they discriminate against others, but those who are marginalized are constantly aware of the unbalanced relationship. The authors start with the story of Christine Mann Darden, a black woman scientist at NASA’s Langley Research Center in the 1960’s and 1970’s. She was the technical expert who analyzed the physics of rocket reentry that allowed Apollo 11 to return successfully.
At the time she was classified, along with many other women mathematicians, as computers (people who performed calculations), rather engineers. Identically qualified male mathematicians were generally classified as engineers, a position with greater promotion potential. Working with the Equal Opportunity Office and using data, Darden was able to demonstrate the discrimination against women. Her white male supervisor was “shocked” when presented with the evidence and she was quickly promoted. Her use of data to challenge discrimination succeeded and eventually she became a senior NASA leader.
Data Feminism is filled with rich and informative stories about power and data, challenging the status quo, which is dominated by white males in positions of privilege. The discussions are careful and nuanced … observing that potentially well-meaning actions may result in negative consequences, while non-discriminatory choices may be complicated and difficult.
The authors organize the book around the seven principles of Data Feminism:
- Examine power
- Challenge power
- Elevate emotion and embodiment
- Rethink binaries and hierarchies
- Embrace pluralism
- Consider context
- Make labor visible
There is simply too much content to adequately describe in a short review, so we will focus on the themes of the use of open data, counting and classification, missing data, the use of training data in artificial intelligence, and the shortfalls of data for good.
While public data has the noble goal of making data widely accessible, it often lacks the context necessary to make it useable.
Until we invest as much in providing (and maintaining) context as we do in publishing data, we will end up with public information resources that are subpar at best and dangerous at worst.
A lack of adequate context resulted in an embarrassing retraction of a 538 news article on the number of Boko Haram kidnappings in Nigeria. This was a result of the poorly documented Global Database of Events, Language and Tone (GDELT) project and the reporter’s misunderstanding of the data context, where the distinction between individual events and multiple reports of a single event was not clear.
The authors assert that the concept of raw data is an oxymoron … all data reflects a number of social decisions and the limitations of the data collection resources. Nowhere is this more evident than in the challenge of establishing categories. The authors highlight the complexity of categorizing gender, explaining how Facebook has at different times used a binary system (male, female), a system with over 50 choices, and a system which left the question blank and let the user self-identify. Despite offering these choices, research has shown that internally Facebook uses the male-female binary. The authors also discuss similar issues with the Census Bureau’s handling of race, which has been fluid throughout its history.
Missing data is another hindrance to recognizing the issues of marginalized people. The authors use the term “missing data” to apply to whole classes of data that are not available, not just to missing observations in an existing dataset. There are several examples of missing data, including information about the deaths of children killed by white commuters in Detroit and the gender-related killings of women and girls in Mexico. In both cases, the data to assess the issue was simply not available. In today’s data rich society, one would expect data to be available on almost any topic … it is not. These missing datasets have often only finally been made available through the initiatives of individuals.
The shocking case of Joy Buolamwini, a Ghanaian-American graduate student at MIT, is used as an example of discrimination in data science. Buolamwini discovered that facial-recognition software was not able to identify her face, while it was able to recognize her lighter-skinned colleagues. It was only when she put on a white mask that the system was able to recognize her. The issue was that the data had been trained with primarily white, male faces, and did not have a basis for recognizing a dark-skinned female. As a result of her research, more recent efforts, such as IBM’s Diversity in Faces database, have made an effort to be more inclusive.
Finally, the authors comment on the limitations of “data for good” efforts, especially those which do not include marginalized people in their management and direction. These well-intentioned, often paternalistic efforts may actually cause additional harm to marginalized people by further stigmatizing them, rather than recognizing their strengths. The authors propose the concept of co-liberation, where the marginalized and others work together, building authentic, long-term relationships where both learn and take accountability for their efforts.
With its focus on data, power structures, and oppression, Data Feminism is a valuable contribution to data science. This review has only scratched the surface of issues related to data and discrimination. If you are not oppressed or marginalized, reading the book will make you aware of these issues. If you are oppressed or marginalized, you will recognize familiar issues and see a broad range of potential solutions.
Data Feminism. The open book, available online from MIT at no cost.
Data Feminism. Printed book from Amazon.
Catherine D’Ignazio. Director of the Data + Feminism Lab and an Assistant Professor of Urban Science and Planning at MIT’s Department of Urban Studies and Planning.
Lauren F. Klein. Associate Professor in the Departments of English and Quantitative Theory and Methods at Emory University, and director of the Digital Humanities Lab.
Race After Technology: Abolitionist Tools for the New Jim Code, by Ruha Benjamin
Algorithms of Oppression: How Search Engines Reinforce Racism, by Safiya Umoja Noble
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, by Virginia Eubanks