I highly recommend this Guardian article by MIT Professor Catherine D’Ignazio, co-author of the book, “Data Feminism,” which describes bias built into data. D’Ignazio also directs MIT’s new Data and Feminism Lab, which seeks to use data and computation to counter oppression.
She describes the essence of the problem:
We find some people are winning and some people are losing. The benefits and harms are not being equally distributed. And those who are losing are disproportionately women, people of colour, and other marginalised groups. One way they are losing is that data most of us would think is important isn’t being collected. We have detailed datasets on things like the length of guinea pig teeth and items lost on the New York City subway. But, in the US, missing data includes maternal mortality data, which has only started being collected recently, and sexual harassment data. And so much of our health and medical understanding is based on research that has been done exclusively on the male body.
She talks about what she calls “Big Dick Data” which is all too common:
We coined it to denote big data projects that are characterised by masculine fantasies of world domination. Big Dick Data projects fetishise large size and prioritise it, along with speed, over quality, ignore context and inflate their technical capabilities. They also tend to have little consideration for inequalities or inclusion in the process. Mark Zuckerberg aiming to supersede human senses with AI might be considered one such project, along with software company Palantir’s claims about massive-scale datasets. Big Dick Data projects aren’t necessarily wholly invalid, but they suck up resources that could be given smaller, more inclusive projects.