Training data that is meant to make predictive policing less biased is still racist
In their defense, many developers of predictive policing tools say that they have started using victim reports to get a more accurate picture of crime rates in different neighborhoods. In theory, victim reports should be less biased because they aren’t affected by police prejudice or feedback loops.
But Nil-Jana Akpinar and Alexandra Chouldechova at Carnegie Mellon University show that the view provided by victim reports is also skewed. The pair built their own predictive algorithm using the same model found in several popular tools, including PredPol, the most widely used system in the US. They trained the model on victim report data for Bogotá, Colombia, one of very few cities for which independent crime reporting data is available at a district-by-district level.
When they compared their tool’s predictions against actual crime data for each district, they found that it made significant errors. For example, in a district where few crimes were reported, the tool predicted around 20% of the actual hot spots—locations with a high rate of crime. On the other hand, in a district with a high number of reports, the tool predicted 20% more hot spots than there really were.
For Rashida Richardson, a lawyer and researcher who studies algorithmic bias at the AI Now Institute in New York, these results reinforce existing work that highlights problems with data sets used in predictive policing. “They lead to biased outcomes that do not improve public safety,” she says. “I think many predictive policing vendors like PredPol fundamentally do not understand how structural and social conditions bias or skew many forms of crime data.”
So why did the algorithm get it so wrong? The problem with victim reports is that Black people are more likely to be reported for a crime than white. Richer white people are more likely to report a poorer Black person than the other way around. And Black people are also more likely to report other Black people. As with arrest data, this leads to Black neighborhoods being flagged as crime hot spots more often than they should be.
Other factors distort the picture too. “Victim reporting is also related to community trust or distrust of police,” says Richardson. “So if you are in a community with a historically corrupt or notoriously racially biased police department, that will affect how and whether people report crime.” In this case, a predictive tool might underestimate the level of crime in an area, so it will not get the policing it needs.