I’m not sure what was bigger- the size of my eyes or the decibel reading of my eight-year-old squeal.
It shimmered in the distance. The oasis in the desert. We were almost there.
My grandfather hardly pulled off the road when I tumbled across the back-bench seat of his hulking 1980-something GMC crew cab, threw the heavy door open, and ran through the settling dust cloud to stand rapturously in the shadow of oasis.
I was here, standing amongst history. Basking in the connection to lives lived before me. And it was even more glorious than it seemed in Pee Wee Herman’s Big Adventure.
The Cabazon Dinosaurs.
After getting my fill of these stupendous works of art (remember- eight years old), the agenda called for clamoring back into the truck and continuing on to the Mojave Desert. The purpose of this journey (besides giving a little kid the thrill of a lifetime) lay in the hobby of my grandfather and my father- fossil hunting. I criss-crossed the southwestern deserts during my early-childhood winter and spring breaks from school. Sometimes on foot, sometimes on the back of a quad, but always combing the ground for signs of life. Past life. Thousands, millions, bazillions (atleast to my childhood grasp of time) years past life, but life nonetheless.
During a winter afternoon, when the shadows were long and the crisp air heavy with sage, I found a section of jaw bone fossilized in rock. I did not know what I had in my hand until my grandfather explained it to me. But I stood there turning it over, looking at it, and wondering about all that had walked those sandy flats before me.
From fossils to datasets
My amazement with lives lived before me has not waned (nor has my appreciation of the Cabazon Dinosaurs, though that has devolved into a highly campy love). But, it has changed.
Instead of marveling at the fossilized evidence of life, I now get lost in the photos, diaries, and records of life. I wonder what these people were doing, how their lives were, and find comfort in how little we have changed. As I am wont to say, people been peoplin’ for a looooooong time.
But these things I now get lost in- the extant records of events and lived experiences- raise enormous questions. Why were these records kept? Why were they made? What was not kept? Who made them? You know, the usual questions that a good healthily skeptic student is trained to ask.
In a data driven world, these questions are not just the “usual ones” that should be asked, they are incredibly urgent issues that need to be addressed. Issues of bias and archival annihilation are compounded exponentially when the volume of data, speed of data, shrinking of time scales (those data points don’t need to ossify, we just need to use an API!), and mechanized biases that modes of data creation and curation are rife with.
Data- those numerical empirical and unsullied little nuggets of value free objectivity- are anything but.
Catherine D’Ignazio and Lauren Klein explain the cascade effects of bias on the data that now makes the world-go-round. In “The Numbers Don’t Speak for Themselves”, a chapter in their book Data Feminism, the authors dismantle the silicon-valley “tech-bro” belief that “that the age of Big Data will soon permit data scientists to do analysis at the scale of the population.” Theory could finally be cast aside because the wealth of data points would mean that analysis could be fool-proof and statistical correlations, rather than interpretation, would rule because numbers were unbiased, unfeeling, and untainted. Data can truly usher in the “end of history”!
This is dismantled by context. Numbers do not and can not take into account context. People who use datasets compiled by others do not fully know the context of their curation (unless the curators abide by the code of ethics outlined in the Santa Barbara Statement on Collections as Data). Their arguments are backed up by convincing evidence and contextualized data. Immediately, a more sophisticated understanding of how distributions can be wildly manipulated with the same numbers and data sets that my community college stats teacher illustrated came to life.
The combined dismissal (or care) of the inherent importance of context and the championing of data is grounded in what D’Ignazio and Klein call “Big Dick Data”. This academic term coined by the authors is pretty self explanatory (regardless of identification, I’m sure everyone can think of someone behaving in such a cock-sure way that is like “We get it! You’re the biggest, the best! Whoopty-freaking-do!”). But they do elaborate that it is “big data projects that have
masculinist, totalizing fantasies of world domination through data capture and analysis” by “ingnor[ing] context, fetishiz[ing] size, and overstat[ing] and inflat[ing] their technical and scientific
No arguments here.
But I do feel that this catchy term eclipses some of the more subtle issues of data presented by D’Ignazio and Klein. Racism, sexism, homophobia, and many other biases that can exacerbate existing inequalities and further feed the perpetuation of hateful stereotypes can come out of even the most well meaning data-centric work.
Daniel Greene’s The Promise of Access comes to mind. Greene conducts an ethnographic study of Washington DC area public libraries and charter schools who have embraced the supremacy of numbers and thrown their lot entirely in modeling Big Data companies. This results in reaffirmations of neoliberalism that continue to harm the already vulnerable, widens the gulf between the tech “haves” and the “have nots”, values hitting metrics over learning outcomes, and in the end does not accomplish much else than remake the world to look like an Apple Store.
They say the road to hell is paved with good intentions, and a belief that data is the stairway to heaven can turn that road into a super-highway.
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner’s expose on the recidivism and “risk assessment” metrics used to assess the “risk” a person presents when they enter the judicial system is an especially egregious example of this. Black Americans who entered the justice system were rated as being at an extremely high rate of recidivism risk, while white Americans were overwhelmingly rated much lower. It did not matter if a Black American only had a misdemeanor and a white American had multiple accounts of robbery and assault, the numbers lied and judicial system racism was reinforced.
That data assessments like this were even attempted in such an incredibly racially and gender biased system is shocking. What did people expect? But, D’Ignazio and Klein scream of “CONTEXT MATTERS, NUMBERS LIE!” into the hallowed halls of tech campuses is extremely necessary. There are still many, many (often wealthy) people who march to the drum of “savior data” and expect the world to dance to the beat as well.
But there are people who are championing the humanistic use of data, those who use it as a medium for creation and nuanced inquiry. In The Terrorist Album, Jacob Dlamini combines data from the South African Truth and Reconciliation Committee and existing South African Police records from the formal apartheid era to contextualize the jail photos of South Africans who were imprisoned as terrorists. His use of data is heavily contextualized and used to complicate history and human existence, not try to provide an absolute answer. Dlamini is not trying to say that number’s don’t lie, but rather that the apartheid police state did, and the biased data their actions created supports this.
An even more data-intensive and tech forward example is Kate Bagnall and Tim Sherratt The Real Face of White Australia. This project breaks the early 20th century claim that Australia was for white Australians, it was a white country! When they looked at the records, this became patently untrue. Extensive photos, passports, and citizenship paperwork existed that proved the true mosaic nature and the oppression people faced for being living proof that Australia was not a white country. Like Dlamini, Bagnall and Sherratt turn to government data and used it to reveal how flawed it is.
But Bagnall and Sherratt took it a step further. The photos, passports, immigration papers, census data, and other information of their research has been fully digitized. On the site, the data sets (complete with contextualization of where the data came from and statements about their methods for interpreting it), are available. This is a hopeful example that even though data will not be our collective SuperMan, it can be our collective Louis Lane. With context, research, and healthy humanistic skepticism, it can help us better understand events and lives lived before us.
When data is contextualized correctly, it can facilitate the same wonder I felt as a kid in the southwestern desert. But instead of imagining what creature stepped on the same sand flats as me, I can look into the eyes of someone who navigated this world not so long before me.