a review of Erez Aiden and Jean-Baptiste Michel, Uncharted: Big Data as a Lens on Human Culture (Riverhead Books, reprint edition, 2014)
by Benjamin Haber
On a recent visit to San Francisco, I found myself trying to purchase groceries when my credit card was declined. As the cashier is telling me this news, and before I really had time to feel any particular way about it, my leg vibrates. I’ve received a text: “Chase Fraud-Did you use card ending in 1234 for $100.40 at a grocery store on 07/01/2015? If YES reply 1, NO reply 2.” After replying “yes” (which was recognized even though I failed to follow instructions), I swiped my card again and was out the door with my food. Many have probably had a similar experience: most if not all credit card companies automatically track purchases for a variety of reasons, including fraud prevention, the tracking of illegal activity, and to offer tailored financial products and services. As I walked out of the store, for a moment, I felt the power of “big data,” how real-time consumer information can be read as be a predictor of a stolen card in less time than I had to consider why my card had been declined. It was a too rare moment of reflection on those networks of activity that modulate our life chances and capacities, mostly below and above our conscious awareness.
And then I remembered: didn’t I buy my plane ticket with the points from that very credit card? And in fact, hadn’t I used that card on multiple occasions in San Francisco for purchases not much less than the amount my groceries cost. While the near-instantaneous text provided reassurance before I could consciously recognize my anxiety, the automatic card decline was likely not a sophisticated real-time data-enabled prescience, but a rather blunt instrument, flagging the transaction on the basis of two data points: distance from home and amount of purchase. In fact, there is plenty of evidence to suggest that the gap between data collection and processing, between metadata and content and between current reality of data and its speculative future is still quite large. While Target’s pregnancy predicting algorithm was a journalistic sensation, the more mundane computational confusion that has Gmail constantly serving me advertisements for trade and business schools shows the striking gap between the possibilities of what is collected and the current landscape of computationally prodded behavior. The text from Chase, your Klout score, the vibration of your FitBit, or the probabilistic genetic information from 23 and me are all primarily affective investments in mobilizing a desire for data’s future promise. These companies and others are opening of new ground for discourse via affect, creating networked infrastructures for modulating the body and social life.
I was thinking about this while reading Uncharted: Big Data as a Lens on Human Culture, a love letter to the power and utility of algorithmic processing of the words in books. Though ostensibly about the Google Ngram Viewer, a neat if one-dimensional tool to visualize the word frequency of a portion of the books scanned by Google, Uncharted is also unquestionably involved in the mobilization of desire for quantification. Though about the academy rather than financialization, medicine, sports or any other field being “revolutionized” by big data, its breathless boosterism and obligatory cautions are emblematic of the emergent datafied spirit of capitalism, a celebratory “coming out” of the quantifying systems that constitute the emergent infrastructures of sociality.
While published fairly recently, in 2013, Uncharted already feels dated in its strangely muted engagement with the variety of serious objections to sprawling corporate and state run data systems in the post-Snowden, post-Target, post-Ashley Madison era (a list that will always be in need of update). There is still the dazzlement about the sheer magnificent size of this potential new suitor—“If you wrote out all five zettabytes that humans produce every year by hand, you would reach the core of the Milky Way” (11)—all the more impressive when explicitly compared to the dusty old technologies of ink and paper. Authors Erez Aiden and Jean-Baptiste Michel are floating in a world of “simple and beautiful” formulas (45), “strange, fascinating and addictive” methods (22), producing “intriguing, perplexing and even fun” conclusions (119) in their drive to colonize the “uncharted continent” (76) that is the English language. The almost erotic desire for this bounty is made more explicit in their tongue-in-cheek characterization of their meetings with Google employees as an “irresistible… mating dance” (22):
Scholars and scientists approach engineers, product managers, and even high-level executives about getting access to their companies’ data. Sometimes the initial conversation goes well. They go out for coffee. One thing leads to another, and a year later, a brand-new person enters the picture. Unfortunately this person is usually a lawyer. (22)
There is a lot to unpack in these metaphors, the recasting of academic dependence on data systems designed and controlled by corporate entities as a sexy new opportunity for scholars and scientists. There are important conversations to be had about these circulations of quantified desire; about who gets access to this kind of data, the ethics of working with companies who have an existential interest in profit and shareholder return and the cultural significance of wrapping business transactions in the language of heterosexual coupling. Here however I am mostly interested in the real allure that this passage and others speaks to, and the attendant fear that mostly whispers, at least in a book written by Harvard PhDs with Ted talks to give.
For most academics in the social sciences and the humanities “big data” is a term more likely to get caught in the throat than inspire butterflies in the stomach. While Aiden and Michel certainly acknowledge that old-fashion textual analysis (50) and theory (20) will have a place in this brave new world of charts and numbers, they provide a number of contrasts to suggest the relative poverty of even the most brilliant scholar in the face of big data. One hypothetical in particular, that is not directly answered but is strongly implied, spoke to my discipline specifically:
Consider the following question: Which would help you more if your quest was to learn about contemporary human society—unfettered access to a leading university’s department of sociology, packed with experts on how societies function, or unfettered access to Facebook, a company whose goal is to help mediate human social relationships online? (12)
The existential threat at the heart of this question was catalyzed for many people in Roger Burrows and Mike Savage’s 2007 “The Coming Crisis of Empirical Sociology,” an early canary singing the worry of what Nigel Thrift has called “knowing capitalism” (2005). Knowing capitalism speaks to the ways that capitalism has begun to take seriously the task of “thinking the everyday” (1) by embedding information technologies within “circuits of practice” (5). For Burrows and Savage these practices can and should be seen as a largely unrecognized world of sophisticated and profit-minded sociology that makes the quantitative tools of academics look like “a very poor instrument” in comparison (2007: 891).
Indeed, as Burrows and Savage note, the now ubiquitous social survey is a technology invented by social scientists, folks who were once seen as strikingly innovative methodologists (888). Despite ever more sophisticated statistical treatments however, the now over 40 year old social survey remains the heart of social scientific quantitative methodology in a radically changed context. And while declining response rates, a constraining nation-based framing and competition from privately-funded surveys have all decreased the efficacy of academic survey research (890), nothing has threatened the discipline like the embedded and “passive” collecting technologies that fuel big data. And with these methodological changes come profound epistemological ones: questions of how, when, why and what we know of the world. These methods are inspiring changing ideas of generalizability and new expectations around the temporality of research. Does it matter, for example, that studies have questioned the accuracy of the FitBit? The growing popularity of these devices suggests at the very least that sociologists should not count on empirical rigor to save them from irrelevance.
As academia reorganizes around the speculative potential of digital technologies, there is an increasing pile of capital available to those academics able to translate between the discourses of data capitalism and a variety of disciplinary traditions. And the lure of this capital is perhaps strongest in the humanities, whose scholars have been disproportionately affected by state economic retrenchment on education spending that has increasingly prioritized quantitative, instrumental, and skill-based majors. The increasing urgency in the humanities to use bigger and faster tools is reflected in the surprisingly minimal hand wringing over the politics of working with companies like Facebook, Twitter and Google. If there is trepidation in the N-grams project recounted in Uncharted, it is mostly coming from Google, whose lawyers and engineers have little incentive to bother themselves with the politically fraught, theory-driven, Institutional Review Board slow lane of academic production. The power imbalance of this courtship leaves those academics who decide to partner with these companies at the mercy of their epistemological priorities and, as Uncharted demonstrates, the cultural aesthetics of corporate tech.
This is a vision of the public humanities refracted through the language of public relations and the “measurable outcomes” culture of the American technology industry. Uncharted has taken to heart the power of (re)branding to change the valence of your work: Aiden and Michel would like you to call their big data inflected historical research “culturomics” (22). In addition to a hopeful attempt to coin a buzzy new work about the digital, culturomics linguistically brings the humanities closer to the supposed precision, determination and quantifiability of economics. And lest you think this multivalent bringing of culture to capital—or rather the renegotiation of “the relationship between commerce and the ivory tower” (8)—is unseemly, Aiden and Michel provide an origin story to show how futile this separation has been.
But the desire for written records has always accompanied economic activity, since transactions are meaningless unless you can clearly keep track of who owns what. As such, early human writing is dominated by wheeling and dealing: a menagerie of bets, chits, and contracts. Long before we had the writings of prophets, we had the writing of profits. (9)
And no doubt this is true: culture is always already bound up with economy. But the full-throated embrace of culturomics is not a vision of interrogating and reimagining the relationship between economic systems, culture and everyday life;  rather it signals the acceptance of the idea of culture as transactional business model. While Google has long imagined itself as a company with a social mission, they are a publicly held company who will be punished by investors if they neglect their bottom line of increasing the engagement of eyeballs on advertisements. The N-gram Viewer does not make Google money, but it perhaps increases public support for their larger book-scanning initiative, which Google clearly sees as a valuable enough project to invest many years of labor and millions of dollars to defend in court.
This vision of the humanities is transactionary in another way as well. While much of Uncharted is an attempt to demonstrate the profound, game-changing implications of the N-gram viewer, there is a distinctly small-questions, cocktail-party-conversation feel to this type of inquiry that seems ironically most useful in preparing ABD humanities and social science PhDs for jobs in the service industry than in training them for the future of academia. It might be more precise to say that the N-gram viewer is architecturally designed for small answers rather than small questions. All is resolved through linear projection, a winner and a loser or stasis. This is a vision of research where the precise nature of the mediation (what books have been excluded? what is the effect of treating all books as equally revealing of human culture? what about those humans whose voices have been systematically excluded from the written record?) is ignored, and where the actual analysis of books, and indeed the books themselves, are black-boxed from the researcher.
Uncharted speaks to perils of doing research under the cloud of existential erasure and to the failure of academics to lead with a different vision of the possibilities of quantification. Collaborating with the wealthy corporate titans of data collection requires an acceptance of these companies own existential mandate: make tons of money by monetizing a dizzying array of human activities while speculatively reimagining the future to attempt to maintain that cash flow. For Google, this is a vision where all activities, not just “googling” are collected and analyzed in a seamlessly updating centralized system. Cars, thermostats, video games, photos, businesses are integrated not for the public benefit but because of the power of scale to sell or rent or advertise products. Data is promised as a deterministic balm for the unknowability of life and Google’s participation in academic research gives them the credibility to be your corporate (sen.se) mother. What, might we imagine, are the speculative possibilities of networked data not beholden to shareholder value?
Benjamin Haber is a PhD candidate in Sociology at CUNY Graduate Center and a Digital Fellow at The Center for the Humanities. His current research is a cultural and material exploration of emergent infrastructures of corporeal data through a queer theoretical framework. He is organizing a conference called “Queer Circuits in Archival Times: Experimentation and Critique of Networked Data” to be held in New York City in May 2016.
 A project desperately needed in academia, where terms like “neoliberalism,” “biopolitics” and “late capitalism” more often than not are used briefly at end of a short section on implications rather than being given the critical attention and nuanced intentionality that they deserve.
Savage, Mike, and Roger Burrows. 2007. “The Coming Crisis of Empirical Sociology.” Sociology 41 (5): 885–99.
Thrift, Nigel. 2005. Knowing Capitalism. London: SAGE.