Data is not the new oil

Here’s a thought experiment. One day, I sit on the Tube platform and see a billboard advert for Uber. Uber has targeted that location for advertising because it knows Tube users might be inclined to demand its services. A few days later, I go home, and I see a targeted advert for Uber on Facebook which, hypothetically, has obtained data about my use of public transport in London.

There are obvious differences between these two cases. One is digital and the other isn’t. Facebook might be able to target me much more specifically and accurately, and so the ad might cost more as a result, potentially feeding into prices. It also implies a likely transaction, or process, that has allowed the data to be obtained.

But there is also a similarity. Both adverts rely on information about their target. It is just that the fact I use the Tube, in the case of the billboard, is not quantified. If it were, you might express it as some kind of dummy node: automatically mark “yes” in the spreadsheet column for “do they use the Tube”? But, as with most information that exists in the world, there is no spreadsheet. It is simply built into the fabric of things.

This is why data is not the new oil. It is not the new gold, or the new inflation-linked bond, or the new anything. It is a codification of things that are already there, mostly for the purposes of advertising or sales, and there is no obvious limit to it. Data is interpretatively manufactured, in the way writing is manufactured. It is also infinitely divisible. You could list as much data about one individual as you want. Achilles never catches up with the tortoise because there are infinitely many data points between each updated version of the Wikipedia article on Zeno’s paradox.

Despite all this, we are now faced with a series of peculiar ideas that draw heavily on misleading uses of the term data. They call for the monetisation of data, stating that it is valuable, and customers should be compensated for providing it. These ideas presuppose that data is some kind of commodity, and even the refutations of these positions engage with inherent differences between, say, data, which can be reused, and oil, which can't. But the conversation doesn’t even need to reach this point.

If user data has a non-zero value, then it is liable to almost immediate hyperinflation, given we could all go on printing data about ourselves endlessly (in fact, we could print data about a single transaction endlessly). There must therefore be constraints on the notion of data. But there are hidden values for advertisers and salespeople in all kinds of behavioural practices and rituals. There is no obvious limit to either these values, or the unpredictable combinations in which they emerge — and, in fact, the entire point is that they continuously emerge. Only a small portion of them emerge on the internet.

The focus on data-as-value exposes an unnecessary dichotomy between the digital and the non-digital. Politicians might demand Facebook pays me something for data about my preference for Mars bars, expressed via the performative labour of liking the Mars Bars Appreciation Page. So, when a consumer goes into a corner shop, and the Mars Bars are placed four feet above the ground next to the till, right in the line of sight of their infant child, thereby co-opting her otherwise unmatchable lobbying skills into the purchase, what else are they owed? The fact that many children are four feet high, and have a great ability to influence their parents, are both information about the world. They are not codified data in a spreadsheet, but it would be strange to claim that they therefore don’t count, as part of this discussion, when the Facebook like does.

By focusing on data, the discussions skirt along the edges of the real story, which in turn explains why non-digital incidences of weaponised information have not attracted political opprobrium. Social media companies, along with Google, now make up an oligopoly in the digital advertising industry (though not in the advertising industry in general). It is true that their use of data is a good explanation for their dominance. The use of data is not new, but its particular rapid accumulation on the internet, via search and socialising, is. Data produced by the internet, however, is still an infinitesimal proportion of relevant information that could be codified, such as how tall children are, or whether the billboard is placed in the Tube. Nor is it easily categorised.

This is where the data sleight of hand enters the conversation. The great data controversies are not over whether Asos trainers are too expensive compared to Sports Direct, or how people should be renumerated for the fact they typed Asos into Google and then saw an ad which led to the purchase. They are manifestations of political controversies in recent elections. Uncoincidentally, the new advertising oligopoly has weakened sectors which also rely on advertising — the print and broadcast media. Those sectors exercise influence over political power, and their weakness has come alongside a destabilisation of the political systems of many countries.

Most discussion of data is a coded call to weaken this new, weakly regulated publishing power, which relies on the internet-fuelled dominance of the advertising industry, and its spillover effects on politics. Within a historical context, the tensions are evocative of any number of publishing revolutions. The connections with actual data are truisms, in the same way that the Bible was translated into the vernacular and then transported (via the data on maps) to regions (data) with high numbers (data) of native speakers (data) of the vernacular (a kind of collectively known data).

So, I search for TfL five times a week on Google, and artificial intelligence deduces that I have, on aggregate, a 30 per cent higher susceptivity to Uber ads. How does this compare to advertising on the Tube itself? It’s hard to say. The former, being data of the codified variety, is naturally amenable to analysis. The latter is a different proposition. It involves a judgment about what someone sitting on a Tube platform might be thinking when they look at an advert. If you wanted to, you could insist that judgment has also compressed thousands of data points; it just didn’t require a spreadsheet.

Copyright The Financial Times Limited 2019. All rights reserved. You may share using our article tools. Please don't cut articles from and redistribute by email or post to the web.

Read next:

Read next:

Lookout, there’s a dollar crunch!

FT Alpha Tweets