Two new Constellation Research reports of mine cover data protection, looking at the heightened need for privacy in the face of artificial intelligence and Big Data, and the sorts of systemic infrastructure needed to safeguard data supply in future.

Big Data and AI are infamously providing corporations and governments with the means to know us "better than we know ourselves". Businesses no longer need to survey their customers to work out their product preferences, lifestyles, or even their state of health; instead, data analytics and machine learning algorithms, fueled by vast amounts of the "digital exhaust" we leave behind wherever we go online, are uncovering ever deeper insights about us. Businesses get to know us now automatically, without ever asking explicit questions.

What are the privacy implications? The good news for consumers and privacy advocates is that general data protection and privacy laws are technology neutral; they extend essentially the same protections to Personal Data that is automatically generated as they do data collected manually. Long established privacy laws have been applied to curb the excesses of digital companies on the cutting edge of data processing. My report Big Privacy Rises to the Challenge of Big Data and AI examines the strengths and weaknesses of classical privacy laws.

The future of the digital economy depends on reasonable and equitable use of data as a resource. The early years of the Internet Age has seen significant exploitation of individuals by digital entrepreneurs, and stark imbalances in the riches that can be made from mining and refining information. But privacy laws are being reinforced in Europe (with the EU's General Data Protection Rule, GDPR) and extended to places like California. This is surely a sign of the law-and-order to come.

The early oil rush is instructive for how the digital economy should probably evolve from here. To bring oil safely to market, the petroleum industry organised itself into complex new supply chains, for moving and processing petrochemicals. Technical standards, enforceable rules, and even social norms (like good habits for handling gasoline) developed to help keep the new supply chains orderly.

Obviously data is quite different from oil, and the comparison isn't meant to be taken too far. So what practical lessons are there from the petrochemical experience for the future organisation of the digital economy? What would data supply chains actually look like? It seems likely to me that new laws and jurisprudence will emerge to deal with data as a intangible asset class, but that's another story. For now, I start to tease out the more technological aspects of data protection in How Data Supply Chains Must Be Safeguarded in the Digital Economy.

I start with the challenge of how to be sure about the people and entities we try to deal with in the digital environment. The Digital Identity industry has grappled with this for two decades, and its successes can be leveraged.

In the so-called "real world", commerce and government services revolve around established facts and figures about people (account numbers, customer reference numbers, employee numbers, professional qualifications, memberships, social security entitlements, driver licenses, and personal attributes like age, residency, health conditions and so on). But all these critical pieces of information lose their reliability and provenance online: we cannot tell where the information is supposed to have come from, much less can we distinguish clones and counterfeits from "originals". Nor can we be sure that data presented online truly belongs to particular individuals.

But in another setting, this is a solved problem. The susceptibility of Digital Identity data to fraud is very similar to that of credit card numbers, and we've secured them with integrated circuits and cryptography.

A credit card is nothing more than a data carrier used to present an account holder's bona fides to a merchant (within the context of an overarching scheme). Over time, the payment card industry has steadily adopted more robust forms of data carrier:

  1. The original paper charge cards in the 1950s were transcribed by merchants by hand
  2. embossed plastic cards were "read" by carbon paper click-clack machines
  3. magnetic stripe cards were read automatically by electronic terminals which scanned data encoded in analogue magnetized patterns
  4. now chip cards are also read automatically but using digital memory and mutual authentication between card and terminal
  5. smart phones embody chips which can mimic smartcards, and bring added functionality, like a mobile wallet which can manage multiple accounts.

Magnetic stripe cards persisted for decades until criminal skimming and carding became unbearable. Magnetic stripe fraud is enabled because a card terminal cannot tell the difference between an original analogue stripe and a copy; the data encoded in the magnetic medium has no provenance.

The whole point of a chip or smart payment card is to protect the presentation of cardholder data, to prevent interception, tampering, illicit replay, cloning and/or counterfeiting. The cardholder data in a chip card is exactly the same as that in a magnetic stripe (or on the surface of either type of card for that matter) but the data transfer protocol from chip card to terminal is special.

A chip card holds the cardholder details within an embedded microprocessor, along with one or more private keys which are unique to each cardholder. Data is not passively transferred to the terminal as it is with a mag stripe card; instead, for each transaction, there is a handshake. First the terminal sends the purchase details into the card's microprocessor, which combines them with the cardholder data and digitally signs the combination before sending it back to the terminal. This operation renders each encoded transaction unique to the card and cardholder, and prevents substitution of stolen data or tampering with the transaction.

As the older technology is phased out, an overall systemic improvement is that raw data becomes useless, and valueless to thieves. Best practice is that no raw card details are relied upon but instead we expect transactions to employ chips and digital signatures, to ensure provenance.
The experience of progressively tackling plastic card fraud offers lessons for economy-scale digital identity, and data management in general. The core technique is digital signatures (applied and processed automatically by smart devices) underpinned by seamless key management (where cryptographic keys are registered to users for different applications). The provenance of all data could be safeguarded in the same way as credit card numbers are protected in the payments system.

In my two new reports I try to balance a positive view of classical privacy regulations, with a realistic, evidence-based vision of how standard cryptographic technology can protect the provenance of all data and systematise how it flows through the digital economy.

Research Unlimited and Executive Network members may access the reports here: 

Big Privacy Rises to the Challenges of Big Data and AI

How Data Supply Chains Must be Safeguarded in the Digital Economy