Ignorance of AI is no excuse

November 5, 2023

Understanding and explaining the workings of artificial brains—particularly deep neural networks—has been a problem for a decade or so. Some AI entrepreneurs seem almost to boast they don’t know how their creations work, as if mysteriousness is proof of real intelligence. But algorithmic transparency is being mandated in new European legislation so that individuals have better recourse when adversely affected when robots miscalculate their credit or health insurance risks.

I want to discuss another reason regulators have for getting inside the black box of AI: accountability under data privacy regimes.

The power of conventional privacy laws

Large language models (LLMs) and generative AI are making it hard now to tell fact from fiction.  The Some commentators, with great care, call this an existential threat to social institutions and social order. Naturally there are calls for new regulations. Such reforms could take many years

But I see untapped power to regulate AI in the existing principles-based privacy laws that prevail worldwide, a famous example being Europe’s General Data Protection Rule (GDPR).

I have written elsewhere about the “superpower” of orthodox data privacy laws. These are based on the idea of personal data, broadly defined as essentially any information which may be associated with an identifiable natural person. Data privacy laws such as the GDPR (not to mention 162 national statutes) seek to restrain the collection, use and disclosure of personal data.

Generally speaking, these laws are technology neutral; they are blind to the manner in which personal data is collected.

This means that when algorithms produce data that is personally identifiable, those algorithms and their operators are in scope for privacy laws in most places around the world.

Surprise!

Time and time again, technologists are taken by surprise by the privacy obligations of automated personal data flows:

  • In 2011, German regulators found that Facebook’s photo tag suggestions violated privacy law. The company was ordered to cease facial recognition and delete its biometric data sets. Facebook prudently went further, suspending tag suggestions worldwide for many years. See also this previous analysis of tag suggestions as a form of personal data collection.  
  • The counter-intuitive Right to be Forgotten (RTBF) first emerged as such in the 2014 European Court of Justice case Google Spain v AEPD and Mario Costeja Gonzálezi.  Often misunderstood, the case was not about “forgetting” anything in general but specifically de-indexing web search results. The narrow scope serves to highlight that personal data generated by algorithms (for that’s what search results are) is covered by privacy law. In my view, search results are not simple replicas of objective facts found in the public domain; they are the outcomes of complex Big Data processes.

What’s next?

The legal reality is straightforward. If personal data comes, by any means, to be held in an information system, then the organisation in charge of that system may be deemed to have collected that personal data and thus is subject to applicable data privacy laws.

As we have seen, privacy commissioners have thrown the book at analytics and Big Data.

AI may be next.

Being responsible for personal data, no matter what

If a large language model acquires knowledge about identifiable people—whether by deep learning or the gossip of simulacra—then that knowledge is personal data and the model’s operators may be accountable for it under data privacy rules.

Neural networks represent knowledge in weird and wonderful ways, quite unlike regular file storage and computer memory. It is notoriously hard to pinpoint where these AIs store their data.

But here’s the thing: privacy law probably doesn’t care about that design detail, because the effect still amounts to collection of personal data.

If a computer running a deep learning algorithm has inferred or extracted or uncovered or interpolated fresh personal data about individuals, then its operator has legal obligations to describe the data collection in a privacy policy, justify the collection, limit the collection to a specific purpose, and limit reuse of the collected personal data. In the privacy laws I have read, there is nothing to indicate that an information system based on neutral networks will be treated any differently from one running written in COBOL and running on a mainframe. 

Privacy law usually gives individuals the right to request a copy of all personal data that a company holds about them.  In some jurisdictions, individuals have a qualified right to have personal data erased.

I am not a lawyer but I can’t see that owners of deep learning systems holding personal data can excuse themselves from technology-neutral privacy law just because they don’t know exactly how the data got there.  Nor can they logically get around the right to erasure by appealing to the sheer difficulty of selectively removing knowledge that is distributed throughout a neutral network. Such difficulty may be seen as the result of their own design and decision-making.

And if selective erasure of specific personal data is impossible with these black boxes, then the worst case scenario for the field of AI may be that data protection regulators rule the whole class of technology to be non-compliant with standard privacy principles.

Are you getting prepared for AI? 

Constelltion is developing new AI preparedness tools to help organisations evaluate the regulatory and safety implications of machine learning. Get in touch if you'd like to know more about this reserch, or to exchange views.