Yet another Facebook ‘People You May Know’ scandal broke recently when a sex worker found that the social network was linking her clients to her “real identity”. Kashmir Hill reported the episode for Gizmodo

This type of thing has happened before.  In 2012, a bigamist was uncovered when his two wives were sent friend-suggestions. In 2016, Facebook introduced a psychiatrists’ patients to each other (Kash Hill again). I foresaw that very scenario in a 2010 letter to the British Medical Journal
Facebook’s self-serving philosophy that there should be no friction and no secrets online has created this slippery slope, where the most tenuous links between people are presumed by the company to give it license to join things up. But note carefully that exposing ‘People You May Know’ (PYMK) is the tip of the iceberg; the chilling thing is that Facebook’s Big Data algorithms will be making myriad connections behind the scenes, long before it gets around to making introductions. Facebook is dedicated to the covert refining of all the things it knows about us, in an undying effort to value-add its information assets. 

It’s been long understood that Facebook has no consent to make these linkages. Fellow Australian privacy adviser Anna Johnston and I wrote about this problem in a chapter of the 2013 Encyclopedia of Social Network Analysis and Mining (recently updated): “The import of a user’s contacts and use for suggesting friends represent a secondary use of Personal Information of third parties who may not even be Facebook members themselves and are not given any notice much less the opportunity to expressly consent to the collection.” Relatedly, Facebook also goes too far when it makes photo tag suggestions, by running its biometric face recognition algorithms in the background, a practice outlawed by European privacy authorities.

We can generalise this issue, from the simple mining of contact lists, to the much more subtle collection of synthetic personal data.  If Facebook determines through its secret Big Data algorithms that a person X is somehow connected to member Y, then it breaches X’s privacy to “out” them.  There can be enormous harm, as we’ve seen in the case of the sex worker, if someone’s secrets are needlessly exposed, especially without warning.  Furthermore, note that the technical privacy breach is deeper and probably more widespread: under most privacy laws worldwide, merely making a new connection in a database synthesizes personal information about people, without cause and without consent. I’ve called this algorithmic collection and it runs counter to the Collection Limitation principle. 

This latest episode serves another purpose: it exposes the lie that people online are fully aware of what they’re getting themselves into.  

There’s a bargain at the heart of the social Internet, where digital companies provide fabulous and ostensibly free services in return for our personal information.  When challenged about the fairness of this trade, the data barons typically claim that savvy netizens know there is no such thing as a free lunch, and are fully aware of how the data economy works.
But that’s patently not the case.  The data supply chain is utterly opaque.  In Kash Hill’s article, she can’t figure out how Facebook has made the connection between a user’s carefully anonymous persona and her “real life” account (and Facebook isn’t willing to explain the “more than 100 signals that go into PYMK”). If this is a mystery to Hill, then it’s way beyond the comprehension of 99% of the population.  

The asymmetry in the digital economy is obvious, when the cleverest data scientists in the world are concentrated not in universities but in digital businesses (where they work on new ways to sell ads).  Data is collected, synthesized, refined, traded and integrated, all behind our backs, in ever more complex, proprietary and invisible ways. If data is “the new crude oil”, then we’re surely approaching crunch time, when this vital yet explosive raw material needs better regulating.