The US General Services Administration defines Personally Identifiable Information as information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added). This means that items of data can constitute PII if other data can be combined to identify the person concerned. The fragments are regarded as PII even if it is the whole that does the identifying. And the definition means that a piece of data may be classed as PII before it is identified, rather than after.  This is only prudent. 

I am frequently asked - semi-rhetorically - by IT and security professionals if the definition means that even an IP or MAC address nowadays could count as PII? And I've said that in short, yes, it appears so!

Some security people are uncomfortable with this, but why? When it comes down to it, what is so worrying about having to take care of Personal Information? In Australia and in 100-odd other jurisdictions with OECD based data protection laws, it means that data custodians are required to handle their information assets in accordance with certain sets of Privacy Principles. This is not trivial but neither is it necessarily onerous. If the obligations in the Privacy Principles are examined in a timely manner, alongside compliance, security and information management, then they can be accommodated as just another facet of organisational hygiene.

So for instance, consider a large data base of 'anonymous' records indexed by MAC address. This is just the sort of data that's being collected by retailers with in-store cell phone tracking systems, and used to study how customers move through the facility and interact with stores and merchandise. Strictly speaking, if the records are not identifiable then they are not PII and data protection laws do not apply. But the new definition of Personal Information in Australia means IT designers need to consider the prospect of the records becoming identifiable in the event that another data set comes into play. And why not? If anonymous data becomes identified then the data custodian will suddenly find themselves in scope for privacy laws, so it's prudent to plan for that scenario now. Depending on the custodian's risk appetite, any large potentially identifiable data set should be managed with regard to Privacy Principles. These would dictate that the collection of records should be limited to what's required for clear business purposes; that records collected for one purpose not be casually used for other unrelated purposes; and that the organisation be open about what data it collects and why. These sorts of measures are really pretty sensible.

Security practitioners I've spoken with about PII and identifiability are also upset about the ambiguity in the definition of Personally Identifiable Information. They complain that the identifiability of a piece of data is relative and fluid, and they don't want to have to interpret the legal definition. But I'm struck here by an inconsistency, because security management is all about uncertainty.

Yes, identifiability changes over time and in response to organisational developments. But security professionals and ICT architects should treat the future identification of a piece of unnamed data as just another sort of threat. The probability that data becomes identifiable depends on a range of variables that are a lot like other factors (like the emergence of other data, changes of circumstance, or developments in data analysis) that are routinely evaluated during risk assessment.

To deal with identifiability and the classification of data as PII or not, you should look at the following:

  • consider the broad context of your data assets, how they are used, and how they are linked to other data sets
  • think about how your data assets might grow and evolve over time
  • look at business pressures and plans to expand the detail and value of data, and the resulting potential for new linkages
  • make assumptions, and document them, as you do with any business analysis, and
  • plan to review periodically.

Many organisations maintain a formal Information Assets Inventory and/or an Information Classification regime, and these happen to be ideal management mechanisms in which to classify data as PII or not PII. That decision should be made against the backdrop of the organisation's risk appetite. How conservative or adventurous are in you respect of other risks? If you happen to mis-classify Personal Information, what could be the consequences, and how would the organisation respond? Do some scenario planning, and involve legal, risk and compliance. While you're at it, take the chance to raise awareness outside IT of how information is managed. Be prepared to review and change your classifications from non-PII to PII over time. Remember that security managers should always be prepared for change. Embrace the uncertainty in Personal Information!

Truly, privacy can be tackled by IT professionals in much the same way as security. There are no certainties in security and it's the same in privacy. We will never have perfect privacy; rather, privacy management is really about putting reasonable arrangements in place for controlling the flow of Personal Information.

So, if something that's anonymous today, might be identified later, you're going have to deal with that eventually. Why not start the planning now, treat identifiability as just another threat, and roll your privacy and security management together?

 

See also the excellent survey of identifiability by William B. Baker and Anthony Matyjaszewski The changing meaning of 'personal data' (2010). The essay looks specifically at the question of whether IP addresses can be PII, and highlights a trend in the US towards conceding that IP addresses combined with other data can identify, and may therefore count as PII:

Privacy regulators in the European Union regard dynamic IP addresses as personal information. Even though dynamic IP addresses change over time, and cannot be directly used to identify an individual, the EU Article 29 Working Party believes that a copyright holder using "reasonable means" can obtain a user's identity from an IP address when pursuing abusers of intellectual property rights. More recently, other European privacy regulators have voiced similar views regarding permanent IP addresses, noting that they can be used to track and, eventually, identify individuals.
This contrasts sharply to the approach taken in the United States under laws such as COPPA where, a decade ago, the FTC considered whether to classify even static IP addresses as personal information but ultimately rejected the idea out of concern that it would unnecessarily increase the scope of the law. In the past few years, however, the FTC has begun to suggest that IP addresses should be considered PII for much the same reasons as their European counterparts. Indeed, in a recent consent decree, the FTC included within the definition of "nonpublic, individually-identifiable information" an “IP address (or other "persistent identifier")." And the HIPAA Privacy Rule treats IP addresses as a form of "protected health information" by listing them as a type of data that must be removed from PHI for deidentification purposes.