Anonymisation is like Encryption with a destroyed decryption key

If we follow the discussions around the DPA 2021, it appears that there is a confusion regarding the term “Anonymization” and its effect on Personal Data. It is strange that after so much of discussions on the GDPR and the Data Protection laws, we come back to the basics of what is “Personal Data”.

Personal Data is such data which either directly or indirectly can identify a living natural person. This means that set of characters such as “Chandrashekar” is an element that can identify a living natural person. But the string of data “Chandrashekar” alone has no identity with a living individual since there could be several persons with such name. Further, whether it is a name or not is itself a factor of the knowledge of a recipient of the data. An Indian would recognize it as a name.

Will a person from interior Africa would recognize it even if he is aware of the English Alphabets? or will a person in China who does not know the English alphabets recognize it as a name?

If not, why should we consider “Chandrashekar” as a “Personal data”?. Is it not just a stream of binaries which one software renders as text in English “Chandrashekar”. In another rendition it may look different and may not appear to be a name.

The fundamental principal this suggests is that “Data” is neither personal nor non personal per-se. In a context it may be perceived as “Personal” by some and not by others. (Please refer to Naavi’s Theory of Data for a more detailed discussion)

Can any data that can be perceived as “Personal” by some body in the world be considered as “Personal Data” by all under law? … Certainly not.

Hence just because we sit in India and get a feeling that “Chandrashekar” is the name of a person, does not mean that “Chandrashekar” should be considered as “Personal Data”.

Another example….What does a string called “Bhajji” or “Submarine” represent?. Is it the name of a dish in South India or name of a naval contraception?.

For a Cricket follower in India, Bhajji may be a nickname of Harbhajan Singh and Submarine may be the nick name of Mr Subramanyam (Former test cricketer from Mysore).

Hence “Chandrashekar” by itself should not be considered as “Personal Information” no more than Bajji, or Submarine. This is the part of the “Theory of Data” and the hypothesis is that “Data is in the beholder’s eyes”.

Recently, A German Court in an order related to GDPR held that an IP address is a “Personal Data” and if any American Company is touching the IP address then it would be considered as a disclosure of personal data to a US entity which is not permitted by the cross border data transfer restrictions under GDPR. (See this article).

In this instance, the IP address is related to an action by an individual (Such as visiting a website). But if the data is merely the “IP address” it is not sufficient to identify a living natural individual. Hence it should not be treated as “Personal Information” but be classified as “Non Personal Information”. However if the recipient of the data (IP Address) has in possession more information and his full particulars are available then it may be considered as personal information like the profile information.

This is to be considered as Privacy Jurisprudence .

In India, even the JPC members seem to have an unresolved doubt about what is “Anonymised Data” and how does it relate to “Personal Data”.

Personal data by definition contains elements that lead to an identifiable individual. These identity parameters such as the name, PAN number, E Mail address, IP address, Cookie information etc in combination represent the identity parameters that render a piece of information as “Personal Information” to which the data protection law becomes applicable.

In comparison, there could be data such as the weather, the environment etc which is understood by everybody as “Non Personal Data”. Then there is information about a “Company” which is not a “Living Natural Person” which also is easy to identify as “Non Personal Data”.

However there could be doubt about personal looking data of a non living natural person. In this case there is no doubt that the information may be considered as “Personal information” but there is no need for providing “Privacy Protection through data protection for the deceased individual”. Hence compliance requirements of a data protection law may not apply to the personal data of a “deceased data principal”.

In the context of compliance therefore the organization can classify the personal data of a deceased individual as different from personal data for which the obligations and rights become applicable. (Unless the law specifically makes it applicable to personal data of deceased persons…like Singapore law)

Yet another category of personal data that creates a problem is the “Anonymized Data” where the identity parameters of the individual contained in a personal data set are removed and irrevocably destroyed so that even the person who created the anonymized data from an identifiable data cannot re-identify the data.

Some people consider that “Anonymization” is reversible and hence anonymised data should be also considered as “Protected Personal Data”. But if the law places a standard for anonymization which includes that the identity parameters separated from the identified information is forensically destroyed, then there is no way of reversing the process of anonymization.

In the case of “Encryption” there is a “Key” with which the encrypted data can be de-crypted. This is similar to the process of “De-identification” or “Pseudonymisation” where identifiable data is rendered unidentifiable through a process of removal of identity parameters and/or substitution with proxy parameters. The person which has the “Key” to de-identification or pseudonymization can re-identify the data. Hence these processes are reversible.

If however we have a very strong encryption and the holder of the encrypted data does not have the decryption key, then such data is considered “Confidential” though the data is in the hands of an unauthorized person. Data Breach notification requirements under HIPAA/HITECH Act do not consider such data breach as breach of PHI. If however the encrypted data is lost along with the key stored in the same data store, the breach is recognized.

In the Case of anonymization, the anonymization process is known to the anonymizer. However just as an encrypting person deliberately throws away the decryption key, the anonymiser forensically deletes the anonymization key so that de-anonymisation is theoretically not possible if proper standard has been followed.

Hence it is correct to consider that “Anonymised Personal Data” is not “Personal Data”. This was the status in the PDPB 2019. However in the PDPB 2021, the JPC has been confused sufficiently by some experts who have held the view that just as a data encryptor having the decryption key can decrypt the encrypted data, an anonymiser of data can de-anonymise it as a matter of routine. This is an incorrect perception of the process of anonymization. An anonymisation process inherently includes the process of forensic deletion of all the identity parameters. Otherwise it is only a de-identification process and not anonymisation process.

Some experts claim that Data Analysts can apply sophisticated algorithms and read meanings into Big data which enable them to de-anonymise. This is a false premise since if the anonymisation process is as per a proper standard, the de-anonymiser can only make a guess like creating a “Profile” out of data which is just a “View” and not “Fact”.

Beyond this, if some body can decrypt encrypted data without a key by use of brute force attack or social engineering, it is called a “Crime” and not the problem of the encryption system. Similarly if anonymised data can be de-anonymised to a reliable extent by use of some technology, then it would mean that the standard of anonymisation was not good enough or the de-anonymiser was a criminal who with a persistent hacking of the data was able to extract personalized information out of the anonymised information. Such acts should be considered as a crime and PDPB 2019/2021 does consider them as publishable crimes with 3 years imprisonment.

If we are not confident of our Data Protection Authority for his capability of setting a proper anonymisation standard which cannot be broken with a reasonable level of sophistication of an attack, then the user of an unreasonable level of sophistication to break an anonymisation should be considered as a “Motivated Criminal” and the punishment should be raised from 3 years to at least 10 years or more to bring in sufficient deterrence.

Unfortunately without understanding this aspect, PDPB 2021 tries to include “Anonymised Data” as part of the regulations and create an overlap between ITA 2000 and PDPA 2021.

Technically there is no difficulty in segregating data as “Personal” and “Non Personal” using “Anonymisation” as a separator. Just as a strongly encrypted data with the key having been destroyed cannot be recovered, a properly anonymised data cannot be de-anonymised.

I wish JPC gives a serious thought to correct this situation when the Bill is taken up in the Parliament for discussion provided there is no ego issue in making changes.

Naavi

Other articles on DPA 2021