This is in continuation of our discussion on the Theory of Data to explain “What Data is” for different communities like the technologists, lawyers and business manager. In this direction we have stated that there are three hypotheses that we want to explore and the first of such hypothesis was the thought that
“Data is a congregation of fundamental data particles which come together to form a stable in a meaningful pattern for a software/hardware to enable a human being to make a meaning out of the pattern.”
If we take an expression ’10’ and ask several people to read it as ‘data’, then perhaps most of them will read it as number ten”. But ask a “Binary Reader” who knows the language of “binary” like I and you know English, he will say ’10’ is decimal number two.
[This is not much different from asking people to read “Ceci est une pomme”. Not many would be able to understand this. But those who know French may understand this equivalent to “This is an apple”. ]
Can ’10’ be ‘Ten’ and ‘Two’ at the same time for different people? The answer is a definite yes since the human understandable meaning of data ’10’ depends on the cognizable power of the human. “What Data is”, therefore cannot be expressed in “absolute” terms. But it is relative to the language which a human uses to “Experience” the data. We use the word “Experience” here since data can be read as a text or seen as an image or heard as a sound depending on the capability of the converter of the binary notation to a human experience of reading, seeing or hearing.
If we go a further step deeper, the binary data ’10’ is not existing on a hard disk as an etching of ‘1’ and ‘0’. It represents a “State” of a region of the hard disk whether it carries a current or not , whether it is magnetized in one direction or the other, whether it is a light which is in on or off state etc.
The fundamental data particles which we often call as binary bits do not have a specific form. If the data interpreter is capable of seeing the ‘lights on and off” and convert it into a human understandable text, it is fine. If the interpreter can sense the magnetic state, then also it is fine. If the data is defined as the “Spin state” of an electron or a nucleus as in Quantum Computing and the data interpreter can identify the spin states, then that type of data representation is also acceptable.
But in all these cases , “Data” is not “Data” unless there is a pattern to the data particles coming together and staying together until they are ‘observed by the interpreter’. If the data is unstable and is in a chaotic condition, the data particles may be there but they do not represent any meaningful data.
The fundamental data particles existing in a chaotic state and existing in a stable pattern are two states which are like a human foetus before life enters and after life enters. This is the concept of “Data Birth“.
Once a “Data Set” which is a congregation of a stable pattern of fundamental data particles is formed, it can grow bigger and bigger by adding more data bits or more units of data sets. This is the horizontal and vertical aggregation of fundamental data particles.
Horizontally, when ’10’ becomes ‘10111000’ it becomes number one hundred eighty four.
Similarly when a stream of binary such as ‘01000001 01001110 01000100’ is read through a binary-ascii converter, it reads as ‘AND’. The same pattern reads as 4279876 in a binary-decimal converter.
Thus ‘1’ can grow into ’10’ and further to ‘10111000’ etc in a horizontal direction.
When there is a text ‘vijay’ and this is combined with another data element which reads as ‘firstname.lastname@example.org’, then we have a composite data set which a human may recognize as name and e-mail address. This composite data set is considered as “Personal Information”.
Thus, an alphabet grows into a name horizontally and combines with an e-mail address vertically to become “Personal information”.
Thus “Personal information” is a grown up state of the data which started with a single data cell of 1 or o added other cells just as a human cell grows into a foetus, acquires life on the way, gets delivered as a baby, grows into a child, adult and so on.
A similar “Life Cycle” can be identified in the manner in which “Data” gets born within a control environment (say within the corporate data environment) and then changes its nature from a foetus without life to a foetus with life, a delivered baby, a child, an adult etc.
Somewhere during the journey, the personal data may become sensitive personal data or lose some of the characters and become anonymized data or wear a mask and become pseudonymized data and finally may get so dismembered that the data set integrates from a “Composite data set” to “Individual data sets” and further onto “fundamental data particles”, losing the “stable pattern” which gave it a “Meaning”. This is like the ‘death’ of a human being.
Thus the “life cycle” of data is comparable to the life cycle of a living being.
Just as there is a law for an individual when he is a minor and it is different from law of an adult, there is a law for information which is “Personal” and information which is “Not personal” etc. Just as there is a law for married women different from law for married man, there could be different laws for data which is just ‘personal’ and data which is ‘sensitive personal’.
This “Life Cycle hypothesis” of data therefore can explain how the technical view of “Data” as binary bits can co-exist with the legal view of “Data” being “Personal data”, “Sensitive personal data”, “Corporate data”, “Anonymized data”, “pseudonymized data” etc.
As long as we understand that it is the same “Core Human” who was once a foetus without life and thereafter foetus with life, became a baby, child or adult, a senior citizen or a corpse and finally burnt to dust and joined the five elements from which the foetus was first formed, we must understand that “Data” is “Dynamic” and changes it’s form from time to time.
Just as a human in his family is “an identified person” but in a Mumbai local he is an “Anonymized person”, the data recognition as personal or non personal may have nothing to do with the data itself but by the knowledge of the people around.
Just as an anonymous person in a crowd may behave as a beast but turn tender when he sees known people around, anonymized data contributes differently to the society from the identified data.
Data starts its journey as a “Data Dust” and returns to the same state after its death. This “Dust” to “Dust” concept is also similar to the human life as interpreted by philosophers in India from times immemorial. At the same time the “Soul” in a human is indestructible and enters and leaves the body at different points of time. Similarly, in the Data life cycle, the soul is the “Knowledge and Cognizable ability of the observer” and it remains with the observer even after the data itself has been ground to dust by a “Forensic Deletion”. No body can destroy the knowledge already set in the observer’s knowledge base and out of his memory he may be even be able to re-create a clone data set.
The essence of this “Life Cycle Hypothesis” is that “Data” does not exist as “Non Personal Data” or “Personal Data” etc. It is what it is. But we the people with knowledge about the data make it look “Identified” or “Anonymous”. But by our ability to identify or not identify a data with a living natural person, the utility of the data set is being changed without the data set needing to do anything of its own.
The “Data Environment” is therefore what gives a character to the data. In other words, the tag that we provide a data as “Personal” and “non Personal” is more a contribution of the environment than the “Data” itself. No doubt the identity has a genetic character of its own. But the final identity is given by the environment. This is like in a mall where a CCTV can identify a person approximately 6 feet, well built, with bald head teasing a fair looking young girl. In this data capture, the identity of the person or the lady is not known to all. But if we equip the data environment with a face recognition software and a relevant data base, then the data which was anonymous becomes data which is identifiable. This conversion did not happen because the data was different. It was because the “Cognizable Ability” of the observer was different.
If therefore the confidentiality of the people has to be maintained, then the responsibility for the same is with the “Face recognition software” and the background data base rather than the “CCTV camera”. The law should therefore factor this and not be blind to say “CCTV violates Privacy”.
If the background data base which identifies the face is either incorrect or the AI which does the recognition has not been properly built, the face recognition may go wrong. Then law should recognize that “Data” is benign and its character is what is contributed by the software, hardware etc and if there is an error resulting in say “Defamation”, it is the interpreting software manufacturers who should be held liable as an “Intermediary”.
Life Cycle hypothesis of data therefore extends the earlier hypothesis of “Data is constructed by technology and interpreted by humans”.
This lifecycle concept of data has one interesting outcome. In “Data Portability” and “Data Erasure” or “Right to Forget”, we have a problem when the raw data supplied by the data subject has been converted into a value added data and a profile of the data subject by the data processor. When the data subject requests for data portability or data erasure in such instances, the dilemma is whether the entire data in profile form has to be ported or destroyed or it is only the raw data supplied by the data subject which needs to be returned or destroyed.
In the case of a human being, if a person adopts a baby who grows into an adult and the erstwhile parents want the baby back, it is not possible to return the baby. Because the human cycle of growth cannot be reversed ( atleast by the technology we know today).
We may therefore qualify the “Data Life Cycle Hypothesis” that this life cycle is “Reversible” unlike a human life cycle.
I am sure that this is only a core thought and the readers can expand on this thought further… Whenever an argument ensues between a technologist and a lawyer on what is data, what is personal data, why there is a certain regulation etc., then we may subject the argument to this life cycle hypothesis test and see if the view of both persons can be satisfactorily explained.
Watch for more….