Additive Value hypothesis of ownership of data

Out of the three hypotheses which we took up for discussion in constituting the Naavi’s Theory of Data, we have so far discussed the “Definition hypothesis” and “Reversible life cycle hypothesis”. We shall now take up the third hypothesis through which we shall discuss the issue of how we can interpret the “Ownership of Data”.

In all the regulatory discussions on “Data Protection”, there is a concept of the “Data Subject” (or the “Data Principal” )providing “Consent” to another person as an expression of the data subject’s choice of how his personal data can be used by the recipient. In certain laws, it is specified that there are some basic data protection principles to be followed, there are certain basic rights of the data subject  recognized and certain obligations of the data recipient in terms of security, disclosure etc.

The “Consent” is almost always recognized as a “Contract”.

Certain regulations are clear in defining that “Personal Data” is a “Property” of the data subject and the “Consent Contract” is transferring some part of or all of the property to the recipient.

It is interesting to note that even in regulations which consider that personal data is a personal property which can be “Sold” for a consideration, there is no mention of whether the sale is “Exclusive” or “Non Exclusive”. We know that unlike the physical property, “Data Property” can be transferred to another person even while a copy of the same remains with the transferor. In fact, if there is a legal challenge, the copy with the transferor may be considered as “Original” and the copy with the recipient as a “Secondary copy”.

Though not specifically mentioned, some laws imply that “Personal Data” or “Elements of Personal Data” are “Transferable” as a “Right to Use”. So what happens in the transfer of personal data information from the data subject to the data recipient is that there is a  disclosure with a transfer of right to use, process, share or otherwise dispose off the transferred elements of data either together or independently.

But even the best Consent form drafted by the best GDPR lawyer in the world has never properly indicated that the

“Personal Data now being disclosed consists of several data elements and this consent is deemed as an offer/acceptance to transfer the rights of individual data elements contained here in like the name, email address, etc., to the limited extent of it being used for the purpose for which this consent is being provided as understood by me, namely ………. and that the right is collective/applicable to each data element individually”

If the transfer of right is for “Collective” data elements, then use of the data for aggregation even after de-identification or anonymization becomes a violation of the terms of the contract.

Thus the ownership of data as is presently understood as a “Property” has a problem in not being able to identify if it extends to the whole set given at a particular point of time, whether it can be considered as multiple consents for each of the data elements.

Then the question of what is the legal instrument by which such transfer can be effected also is difficult to recognize since “Data” as a property cannot be classified either as movable or immovable or actionable right or as an Intellectual Property Right such as Copyright, Trade mark or Patent for which separate laws and definitions exist.

Even if the instrument of transfer is a “Contract”, there is a need to define what is the “Property” being transferred. If there is ambiguity on the same, then the Contract may fail due to lack of “Meeting of the Minds” between the contracting parties.

In India we also have an issue that “Click Wrap” contracts only have the status of an implied contract and the onerous clauses that may be included there in may become voidable like in a standard form contract/dotted line contract.

The Theory of Data should therefore provide such explanations as necessary to ensure that this transfer is properly explained.

The Processing Value

One of the other areas where the existing explanations on the ownership of personal data fail miserably is when the processor takes in the personal data as provided by the data subject and through his own contributions, creates a new product out of it such as a “Profile” or “Community Data” or “Aggregated Anonymized Data” etc.,

Under the current regulations like GDPR, it is interpreted that all this value added versions that emanate from the original set of personal data belongs to the personal data owner and when he withdraws his consent or wants to exercise his right to forget, all the derivatives have to be destroyed (Except perhaps the anonymized data elements). If the data subjects want the data to be ported back to him or another processor, then the entire derivative set including the profile created by one processor has to be perhaps transferred to another processor who may actually be a business competitor of the first processor.

This interpretation will seriously conflict with the law of intellectual property rights where it is already an established view that a value added data base creation is an intellectual property of the creator and is different from the value of the raw data.

We have also in the past given the following two examples that establish that the theory that all super structures built on the personal data become the property of the personal data subject as if it is a land on lease on which buildings are constructed and which has to be returned back to the land owner after the expiry of the lease contract.

Example 1:

A person gives a piece of lemon to the processor. He crushes it adds water and sugar and creates “lemon Juice”. Then the lemon owner withdraws his consent on the use of lemon and wants the lemon or juice back. Obviously the processor cannot give back the lemon. But will he be required to return the lemon juice which is more valuable than the original lemon since additional cost inputs have gone in?

Example 2:

A person gives a piece of Coal to the processor. The Processor uses a compression technology to compress the coal and convert it into its allotropic form of a “Diamond”.

(P.S:  Allotropy is the property by virtue of which an element exist in more than one form and each form has different physical properties but identical chemical properties. These different forms are called allotropes. The two common allotropic forms of carbon are diamond and graphite)

Now if the owner of “Coal” wants the “Diamond” back, how fair is the demand?

In the Indian draft legislation of “DISHA”, an attempt has been made to define that the medical diagnostic reports of a patient developed by a diagnostic center or the hospital is the property of the patient. Unless the law clarifies that the medical report has a value and the patient is entitled to get a copy of the report only if he pays the value in terms of the fees charged for the diagnosis, there may be a legal conflict when the patient demands that the information should be returned whether or not he has paid the fees.

We can therefore conclude that there is a shortcoming in the present theories of data ownership either as a “Property” or as a “Right”. We need a better explanation of the “Data Property Ownership”.

Under this new Theory of Data being propounded, I therefore propose a hypothesis as follows:

“The ownership of Data is applicable to individual data elements and belongs to the person/entity who creates the said element of data that enhances the value of the associated  set of data elements”.

What this means is that as the “Data” is born and then grows as explained in the life cycle hypothesis, the value of the data set undergoes a change. Different persons are responsible for the change. The data subject is of course one party who may be involved in most of these value changes but there are others who contribute to the value addition.  The ownership of the data has to be recognized with a segregation of the data in current form into different value units and ownership has to be recognized to the persons who are responsible for the value addition.

For example, let us say Company C  floats a service and Mr P opts to become a member. P provides some set of personal data of which he is the owner. Company C creates a “Profile”. The “Profile” data if attached to the original raw data provided by the data subject is more valuable and marketable. Company C can realize say Rs 100 for this profile data where as Mr P had zero value for his name, address etc which he might have shared in the first place with Company A. Let us say that Company C has been fair to Mr P and paid Rs 10 for the collection of the raw personal data and had also agreed that 10% of  any further value realization that may be attributed to the personal data would be paid to him like a “Royalty”, then there is a fair distribution of the “Value Addition”.

In such a concept, B will be an owner to the extent of 10% of the value in the hands of C and C will be the owner to the extent of Rs 90.

This “Additive Ownership” concept co-exists with “Additive Value Realization” of the data as it matures in its life cycle. The Value realization can be “Notional” in the sense that C may not sell the data to a third party but transfer it to another division of its own in which case the “opportunity benefit” has to be recognized as a “Transfer Price”.

If there is therefore collection of personal data for processing, the processor may after defining the purpose may also indicate the notional value of the processed personal data and the price he is willing to recognize as payable to P either immediately or after a certain event or time.

This “Additive Value Ownership” will have a cascading effect and there could be multiple owners of the Data for each recognizable value addition. The composite data set at any point of time will therefore be considered as having made up of multiple sub sets each of which is attributable to one processor and the ownership of that value addition remains with the entity that contributes that value.

This also means that each such part owner of the value has a right to transfer his property to another person unless his contract of value creation prohibits the same. Law should however prohibit restrictions that restrict such transfers as an “Unfair Contract” and therefore enable the free flow of value addition for any given data set so that the society at large benefits.

This discussion will continue…

Naavi

Posted in Cyber Law | 1 Comment

Reversible Life Cycle hypothesis of the Theory of Data

This is in continuation of our discussion on the Theory of Data to explain “What Data is” for different communities like the technologists, lawyers and business manager. In this direction we have stated that there are three hypotheses that we want to explore and the first of such hypothesis was the thought that

“Data is a congregation of fundamental data particles which come together to form a stable in a meaningful pattern for a software/hardware to enable a human being to make a meaning out of the pattern.”

If we take an expression ’10’ and ask several people to read it as ‘data’, then perhaps most of them will read it as number ten”. But ask a “Binary Reader” who knows the language of “binary” like I and you know English, he will say ’10’ is decimal number two.

[This is not much different from asking people to read “Ceci est une pomme”. Not many would be able to understand this. But those who know French may understand this equivalent to “This is an apple”. ]

Can ’10’ be ‘Ten’ and ‘Two’ at the same time for different people? The answer is a definite yes since the human understandable meaning of  data ’10’ depends on the cognizable power of the human. “What Data is”,  therefore cannot be expressed in “absolute” terms. But it is relative to the language which a human uses to “Experience” the data. We use the word “Experience” here since data can be read as a text or seen as an image or heard as a sound depending on the capability of the converter of the binary notation to a human experience of reading, seeing or hearing.

If we go a further step deeper, the binary data ’10’ is not existing on a hard disk as an etching of ‘1’ and ‘0’. It represents a “State” of a region of the hard disk whether it carries a current or not , whether it is magnetized in one direction or the other, whether it is a light which is in on or off state etc.

The fundamental data particles which we often call as binary bits do not have a specific form. If the data interpreter is capable of seeing the ‘lights on and off” and convert it into a human understandable text, it is fine. If the interpreter can sense the magnetic state, then also it is fine. If the data is defined as the “Spin state” of an electron or a nucleus as in Quantum Computing and the data interpreter can identify the spin states, then that type of data representation is also acceptable.

But in all these cases , “Data” is not “Data” unless there is a pattern to the data particles coming together and staying together until they are ‘observed by the interpreter’. If the data is unstable and is in a chaotic condition, the data particles may be there but they do not represent any meaningful data.

The fundamental data particles existing in a chaotic state and existing in a stable pattern are two states which are like a human foetus  before life enters and after life enters. This is the concept of “Data Birth“.

Once a “Data Set” which is a congregation of a stable pattern of fundamental data particles is formed, it can grow bigger and bigger by adding more data bits or more units of data sets. This is the horizontal and vertical aggregation of fundamental data particles.

Horizontally, when ’10’ becomes ‘10111000’  it becomes number one hundred eighty four.

Similarly when a stream of binary such as ‘01000001 01001110 01000100’ is read through a binary-ascii converter, it reads as ‘AND’. The same pattern reads as 4279876 in a binary-decimal converter.

Thus ‘1’ can grow into ’10’ and further to ‘10111000’ etc in a horizontal direction.

When there is a text ‘vijay’ and this is combined with another data element which reads as ‘vijay@naavi.org’, then we have a composite data set which a human may recognize as name and e-mail address. This composite data set is considered as “Personal Information”.

Thus, an alphabet grows into a name horizontally and combines with an e-mail address vertically to become “Personal information”.

Thus “Personal information” is a grown up state of the data which started with a single data cell of 1 or o added other cells just as a human cell grows into a foetus, acquires life on the way, gets delivered as a baby, grows into a child, adult and so on.

A similar “Life Cycle” can be identified in the manner in which “Data” gets born within a control environment (say within the corporate data environment) and then changes its nature from a foetus without life to a foetus with life, a delivered baby, a child, an adult etc.

Somewhere during the journey, the personal data may become sensitive personal data or lose some of the characters and become anonymized data or wear a  mask and become pseudonymized data and finally may get so dismembered that the data set integrates from a “Composite data set” to “Individual data sets” and further onto “fundamental data particles”, losing the “stable pattern” which gave it a “Meaning”. This is like the ‘death’ of a human being.

Thus the “life cycle” of data is comparable to the life cycle of a living being.

Just as there is a law for an individual when he is a minor and it is different from law of an adult, there is a law for information which is “Personal” and information which is “Not personal” etc. Just as there is a law for married women different from law for married man, there could be different laws for data which is just ‘personal’ and data which is ‘sensitive personal’.

This  “Life Cycle hypothesis” of data therefore can explain how the technical view of “Data” as binary bits can co-exist with the legal view of “Data” being “Personal data”, “Sensitive personal data”, “Corporate data”, “Anonymized data”, “pseudonymized data” etc.

As long as we understand that it is the same “Core Human” who was once a foetus without life and thereafter foetus with life, became a baby, child or adult, a senior citizen or a corpse and finally burnt to dust and joined the five elements from which the foetus was first formed, we must understand that “Data” is “Dynamic” and changes it’s form from time to time.

Just as a human in his family is “an identified person” but in a Mumbai local he is an “Anonymized person”,  the data recognition as personal or non personal may have nothing to do with the data itself but by the knowledge of the people around.

Just as an anonymous person in a crowd may behave as a beast but turn tender when he sees known people around, anonymized data contributes differently to the society from the identified data.

Data starts its journey as a “Data Dust” and returns to the same state after its death. This “Dust” to “Dust” concept is also similar to the human life as interpreted by philosophers in India from times immemorial. At the same time the “Soul” in a human is indestructible and enters and leaves the body at different points of time. Similarly, in the Data life cycle, the soul is the “Knowledge and Cognizable ability of the observer” and it remains with the observer even after the data itself has been ground to dust by a “Forensic Deletion”. No body can destroy the knowledge already set in the observer’s knowledge base and out of his memory he may be even be able to re-create a clone data set.

The essence of this “Life Cycle Hypothesis” is that “Data” does not exist as “Non Personal Data” or “Personal Data” etc. It is what it is. But we the people with knowledge about the data make it look “Identified” or “Anonymous”. But by our ability to identify or not identify a data with a living natural person, the utility of the data set is being changed without the data set needing to do anything of its own.

The “Data Environment” is therefore what gives a character to the data. In other words, the tag that we provide a data as “Personal” and “non Personal” is more a contribution of the environment than the “Data” itself. No doubt the identity has a genetic character of its own. But the final identity is given by the environment. This is like in a mall where a CCTV can identify a person approximately 6 feet, well built, with bald head teasing a fair looking young girl. In this data capture, the identity of the person or the lady is not known to all. But if we equip the data environment with a face recognition software and a relevant data base, then the data which was anonymous becomes data which is identifiable. This conversion did not happen because the data was different. It was because the “Cognizable Ability” of the observer was different.

If therefore the confidentiality of the people has to be maintained, then the responsibility for the same is with the “Face recognition software” and the background data base rather than the “CCTV camera”. The law should therefore factor this and not be blind to say “CCTV violates Privacy”.

If the background data base which identifies the face is either incorrect or the AI which does the recognition has not been properly built, the face recognition may go wrong. Then law should recognize that “Data” is benign and its character is what is contributed by the software, hardware etc and if there is an error resulting in say “Defamation”, it is the interpreting software manufacturers who should be held liable as an “Intermediary”.

Life Cycle hypothesis of data therefore extends the earlier hypothesis of “Data is constructed by technology and interpreted by humans”.

This lifecycle concept of data has one interesting outcome. In “Data Portability” and “Data Erasure” or “Right to Forget”, we have a problem when the raw data supplied by the data subject has been converted into a value added data and a profile of the data subject by the data processor. When the data subject requests for data portability or data erasure in such instances, the dilemma is whether the entire data in profile form has to be ported or destroyed or it is only the raw data supplied by the data subject which needs to be returned or destroyed.

In the case of a human being, if a person adopts a baby who grows into an adult and the erstwhile parents want the baby back, it is not possible to return the baby. Because the human cycle of growth cannot be reversed ( atleast by the technology we know today).

We may therefore qualify the “Data Life Cycle Hypothesis” that this life cycle is “Reversible” unlike a human life cycle.

I am sure that this is only a core thought and the readers can expand on this thought further… Whenever an argument ensues between a technologist and a lawyer on what is data, what is personal data, why there is a certain regulation etc., then we may subject the argument to this life cycle hypothesis test and see if the view of both persons can be satisfactorily explained.

Watch for more….

Naavi

Posted in Cyber Law | Leave a comment

Theory of Data and Definition Hypothesis

Out of the three  main Challenges that we are trying to address in this Theory of Data, the first and most fundamental challenge is a proper definition of a “Data”, which is acceptable to the Technology persons, the Legal Persons as well as the Management persons.

The hypothesis we propose is that

“Data is an aggregation of fundamental data particles which combine together horizontally and vertically to derive simple and composite data sets which have further use to humans based on the pattern in which the fundamental data particles get organized”.

 Horizontally, the fundamental data particles when broken into sets of 8, become a “byte”. Depending on the preference of technologists, the number of data particles in a standard set can be varied. Vertically, bytes can be added together to constitute larger composite data sets.

At the first level when fundamental data particles come together randomly, the data has no cognizable meaning to a human being.  As the data particles come together and stay together, a pattern develops. Certain patterns formed in such congregation become cognizable by interpreters  (software and hardware) created for converting the congregation of fundamental data particles into what humans recognize as text, image or sound when they become data at the human usage level.

This human understandable form of data is subject to regulations and other interpretations. Humans cannot ascribe meanings to data particles unless they are organized in a specific pattern. Such unorganized fundamental data particles are “gibberish” for the human user.

The human interpretation of a given composite data set is “Relative” to the cognizable ability of the user. Hence data which is understood by a human is always “person dependent”. Its interpretation is “Relative” to the person’s ability. Where the person does not have the ability to understand the presented data pattern because he may not have the right reader (Software or hardware) he will still see only  “gibberish”.

When the compatible readers are used, the human can view the data as “Text” or “Sound” or “Image”.

The categories of data which we normally recognize as “Personal data”, “Non Personal data” etc are all interpretations of humans based on their own perceptions and not an “Absolute Truth”. No data is “personal” or “non Personal” per-se. They are interpreted so by the human because he follows a certain school of thought.

Data therefore does not have an absolute existence at the level of human recognition but is relative to the interpretive ability of the data user.

The principle we should recognize here is “Data is in the beholder’s eyes”. Data is constructed by technology but interpreted by humans.

If some call “Data” as “Original Data” and produce a hard disk to a Court as “Evidence”, it is to be recognized that there are a certain data patterns in the hard disk which some (may be a majority of people) recognize as some kind of text, image or sound which is the evidence presented. This principle is already being used in Indian law in the form of Section 65B of Indian Evidence Act.

Watch for more….

(P.S: In subsequent discussions in 2020, this hypothesis has been renamed as Interpretation hypothesis”)

Naavi

Posted in Cyber Law | Leave a comment

New Data Theory of Naavi built on three hypotheses

In searching a proper expression and articulation of the Theory of Data, Naavi has decided to adopt a set of three hypotheses which are the pillars of this New Theory of Data.

The three hypotheses are

a) A hypothesis that tries to explain the definition of data

b) A hypothesis that tries to explain the life cycle of data

c) A hypothesis that tries to explain the ownership of data

The three hypotheses combine together in developing a comprehensive theory of data.

Watch out for more …

Naavi

Posted in Cyber Law | Leave a comment

Six amendments proposed to California Consumer Privacy Act

The California Consumer Privacy Act (CCPA) which is applicable to the collection and processing of the personal data of Californian residents is set to become effective from 1st January 2020.

CCPA has already distinguished itself by its honest approach to privacy protection  by specifically admitting the possibility of “Sale of Personal Data”. Unlike GDPR which does not provide clarity on whether personal data may be “Sold” even when there is an “Explicit Consent” and leaves the data processing companies in doubt, CCPA is clear in its prescriptions.

CCPA also recognizes a “Financial Value” for the personal data, recognizes the right of ownership of the data subject to deal with it even in commercial terms. While Privacy activists may debate the ethics of “Trading of Personal Data”, the fact is that this provision gives some breathing space for data dependent businesses.

Now before the act becomes effective, some amendments have been proposed and is likely to be discussed and probably passed before the January 1, 2020 deadline for implementation.

The Six amendments are as follows.

  1. Reasonable Authentication: 

CCPA shall allow a consumer to submit requests through a “Consumer Account”, if the customer maintains an account with the business.

The employee information collected in the course of a natural person acting as a job applicant, employee, owner,director, officer,medical staff member or contractor is exempted from the definition of personal information for one year (until January 2021)

The exemption also covers employee emergency contact information and information used to administer benefits, but it does not apply to a business’s obligation to provide notice to employees about its collection practices or employees’ eligibility for the data breach provision’s private right of action.

2. Classification of Personal Information

This amendment adds the phrase “reasonably capable of being associated with . . . a particular consumer or household.” to the definition of how a data is identified as a personal data.

The bill also clarifies that any information made available by federal, state or local government is “publicly available” and is not personal information.

The amendment also eliminates the provision of the CCPA stating that publicly available information that a company uses in a manner incompatible with the purpose for which it was originally collected by the government is considered covered personal information.

It also clarifies that personal information does not include de-identified or aggregate information

3. Right to Forget

The amendment adds a new exception to a consumer deletion request that allows a business to deny the request if the information is needed to “fulfill the terms of a written warranty or product recall conducted in accordance with federal law.”

It also creates an industry-specific exemption from the right to opt out of the sale of personal information for vehicle or ownership information maintained or shared between an automobile dealer and a manufacturer if it is maintained or shared for certain purposes.

4. Data Brokers

This amendment requires “data brokers” – defined as a “business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship” – to register with the California attorney general.

5. Miscellaneous amendments

a) A one-year exemption  to be provided for personal information exchanged in certain business-to-business communications.

b) A covered business does not have to collect or retain consumer information for CCPA purposes that it would not otherwise collect or retain in its ordinary course of business.

c) Businesses must disclose to consumers their right to request specific pieces of information a business has collected about them, and includes some changes to the CCPA’s exception for consumer-credit information covered by the Fair Credit Reporting Act (FCRA)

6.  Exemption from Toll free phone number

An exclusively online business with a direct relationship with a consumer need not provide a toll-free phone number to which consumers can submit a request for disclosure of information. It need only provide consumers with an email address.

Additional clarification in the form of draft regulations is expected from the California attorney general in late October or early November.

It is also expected that California may also pass a State Privacy Legislation soon. Since many other states (16 by last count) are following the steps of CCPA, the changes in CCPA is likely to have wide impact on the Privacy protection regime in USA.

There is a need to closely watch the developments in the Privacy regime overtaking USA for Indian businesses to structure their compliance measures.

Naavi

Posted in Cyber Law | Leave a comment

Data Is Always Evolving

One of the myths that is being perpetuated by Data Protection Regulations is that there is some thing called “Personal Data” and some thing called “Sensitive personal data” which companies collect and which needs to be protected.

The regulators however forget that in a corporate environment several kinds of data keep flowing in and out. It is not always that a “Data Set” like a Name, Address, E Mail, Mobile, health Data, Financial Data etc come at one single point of time so that they can be immediately tagged and protected as required. It happens only in cases where a company puts out a Web Form and collects some designated information from a source. In such cases a “Consent” can be obtained and data protection compliance can be achieved.

However in most cases, data flows in in different contexts and through different channels often in unstructured format. A company could have received the name and E Mail address an year back and today the same person’s further data may just land within the Data environment of the organization. When the new information is fused to the earlier information, the simple data grows into bigger and more sensitive form.

Similarly, it is possible for a set of available data can be disintegrated and a sensitive data may be converted into a non sensitive data and also anonymized data.

The fact that a personal data is always a “Set” of  elements one of which is the core identifier of a living natural person and there is an organic growth of the data into different forms is not adequately captured in the data protection regulations. Some of the data protection regulations define individual identifiers themselves as “Personal Data” without recognizing that any identifier not being identifiable with another “identity” of a living individual cannot be called “Personal Information” is often missed.

As an example we often hear, IP address is Personal Information or Physical Address is “Personal Information” etc. Though data protection practitioners try to enable their processes to identify the conversion of the status of data from one state to the other through manual intervention or with the use of AI, this remains a lacuna in the regulatory definition of data.

The New Theory of Data has to therefore capture in its Data Definition that “Data is Dynamic”, “It evolves over time” and “Consent” obtained when the data is in its Zero day status fails when a new data element comes within the radar of the Company.

An example could be that a Company may have a group photo of people many of whom is not known to it. Suddenly, one of the person becomes identifiable because he sends in a job application with a photo. Now the Group photo which is already in the data system as of a past date becomes an “Identifiable” data. This dynamic nature also affects the Data Portability and Data Erasure requests.

The New Theory of Data needs to recognize these anomalies and ensure that there is a valid explanation of these special instances of data within the theory of data.

Similarly “Data as a Property” of the Data Subject or Data as a productive asset of an organization is not properly captured by the present technical or legal approach to data.

Thus the current system of understanding data from the perspective of technology and law appears to be posing contradictions because each domain of stake holders have at different points in time tried to describe the term “Data” for their own convenience. If these differences are not amicably resolved the Corporate managements will find it difficult to balance the differing demands of the technologists, lawyers and the business managers.

The need for a new approach to understanding data is therefore critical and this new theory should be capable of creating a proper definition for the term data so that all seemingly contradictory views converge under the new theory.

Watch out for more…

Naavi

Posted in Cyber Law | Leave a comment