Theory of Data and Definition Hypothesis

Out of the three main Challenges that we are trying to address in this Theory of Data, the first and most fundamental challenge is a proper definition of a “Data”, which is acceptable to the Technology persons, the Legal Persons as well as the Management persons.

The hypothesis we propose is that

“Data is an aggregation of fundamental data particles which combine together horizontally and vertically to derive simple and composite data sets which have further use to humans based on the pattern in which the fundamental data particles get organized”.

Horizontally, the fundamental data particles when broken into sets of 8, become a “byte”. Depending on the preference of technologists, the number of data particles in a standard set can be varied. Vertically, bytes can be added together to constitute larger composite data sets.

At the first level when fundamental data particles come together randomly, the data has no cognizable meaning to a human being. As the data particles come together and stay together, a pattern develops. Certain patterns formed in such congregation become cognizable by interpreters (software and hardware) created for converting the congregation of fundamental data particles into what humans recognize as text, image or sound when they become data at the human usage level.

This human understandable form of data is subject to regulations and other interpretations. Humans cannot ascribe meanings to data particles unless they are organized in a specific pattern. Such unorganized fundamental data particles are “gibberish” for the human user.

The human interpretation of a given composite data set is “Relative” to the cognizable ability of the user. Hence data which is understood by a human is always “person dependent”. Its interpretation is “Relative” to the person’s ability. Where the person does not have the ability to understand the presented data pattern because he may not have the right reader (Software or hardware) he will still see only “gibberish”.

When the compatible readers are used, the human can view the data as “Text” or “Sound” or “Image”.

The categories of data which we normally recognize as “Personal data”, “Non Personal data” etc are all interpretations of humans based on their own perceptions and not an “Absolute Truth”. No data is “personal” or “non Personal” per-se. They are interpreted so by the human because he follows a certain school of thought.

Data therefore does not have an absolute existence at the level of human recognition but is relative to the interpretive ability of the data user.

The principle we should recognize here is “Data is in the beholder’s eyes”. Data is constructed by technology but interpreted by humans.

If some call “Data” as “Original Data” and produce a hard disk to a Court as “Evidence”, it is to be recognized that there are a certain data patterns in the hard disk which some (may be a majority of people) recognize as some kind of text, image or sound which is the evidence presented. This principle is already being used in Indian law in the form of Section 65B of Indian Evidence Act.

Watch for more….

(P.S: In subsequent discussions in 2020, this hypothesis has been renamed as Interpretation hypothesis”)

Naavi