Data Science is an important area of study in the present days when “Data” is considered as an important economic asset which can be harnessed like “Oil” or “Mined” like Gold.
According to Wikipedia,
“Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.”
The view of most data scientists however is limited to disciplines such as “Statistics”, “Mathematics”, “Computer Science” and “Information Science”.
Data science is considered as a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. The term “Data Science” is often used interchangeably with concepts like business analytics, business intelligence, predictive modeling, and statistics.
The business of “Big Data”, “Machine Learning”, “AI Algorithms” all depend on “Data Scientists”.
For those of us who have watched the growth of “Information Security” as a professional domain, the evolution of “Technical Aspects of Information Security” to “Techno Legal Aspects of Information Security” was clear. Today, the legal aspects of Information Security has taken a firm grip in the Information Security domain. With “Information Security” migrating to “Data Security” and emergence of stringent laws such as GDPR, the future of “Information Security” has slipped out of the hands of the CISOs to DPOs. (Data Protection Officers).
While the IS domain soon realized the importance of “People” along with “Processes”, the transition was fully extended into the third dimension of “Behavioural Science” by practitioners like Naavi.
At present the Data Security domain has taken a further step ahead towards “Data Governance” bringing in the “Business Management” professionals closer to the group of Data Security Professionals.
A similar graduated evolution is now also required in the field of Data Science. The various theories of Data that support the Data Science field as of today work around the technical aspects of Data. Statistical tools to segregate and detect correlations in a heap of data, drawing predictive conclusions, creating self learning and self improving algorithms are all based on technical perspective of data.
The technical perspective of data is as a “Stream of Binary Notations” as can be read by a “Binary Reading Device”. The “Binary Stream” rests on some surface like the platter of a hard disk or a Compact Disk or a Memory Card. The reader reads the binary stream, passes it through an application that assigns meaning to the data stream and thereafter sends it out into another data processing step or to a data delivery device like the “Monitor” or a “Speaker” for the human being to “Experience” the data.
To enable this conversion of data from a binary stream to a human experience, computer engineers have developed a protocol such as splitting a data stream into some finite bit packs, separate it with limiters, adding meta data to instruct the devices on processing. How efficiently data can be read, multiple data can be aggregated, profiling can be achieved etc are the problems that the Data Scientists try to examine. But in this entire process they deal with “data as a binary stream”.
In “Technical Perspective”, Encrypted data is also a binary stream though it is different from the binary stream of the parent data itself.
The moment we start recognizing the binary stream as a word, sentence, picture, sound etc., we are adding human interpretations of the binary stream. Then Data is no longer in the technical domain only. It has crossed over to the human domain.
If the human observing the data is blind, he will not see any data. If he is colour blind, he may see some data but miss some other parts of the data. If he is deaf, he may miss some sound. If his ear/brain cannot respond to some frequencies then he will hear sounds which are different from the sound which his neighbor is hearing from the same speaker. Even in texts, if the person does not know the language of the text, he will not understand the data.
Thus “Data” is not what the “Binary Stream” suggests. Data is what the human perceives. It is for this reason we say “Data is in the beholder’s eyes”.
Do the Data Scientists of the day factor in this possibility that Data may be different for different people?
Similarly, for law enforcement, when they look at “Data” as “Evidence” the same issue confronts them. What is the data that a person sees in dependent on the technology that converts the binary stream into a human experience. If the devices (hardware and software) used for the purpose do not do their job as expected, then even the person who is not color blind or deaf will also not see the same data that some body else with another device may see.
For example, a Word document created in MS Word can be read also in Libre Office or some other word processor. But will the reproduction will be exactly same as what one can see in MS Office? .. is doubtful. Similarly, a web document may look differently in different browsers and with different configurations. Hence the data as seen by a human even if he has no disabilities is still dependent on several aspects and is not the faithful rendition of the binary bits which the data scientists recognize as the data.
The principle is similar to the Einstein’s theory of relativity. Data is not absolute. It is relative to the devices used to convert the binary stream into text, sound or image and further on the ability of the observer to observe faithfully what is rendered.
An ideal theory of data should therefore cannot stop at studying the data only from the perspective of technology without fully absorbing how other factors affect the human experience of data.
Perhaps there is a need to think differently and develop a “New Theory of Data”.
Watch out for more…