Data Science Has to Evolve From Technical Perspective…

Data Science is an important area of study in the present days when “Data” is considered as an important economic asset which can be harnessed like “Oil” or “Mined” like Gold.

According to Wikipedia,

“Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.”

The view of most data scientists however is limited to disciplines such as “Statistics”, “Mathematics”, “Computer Science” and “Information Science”.

Data science is considered as a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. The term “Data Science”  is often used interchangeably with  concepts like business analytics, business intelligence, predictive modeling, and statistics.

The business of “Big Data”, “Machine Learning”, “AI Algorithms” all depend on “Data Scientists”.

For those of us who have watched the growth of “Information Security” as a professional domain, the evolution of “Technical Aspects of Information Security” to “Techno Legal Aspects of Information Security” was clear. Today, the legal aspects of Information Security has taken a firm grip in the Information Security domain. With “Information Security” migrating to “Data Security” and emergence of stringent laws such as GDPR, the future of “Information Security” has slipped out of the hands of the CISOs to DPOs. (Data Protection Officers).

While the IS domain soon realized the importance of “People” along with “Processes”, the transition was fully extended into the third dimension of “Behavioural Science” by practitioners like Naavi.

At present the Data Security domain has taken a further step ahead towards “Data Governance” bringing in the “Business Management” professionals closer to the group of Data Security Professionals.

A similar graduated evolution is now also required in the field of Data Science. The various theories of Data that support the Data Science field as of today work around the technical aspects of Data.  Statistical tools to segregate and detect correlations in a heap of data, drawing predictive conclusions, creating self learning and self improving algorithms are all based on technical perspective of data.

The technical perspective of data is as a “Stream of Binary Notations” as can be read by a “Binary Reading Device”. The “Binary Stream” rests on some surface like the platter of a hard disk or a Compact Disk or a Memory Card. The reader reads the binary stream, passes it through an application that assigns meaning to the data stream and thereafter sends it out into another data processing step or to a data delivery device like the “Monitor” or a “Speaker” for the human being to “Experience” the data.

To enable this conversion of data from a binary stream to a human experience, computer engineers have developed a protocol such as splitting a data stream into some finite bit packs, separate it with limiters, adding meta data to instruct the devices on processing. How efficiently data can be read, multiple data can be aggregated, profiling can be achieved etc are the problems that the Data Scientists try to examine. But in this entire process they deal with “data as a binary stream”.

In “Technical Perspective”, Encrypted data is also a binary stream though it is different from the binary stream of the parent data itself.

The moment we start recognizing the binary stream as a word, sentence, picture, sound etc., we are adding human interpretations of the binary stream. Then Data is no longer in the technical domain only. It has crossed over to the human domain.

If the human observing the data is blind, he will not see any data. If he is colour blind, he may see some data but miss some other parts of the data. If he is deaf, he may miss some sound. If his ear/brain cannot respond to some frequencies then he will hear sounds which are different from the sound which his neighbor is hearing from the same speaker. Even in texts, if the person does not know the language of the text, he will not understand the data.

Thus “Data” is not what the “Binary Stream” suggests. Data is what the human perceives. It is for this reason we say “Data is in the beholder’s eyes”.

Do the Data Scientists of the day factor in this possibility that Data may be different for different people?

Similarly, for law enforcement, when they look at “Data” as “Evidence” the same issue confronts them. What is the data that a person sees in dependent on the technology that converts the binary stream into a  human experience. If the devices (hardware and software) used for the purpose do not do their job as expected, then even the person who is not color blind or deaf will also not see the same data that some body else with another device may see.

For example, a Word document created in MS Word can be read also in Libre Office or some other word processor. But will the reproduction will be exactly same as what one can see in MS Office? .. is doubtful. Similarly, a web document may look differently in different browsers and with different configurations. Hence the data as seen by a human even if he has no disabilities is still dependent on several aspects and is not the faithful rendition of the binary bits which the data scientists recognize as the data.

The principle is similar to the Einstein’s theory of relativity. Data is not absolute. It is relative to the devices used to convert the binary stream into text, sound or image and further on the ability of the observer to observe faithfully what is rendered.

An ideal theory of data should therefore cannot stop at studying the data only from the perspective of technology without fully absorbing how other factors affect the human experience of data.

Perhaps there is a need to think differently and develop a “New Theory of Data”.

Watch out for more…

Naavi

Posted in Cyber Law | Leave a comment

What is the life cycle of Data?

“Life Cycle” starts from birth and ends with death. In the case of humans, life form comes from the environment (in the form of the Pancha bhootas)  and goes back to the environment. Human life has two components namely the physical body and the “Life within it”. Human body is constantly aging and growing and is never static. According to wise men, even after a person dies, some activities of the human body such as hair growth and nail growth continues for some time. So, there is a specific distinction between “Body” and the “Soul”.

In non-living bodies, the structure may remain static unless otherwise affected by an external agency. A stone may remain static unless rain water flowing over it or dust accumulating over it keeps it either eroding or growing over a long time frame. This however is different from human growth which occurs from within with the intervention of what we identify as “life”.

It is interesting to ponder when does a “Stone” take birth and when does it die and what happens to it during its life cycle?

A “Stone” is born as “Soil”. It is compressed under the earth until it becomes hard. During this process fine particles may come together as a larger group of particles bound together closely. This may then come out of the earth and get exposed as stones and rocks. Some times it may come out like Lava and there after solidify into a rock form on cooling.

The death of a rock is in it being powdered back into soil form. Hence the life cycle of a rock is from the soil back to the soil. In the intermediate life cycle it can assume different forms such as sedimentary rocks, Metomorphic rocks or igneous rocks etc. Some may be carved as “Statues”, “Idols”, “Slabs” etc.

“Data” life cycle has some similarity. It is debatable if it has “Life”. It is also interesting to think …When does “Data” take Birth and When does “Data Die”? and what forms it can take in between is difficult to answer. During its life cycle, Where does “Data Reside”? Is it in a hard disk”,  Is it in a tape? Is it in a Memory Card? how does it flow from one data holding device to another?… Will it copy? or will it move?.. are questions to which we believe we have an answer on a contextual basis. We say this can be this and also that most of the time.

If Data can be in different forms at different times, it is fine. But can some body explain this? Physicists often say Light is a stream of Photons but Light also behaves some times like a wave. But some body has to explain how can it be both? and at what circumstances the Light Photons behave like a wave? That explanation is what we may call the “Light Wave Theory”Another set of Physics went further and said there is a “Matter Wave Theory ” according to which all “Particles” have a “Wave like behaviour”.

Now, if we have doubts about “What is Data”? and “Data is many things to many people based on the context”, we cannot simply accept this “Context Based Excuse” as a “Definition”. Based on one such notions, law makers create laws and Judges interpret them according to their own whims and fancies and the industry is left to deal with the issue on an adhoc basis.

Some times the Supreme Court thinks “Data Disclosure” is a freedom or expression and some times it thinks it is a “Privacy Right”. Same kind of uncertainty exists when Quantum Computing says a “Data State can be either zero or one at the same time”.

If the differences between “Data” from classical computing and “Data” from  Quantum Computing has to be resolved, and “Data” from Classical/Quantum computing perspective to Judicial perspective has to converge, then we need a “New” “Theory of Data”.

Naavi is trying to explore this interesting theoretical concept of “Data” and trying to find a model of description which should fit into the different perceptions of “what data is”.

This is the “New Theory of Data” which will be revealed bit by bit here. Watch for more…

(P.S: This discussion is purely academic)

Naavi

 

Posted in Cyber Law | Leave a comment

The New Theory of Data

Can there be a single answer to “What is Data” which resolves the dilemma of the common man?

Watch these columns for more ….

Naavi

P.S: This concept was expanded through a series of articles culminating in a discussion in the book on Personal Data Protection Act of India (PDPA 2020). Though this is an academical discussion, this discussion is required to develop Data Protection Jurisprudence for the future.

The other articles can be found through a search on Theory of Data  Some of the key articles explaining the theory is also given below.

October 8 2019: New Data Theory of Naavi built on three hypotheses

October 8, 2019: Theory of Data and Definition Hypothesis

October 10, 2019: Reversible Life Cycle hypothesis of the theory of Data

October 11, 2019: Additive value hypothesis of ownership of data

November 20 2019: Will Personal Data Protection Act be compatible to the Theory of Data?

March 31, 2018: Theory of Dynamic Personal Data

Posted in Cyber Law | 3 Comments

Data Governance Framework as it exists in India now

With the formation of an expert committee titled “Data Governance Framework Committee” Data Professionals in India are now wondering what is in store.

Some of the questions that are in the minds of the Data Regulation Observers are

Will this committee modify the Personal data protection Bill (PDPB)?

Will it give an excuse to the Government to push the PDPB to a standing committee so that the implementation can be indefinitely delayed?

Will the Data Localization requirement of PDPB be circumvented by re-defining “Non Personal Data” to include part of the “Personal Data”?

The answers to the above will depend on the integrity of the Committee which consists of mostly a representation of business interests  which had produced a dissenting note to the Srikrishna committee report.

As is the tradition of Naavi.org, we will closely watch the developments and report our views whether it would be palatable to others or not.

In the meantime, it is essential to reflect on what is the current “Data Governance Framework” in our legislation, if any.

If we look back at ITA 2000, in the 2000 version of the Act, the emphasis was mostly on E Commerce and it introduced the important element of the use of “Digital Signature based authentication” as part of the data governance.

Additionally, some sections such as the Section 43 of the Act  and the mention of liability under Section 79 in the absence of “Due Diligence” gave some directions to the Corporate world on how the data has to be governed in their environment to avoid any liabilities.

This was  the concept of “Cyber Law Compliance” first discussed by Naavi in December 2000 in a CII seminar in Chennai.  The book “Cyber Law Compliance the Corporate Mantra for the Digital Era”, published at that time was a first attempt to bring the attention of Corporates handing data into a recommended data governance framework under ITA 2000.

Industry however looked at ITA 2000 as a law which mattered only to the Police and Lawyers and paid scant attention to ITA 2000 compliance. The stakes became higher with the amendments of 2008 and the need for ITA 2008 grew stronger. ITA 2008 also introduced the concept of Personal and Sensitive personal information along with “Intermediary guidelines under Section 79” and “Reasonable Security Guidelines under Section 43A”. (Amendment Act notified on 27th October 2009 and Rules notified on 11th April 2011)

Further the sections 67C, 69,69A, 69B, 70B, 72A etc all covered different aspects of Data Governance.

Most of the industry observers failed to recognize the data governance elements contained in the ITA 2008 and its notifications but did make efforts to comply with Section 43A. The concept of ITA 2000/8 compliance was to some extent recognized in the post 2011 time and some Techno Legal professionals emerged advising the Companies how to remain compliant with ITA 2000/8 mainly from the perspective of Section 43A.

Naavi was in the forefront of this Compliance brigade and highlighted the compliance requirements under ITA 2000/8 through the following Risk identification model.

A Comprehensive Information Security Framework IISF 309 was also recommended indicating the following responsibilities.

As a rough glance of this framework indicates, out of the 30 different requirements listed here, 23 referred to Non IT Governance. In a way this was a “Data Governance Framework” recommended under ITA 2000/8.

The focus however was on “Meeting Due Diligence” to avoid vicarious liabilities under Section 79 and Section 85 of ITA 2000/8. To that extent, it was not projected as a “Data Governance Framework”.

However after the PDPA came into broader discussion, Naavi introduced the “PDPSI” (Personal Data Protection Standard of India)  where more aspects of Data Governance were added. In particular, the Data Classification system indicating 16 different types of data and the suggested system of Personal Data Keepers and Internal data controllers etc., indicated the Governance requirements though this was in the context of the “Personal Data”. The discussion on DPSI (Data Protection Standard of India) was deferred since it was not a priority at that time.

These discussions extended by the ideas like the DTS, laid the ground work for a Data Governance Model. Though these efforts were focussed more towards “Data Protection”, they also created the early framework in India for Data Governance.

I therefore consider that a “Data Governance Framework”  does exist in India as a reference and the Data Governance Framework committee can take some ideas from these suggestions scattered through this website. Probably when I am able to collate these ideas in the New theory of Data being developed, there will be a better reference book on how to develop the Data Governance Framework.

Let us see if a working draft of the Theory would be available in time to be presented to the Committee before it arrives at its final recommendations.

Naavi

 

 

 

Posted in Cyber Law | Leave a comment

The Journey to the development of a New “Theory of Data” begins

Yesterday, I made an announcement that I will be working on a “Theory of Data”. I consider www.naavi.org as a global publication platform. All my work most of which are products of my own research have been published here. Hence the objective of codifying a “Theory of Data” is also being elaborated here.

Naavi


Why this exercise?

“Data” has become a topic of wide interest in our world  from Mark Zuckerberg to Mr Narendra Modi, from Justice Srikrishna to the Mr Mukesh Ambani . Everybody is speaking about Data.

But different people speak of Data in different perspectives. When Mr Modi says “Data is cheap” in India and therefore international business should find it attractive to do business in India using Data as a raw material. Somewhere else Data Protection professionals say, “Data” is “Gold” and very valuable. Industry 4.0 says “Data is the New Oil” and can be harnessed for prosperity.

In one previous occasion, Naavi has likened the “Samudra Manthana” story in Indian Puranas to describe how Data can be churned with the right tools to extract useful outputs as long as there is a “Visha Kanta” to gulp the poison that may come out and a “Mohini” to ensure that the “Amrutha or Nector” does not fall into wrong hands.

The Data Scientists and the Big Data Industry are concerned that with so much of interest being shown on what is their raw material, the day is not far off when their business will become a play ground for people of every kind. This may actually land the industry in trouble sooner or later since different people come with different perspectives and unless a common understanding emerges, industry cannot have a turbulence free eco system to operate in.

Data Protection Vs Governance

For some, “Data” is the key to Privacy Protection. For others “Data” is the Key to enrichment. For some others “Data” is like the air we breath, access to which should be a fundamental right. For some “Data” is an asset which can be converted into Cash either lawfully or unlawfully through “Data harnessing” or “Data Exploitation” or “Data Laundering”

Between the different perspectives that exist about “Data”, the law makers are trying to make laws to regulate the Life Cycle of Data such as collection,  use, storage, disclosure and destruction of data. These regulations are to be implemented by organizations and are expected to be followed by the entire global population at all times. Failure to be compliant with the laws results in heavy penalties for the business managers often ending them with the prospect of imprisonment.

Those who want to govern business in a lawful manner are exploring the ways of “Data Governance” for better productivity while the regulators are watching every one of their steps to ensure that they remain within the boundaries of law.

Data Protection professionals are developing a framework for protection while Data Governance Professionals are going beyond Data Protection Framework to develop a Data Governance Framework.

Already, we see conflicts emerging with the multitude of Data Protection Laws that make the life of a Data Manager miserable. While Data moves across geographical boundaries freely, when we see that this data consists of Personal Data of Indians, EU Citizens, Californians, Canadians, Australians etc., the Data Governance official is immediately alerted to the fact that each of these data types are subject to different regulations and the Governance model needs to implement them in such a manner that there is no contravention of any of these laws.

These are conflicts arising out of overlapping regulations and we need a solution to overcome the challenges. Naavi has suggested a technical solution within the framework of a Personal Data Protection Standard of India (PDPSI) to address this issue.

But as we go along, there will be conflicts arising out of Data Protection Professionals taking a stand different from the Data Governance Professionals when implementing certain operational decisions of business. This conflict is dangerous since it is internal to an organization, will expose all the behavioural challenges that confront all Man Managers.

If this internal conflict is not handled with finesse, an organization can simply collapse not because its business environment is negative, but because the internal management teams donot see eye to eye because each of them think that they are correct and the other person is wrong.

It is this concern of the problem of the future generation of corporate managers that has prompted me to start work on developing a “Theory of Data” that tries to develop an understanding of Data that all types of professionals whether they are Lawyers or Computer Engineers or Data Analysts or Corporate Managers appreciate with mutual respect and empathy.

This Theory of Data should be consistent with the present and future regulations related to data. However since some regulations are already in place, it is possible that they may not be in sync with this theory. However it is expected that “Jurisprudence” will enable syncing of the present regulations with this theory while new regulations may adopt the concepts propounded in this theory during the construction of the regulation itself.

This “Naavi’s Theory of Data” does not explore the “Technical aspects” but addresses the “Legal”, Behavioural Science” and “Management” aspects of how “Data” comes into existence and lives through its life until its death. If it is necessary to distinguish this theory with the existing theories, then it may be necessary to identify this theory as “Naavi’s theory of Data” or “LBM Theory of Data”. For the rest of our discussion here, we shall however refer to this as simply the “Theory of Data”.

Theory and Hypothesis

The hall mark of a “Theory” is supposed to be establishment of a principle through experimentation and testing. It may start with a hypothesis which is a statement of what a situation is as per an educated guess. Then through various experimentation, the hypothesis is either proved or disproved or further refined until it becomes the theory.

Naavi’s Theory of Data will also adopt a similar approach of first pronouncing some hypothesis and then subjecting it to observations that either support it or refine it.

If other readers have a hypothesis of their own, they are free to submit it to me and I will try to incorporate a discussion on such hypothesis also as part of the development of this theory.

This is therefore a journey which we are starting now.

Naavi

Posted in Cyber Law | Tagged , , , | 1 Comment

Theory of Data

Naavi has been working around the concept of “Data” from a Techno-legal, Behavioural and Managerial perspectives for some time. The objective is to bring better clarity on the nature of “Data” and in particular “Personal Data” so that regulations are clear from the point of view of compliance.

Unclear regulations are the bane of the industry because the practitioners donot know  how to comply and what to comply. This leads to discretionary decisions which may go wrong. Even the judiciary and the regulator often unwittingly comes to wrong decisions and has to face the criticism from the industry.

In trying to present the Data and its implications in terms of the current laws and the continuously moving technology base, Naavi has been sporadically writing about “Cyber Laws for Quantum Computing Era”, “Impact of AI on Cyber Laws”, “Evidence under Section 65B of IEA”, “Dynamic Theory of Data”, “Nuclear Theory of Data” and so on.

Some of these thoughts have emerged over time and has been presented based on the contextual discussions.

Time has now come to bring some more cogency to these thoughts and consolidation of these different explanations provided earlier into a consolidated “Theory of Data”.

Since there is already a body of academicians who are working on the “Theory of Data” from the perspective of technology and my approach is basically from the “Legal, Behavioural Science and Managerial perspective” of Data, it is necessary to prefix this theory as “Naavi’s Theory of Data” or “LBM theory of Data”.

This is primarily an academic exercise.  However, some thoughts that may be presented in this theory may have the potential to be implemented in the new Data Governance Framework or Data Protection Frameworks that the industry may try to follow in the context of new regulations and new technology.

For some time this work may proceed in the background and would be released in public from time to time.

Any thoughts and words of wisdom from readers is welcome.

Naavi

Posted in Cyber Law | Leave a comment