cft

Is “Data Science” really as important as it is made out to be?

A Brief History of Data Science


user

Don Kaluarachchi

3 years ago | 7 min read

While the discipline of data science has been around since the early 2000s, it has only really gained popularity over the last decade — with the field having gained major recognition over the last year, during the course of the pandemic.

The topic of data science has been surfacing more and more, over the last couple of months leaving some wondering how and why it is important. To understand this, the concept of data science and a brief history of it needs to be looked at.

What is Data Science?

The official definition for Data Science is that it is “a field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data”¹. The reason this definition seems broad is because data science is actually a field that encompasses a lot.

Usually, data science is associated with the concept of ‘big data’ and it might be thought that this is all data science includes. However, this is not the case and it should be noted that while ‘big data’ is one aspect of data science, the field comprises of many other parts to it as well.

Data Science is a field — or rather an inter-disciplinary field² — that incorporates elements of computer science, mathematics, statistics, and aspects of a number of other fields. Because of this, it includes Machine Learning (algorithms that use statistics to find patterns in large amounts of data), Data Analysis (includes examining data, cleaning it/making it useful and transforming it to make sure that it could be modelled in a useful manner to help answer business problems), and Data Engineering (focuses on the process of acquiring data, preparing it and processing it).

The figure below shows the different responsibilities a data scientist — a person working in the field of data science — has to undertake.

Figure from Simplilearn
Figure from Simplilearn

Data Acquisition: This is the process of collecting the data that is going to be processed. Data is available everywhere in the world we live in today, therefore, the problem at this stage is not going to be actually ‘collecting’ data but rather, understanding the business requirements and priorities to collect the right types and amounts of data.

Data Preparation: This is where the preprocessing of data takes place. It is at this stage that the data is prepared for the analysis and the modelling that is to happen over the next few stages. It is at the preparation stage that relevant data is extracted and is transformed into useful forms.

Data Analysis: This is the stage where the data is analysed before it is modelled. Exploratory Data Analytics (EDA)³ is done at this stage using a number of different tools which helps refine the data that is available. It is the refinement done at this stage that helps with the modelling.

Data Modeling: The data modelling stage is where different machine learning techniques are applied to the data to be able to help understand patterns and relationships in the data.

Visualisation/Data Visualisation: This is where the findings from the previous stages of data analytics are presented in a format that is understandable to everyone. These findings contain a lot of valuable information and therefore are used to make important decisions.

Deploy and Maintain: The model is then deployed and continually improved (maintained) to ensure that it adapts with any changes in the environment.

This process is more of a cycle than a liner one and therefore upon completing the visualisation of the data and deployment of the model, the process is repeated again and again to ensure that the model is improved.

The figure below shows a summary of some of the skills required for a role in data science. This helps get a better understanding of the type of work involved in the field.

Figure from datanami
Figure from datanami

A Brief History of Data Science

Even though data science was only officially introduced as a discipline in the early 2000s (2001 to be precise), it has unknowingly been around for well over three decades before that.

It is thought that John W. Tukey’s writing about “The Future of Data Analysis”⁴ in 1962 was one of the first publications that mentioned data science. Over the next couple of decades, while there was mention of data science — without it being officially called “data science” — it was more to do with statistical analysis that the data science we know today.

In 2001, having understood its importance, data science was first introduced as a discipline. In the same year, William S. Cleveland published “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics”⁵ which was the first publication done about the field of data science and became the starting point of the advancement of it.

Almost a decade later, in 2010, the field gained popularity due to the increasing need to analyse large amounts of data.

The increasing availability of massive amounts of data was fueled by the growth of large tech companies (such as Google) who were collecting data.

This was the same year that Kenneth Cukier wrote a special report in ‘The Economist’⁶ about combining skills of software developers, statisticians and artists to extract useful insights from data.

Later that year, Mike Loukides wrote about creating models to analyse data, making predictions, and how these models could be incrementally improved to make better predictions in the publication “What is Data Science?”⁷.

From then on, research done in the field kept growing exponentially and the use of data science with it. It should be noted that data science is currently — in some way — associated with most of the modern industries available today.

How Data Science is used in various industries

Data Science, in one way or another, is linked to almost every modern industry available today. Some of these areas include: Automotive, Aviation, Business (including supply chain management), Finance, Healthcare, and many other industries.

While a few examples of the importance of data science in various industries is discussed below, this is not at all a comprehensive list (in fact, the examples listed below only cover a very small percentage of all fo the uses of data science and are discussed only to better understand the importance of it).

Self-Driving Cars

Similar to the topic of ‘data science’, the concept of self-driving cars seem to be a topic that comes up more and more. While the full logic of how self-driving cars are not going to be explained, it is important to note that a lot of it is to do with data science and analysing huge amounts of data to be able to make accurate decisions and predictions (see “The Data Science Behind Self-Driving Cars” for more details).

Airline Routing

There was a time when the airline industry started to lose money due to the rise of jet fuel and an increase in other expenses. It was at this point that data science was introduced for flight planning at routing.

Using huge amounts of data that was collected over the last two decades made it easier to make more accurate predictions about customer behaviour and in turn plan routes that would make the company more money.

Targeted Advertisements

Previously, targetted adverts had a very general audience where a big portion of the users might not have been interested in the product that the advertisement was endorsing.

This however changed with the use of data science since it became possible to understand the exact customers that were interested in the products and therefore make informed decisions about who to target the adverts towards.

Delivery Package Routing

Through the use of data science, it was possible to calculate the best route to take in terms of distance (taking the shortest distance), time (taking the shortest time), and various other factors to ensure that packages are sent out and delivered in a more efficient way that it was done previously.

As previously stated, the use of data science is in no way limited to the situations mentioned above. These examples have been carefully selected from a range of different industries to demonstrate the fact that data science is not only being used in a lot of the industries but that it has contributed to significant advancements in many of them.

How has Data Science helped during the global pandemic

One of the reasons the topic of data science has started to appear more and more is because it is essential to getting through the pandemic. Some of the ways in which data science is helping during the pandemic include:

  • Understanding transmission risks⁸ of COVID-19 to make better decisions about the social measures to put in place.
  • Forecasting spread of the virus in different cities and countries to take necessary action.
  • Trying to tackle the problem of community spread by understanding it.
  • Sending out alerts using the track and trace system.

Similar to the previous examples, the use of data science is not limited to the uses mentioned above. However, these are the areas in which data science played a significant part in the progress made.

In conclusion, to answer the question “is data science really as important as it is made out to be”, it depends.

It depends on who is asking the question, it depends on what is considered important, it depends on the circumstances from which the question is being asked but what's certain is that the discipline has grown significantly over the last decade and its popularity is only expected to grow in the years to come.

Upvote


user
Created by

Don Kaluarachchi


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles