Will Data Science Become Automated?
Pros and cons of completely automatic data science platforms.
Table of Contents
- Automation of Data Science
- Pros and Cons
With data science becoming more and more popular, companies are figuring out how many data scientists they will need on one team to make a successful product or answer a business problem successfully.
While companies focus on hiring data scientists, they have most likely noticed that instead of hiring people to perform data science, they could hire a platform — or perform data science in other ways to employ data science at their company.
Ultimately, data science can be automated, just like most technical processes, which is a bit of inception.
The question, however, turns into, should it be automated and how well does data science perform when it is automated by a tool or platform? I will discuss these questions below by highlighting the pros and cons of auto — data science and/or machine learning.
Automation of Data Science
Like most things in life, moderation is key, so to eliminate your human data scientists and replace them with a tool is probably going to lead to some chaos and confusion — at first. Just like in education, an online platform could teach many people to become successful in an academic area, as can automated data science platforms.
Data science can be learned by a human from a machine. But, when you automate data science this early in the history of the field (yes, I know it not as new of a field as many people think), you can run into some serious problems. Opposingly, you can run into some awesome pros.
Pros and Cons
There are pros and cons to everything, automated data science is no exception. I am not going to detail the specific tools/companies where their main product is data science automation, but you can expect some of these pros and cons to represent some of these tools.
- Easy to Use
The main function of automated data science platforms is to make it easier for users to implement data science in their business. Therefore, someone who has a background in data analytics or product management could expect to easily use a platform, to say — categorize images.
Whereas hiring data scientists can cost a company well over $100,000 from salary and onboarding costs, an automated platform could cost significantly less than even just one data scientist — it is important to note that some companies have plenty over one data scientist.
Data science is widely known as a powerful tool in itself that can significantly impact a company or business. Data science and machine learning has lead countless products and served nearly every human in some way.
Use your phone today? Was it an iPhone? Did you use Face ID? Then you probably already used machine learning without even realizing it (unless you are a data scientist now and know it already). Maybe you used Netflix’s recommendation algorithm that suggested a show or movie.
These are some of the examples of everyday machine learning that you will encounter. There are countless more, and a company can truly benefit from the power of data science on their business, whether it be internally or externally.
I am going to highlight the cons next, as I believe they are more important and outweigh the pros (as of now — this could change quickly).
- Hard to Explain
The cons are where it gets tricky.
These points can really mess up a company from a user not using the platform correctly and/or interpreting the results and model incorrectly. It can be hard to explain the results of a complicated data science model. Now imagine you are not a data scientist and have not had an academic background in the various types of machine learning algorithms.
You will have to explain these platform model results and implement the suggestions or predictions with regards to your company’s integrations (sometimes), which could prove to be time-consuming and difficult.
- Misleading Results
Since you did not build the model yourself, you may be unaware of possible parameters that need to be tuned. Additionally, you might not know that you need to use an elbow plot to find the optimal number of clusters for an unsupervised segmentation algorithm.
All of these complications of not understanding the model from scratch could lead to results that may not make the most sense. Perhaps you used logistic regression to predict temperature for the next few months, but then later realize it was best to use the algorithm as a classification model instead, despite the contradicting name.
There are small nuances that can add up and could lead to some serious mistakes.
Ultimately, it depends on if data science will be completely automated. Sure, use an automated data science platform if you already have a data analyst on your team.
Or, use the automated solution for predictions that are not harmful if incorrect. Categorizing clothes incorrectly is not the worst thing that can happen, but when you are in the health or finance industry and you classify a disease or large sums of money incorrectly, the harm is undeniable.
Figure out what company you are, your goals, and weigh the pros and cons, and from there, you can decide if automated data science is right for you.
That being said, data science is already being automated but will face platforms that will try to completely automate the whole entire process in the future.
I hope this article brings some interesting discussion. Of course, I am biased and prefer to keep data scientists around; however, I know how much data science is automated already with importing popular libraries that are pre-saved.
The solution may be that you could use the human-in-the-loop method: automate what you can, and then provide checks and balances to account for model error.
Feel free to comment down below. Thank you for reading!
Originally published on medium.