cft

5 Actionable Advice for Data Science Beginners

Here is what the new data scientists should follow.


user

Benedict Neo

3 years ago | 4 min read

1. Pick the brain of an expert

“Learn from the experts; you will not live long enough to figure it all out by yourself. “ — Brian Tracy

There are myriads of ways to learn data science. You can read articles, watch videos, enroll in online courses, turn up at meetups, etc. But one thing that you cannot “learn” is the experience. That you have to gain throughout years of working in the field. There is much to learn from Data science experts, their experience in managing end-to-end machine learning and deep learning projects, their philosophy when constructing a data science team from scratch, and their perseverance and grit in managing tough projects and overcoming hurdles, etc. — all these cannot be learned in any course. They can only be acquired.

Here are a few data science experts to learn from:

Dean Abbott, Kenneth Cukier, John Elder, Bernard Marr

2. Ask the right questions

“If I had an hour to solve a problem I’d spend 55 minutes determining the proper questions to ask, for once I know the proper question, I could solve the problem in less than 5 minutes.“— Albert Einstein

Data scientists have to ask a plethora of questions in other to effectively produce something the business wants. And it’s not just any questions, they have to be the right questions. The main goal of asking questions is to define the problem statement. In other words, inquiries are the first step that data scientists take when solving problems. Once you begin asking questions, it’ll soon become second nature and you’ll discover the value of it, asking yourself better questions in the process as you become more experienced.

3. Master the art of storytelling with data

“Storytelling, a primitive art, is as old as the beginning of mankind. People want to receive what’s out there in the form of stories, not just facts, opinion, analysis.”— Lee Gutkind

A great story has clear detail and visualization. Data is just a pile of messy and unstructured data until a data scientist. And insight extracted from it has to explain what happened, why it is important and how this knowledge can be converted into something practical. The words data visualization means using data and statistics and programming skills to envisage patterns, prove theories, conclude, that can help an organization make good decisions. Data-driven stories are substantially beneficial for both stakeholders as well as the customers. To begin, start with the question: “What data is most important?” There is a multitude of clutter in data and highlighting the important ones is key. Next is to read the data and figure out how to use it to know your audience. (More in Forbes.com)

4. Learn statistics the right way

Correlation does not imply causation.

Statistics is the art of connecting numbers to these questions so that the “answers” evolve! To establish quantitative connections to largely qualitative questions is the heart of statistics. It is said that “A Data Scientist is one who knows more statistics than a programmer and more programming than a statistician”. Statistics is not an easy topic and can be hard to swallow for beginners. Thus, the right way to learn statistics would be to start with the book ThinkStats as it is imperative to understand that statistics is fundamentally the “art” of unraveling the secrets hidden inside the dataset. After having the notion of what stats is, move on the programming part of statistics using Python.

5. Learn Python

Now, it’s my belief that Python is a lot easier than to teach to students programming and teach them C or C++ or Java at the same time because all the details of the languages are so much harder. Other scripting languages really don’t work very well there either. — Guido van Rossum

Python is arguably the most prevalent programming language in the booming world of data science. Why? It’s because Python is an easy to learn programming language with an active community. It also has tons of libraries and resources that makes it the quintessential language for beginners to dive into the field and for experts to do their job efficiently. A staggering 48 percent of data scientists with five or fewer years experience-rated Python their preferred programming language. To start using python for data science the right way, start by learning the basics, then move on to data visualization with Matplotlib, Pandas, statistics, and Scikit-learn — the most popular ML libraries in Python.

One way to start learning today is with the specialization course Python for Everybody at Coursera. This course is great for beginners and Charles Russell Severance, the instructor is an amazing teacher. If you think you’ve conquered the basics of Python, move ahead to the Applied Data Science with Python Specialization which guides you on how to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular python toolkits such as pandas, Matplotlib, Scikit-learn, nltk, and networkx to gain insight into data.

Original

Upvote


user
Created by

Benedict Neo

Data Science blogger


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles