How I Set up a Curriculum to Learn Data Science from Scratch
There's no clear path on how to become a data scientist on your own so i decided to create one
This is an article that I have written before in another publication when I decided to start this journey of learning how to code in Python and eventually learn data science.
I am starting this journey because of these big changes in the world that we’re seeing, and apparently, we’re heading to a world where being a practical engineer is not going to cut it anymore.
This week I started by gathering resources and creating the curriculum that I want to follow but I couldn’t handle the information that is spread online about all sorts of things IT-related, especially coding.
For full disclosure, I have a sweet spot for data, as I am a hydrographic engineer, after all, dealing with data acquisition, data processing, and remote sensing, is a huge part of the job. All of the software that I deal with in my day-to-day job consists of commercial solutions that cost a huge amount of money and don’t give me the required freedom to manipulate the data as I would’ve wished.
Following the same line of disclosure, it looks way more diversified to have the basics in data science along with field engineering in a data-related discipline such as hydrographic surveying and GIS, and that's how I decided to build my curriculum, I decided to opt for a data science Bootcamp that I will create by myself, through the uncountable Youtube video and Bootcamp landing pages that I scrolled for the past week and this is how I broke down to the final roadmap to becoming a data scientist :
1- Introduction to data science and linear regression
- Learn how to specify a Data Science problem. Understand how and why you need to clean your data.
- Learn how to incorporate Python modules such as Pandas and Matplotlib into your Jupyter Notebook.
- Use Matplotlib to visualize and better understand your data.
- Learn about the theory behind Linear Regression and how it works. Estimate and interpret regression coefficients using sci-kit learn.
- Make a prediction using your model.
- Analyze and evaluate your regression results using metrics such as the goodness of fit or R-squared.
2- Introduction to python programming
- Understand how to use variables and types to manage data. Work with Python collections: Lists, Numpy Arrays, Pandas DataFrames, and Series
- Understand how to use Python modules and import Python packages.
- Learn how to use Python functions to simplify complex operations.
- Understand function arguments, parameters, and return values.
- Understand how to work with Python objects. Familiarise yourself with
- Python naming conventions. Learn how to use the Python documentation to solve your own problems
3- Optimization and gradient descent
- Understand how optimization works in practice and how parameters in a Machine Learning model are estimated.
- Understand the role of Cost Functions. Introduction to calculus: derivatives, the power rule, and partial derivatives.
- Understand how to work with Python Tuples.
- Work with Python loops to run the Gradient Descent optimization algorithm.
- Understand the effect of the learning rate, multiple minima, and the pitfalls with optimization algorithms.
- Learn how to manipulate, reshape, concatenate, and transpose data in N-Dimensional arrays.
- Learn how to create 3D plots and charts. Understand the Mean Squared Error cost function. Work with a nested loop in Python.
I chose the topics that I mentioned above based on several curriculums that I saw over the net, on Udemy, and several other online resources and I decided to build these three elements to tackle them first and understand them in depth at least to have something that I can build on it the infinity of disciplines that are out there and waiting for a data scientist to take them down.
Great free data science resources
I know that the roadmap I slapped on the page could be not suitable for everyone that is reading this article, the key component for successful road mapping is knowing enough about the optimal way for you to learn to take the best result possible out of your research for your resources online :
- FreeCodeCamp: The best free scientific computing program online that offers the best of both worlds, video tutorial by an expert in each subject and five projects that have to be done at the end of the course, and to it all out, you will get a certificate acknowledged by FANG companies for completely FREE.
- Youtube: it may sound simple, but youtube has one of the best Tutorials resources about any subject you want to learn whether it’s technical or not, it’s practically the mother of tutorial land and with the title of this bullet point, I added a link to a walkthrough from the basics of Python to the complex stuff like AI and machine learning.
- Udemy: I don’t think that there is any platform out there richer than Udemy when it comes to quality courses delivered by industry experts in their fields, now I know that this resource has only paid courses in it, but compared to the value-added ad the process that you will find there, it’s a pretty smart investment in a future that at best, is unknown for a lot of careers that are fading away in a world that is slightly becoming only Zeros and Ones.
No matter how this experience may end, the one thing that I am 100% positive about, is that I am going to learn a skill that will maybe set me up for a different path, maybe not data science, maybe it is, but at least, it will set things in motion for my own butterfly effect, and personally, that idea is fine by me.
Navy Hydrographic Engineer and GIS Specialist and looking to become adata scientist