Everything That Needs To Master SQL Skills For Data Science
If you Google “How to learn SQL?”, you get a lot of results, it’s overwhelming. Let’s go through it
I am writing this article by analyzing various LinkedIn posts written by some of the best data scientists in the industry along with the things I have experienced while learning SQL. I think that this is the list of some good resources for SQL. I have used some of them but not all.
This article consists of the importance of SQL, important topics one must cover, courses, practice platforms, interview preparation resources, and finally SQL notes.
Importance of SQL?
In the recent 25 Data Scientist job postings at Facebook, every job posting listed skills with SQL. In the 2020 LinkedIn top 10 Startups from India list, 7 have SQL as one of their Most Common Skill.
This often underappreciated language is amongst the top skills required not just in India, but worldwide. As long as there is ‘data’ in data science, SQL will remain an important part of it. While it may be over four decades old, SQL remains relevant in the 21st century thanks to a number of key advantages that it offers over the alternatives.
Where does SQL come into the Picture?
Data Science is the study and analysis of data. In order to analyze the data, we need to extract it from the database. This is where SQL comes into the picture.
Many database platforms are modeled after SQL. This is because it has become a standard for many database systems. As a matter of fact, modern big data systems like Hadoop, Spark make use of SQL for maintaining relational database systems and processing structured data.
Identifying the right data sources, getting the data, and preprocessing the data are fundamental steps in any descriptive or predictive analytics work. As mainly this data is stored in relational databases therefore, in order to query these databases, a data scientist must have a sound knowledge of SQL. And SQL plays the most critical role across these steps.
In order to experiment with data through the creation of test environments, data scientists make use of SQL as their standard tool, and to carry out data analytics with the data that is stored in relational databases like Oracle, Microsoft SQL, MySQL, we need SQL.
Important topics in SQL for Data Science
Before jumping into the resources let’s see what are the important topics. Make sure you cover the following topic but do not limit to those.
1] Group By Clause: The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups. Mostly we use aggregation functions with the group by clause and also use Having Clause to apply conditions along with group by clause.
2] Aggregation Functions: An aggregate function performs a calculation on a set of values and returns a single value. Ex. count, avg, min, max, etc.
3] String Functions and Operations: In order to perform various operations such as Convert string to uppercase, match a regular expression, etc.
Ex. 1] Find the student IDs whose name starts with ‘A’. 2] Get the pin code from the address column.
4] Date & Time operations: When the value contains the only date it is easy to handle but when the time portion is also involved then things get a bit complicated. So make sure you practice enough.
5] Output control statements: To get results as per requirements. Eg: order by clause, limit function to get limited rows.
6] Various operators: There are mainly three types of operators as Arithmetic, Logical, and Comparison operators.
7] Joins: This is one of the important topics and used to join multiple tables to get the desired output. Make sure you get all the concepts like types of joins, primary key, foreign key, composite key, etc.
8] Nested Queries: A subquery/nested is used to return data when that will be used in the main query as a condition to further restrict the data to be retrieved.
(Nested queries can be used to return either a scalar (single) value or a row set; whereas, joins are used to return rows. If you can perform an operation in both ways then the optimized way is to use Joins.)
9] Views & Indexing: Indexes are special lookup tables that the database search engine can use to speed up data retrieval. In simple words, an index in a database is similar to the index of a book.
10] Temporary Tables: It is a great feature that lets you store and process intermediate results by using the same selection, update, and join capabilities.
11] Windowing Functions: Window functions operate on a set of rows and return a single value for each row from the underlying query. They reduce the complexity of queries that analyze partitions (windows) of a data set.
12] Query Optimizations: When we are dealing with larger datasets, it is important to use the most efficient method for a SQL statement to access requested data.
And finally 13] Common Table Expression.
Resources to learn and practice:
1] Udacity’s SQL for Data Analysis:
This is one of the best free courses covering all the above topics with a clear explanation and practice quiz after each topic. The quality of practice quizzes makes this course very effective. Overall this is a great course.
2] Introduction to SQL Programming for Excel Users:
If you are an Excel user and wanted to learn SQL then this will be a great YouTube playlist for you. This covers major topics from the above list.
3] Udemy’s Master SQL For Data Science:
In this course, you’ll learn the skills you need to extract critical insight from data sitting in a database. There are over 100 puzzles scattered throughout the course with in-depth solutions providing plenty of opportunities for you to practice.
4] Khan Academy:
This is a non-profit educational organization, with the goal of creating a set of online tools that help educate students. It is a great platform with quality free courses with a detailed explanation. You can also try other courses here. Especially I love statistics one.
The entire course contains 5 parts starting with basics and leading you all the way up to more advanced lessons. In this course, there are challenges followed by video tutorials and a small project at the end of each topic.
5] 200+ SQL Interview Questions:
If you are preparing for an interview this course will help you much by solving complex queries. In this course, you will find 200+ Real World SQL Questions and Practical Answers.
6] LinkedIn Master SQL for Data Science:
Master SQL for Data Science LinkedIn
This contains a total 6 items. This set of courses covers every aspect required for data science.
If you are interested in the history of SQL then go through this video (only 1st part).
If you are a beginner, take any of the above courses or any other that you find good and complete all topics from the above list.
And if you know the basics then to revise all the topics this video will be helpful.
Practice is key of success.
Once you know all the topics it is time to practice. Without practice, you can’t master any skill. So let’s see some good platforms to practice………
SQL Query Practice Platforms:
This is one of the best practice platform with a great variety of questions. Here are some good questions are Second highest salary problem, Duplicate Emails, Classes More Than 5 Students, Rising Temperature, Classes More Than 5 Students.
2] SQL Zoo:
SQLZoo is a well established online platform (since 1999) for writing and running SQL queries against a live database. You can see the actual result of your query without having to scrupulously check your query matches a solution as there may be multiple ways to solve a problem. The assessments section contain more involved examples that allow you to deep-dive into a database at varying levels of difficulty
This is one of the good platforms to practice. Here the questions are divided into three parts as Easy, Medium, Hard.
4] SQL Bolt:
It is, essentially, a series of interactive lessons and exercises that are created to help users learn SQL easily. The lessons and topics found on this site are comprehensive and they cover all the important details of using SQL.
5] Select * SQL:
This is an interactive book that aims to be the best place on the internet for learning SQL. It is free of charge, free of ads, and doesn’t require registration or downloads. It helps you learn by running queries against a real-world dataset to complete projects of consequence.
Here you can practice topic wise and there are major 4 sections as Basics, Intermediate, Advanced, and SQL Analytical training. Here you can read the theory and practice the SQL queries.
7] Stanford University:
Here you can practice topic wise questions. The UI is not so good but you will get quality material. Here you can solve topic wise queries. The main topics covered are basics, joins, subqueries, modifying data, aggregations, date, string, and lastly recursive.
Interview Preparation Resources
If you are preparing for an interview then the following resources can help you,
1]Data Analyst SQL Interview Questions by Zachary Thomas:
The first 70% of SQL is pretty straightforward, the remaining 30% can be pretty tricky. Data analyst and data scientist interview questions at technology companies often pull from that 30%. Here he concentrated on those medium-hard problems. If you practice all the important topics from any of the above platforms(I will prefer Leetcode.) then this will be enough for interview preparation.
2]The Data Monk:
Here you can get company-wise all subjects (SQL, python, statistics, case studies, etc.) interview questions to crack Data Science and Data Analyst interview.
1] If you want to revise all concepts in a short span of time then here is SQL 100+ pages notes provided by GoalKicer.
2] This is the e-book created by stack overflow contributors. This book covers all the concepts.
Note: There are likely other resources. That’s great. I just haven’t gotten to all yet!
If you find anything which can help others then please add it in the comments, I will add those resources to the blog.
In this article, we went over all the resources which will help to plan your own SQL track. This article covered all resources starting from basics to the interview preparation. If you are still confused about “How to start?”, just go with the first resource under each heading.