How to be a successful Data Engineer in 2021?
Data Engineers are involved in building the infrastructure and architecture for data generation, Data Scientists are concerned with performing mathematics and statistical analysis on the curated data.
Who is a Data Engineer?
Before directly jumping into how to become a successful Data Engineer, let’s understand what exactly Data Engineering is very briefly. Sadly, I have come across many folks who cannot define accurate differences between a Data Engineer (Or a Big Data Engineer) and a Data Scientist. (Yes, this is more common than you’d think!)
Data Engineers are involved in building the infrastructure and architecture for data generation, Data Scientists are concerned with performing mathematics and statistical analysis on the curated data. I know it sounds like all Data Engineers do is:
But, Data Engineers design, create, integrate and optimize the data collected from different sources and apply standardizations, DQMs, business logic transformations and more in the pipeline using Big Data tools. Obviously, if you work at a startup, the line between a DE and a DS is blurred.
To start out as a fresher in the Data Engineering world, it is required that one must either have a bachelor’s degree in technology, science or mathematics. But no matter where you are from, learning software engineering is a must.
Now that we have this out of the way, let’s understand which skills are necessary for a Data Engineer and which skills are cherry on top.
This is the most important factor if you want to land a good data engineering role. You must master these skills first.
- Distributed Systems Framework/Programming Languages – Since you are working on Big Data (Data on the scale of tera/peta bytes that cannot be handled by a normal computer), you need a cluster of computers at your disposal. For this, you need to learn Apache Hadoop and Apache Spark. So, if you’re this guy below, you’re going to have to change.
2. Cloud Platforms – Now, we know that we’re going to use a cluster of computers and not a single node. Are going to buy them? Nope. Why? That’s right, too expensive. Here enters Cloud Platforms, where most of the Data Storage and Computation is based on. Most used platforms are,
- AWS – Very widely used (as per ol’ reliable Gartner’s Magic Quadrant Chart)
- Azure – A close second place competitor
- GCP – Barely making to podium, bravo!
3. Programming – Python, Scala and Java. No need to master all of them. Just follow this simple principle when it comes to programming languages: “Be a jack of all trades but master of One!”
Also, you must be proficient in Data Structures and Algorithms. Can’t escape them!
4. SQL and NoSQL – Must have for a Data Engineer, as mostly you will be dealing with structured or unstructured data by making a lot of joins, aggregates, pivots, filters and more. Don’t like DBMS or SQL? Data engineering is not for you, am afraid! If you know the logic, you can implement transformations in any language mentioned above.
5. Data Warehousing and Data Modeling is also a very important skill that you will be using daily. Also, having a basic idea of Data Lake, Data Lake House, Data Fabric or Data Mesh will help a lot.
6. ETL/ELT tools – Tools like Airflow, Talend and many more are used to extract, transform and load your data very easily. They are pivotal to creating pipelines!
Apart from these, additional good to have skills are Machine Learning, Visualization and good Analytical skills. DevOps skills like JIRA, Bitbucket/Git, Confluence etc., are used almost everywhere, too.
Certifications are pivotal if you want to be a successful data engineer. Because Big Data technology is constantly evolving. Most of the clients I have worked with, prefer someone who is certified in the technology that they are implementing solutions in. Be it AWS/Azure/Kubernetes or anything else.
There are 2 approaches you can follow while pursuing different certifications:
- Breadth First: Get basic certifications in different areas like AWS, Azure, GCP, Snowflake or more. This will make you more versatile as per the ever-changing market.
- Depth First: Get pro-level certifications in a single area like AWS Big Data Specialist, AWS Solutions Architect Professional. This will give you expertise in one area/platform.
Which one you choose, completely depends on your career preference. For example, you want to remain in AWS for an entire career, then choose Depth First.
Do you want to pass your certification on the first try?
Then follow this Pro Tip: No matter which certification you go for, book the exam first after a month or so and then start preparing, so you will have enough motivation and pressure 😉
And last but not least,
As a Data Engineer, your career path will be something like this:
- Data Engineer
- Senior Data Engineer
- Lead Data Engineer
- Big Data Architect/Solutions Architect and so on…
The higher you go on the corporate ladder; the more crucial it is to master your soft skills. Not only you will mentor your juniors and have a lot of team management activities, but you will often lead client calls where you will be suggesting an architecture pattern with technologies. There communication skills, presentation skills and collaboration skills, to name a few, will be used.
So, to sum it up:
Because of the soaring demand for cloud-based services, the demand for data engineers is also increasing proportionally. You don’t need to master all the skills in the world of Big Data to be good. Just pick few skills like a cloud platform and gain experience in complex real-world projects. This will be super helpful in showcasing your worth in job interviews.
Senior Data Engineer @ ZS
A Technology Geek disguised as a Data Engineer in Management Consulting