Introduction

In this post, I want to describe one of the DevOps projects I have worked on as a pro bono consultant. The project goal was to build a proof of concept (PoC) demo for a company to showcase how they can improve their existing infrastructure to accommodate continuous integration and continuous deployment.

I named this project “shadowops” because one of my objections was to create an ephemeral test environment (a shadow) to replicate the production environment.

The shadow infrastructure could serve as a cost-effective solution to test new changes on the database schema and the source code.

In the following, I describe the project and my solutions for the problems I was presented to. Moreover, I try to explain full technical details of my architecture and the reason for some of my decisions.

Project Overview

In this project, the company presented me with two challenges. First, the company’s developers had one AWS EC2 instance to work on as a development environment, and from time to time, their new codes break the environment. Second, they didn’t have a good solution for testing changes to their database schema.

The first problem is the classic struggle of many early stage companies and non-mature enterprises. The developers write unit tests for their code, but it is isolated and not a concerted effort. If a developer code passes his/her own unit tests, there is no guarantee that it is harmless to other people’s code. Basically, the classic continuous integration problem.

The second problem is also a classic problem, i.e. static schema. Many small companies hard code their database schema in their source code. It is very convenient in the beginning.

However, as the company grows and if there is a need to change the schema, then the whole source code must be changed accordingly; otherwise, it breaks.

Solution

Here, I describe my solutions to the aforementioned problems. Obviously there are many technologies and approaches to tackle the CI/CD problems, but the following solutions are shaped partly by the company’s legacy practices and the technologies that they were comfortable with.

For the first problem, I used CircleCI as a continuous integration tool to create a demo to showcase how to implement an automated CI pipeline. CircleCI is a very mature CI tool.

For the second problem, after many sessions with the company’s engineers and studying their existing infrastructure and legacy practices, I decided to build a cost-effective temporary test environment.

This way, when there is a need to test schema or pipeline changes, the company can quickly deploy the test environment and run the test and then automatically shut it down to avoid incurring additional cost.

Ideally, after the successful implementation of this infrastructure, I can dockerize the different parts of the test environment and integrate it with CircleCI for the CD part.

Architecture Diagram

Here, I describe the solution’s architecture. In the project, I used AWS as a cloud provider. I picked AWS because it is a very mature infrastructure and many companies are familiar with it. The other reason is that the company uses AWS as its main cloud provider.

Below is the architecture diagram.

As you see on the left-hand side, Terraform and Ansible are used for infrastructure provisioning and configuration management. The link between the two tools is the terraform state files. On the right-hand side is my infrastructure. It consists of an EC2 instance and a Redshift cluster on the cloud.

Inside the EC2, R language and Airflow are installed. The company uses R scripts for programming and it uses Airflow for workflow management. The EC2 instance uses JDBC driver to connect to the Redshift cluster and perform data query.

The big portion of the solution is inside the Airflow. I needed to write multiple tasks and tie them together to create a workflow.

This workflow must connect to the Redshift cluster and query the data. I designed the workflow to engage the database, perform analytics and write back data into the cluster. Finally, if all tasks run successfully, it ends with an email task that connects to a SMTP server and sends an email to notify about the success of the workflow.

Demo

Below, you can see my Demo. I put it on YouTube, so, you can watch it there. After the demo, I explain the details of it.

As you saw in the demo, the first part was how to use CircleCI for continuous integration. I created a simple app with R language with unit tests. I then connected the CircleCI to my github.

Now, when I push my changes to the github, it triggers CircleCI to execute the tasks that I defined. CircleCI then starts a docker on its server, imports a docker image from docker hub, copies the source code on that docker and runs my unit tests and if they pass it will give me a green build.

Then after getting the green build, I am allowed to merge the new code with the master branch. Obviously, this process is more detailed than what I described, but I hope you get the idea.

The second was to build a production replica. It shows the terraform codes, an empty mailbox, a ready Redshift cluster and an AWS elastic compute console with no running EC2.

Then I apply my terraform code and it provisions the infrastructure (34 resources) and connects to the remote host to install necessary softwares and configure the environment.

The infrastructure provisioning takes around 60 seconds and configuration management takes around 5 minute and 45 seconds. When the infrastructure is ready, I copy the Airflow webserver URL and paste it on my browser. It connects to my Airflow remotely.

Then, I show the Airflow DAG that I created. It consists of several tasks. Each task is a R program whose job is to connect to the Redshift cluster, query data and perform analytics and pass the data to the next task. Each task is connected to the next task. If all these analytics tasks run successfully, then the last task is to send a success email to the admin.

If not, then the admin doesn’t receive the email and can not proceed to the next level, which is deployment. At the end of the demo, as you saw, when all tasks run successfully, the success email is sent, which means the test was successful.

Engineering Challenges

In this project, I had three major challenges. The first one was very specific to this project, since the company was using R, I must have found a JDBC driver that can work with R.

Finding the right package and resolving its dependencies was a challenge that took me many iterations to resolve. I used the RJDBC package for this purpose.

Another challenge was configuration management, this project was very heavy on the configuration part. Originally I used terraform provisioners to connect to my remote hosts and install the necessary packages, but after one week and many iterations, I found that terraform is not an ideal tool for this purpose, so I pivoted to Ansible.

However, instead of using Ansible manually, I used terraform to control the Ansible, which requires learning both tools very deeply. My code for controlling Ansible by terraform is on my Github and is open source.

The third challenge was to make the project one click deployment.

For doing that you not only need to have working familiarity with the tools, you must have enough deep knowledge about their mechanics to be able to connect them together. Below, I show some of the connections that were necessary to make the project one-click deployment.

Originally published on medium.

Saeed Mohajeryami