Visualizing U.S. Petroleum Pipeline Networks

Let’s use data to understand the role of the Colonial pipeline in the larger U.S. petroleum-product pipeline network.


Skanda Vivek

2 years ago | 4 min read

The impacts of the recent Colonial pipeline ransomware cyber-attack are being felt across the entire southeast of the United States. The perpetrators — a well known ransomware group known as DarkSide—have claimed that their goal was never to disrupt the society:

“Our goal is to make money, and not creating problems for society. From today we introduce moderation and check each company that our partners want to encrypt to avoid social consequences in the future.”

But one wonders if this is entirely true. After all, DarkSide is based in Russia, and the recent SolarWinds cyber-attacks that originated from Russia resulted in the breaching of multiple U.S. government servers.

This along with other recent incidents have shown that Russia does have an interest in compromising the U.S. networks, daresay even critical infrastructure networks — such as those that run fuel across a major portion of the U.S.

Let’s take a deeper look into the U.S. petroleum product pipeline network. This will help understand the role of the Colonial pipeline. In addition, the rise of nation state cyber-attacks from Russia, North Korea, Iran, and China mean that in the future the U.S. could be at risk for targeted attacks with the intent to maximize social disruptions.

In this case, it is better to account for vulnerabilities in critical infrastructures, to build resilience in the event of successful cyber-attacks.

Visualizing Pipeline Shapefiles

import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import re
import matplotlib.colors as mcolors
import geopandas as gpd
import contextily as ctxshapefile = gpd.read_file('./PetroleumProduct_Pipelines_US_EIA/PetroleumProduct_Pipelines_US_202001.shp')
shapefile2=shapefile[shapefile['geometry'].notnull()]shapefile3=shapefile2.to_crs(epsg=3857).sample(frac=1) #projecting to Spherical Mercator projection coordinate system for subsequent plotting on mapfig=plt.figure()
ax=shapefile3.plot(column='Shape_Leng', cmap='jet',linewidth=3,figsize=(32,16))
Major petroleum product pipelines in the United States. Source: EIA. Image credits: Skanda Vivek
Major petroleum product pipelines in the United States. Source: EIA. Image credits: Skanda Vivek

I obtained the data for major petroleum product pipelines in the United States from the EIA website, and used the Python GeoPandas package to load the shapefile. You see the largest pipeline in dark red, which is the Colonial pipeline. A closer look at the 5 largest pipelines confirms this.

The second longest is the Southern Lights pipeline, which runs from the Midwest U.S. to Alberta, Canada, and the third longest pipeline is the Plantation pipeline running from Louisiana to D.C, almost the same as the Colonial pipeline, except a bit shorter.

5 largest petroleum product pipelines | Skanda Vivek
5 largest petroleum product pipelines | Skanda Vivek

Converting Shapefiles To Networks

#using shapely to convert geometry into string format
from shapely import wkt
shapefile2['str_geom'] = shapefile2.geometry.apply(lambda x: wkt.dumps(x))
net = nx.Graph()
nt=0for i in range(0,len(shapefile2)):
a=np.zeros(len(shapefile2['str_geom'][i].split())-1) #e.g. 4 points with 8 coordinates means a has 8 values
for j in range (0,len(a)):
a[j]=float(re.findall(r"[-+]?\d*\.\d+|\d+", shapefile2['str_geom'][i].split()[j+1])[0])
for k in range(0, int(len(a)/2)-1):
net.add_edge((a[2*k],a[2*k+1]),(a[2*k+2],a[2*k+3]))positions = {n: (n[0], n[1]) for n in list(net.nodes)}fig, ax = plt.subplots(figsize=(16,8))
ax.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
nx.draw(net, positions, ax=ax, node_size=20)
Petroleum product pipeline network of nodes and edges | Skanda Vivek
Petroleum product pipeline network of nodes and edges | Skanda Vivek

Each shapefile row contains coordinates of points in a pipeline. I represent each point as a node, and connect these points through edges, through which the network is built using the NetworkX Python package. Now that we have the network, we can run some classic network algorithms to discover which nodes, or pipeline segments are essential for the pipeline network as a whole.

Centrality Metrics

In graph theory and network analysis, centrality metrics identify the relative importance of nodes in the entire network. I will use betweenness centrality, which is the amount of influence a node has over the flow of information (or in this case, the flow of petroleum) in a network. The betweenness centrality metric is given below:

Betweenness centrality | Wikipedia
Betweenness centrality | Wikipedia

In the equation, the numerator is the number of shortest paths between nodes s and t, that pass through node v, and the denominator is the number of all shortest paths between s and t. The summation is over all pairs of nodes. In our case, betweenness centrality should give a sense of which pipeline segments are the most important for the transport of petroleum in the greater network.

fig, ax = plt.subplots(figsize=(16,8))ax.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
nodes = nx.draw_networkx_nodes(net, positions,ax=ax, node_size=20,,node_color=list(nx.betweenness_centrality(net).values()),nodelist=nx.betweenness_centrality(net).keys())edges = nx.draw_networkx_edges(net, positions,ax=ax)
plt.title('Betweenness Centrality')
Petroleum product pipeline network, nodes colored by betweenness centrality value | Skanda Vivek
Petroleum product pipeline network, nodes colored by betweenness centrality value | Skanda Vivek

The node with the highest betweenness centrality lies right at the heart of the Colonial pipeline, in the state of South Carolina. This was surprising for me, as I would have thought the most important node might be located closer to the geographic center of the U.S.


Both the sheer length of the Colonial pipeline as well as the betweenness centrality metric place the Colonial pipeline as the most important asset in the U.S. petroleum product pipeline network. A larger concern is how might an attacker motivated to maximize social disruptions, such as a nation state act.

And what might those consequences be? We have already seen the weeks of disruptions in gas supply and price hikes in the aftermath of the Colonial pipeline ransomware attacks. What would this look like if multiple such pipelines are shut off? How might we build societal resilience to successful attacks?

There is much we can learn through multidimensional data sources and connecting the dots during disasters, so that in the future, we are more prepared as a society. In the case of the Colonial pipeline cyber-attack: pipeline networks, petroleum refinery and storage locations, transport supply chains, and the locations of gas stations with shortages can help understand the chain of events. This understanding will make us better prepared for future cyber-attacks.

In conclusion — there’s a lot to be done to make societies resilient to cyber-attacks. But I believe the first step lies in quantifying complex societal vulnerabilities to such attacks. Unfortunately (or on purpose)— in the case of the Colonial pipeline incident, the data unequivocally shows that the attack shut down the most important pipeline in the entire U.S. petroleum product pipeline network, resulting in gas shortages felt for weeks.


Created by

Skanda Vivek

Senior Data Scientist in NLP. Creator of







Related Articles