Visualizing U.S. Petroleum Pipeline Networks
Let’s use data to understand the role of the Colonial pipeline in the larger U.S. petroleum-product pipeline network.
Skanda Vivek
The impacts of the recent Colonial pipeline ransomware cyber-attack are being felt across the entire southeast of the United States. The perpetrators — a well known ransomware group known as DarkSide—have claimed that their goal was never to disrupt the society:
“Our goal is to make money, and not creating problems for society. From today we introduce moderation and check each company that our partners want to encrypt to avoid social consequences in the future.”
But one wonders if this is entirely true. After all, DarkSide is based in Russia, and the recent SolarWinds cyber-attacks that originated from Russia resulted in the breaching of multiple U.S. government servers.
This along with other recent incidents have shown that Russia does have an interest in compromising the U.S. networks, daresay even critical infrastructure networks — such as those that run fuel across a major portion of the U.S.
Let’s take a deeper look into the U.S. petroleum product pipeline network. This will help understand the role of the Colonial pipeline. In addition, the rise of nation state cyber-attacks from Russia, North Korea, Iran, and China mean that in the future the U.S. could be at risk for targeted attacks with the intent to maximize social disruptions.
In this case, it is better to account for vulnerabilities in critical infrastructures, to build resilience in the event of successful cyber-attacks.
Visualizing Pipeline Shapefiles
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import re
import matplotlib.colors as mcolors
import geopandas as gpd
import contextily as ctxshapefile = gpd.read_file('./PetroleumProduct_Pipelines_US_EIA/PetroleumProduct_Pipelines_US_202001.shp')
shapefile2=shapefile[shapefile['geometry'].notnull()]shapefile3=shapefile2.to_crs(epsg=3857).sample(frac=1) #projecting to Spherical Mercator projection coordinate system for subsequent plotting on mapfig=plt.figure()
ax=shapefile3.plot(column='Shape_Leng', cmap='jet',linewidth=3,figsize=(32,16))
plt.axis('off')
ctx.add_basemap(ax)

I obtained the data for major petroleum product pipelines in the United States from the EIA website, and used the Python GeoPandas package to load the shapefile. You see the largest pipeline in dark red, which is the Colonial pipeline. A closer look at the 5 largest pipelines confirms this.
The second longest is the Southern Lights pipeline, which runs from the Midwest U.S. to Alberta, Canada, and the third longest pipeline is the Plantation pipeline running from Louisiana to D.C, almost the same as the Colonial pipeline, except a bit shorter.

Converting Shapefiles To Networks
#using shapely to convert geometry into string format
from shapely import wkt
shapefile2['str_geom'] = shapefile2.geometry.apply(lambda x: wkt.dumps(x))
net = nx.Graph()
nt=0for i in range(0,len(shapefile2)):
a=np.zeros(len(shapefile2['str_geom'][i].split())-1) #e.g. 4 points with 8 coordinates means a has 8 values
nt+=len(a)/2
for j in range (0,len(a)):
a[j]=float(re.findall(r"[-+]?\d*\.\d+|\d+", shapefile2['str_geom'][i].split()[j+1])[0])
for k in range(0, int(len(a)/2)-1):
net.add_edge((a[2*k],a[2*k+1]),(a[2*k+2],a[2*k+3]))positions = {n: (n[0], n[1]) for n in list(net.nodes)}fig, ax = plt.subplots(figsize=(16,8))
ax.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
nx.draw(net, positions, ax=ax, node_size=20)
plt.tight_layout()

Each shapefile row contains coordinates of points in a pipeline. I represent each point as a node, and connect these points through edges, through which the network is built using the NetworkX Python package. Now that we have the network, we can run some classic network algorithms to discover which nodes, or pipeline segments are essential for the pipeline network as a whole.
Centrality Metrics
In graph theory and network analysis, centrality metrics identify the relative importance of nodes in the entire network. I will use betweenness centrality, which is the amount of influence a node has over the flow of information (or in this case, the flow of petroleum) in a network. The betweenness centrality metric is given below:

In the equation, the numerator is the number of shortest paths between nodes s and t, that pass through node v, and the denominator is the number of all shortest paths between s and t. The summation is over all pairs of nodes. In our case, betweenness centrality should give a sense of which pipeline segments are the most important for the transport of petroleum in the greater network.
fig, ax = plt.subplots(figsize=(16,8))ax.tick_params(left=True, bottom=True, labelleft=True, labelbottom=True)
nodes = nx.draw_networkx_nodes(net, positions,ax=ax, node_size=20, cmap=plt.cm.jet,node_color=list(nx.betweenness_centrality(net).values()),nodelist=nx.betweenness_centrality(net).keys())edges = nx.draw_networkx_edges(net, positions,ax=ax)
plt.axis('off')
plt.title('Betweenness Centrality')
plt.colorbar(nodes)
plt.tight_layout()

The node with the highest betweenness centrality lies right at the heart of the Colonial pipeline, in the state of South Carolina. This was surprising for me, as I would have thought the most important node might be located closer to the geographic center of the U.S.
Conclusions
Both the sheer length of the Colonial pipeline as well as the betweenness centrality metric place the Colonial pipeline as the most important asset in the U.S. petroleum product pipeline network. A larger concern is how might an attacker motivated to maximize social disruptions, such as a nation state act.
And what might those consequences be? We have already seen the weeks of disruptions in gas supply and price hikes in the aftermath of the Colonial pipeline ransomware attacks. What would this look like if multiple such pipelines are shut off? How might we build societal resilience to successful attacks?
There is much we can learn through multidimensional data sources and connecting the dots during disasters, so that in the future, we are more prepared as a society. In the case of the Colonial pipeline cyber-attack: pipeline networks, petroleum refinery and storage locations, transport supply chains, and the locations of gas stations with shortages can help understand the chain of events. This understanding will make us better prepared for future cyber-attacks.
In conclusion — there’s a lot to be done to make societies resilient to cyber-attacks. But I believe the first step lies in quantifying complex societal vulnerabilities to such attacks. Unfortunately (or on purpose)— in the case of the Colonial pipeline incident, the data unequivocally shows that the attack shut down the most important pipeline in the entire U.S. petroleum product pipeline network, resulting in gas shortages felt for weeks.
Upvote
Skanda Vivek
Senior Data Scientist in NLP. Creator of https://www.answerchatai.com/

Related Articles