I have always been passionate about sports and the huge world of AI and analytics, so naturally combining both of them was a pending goal that I had.

After finishing Andrew Ng’s Deep Learning Specialization, I decided it was time to put into practice the recently acquired knowledge.

The goal of this project is to develop an open source Artificial Intelligence solution for analyzing squash matches and gaining insights on how to improve the player’s game based on court movement and positioning.

The first objective was to create an MVP (Minimum Viable Product) with the whole pipeline, from which to improve over time each of the stages of the process. In this article I will explain the overall approach and then dive into each step.

The video that will be analyzed is a rally between Nick Matthew (England) and Miguel Angel Rodriguez (Colombia) in the PSA Dubai World Series Finals of 2016.

This is an ongoing project, so the article is meant to change over time when updates are made. The code used for this project can be found on my Github.

Pipeline

The MVP should be able to process a video of a squash match, track the players’ movement on the court, map them onto a 2D court and generate stats of their game. The pipeline developed for this solution was:

Court Detection: delineate the court polygon.
Player Identification: differentiate between players so as to generate individual stats.
Player Tracking: track each player movement throughout the court.
Court Mapping: project 3D position of players on a 2D court.
Stats Generation: generate analytics for each player.

The project uses extensively OpenCV for video and image processing and a customized version of the YOLOv3 algorithm for player detection, identification, and tracking.

Although the R-CNNs (Region-based Convolutional Neural Networks) models may be generally more accurate in object detection tasks, the YOLO (You Only Look Once) family of models was chosen as they are much faster, achieving object detection in real-time.

Many of these techniques can be found very well explained in the highly recommendable blog PyImageSearch of Adrian Rosebrock.

Furthermore, these two articles proved to be useful in terms of the application of these techniques to sports: Open Source Sports Video Analysis using Maching Learning and Player Tracking in Squash with Computer Vision and Deep Learning.

1. Court Detection

The initial attempt in this step was to automatically detect court boundaries.

Even though there are several ways to approach this task, it is very difficult to develop a robust algorithm to identify the court lines because of occlusions, partial court views, bad lighting conditions, or shadows.

Nevertheless, an attemp was made using Hough Lines detection, following the methodology proposed in the Robust Camera Calibration for Sport Videos using Court Models publication.

This was an extremely challenging task and the results obtained were not satisfying. Therefore, in the MVP spirit of having a complete pipeline before introducing complex upgrades, a manual Court Detection class was implemented using OpenCV and a callback function using Shapely library .

import classCourtDetection
# Court Detection instance
CD = classCourtDetection.CourtDetection()\
# Court Detection
src_pts = CD.detectCourt(video_path, video_name, output_path, video_name.split('.')[0] + date_suffix)

The detectCourt() method samples a frame of the video and asks the user to manually delimitate the court boundaries by setting 7 points: the 4 corners and the points where the “T” lines meet the wall and door.

Fig. 2: Court Delimitation: after selecting the 7 coordinates, an image of the detected court (right) is exported — Image by Author

After the user has marked the 7 points in the court and confirmed that the resulting polygon is correct, a list with the coordinates is generated and the image with the detected court is exported.

2. Player Identification

In order to produce analytics for each player individually, it is necessary to make a distinction between them.

To do this, and again in an MVP spirit, a Player Detection class was implemented using OpenCV with a callback function, Shapely for dealing with polygons and the YOLOv3 algorithm for player detection.

import classPlayerDetection
# Player Detection instance
yolo_model = load_model(master_path + '03 - Player Detection and Tracking/yolo_model.h5')
PDT = classPlayerDetection.PlayerDetection(yolo_model)
# Player Identification
dictPlayers = PDT.identifyPlayers(video_path, video_name, dictPlayers, username, src_pts, output_path, video_name.split('.')[0] + date_suffix)

The username input of the identifyPlayers() method will be a string with the name of the player to be identified.

Similarly to the court detection solution, a frame is sampled, the players are detected by the YOLO algorithm and the user is asked to identify which of the boxes contains the username player.

Fig. 3: Player Detection and Identification (YOLOv3 custom implementation) — Image by Author

In order to implement the callback function, first each box is converted to a polygon using the Shapely library. The user then has to click within one of the polygons (players boxes) and the username gets associated to that player. For each player, the pixel colour of their torso is stored.

These pixel colours will be used to compare the torso’s detected in each frame of the video to them and identify to which player it corresponds. Note that it is compulsory that the players wear different coloured shirts in order for the method to work properly.

The comparison between the stored player’s torso and their torso’s in each frame of the video is made using the Delta-E distance metric in the CIE Lab colour space.

The lower the distance between two pixel colours, the more similiar they are. Therefore, on each frame, the pixel colour of the players’ torsos are extracted and compared to the stored values of each of them. Each box coordinates are assigned to the player with the lower Delta-E distance.

3. Player Tracking

Having identified both players, the Yolov3 algorithm is used to detect them in each frame of the video and the Delta-E distance metric to identify which box corresponds to which player.

Fig. 4: Player Tracking (OpenCV + YOLOv3 custom implementation) — Image by Author

Even though the Delta-E metric is much better than other metrics (such as cosine similarity) to compare pixel colours, there are still misclassified boxes in some frames, due to comparison errors. Given that all the generated stats are based on court movement, some accuracy issues were tolerable.

To track each player’s movements, several coordinates are extracted from each box:

## Calculate the Player coordinates (between the feet) and Tracking coordinates
x1 = int(box.xmin)
y1 = int(box.ymin)
x2 = int(box.xmax)
y2 = int(box.ymax)

xc = x1 + int((x2 - x1)/2)
yc = y1 + int((y2 - y1)/2)

# Player position
player_pos = (xc - 1, y2 - 25)

# Tracking
lower_center = (xc, y2)
mid_left = (x1, yc)
center = (xc, yc)
mid_right = (x2, yc)
torso = (xc, y1 + int((yc - y1)/2))
upper_center = (xc, y1)
tracking_coords = [lower_center, mid_left, center, mid_right, torso, upper_center]

A dictionary is then created with the information of each player:

dictPlayers = {'player_A':
{'label': 'Player A', 'player_coords': [], 'player_torso': [], 'tracking_coords': [], '2d_court_coords': []},
'player_B':
{'label': 'Player B', 'player_coords': [], 'player_torso': [], 'tracking_coords': [], '2d_court_coords': []}}

This dictionary contains, for each player, the following keys:

label: string with player’s name (i.e. username)
player_coords: list of arrays of player’s position coordinates in each frame.
player_torso: list of arrays of player’s torso coordinates in each frame.
tracking_coords: list of arrays of player’s box coordinates in each frame.
2d_court_coords: list of arrays of player’s position coordinates in each frame, mapped onto a 2d court (for later analysis).

For each processed frame, using the Delta-E distance metric between the players’ torso and the stored values, each player’s coordinates are appended to the dictPlayers dictionary in the corresponding player key.

4. Court Mapping

With each player successfully identified and tracked throughout the whole video, each player’s position was estimated by taking approximately the coordinates between their feet — (xc — 1, y2–25).

Once the 3D locations of the players were identified, a homography transform was made to put the points in 2D. A homography transform is a technique in computer vision that finds the proper degree of rotation and shifting between two shapes.

Fig. 5: Explanation of Homography Transform (image from ResearchGate)

The transform is found by comparing several points on the 3D court to the corresponding points on a 2D court, and a matrix is created that is a unique solution to how to go from 3D to 2D. With this matrix, some basic linear algebra can then take a given point in our 3D space and map it onto the 2D court.

In our solution, the seven points marked by the user when detecting the court are used to match the location of the same seven points in the 2D court. This next gif shows what the resulting transformation looks like.

Fig. 6: Projected 2D court movement — Image by Author

5. Stats Generation

After processing all the video, the whole sequence of court movement data is stored in the dictPlayers dictionary. This data is then used to generate insights about each player.

To start with, a Heatmap was created for both players’ court movement throughout the rally. This heatmap can be used to assess the court control of a player.

The basic strategy of squash is to stay as close to the center of the court (the ‘T’ zone) as possible, as from there you can hit the most effective shots.

Fig. 7: Player’s heatmap based on court movement — Image by Author

It can be seen that Matthew did a better job in staying close to the ‘T’. This behaviour not only gives him a better position for his next shots, but it also allows him to make a better use of his energy by minimizing his movement on the court.

Fig. 8: T-Control Score calculation for Nick Matthew — Image by Author

In order to quantify and have an objective statistic to compare both players, a ‘T-Control Score’ was defined by setting three ellipses around the ‘T’ zone and awarding more points when the player steps inside the smaller ones, than when they step inside the bigger ones.

The ‘T-Control Score’ metric represents the ratio of total points earned to number of points if all frames were in the two smallest ellipses.

The weights for these awarded points were empirically set and tested until a comfortable result was achieved accross different videos and matches.

In this rally, Nick Matthew got a score of 80.97%, whereas Miguel Angel Rodriguez got 63.22%. Judging by each player’s heatmap, these scores appear to be consistent.

Another metric that was calculated using players’ movement around the court was the Zonal Coverage. This represents what percentage of the frames the player was in the front, mid or back zone of the court.

Fig. 9: Zonal Coverage of Nick Matthew — Image by Author

Each of these zones requires different types of shots, hence being able to identify and understand how each player is using the court can help improve their game strategy.

It can be seen that Matthew spent more time than Rodriguez in the ‘Attack Zone’. This has enabled him to use more volley shots and control the pace of the game.

Even though in this particular rally Rodriguez was the winner, a good squash player should always try in the long run to maximize his time in the ‘Mid-Court’, as it is the most strategic position. Whoever controls the ‘T’, sets the pace of the match.

Another interesting metric is that in 70% of the frames, both players where in the left side of the court. This means that the rally had an intensive component of backhand shots.

In a single rally this metric may not be very relevant, but when analyzing an entire match, or even a series of an opponent’s matches, this could help to develop an optimal game strategy.

Finally, the average distance between players along all frames was calculated. Professional players tend to play closer to each other than amateur ones.

This is because pro players rallies are much faster and any space they give away to their opponent is taken as an advantage to take the lead in the rally.

In the Matthew and Rodriguez rally, their average distance was 1.48 meters, whereas in several other amateur videos processed it ranged from 1.96 to 2.89 meters.

As it can be seen, all these stats can be very useful to improve a player’s game, as well as to analyze an opponent’s game and set a match strategy accordingly.

Conclusions

As it was stated at the beginning of the article, this project is meant to change over time, as upgrades are made in each step of the pipeline. The first objective was to develop a functional MVP with clear and actionable insights.

Several upgrades can be made, especially in the Court Detection and Player Identification steps. It would be great to get rid of the callback functions and completely automate the whole pipeline.

Furthermore, adding some other steps like ball tracking could significantly improve the analytics made of the game.

Clearly there is still a lot of work to do, but I am happy with the progress I have made so far in starting to merge two of my passions: sports and analytics.

The code used for this project can be found on my Github.