Let’s admit it: some technologies are just more interesting and fashionable than other. One field that is at the same time technologically very interesting and potentially able to radially change our lives is the one related to self driving cars. They are coming. In a way, they are already here. They are attracting massive investments. Even Apple seems to be considering making an iCar. They are, simply, in our future.

The first time that I saw a self driving car, it was in 2010(!), in Tokyo, in the Toyota show room. You could try it as a passenger, for around 1 euro, so I did it. The car was going very slowly, and it was apparently dependent on sensors embedded in the road, and the car seemed to be able to detect that the car on front of mine stopped.

A Toyota self-driving car in front of my self-driving car, (Tokyo, 2010)

Clearly technology has improved a lot, and we can now rely more on Computer Vision and advanced sensors, like lidar and radars.
Actually in Oslo, in Norway, where I live, there is a self-driving bus going around, as part of a trial; it goes slowly with a few people, there is a person always on it for safety, and they had to put some panels on the road to help the lidar detect the road, but it is still a starting point.

A classical task of self-driving cars is lane detection, which is the argument of this post.

The following content is mostly an extract from the book Hands-On Vision and Behavior for Self-Driving Cars, that I wrote for Packt Publishing, with the help of Krishtof Korda.

Thresholding

The code requires Python 3.7, OpenCV and NumPy.

While for a human it is easy to follow a lane, for a computer, this is not something that is so simple. One problem is that an image of the road has too much information. We need to simplify it, selecting only the parts of the image that we are interested in. We will only analyze the part of the image with the lane, but we also need to separate the lane from the rest of the image, for example, using color selection. After all, the road is typically black or dark, and lanes are usually white or yellow.

The image can, of course, be decomposed into three channels: red, green, and blue. As we know, OpenCV stores the image as BGR (meaning, the first byte is the blue channel, not the red channel), but conceptually, there is no difference.
These are the three channels once separated:

BGR channels: blue, green, and red channels

They all seem fine. We can try to separate the lane by selecting the white pixels. As the white color is (255, 255, 255), we could leave some margin and select the colors above 180 on the scale. To do this operation, we need to create a black image with the same size as the selected channel, then paint all the pixels that are above 180 in the original channel white:

img_threshold = np.zeros_like(channel)
img_threshold [(channel >= 180)] = 255

This is how the output appears:

BGR channels: blue, green, and red channels, threshold above 180

There are other colors spaces (explained in the book) that you can try, like HLS, HSV, LAB and YCbCr.
After some experiments, it seems that the green channel can be used for edge detection, and the L channel from the HLS space could be used as additional thresholding, so we’ll stick to these. These settings should be also fine for a yellow line, while different colors might require different thresholds.

Perspective correction

Let’s take a step back and start simple. The easiest case that we can have is with a straight lane. Let’s see how it looks:

If we were flying over the road, and watching it from a bird’s eye view, the lanes would be parallel, but in the picture, they are not, because of the perspective.
The perspective depends on the focal length of the lens (lenses with a shorter focal length show a stronger perspective) and the position of the camera. Once the camera is mounted on a car, the perspective is fixed, so we can take it into consideration and correct the image.
OpenCV has a method to compute the perspective transformation:
getPerspectiveTransform().
It takes two parameters, both arrays of four points, identifying the trapezoid of the perspective. One array is the source and one array is the destination. This means that the same method can be used to compute the inverse transformation, by just swapping the parameters:

perspective_correction = cv2.getPerspectiveTransform(src, dst)
perspective_correction_inv = cv2.getPerspectiveTransform(dst,
src)

We need to select the area around the lanes, plus a small margin:

Trapezoid with the region of interest around the lanes

In our case, the destination is a rectangle (as we want to make it straight). Figure 3.14 shows the green trapezoid (the src variable in the previous code) with the original perspective and the white rectangle (the dst variable in the previous code), which is the desired perspective. Please notice that for clarity, they have been drawn as overlapping, but the coordinates of the rectangle passed as a parameter are shifted, as if it was starting at X coordinate 0.
We can now apply the perspective correction and get our bird’s eye view:

cv2.warpPerspective(img, perspective_correction, warp_size,
flags=cv2.INTER_LANCZOS4)

The warpPerspective() method accepts four parameters:
• The source image.
• The transformation matrix, obtained from getPerspectiveTransform().
• The size of the output image. In our case, the width is the same as the original image, but the height is only the height of the trapezoid/rectangle.
• Some flags, to specify the interpolation. INTER_LINEAR is a common choice, but I recommend experimenting, and to give INTER_LANCZOS4 a try.
This is the result of warp using INTER_LINEAR:

This is the result using INTER_LANCZOS4:

They are very similar, but a closer look shows that the interpolation performed with the LANCZOS4 resampling is sharper. We will see later that at the end of the pipeline, the difference is significant.
What is clear in both images is that our lines are now vertical, which intuitively could help us.

In the next part, we will see how to leverage this image.