Computer vision might sound like science fiction, but it is already a reality. From simple object detection to facial emotion recognition, we can find many examples of its application directly into our lives.

It might sound complex (and it is), but you can implement a system to identify a person with a few lines.

There are many options that you can use in your program to implement, like OpenCV and Tensorflow, without having to know about deep learning or fancy algorithms.

What is AWS Rekognition?

Amazon Rekognition is one of these services. It is a service from AWS that makes it easy to add image and video analysis to your application. It has nice applications, such as:

Object detection: you can detect objects and scenes inside an image or a video. You can inclusively train a model with your custom object.
Facial recognition: you can detect and recognize faces inside a image, that can be for a user verification or from a celebrity.
Text detection: it can extract text content from images and videos, so you can enable OCR in your application.

And it is available to use with its SDK. So you can integrate with your application in a few steps. Let’s see how.

Note:this tutorial was tested with Python version 3.8 and Boto3 version 1.17.

Configuration

We need to set up a credential to execute any operation with AWS SDK. If you already have one, you can skip this part.

We are going to create a user to access the S3 service. For that, access the IAM console, go to the Users menu and click Add User:

By author.

Type your username and click in Programmatic access option.

By author.

The next step is to set the permissions for the user. You have a couple of options here, as to add to a group or copy from another user. To access the Rekognition service, you will need one of Rekognition permissions. In our case, we choose AmazonRekognitionFullAccess.

By author.

Note: if you want that the AWS Rekognition access a file directly from a S3 bucket, you will need to add a S3 read access as well. Check how to do it here.

Review your data and click in Create User button. A success message should appear, alongside your Access Key ID and Secret Access Key. Copy them both and put them in a secure file (or download the .csv file).

Next we need to inform these data so Boto3 can access the AWS services. For that, we create a file named ~/.aws/credentials:

[default]
aws_access_key_id = <your_access_key>
aws_secret_access_key = <your_secret_key>
region=us-east-1

Note: you can change the location that Boto3 searches for the file by setting the environment variable AWS_CONFIG_FILE. There are other ways to tell Boto3 about the credentials keys, but they are for another time. In Windows, the default location is %USERPROFILE%\.aws\credentials.

We can then install the Boto3 package:

pip install boto3

We also install the package Pillow to manipulate the images:

pip install Pillow

Detecting a label

For AWS Rekognition, a label (or a tag) is everything that it can identify inside an image or a video, like objects (person, tree), scene (beach) or concept (outdoor). If you are using a video, it can also detect an activity (like swimming).

To detect all the labels inside an image, we first need to get the Rekognition client.

import boto3rek_client = boto3.client('rekognition')

After that, you use the method detect_labels() to get a response with all the labels inside your object, passing the image in the Image parameter.

You have two options:

Upload your image bytes through the Bytes parameter.
Indicating the Bucket and Name of your object inside S3Object if you have stored in a S3 bucket.

Let’s take the following photo of a computer as an example.

Photo by Kari Shea on Unsplash.

To detect all the labels, we pass the image to the detect_labels method.

import boto3

rek_client = boto3.client('rekognition')

with open('computer.jpg', 'rb') as image:
response = rek_client.detect_labels(Image={'Bytes': image.read()})

labels = response['Labels']
print(f'Found {len(labels)} labels in the image:')
for label in labels:
name = label['Name']
confidence = label['Confidence']
print(f'> Label "{name}" with confidence {confidence:.2f}')

This method returns a dictionary containing an array of Labels inside the key Labels. Each label is an object that can contain the following fields:

Name: name of the label
Confidence: level of confidence that the image contains the label.
Instances: contains the bounding box for each instance of the detected object inside the image.
Parents: a list with all ancestors of this label. For instance, if you have a car, this could return vehicle and transportation as its parents.

If we run the code above with the image, we get the following result:

Found 14 labels in the image:
> Label "Pc" with confidence 99.93
> Label "Computer" with confidence 99.93
> Label "Electronics" with confidence 99.93
> Label "Laptop" with confidence 99.77
> Label "Computer Keyboard" with confidence 99.55
> Label "Hardware" with confidence 99.55
> Label "Keyboard" with confidence 99.55
> Label "Computer Hardware" with confidence 99.55
> Label "Wood" with confidence 81.52
> Label "Furniture" with confidence 78.39
> Label "Table" with confidence 77.92
> Label "Plywood" with confidence 77.74
> Label "Desk" with confidence 72.95
> Label "Tabletop" with confidence 61.14

As you can see, it get many labels for the same object (as computer, laptop, and hardware).

We can use the Instances information to draw boxes around the object it detected, so we can actually see the object in the image.

import boto3
import io
from PIL import Image, ImageDraw, ImageFont

file_name = 'computer.jpg'
# Get Rekognition client
rek_client = boto3.client('rekognition')
with open(file_name, 'rb') as im:
# Read image bytes
im_bytes = im.read()
# Upload image to AWS
response = rek_client.detect_labels(Image={'Bytes': im_bytes})
# Get default font to draw texts
image = Image.open(io.BytesIO(im_bytes))
font = ImageFont.truetype('arial.ttf', size=80)
draw = ImageDraw.Draw(image)
# Get all labels
w, h = image.size
for label in response['Labels']:
name = label['Name']
# Draw all instancex box, if any
for instance in label['Instances']:
bbox = instance['BoundingBox']
x0 = int(bbox['Left'] * w)
y0 = int(bbox['Top'] * h)
x1 = x0 + int(bbox['Width'] * w)
y1 = y0 + int(bbox['Height'] * h)
draw.rectangle([x0, y0, x1, y1], outline=(255, 0, 0), width=10)
draw.text((x0, y1), name, font=font, fill=(255, 0, 0))

image.save('labels.jpg')

We use the BoundingBox information available inside each label instance to draw a rectangle around the identified object. The coordinates are a percentage of the total height and width of the image, so we need to convert it to actual positions.

Then we get a result similar to this.

By author.

Note: you will need a font to write a text using pillow library as in the source code. You can download one from here.

Conclusion

Object detection is a small part of computer vision. Although it might sound complex and impossible, the fact is today is quite simple to integrate it with your application.

There are some good options out there for you to use, and AWS Rekognition is one of them. It has many more features than just object detection.

With a few extra steps, you can integrate it with your application and can use it without knowing any fancy algorithm or a complex code.

If you’re already using AWS services, might consider using AWS Rekognition as well.

Thank you for reading.