Understanding the Amazon Rainforest with Multi-Label Classification + VGG-19, Inceptionv3, AlexNet & Transfer Learning

The Power of Artificial Intelligence for Ecosystem Preservation


Tenzin Migmar

3 years ago | 16 min read

Table of Contents

i. Introduction

  • State of the Amazon Ecosystem
  • The potential of Artificial Intelligence

ii. Understanding the Dataset

iii. Computer vision; Conv-net architectures

  • Multi-label Classification
  • VGG-19
  • AlexNet
  • Inceptionv3
  • Transfer Learning

iv. Amazon Satellite Chip Imagery ML Classification Model

  • Data Visualization / Exploratory Analysis
  • Preprocessing
  • Model
  • Comparison of Model Performance

v. Closing Notes

The Amazon Rainforest: lush with greenery and biodiversity houses an abundance of the flora and fauna found here on Earth. Dense and teeming with life, the functions of the Amazon isn’t just limited to being a fascinating biome. The Amazon acts as a critical carbon sink; sequestering an excess of two billion tons or 5% of annual emissions of carbon out of the air.

The Amazon also provides a repository of natural resources ranging from transportation and freshwater through the Amazon river, valuable minerals, global food supply, medicinal plants with special properties that can be used to treat diseases.

To quantify the impact of the Amazonian flora on medicine: 25% of all drugs used today are sourced from the Amazon, 41 various species of plants in the Brazilian Amazon can work as treatments for Malaria, 70% of plants that have been found to have anti-cancer properties have been found in the Amazon rainforest. Nature’s pharmacy lies within the Amazon rainforest.

The Amazon is by design, ill-fated to human greed.

In a state of siege from farming, ranching, urban development, logging, and mining, the utility of the Amazon is being drained. All of the rainforest’s resources, beauty, and wonderment are being wrung out of the 5.5 million km² of exuberant life and all of the constituents who depend on the Amazon are being hung out to dry.

Even more so disquieting, the destruction of the Amazon is spoon-feeding tonnes of more carbon into the looming jaws of climate change and aggravating a positive feedback loop: deforestation increases the temperatures of the Amazon, drying out the vegetation and subsequently nurturing and promoting more forest fires.

In order to mitigate further damage to the Amazon, it will be critical to:

  • Heighten regulations and improve or draft policies that align with the best interests of the ecosystem.
  • Automate monitoring around human activity and operations. (Where artificial intelligence comes into the equation.)
  • Understanding the root causes and issues. With all solutions, it begins with understanding the problem. Unshrouding the secrecy behind the Amazon rainforest’s land use operations is a good place to mark the starting line in solving the holistic issue of rainforest destruction.

Harnessing the power of artificial intelligence, we can better understand through satellite imagery and computer vision applications the human activities occurring within the Amazon and increase regulation accordingly.

Understanding the Dataset

The dataset is provided by Planet and imposes a multi-label classification problem in which satellite image chips must be labeled according to their atmospheric conditions and land-use.

The dataset includes a train_csv file which is where the labels are located and the image chips are found under train/test-jpg.

The image chips were sourced from 4-band satellites in sun-synchronous orbit and International Space Station orbit.

The labels are not mutually exclusive and can be classified as: artisinal mine, conventional mine, partly-cloudy, habitation, bare ground, blooming, selective logging, road, cultivation, water, agriculture, haze, primary, blowdown, cloudy, clear, and/or slash-burn.

Computer Vision; Conv-Net Architectures

Computer vision is a subset of machine learning falling under the broad domain of artificial intelligence that allows computers to see and learn a high-level understanding of images.

“At an abstract level, the goal of computer vision problems is to use the observed image data to infer something about the world.”
Computer Vision: Models, Learning, and Inference, 2012.

The processes of computer vision mimic the human visual system; the field aims to teach computers the intricacies of taking in visual input data, processing it, and outputting results. As humans, we do this on the fly without giving it much thought or effort but it becomes a much more complex task when trying to gift this ability to machines.

In recent years, the field has been transformed by Convolutional Neural Networks: a class of neural networks that are specifically designed for visual imagery applications.

Convolutional Neural Networks are made up of two main components:

  1. Feature Extraction

This is the segment of the network where the model applies a series of operations to the image data with the main goal of deriving features* from images.

The layers of the convolutional neural network that collaborate to perform these operations and extract the features are:

  • Convolution → The first layer in feature extraction. The convolution layer applies filters to the input image in order to learn the image features.
  • Pooling → Output of the feature maps from the convolution layer will then be downsampled. Hence, the primary function of the pooling layer is for spatial dimensionality reduction. Here, the pooling operation is applied so that only the most salient features are highlighted in the feature maps.
  • Flattening → Layer before the fully-connected layer that flattens the pooled feature map input into a one-dimensional array; this output will then be passed to the fully-connected layer.

* Patterns, lines, characteristics of an object that allow us to identify it.

2. Classification

This portion of the Convolutional Neural Network is responsible for the classification of the images into one or more of the predetermined classes.

This varies depending on the task: regular image multi-classification would mean classifying the entire image into one of the classes, semantic segmentation works on a more granular level, classifying all of the pixels within the image into a class, again, this depends on the data.

Layers of the Convolutional Neural Network that are used here:

  • Fully connected layer → The final layer in a convolutional neural network. The fully connected layer generates the final prediction of an input image based on the probabilities of each of the classes.

As previously mentioned, the dataset is a multi-label classification problem. Unlike multi-classification where images are grouped under one of the multiple classes, in multi-label classification, some of the images may only fall under one of the target values whereas others may account for two, three, four, or even all seventeen.

Full disclosure: prior to this, I’ve never worked with satellite image data. Curious about which convolutional neural network architectures work best on satellite imagery, I decided to experiment with some common convolutional neural network architectures and compare the performances of each model.

Convolutional neural network architectures used:

  1. VGG-19

VGG-19 is a convolutional neural network architecture developed by the Visual Geometry Group; it became more well-versed in the computer vision community after being named the runner-up of the 2014 ILSVRC classification task. It is often associated with VGG-16 with the difference being that VGG-19 has 19 layers with trainable weights rather than 16, hence the name.

The highlight of the VGG-19 architecture:

“using an architecture with very small ( 3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers.” — Very Deep Convolutional Networks For Large-Scale Image Recognition

The original authors go more in-depth on the configurations of the architecture in their original research paper here.

2. AlexNet

AlexNet; authored by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton is best known for winning the ILSVRC 2012 competition by an immense error margin and can be credited with stimulating the rise in popularity of convolutional neural networks.

Characteristics of the architecture include:

  • Architecture layers include 5 convolutional layers and 3 max-pooling layers.
  • The activation function used is Relu instead of Tanh which introduced non-linearity and improves training speed.
  • Overlapping max pooling is used in reducing the error rate.
  • The architecture also makes use of drop-out layers after the fully-connected layers to avoid the issue of over-fitting.

3. Inceptionv3

Inceptionv3, as the name suggests, is the third variation of the Inception convolutional neural network. Inceptionv3 builds upon the foundations of its predecessors Inception and Inceptionv2.

Here’s a timeline of the Inception models and their papers:

Inception → The design of Inception highlights the increase in the network’s width and depth without compromising on computational resources.

Read the paper here.

Inceptionv2 → The second variation of Inception introduced in Rethinking the Inception Architecture for Computer Vision.

“we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.” — Rethinking the Inception Architecture for Computer Vision

Read the paper here.

Inceptionv3 layers:

  • Convolution
  • Average Pool
  • Max Pool
  • Dropout
  • Fully-connected

Same paper as Inceptionv2.

The Concept of Transfer Learning

Transfer Learning is a method used in machine learning where knowledge (learnable parameters like weights and biases) from one model that was pre-trained can be leveraged in another model with the benefits being:

  1. Faster training time
  2. Improved performances in applications where data is limited.
  3. Better generalization; prevents overfitting.

Simply put, it’s the use of a model that was already trained on a specific dataset and can now be applied to another problem.

Generally, transfer learning is best put to use when the data you have is limited and in scarce quantity or the problem lies in the domain of the data that the model was previously trained on.

Keras offers deep learning image classification models with weights that were pre-trained available for transfer learning usage. Popular convolutional neural network architectures such as Xception, VGG16, VGG19, ResNet50, Inceptionv3, DenseNet121, EfficientNetB7, and others are available to be used.

Amazon Rainforest Satellite Imagery Multilabel Model

Data visualization is always a good place to start. If performed correctly, exploratory data analysis can extract key insights that can then be used to design a model that aligns with the data. For example: is the data limited? Then maybe transfer learning might be a good idea. Missing a lot of values? Feature engineering could possibly help. This stage of the process is to glean a better understanding of the crux of machine learning: data.

import pandas as pd
import os

root = "planets-dataset/planet/planet/"

csvPath = os.path.join(root + "train_classes.csv")
trainImages = os.path.join(root + "train-jpg")
testImages = os.path.join(root + "test-jpg")

df = pd.read_csv(csvPath)


l = set()

for tag in df['tags'].values:
labels = tag.split(' ')


    image_name              tags0    train_0                 haze primary
1 train_1 agriculture clear primary water
2 train_2 clear primary
3 train_3 clear primary
4 train_4 agriculture clear habitation primary road
5 train_5 haze primary water
6 train_6 agriculture clear cultivation primary water
7 train_7 haze primary
8 train_8 agriculture clear cultivation primary
9 train_9 agriculture clear cultivation primary road(40479, 2){'habitation', 'clear', 'cultivation', 'slash_burn', 'bare_ground', 'artisinal_mine', 'road', 'primary', 'selective_logging', 'conventional_mine', 'partly_cloudy', 'agriculture', 'water',
'cloudy', 'blow_down', 'haze', 'blooming'}

There are 40479 values in the training csv file and the 17 labels are printed.

import re

labels = df['tags']

def valueSearch(pattern):
count = []
num = 0
for label in labels:
x =, label)
if x:
for c in count:
num += 1
return num

def showLabels(pattern):
for label in labels:
x =, label)
if x:

showLabels(r"^[\S]+$") # values with one label
showLabels(r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$") # values with six labels
showLabels(r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$") # values with nine labels

# values with one label
[...]# values with six labels
agriculture clear habitation primary road water
blooming clear cultivation habitation primary slash_burn
agriculture clear habitation primary road water
agriculture bare_ground clear habitation primary road
agriculture clear cultivation habitation primary road
agriculture clear cultivation habitation primary road
[...]# values with nine labels
agriculture clear cultivation cultivation habitation primary road slash_burn water
agriculture artisinal_mine clear conventional_mine cultivation habitation primary road water

Note: this dataset does not contain any images with more than nine labels.

Now with the number of values per number of labels, we can plot the distribution to check for class imbalance.

import matplotlib.pyplot as plt

patternList = [r"^[\S]+$", r"^[^ ]* [^ ]*$", r"^[^ ]* [^ ]* [^ ]*$", r"^[^ ]* [^ ]* [^ ]* [^ ]*$", r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$", r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$",
r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$", r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$", r"^[^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]* [^ ]*$", ]

valueList = []
chartLabels = ["1", "2", "3", "4", "5", "6", "7", "8", "9"]

for p in patternList:

fig, ax = plt.subplots(), valueList)
ax.set_xticklabels(chartLabels, rotation=90)
plt.title("Distribution of Number of Labels")

Based on the distribution plot, most of the images have 2, 3, or 4 labels.

It’s also a good idea to visualize the images to get a sense of what the model centers predictions around.

import cv2

def showImages(imgid,imgid2, imgid3, imgid4, imgid5):
img = cv2.imread(trainImages + "/train_" + imgid)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img2 = cv2.imread(trainImages + "/train_" + imgid2)
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
plt.subplot(6, 6,1+2)
img3 = cv2.imread(trainImages + "/train_" + imgid3)
img3 = cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)
plt.subplot(6, 6, 1+3)
img4 = cv2.imread(trainImages + "/train_" + imgid4)
img4 = cv2.cvtColor(img4, cv2.COLOR_BGR2RGB)
plt.subplot(6, 6, 1+4)
img5 = cv2.imread(trainImages + "/train_" + imgid5)
img5 = cv2.cvtColor(img5, cv2.COLOR_BGR2RGB)

showImages("0.jpg", "1.jpg", "2.jpg", "3.jpg", "4.jpg")
showImages("5.jpg", "6.jpg", "7.jpg", "8.jpg", "9.jpg")

Data preprocessing

To preprocess the data, start by reading in the images. The dataset provides 40479 values but for the sake of speeding up training; I read in 20,000 values and split 16,000–4000 on training and validation.

import numpy as np

# Iterate through the image id names and create unique file paths,
# then cv2 imread and resize the images; and convert to np array.
# Divide by 255 to normalize to 0-1.

def loadImages(imgsize):
imgs = []
x = []

imgCount = 0

for imageids in df['image_name'].values:
if imgCount < 20000:
images = os.path.join(trainImages + "/" + imageids + ".jpg")
imgCount += 1

for path in imgs:
img = cv2.imread(path)
img = cv2.resize(img, (imgsize, imgsize))

x = np.array(x, dtype=np.float32) / 255

return x

X = loadImages(128)

The images are loaded into normalized NumPy arrays. Now to load in the target values or the y values. Again, this is a multi-label classification problem so there will be more than one label for some images. As previously mentioned, I split the data 16,000–4,000 for training-validation.

from sklearn.model_selection import train_test_split

# Iterates through the tags and splits them. If the passed category arg is in the split tags; feature will append 1
# to indicate presence of label and 0 in case of absence. It will then return the list of binarized labels.

def loadLabels(category):
feature = []
for tags in df['tags'].values:
tags = tags.split(' ')
if category in tags:
return feature

# Iterates through l (set of all labels - all 17 unique values) and creates a new df feature with the values being equal
# to loadLabels column; then returns the new df.

def createFeatures(l):
for col in l:
df[col] = loadLabels(col)
return df

df = createFeatures(l)

# Drops unnecessary cols (image name and tags) and drops all rows after 20000. Then converts df values to numpy array and returns
# the dataframe.

def extractLabels(df):
df = df.drop(columns=['image_name', 'tags'])
df = df.drop(labels=range(20000, 40479), axis=0)
df = df.to_numpy()
return df

y = extractLabels(df)

# Train test split values; at 0.2% 16,000-4,000 training / validation.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

array([[0, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 1, 0],
[1, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 1, 0]])


VGG-19 — Using the Sequential API

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint

vgg19 = Sequential()

vgg19.add(Conv2D(input_shape=(32, 32, 3), filters=64, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

vgg19.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

vgg19.add(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"))
vgg19.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

vgg19.add(Dense(4096, activation="relu"))
vgg19.add(Dense(4096, activation="relu"))
vgg19.add(Dense(17, activation="sigmoid"))

vgg19.compile(loss='binary_crossentropy', optimizer='adam',

model_checkpoint = ModelCheckpoint('vgg19.h5', monitor="accuracy",verbose=1, save_best_only=True), y_train, batch_size=100, epochs=10, validation_data=(X_test, y_test)), callbacks=[model_checkpoint])

AlexNet — Using the Sequential API

Inceptionv3 + Transfer Learning — Functional API

from keras.applications.inception_v3 import InceptionV3
from keras.layers import MaxPooling2D, Dense, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint
from keras.models import Model

def Inceptionv3():

inceptionv3 = Inception(weights="imagenet", include_top=False, input_shape=(128, 128, 3))

for layer in inceptionv3.layers:
layer.trainable = False

model = inceptionv3.output
model = MaxPooling2D(pool_size(5,5), strides=(2,2))(model)
model = Dense(4096, activation="relu")(model)
model = Dropout(0.1)(model)
output = Dense(17, activation="sigmoid")(model)

model = Model(inputs=inceptionv3.input, outputs=output)

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

return model

inceptionv3 = Inceptionv3()

model_checkpoint = ModelCheckpoint('inceptionv3.h5', monitor="accuracy",verbose=1, save_best_only=True), y_train, batch_size=100, epochs=10, validation_data=(X_test, y_test)), callbacks=[model_checkpoint])

Comparison of Model Performance

Image by author.
Image by author.

It’s important to note here that I resized and trained the model on the image dimensions of VGG-19 to 32 by 32 rather than 128 by 128. The epoch ETA was approximately 4 hours each for VGG-19 and I wanted to speed up the training process.

Notably, the training accuracies are all within close margins but there is a disparity in the validation accuracies.

Taking a look at the chart, it’s difficult to crown a winner; however, there is potential for Inceptionv3 + Transfer Learning model to further improve validation accuracy. Nominally, I’ll declare Inceptionv3 + Transfer Learning to have performed the best out of the three using a holistic perspective (training acc, validation acc, and training speed).

Closing Notes

Okay, I’ll admit, I finish off every one of these machine learning projects off with “this dataset was interesting to work with” but this one especially unlocked a deeper level of appreciation for artificial intelligence.

I started this project off curious to learn more about land-use but ended up also learning about all the different functions of the rainforest along the way too. That’s the wonderful thing about working with unfamiliar data, you end up adding some to your own personal knowledge.

Real-world applications of artificial intelligence are always the most exciting and awe-inspiring. To improve the quality of life here on Earth should be at the forefront of why we build, invent, and create; otherwise, these tasks quickly become meaningless. I’ll be on the look-out for more real-world impact data sets in the future; and if you have any, link them in the comments! I’d love to explore them.


Created by

Tenzin Migmar







Related Articles