Tutorial: Machine learning classification of Sentinel-2 satellite imagery using R [Updated]

rML_Output.jpg

Note: This tutorial was updated on April 20th, 2020 based on reader feedback.

In my earlier post, I wrote about the events leading up to my paper in the journal GIScience & Remote Sensing. In this short post I would like to help you conduct your own machine learning classification of Sentinel-2 data using the open source package R. The process is pretty straightforward if you have experience in remote sensing and image classification. Even if you don’t have extensive experience, basic knowledge of remote sensing terminology is sufficient. I’ve provided detailed information about different machine learning algorithms, including explanations of key concepts in my paper linked below. It has 100 references and is an “applied review” of sorts, it is also open access and accessible to everyone. I strongly encourage you to read it because it will serve as a useful reference going forward. Also, please cite it if you find this tutorial useful in your work or if you build upon it, thank you! The paper can be cited as:

Abdi, A. M. (2020) Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GIScience & Remote Sensing, 57:1, 1-20, DOI: 10.1080/15481603.2019.1650447

The tutorial assumes that you are already well-grounded in R concepts. I’ve prepared a small script that contains the main steps to performing a classification procedure using some pretty standard and well-known algorithms. Each line is well-commented so that you will know exactly what it does.

The entire code is detailed in this GIST, and the full dataset needed to run the code can be downloaded here (155 MB) as a compressed ZIP file. I have tested the code on 64-bit versions of RStudio 1.2.504 and R 3.6.3 using a laptop with an Intel i7-5600U CPU and 16 GB of RAM running 64-bit Windows 10. Using these data and setup, the training phase took 8 minutes and applying the models to new data took 2 minutes.

All you have to do is execute the code from within the directory where all the data is located. In order to make the most out of this code, it is essential to understand all the steps and hopefully the comments in the code are helpful. If you are unclear about any particular concept, I strongly advise you to read my article linked above to get a better understanding. Happy classifying!