EX | Warm up R-spatial |
R
is a very powerful software tool with many functions. In this course, we are interested in using R
to analyze spatial data and create maps based on it.
The first exercise in this unit is pragmatic and deals with preparing and handling the aerial photo data used for the rest of the course.
In the remainder of this course, we will classify or “predict” the location of buildings from this data with the help of suitable models. For this purpose, several types of data are necessary. Furthermore, this data has to be converted into a specific form for technical and organizational reasons.
The following script snippets show how the data should be organized and which tools to use.
In detail, we will perform the following tasks:
- download the prepared data from the course server
- prepare the grid and vector data
- visualize the data
- calculate some indices
- save the results
Step 1 - Get the data and setup the working environment
The Hessian State Department of Land Management and Geoinformation (Hessische Verwaltung fuer Bodenmanagement und Geoinformation; HVBG) commissions flights for the entire state of Hesse every 2 years. They make the imagery, known as digital orthophotos (DOPs), available in 20cm and 40cm resolution with 4 channels: Red (R), Green (G), Blue (B) and Near-Infrared (NIR). More information about the HVBG’s DOPs is available here (German only).
The 20-cm DOP of our study area is available for download from the course server. Please note that the HVBG has provided these DOPs free of charge for the purpose of education and that they may only be used in the context of this course.
Writing it down as a script.
Remember: We start every single script by sourcing our script geoAI_setup.R
, from Unit 1.
Then we start the actual work:
- First, we import a digital orthophoto of Marburg. To import so-called raster data, we normally use the package
raster
, which is loaded with the calllibrary(raster)
. We don’t need to do this, however, because we already loaded the package via the setup script. - Next, we import a vector data set containing the areas of the buildings.
- Then, we check the georeferencing.
- Finally, we visualize the data.
Step 2 - Prepare the data
After downloading the data, move them into the data
path of our working environment.
We will start by creating a new R
script called data.R
in the src
folder.
Raster data
Specifically, we use the stack
function from the raster
package to import the TIF file here. By using the ::
syntax, i.e. package::function
, we guarantee that we are using a specific function from a specific package. This concept is important to ensure that we are using the correct function (because some packages use the same function names, which is called masking).
Vector data
In addition to raster data, R
can handle vector data as well. A different package, sf
, is required to read vector data of many types. For training purposes, we will download some data from the Openstreetmap (OSM) database. Take a look at the website for an overview of the available features.
# Example Polygons
# OSM public data buildings marburg
# do not forget to add the osmdata package to your header section of the script
library(osmdata)
# loading OSM data for the Marburg region
buildings = opq(bbox = "marburg de") %>%
add_osm_feature(key = "building") %>%
osmdata_sf()
buildings = buildings$osm_polygons
mapview::mapview(buildings)
Coordinate reference system
Geospatial data always needs a coordinate reference system (CRS). In R
, you can check the CRS of an imported layer using the crs
function in the raster
package.
# 2 - check CRS and other info
#-----------------------#
raster::crs(rasterStack)
raster::crs(buildings)
If you want to work with both of these layers together they should return the same CRS. In cases when you are uncertain if two layers have the same CRS, you can use the compareCRS function to test if they are the same.
raster::compareCRS(rasterStack, buildings)
If these layers have the same CRS, this command will return TRUE
. Otherwise, R
will return FALSE
. Queries like this can be useful for more complex geospatial data workflows, e.g. if two layers have the same CRS, then continue with the analysis, otherwise stop.
In this case they are not the same, so we will transform the CRS of the vector data to the CRS of the raster data.
buildings = sf::st_transform(buildings, crs(rasterStack))
crs(buildings)
Step 3 - Visualize the data
Now that we have the DOP imported into R
, we want to see what it looks like. There are many options for visualizing geospatial data in R
, whether it be raster or vector data, with options ranging from basic static plots, e.g. plotRGB
to interactive plots, e.g. mapview
.
# 3 - visualize the data ####
#-----------------------#
# simple RGB plot
raster::plotRGB(rasterStack)
# interactive plot
mapview(rasterStack)
Additional Resources
Many packages and functions have been written to visualize geospatial data in R
. In fact, there are too many to mention here. If you’re interested, Chapter 1.5 of Geocomputation with R by Lovelace, Nowosad & Muenchow provides a concise history of the packages developed by the R
spatial community.
Step 4 - Calculate RGB indices
Visualizing spatial data is fantastic and makes sense – we want to make maps after all – but conventional GIS software can do this with raster and vector data as well. One way in which R
adds value as a GIS is in the statistical analyses that you can do with raster data. For example, we can use the different spectral bands of an image to calculate so-called indices. One of the most widely known indices created used remote sensing data is the Normalized difference vegetation index (NDVI). NDVI is useful for distinguishing live green vegetation, because of its properties in the near-infrared and red wavelengths. Due to these spectral properties, NDVI requires more than the three standard RGB channels and is calculated using the Near Infrared and Red channels of an image as follows:
For more information, check out Earth Lab
# 4 - calculate RGB indices ####
# We can use raster as simple calculator
# First, we assign the three first layers in the raster image to variables
# called - surprise - red, green and blue (this is to keep it simple and clear)
#-----------------------#
red <- rasterStack[[1]]
green <- rasterStack[[2]]
blue <- rasterStack[[3]]
# Then we calculate all of the indices we need or want
## Normalized difference turbidity index (NDTI)
NDTI <- (red - green) / (red + green)
names(NDTI) <- "NDTI"
## Visible Atmospherically Resistant Index (VARI)
VARI <- (green - red) / (green + red - blue)
names(VARI) <- "VARI"
## Triangular greenness index (TGI)
TGI <- -0.5*(190*(red - green)- 120*(red - blue))
names(TGI) <- "TGI"
rgbI <- raster::stack(NDTI, VARI, TGI)
raster::plot(rgbI)
Further reading There are plenty of remote sensing indices that can be calculated from simple RGB imagery as well – take a look here for some ideas.
Hint: For those interested in doing less typing and learning more about R package development and maintenance, the uavRst
package contains these three and many more RGB indices in one simple function. The challenge is to get all the features of the package working, since it accesses the command line interfaces of SAGA, GRASS, and Orfeo toolbox. If you’re keen to challenge yourself – good luck!
Step 5 - Save the results for later
Finally, now that we have calculated some remote sensing indices that will be necessary for our machine learning prediction later on, it would be useful and time-efficient to only have to calculate them once (not every time that we open an R
session). RDS is ideal for this purpose, because it allows us to save a single R
object to a file and restore it.
Please note that saveRDS
is highly efficient for saving a single R
object only.
# 5 - stack and save as RDS ####
#-----------------------#
marburg_stack <- stack(rasterStack, rgbI)
saveRDS(marburg_stack, (file.path(envrmt$path_data, "dop_indices.RDS")))
Now repeat with Sentinel satellite data
Working with high-resolution aerial imagery is certainly nice, but also has its downsides. It is expensive to generate or procure, it often only covers relatively small areas and it is not always readily available. Satellite data, on the other hand, is continuously available and made readily accessible. One example of such satellite data that is often used in environmental remote sensing is the Sentinel-2 mission by the European Space Agency.
The package sen2r
The package sen2r
allows you to download and preprocess Sentinel-2 images directly into R
.
To install sen2r
you need to have Rtools
installed.
- Go to http://cran.r-project.org/bin/windows/Rtools/
- Select the download link that corresponds to your version of
R
- Open the .exe file and use the default settings
- Make sure to check the box for the installer to edit your PATH
- Run
library(devtools)
inR
- Run
find_rtools()
– ifTRUE
the installation worked properly
Then it is a matter of simply installing the package as we would with any other package.
install.packages("sen2r")
library(sen2r)
The sen2r
GUI
The easiest way to use sen2r
is to open the graphical user interface (GUI) and use it in interactive mode. However, here you have to choose from a large number of options in the settings. The knowledge required for this is also necessary for the command line version presented below. Both interfaces can be automated. We recommend the API, but ultimately it is up to you. To do so, use the function with the same name.
sen2r:sen2r()
The sen2r
API
In the following script, Sentinel-2 data are used to calculate the surface albedo. For this the following steps are necessary:
- Set up the working environment (Attention: additional libraries will be loaded here)
- Download data by configuring and executing
sen2r
using the API - Calculate the surface albedo (exemplary)
The sen2r vignette offers plenty of helpful information about how to use the GUI as well as to access the functionality of sen2r
from within R
.
Optional exercises
Now that we’ve covered some basics, it’s time to practice on your own. The following tasks serve as an orientation framework for practicing in a targeted manner. It requires you to solve some technical, content-related and conceptual problems. Let’s go.
At Robert Hijmans’ raster
homepage you will find a lot of straightforward exercises, including our basic examples from before. Robert also provides the necessary data. Another highly recommend place is Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow. It is the outstanding reference and a perfect starting point for everything related to spatio-temporal data analysis and processing with R
.
A good approach to improving your skills is to dive into these kind of exercises and use your own data in place of the example data. This means:
- Do the exercises with the example data (technical base check)
- Do the exercises with your own data (advanced technical base check)
- Understand the operation
It is a good habit to document what you learn (the knowledge you gain) and any open questions you may have as well as problems that arise. Documenting your progress in an Rmarkdown
document is particularly useful for this purpose. The blogdown
package is, in fact, excellent for this. The key is practice: not just getting sample source code to run, but changing it and understanding what it does.
- Read and operate the following chapters:
- Read and work through Robert Hijmans’ page about unsupervised classification. Follow his guidelines. Instead of the example data from Robert’s tutorial, you can use either the Sentinel data or the DOP data. Since you will not find sufficient water areas in the data (unlike in Roberts’ example) you can combine the vegetation-covered classes and the vegetation-free classes.
Where can I find more information?
For more information, you can look at the following resources:
-
Spatial Data Analysis by Robert Hijmans. Very comprehensive and recommended. Many of the examples are based on his lecture and are adapted for our conditions.
-
Geocomputation with R by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow is the outstanding reference for everything related to spatio-temporal data analysis and processing with
R
.
Comments?
You can leave comments under this gist if you have questions or comments about any of the code chunks that are not included as gist. Please copy the corresponding line into your comment to make it easier to answer the question.