EX Warm up R-spatial

R is a very powerful software tool with many functions. In this course, we are interested in using R to analyze spatial data and create maps based on it.

The first exercise in this unit is pragmatic and deals with preparing and handling the aerial photo data used for the rest of the course.

In the remainder of this course, we will classify or “predict” the location of buildings from this data with the help of suitable models. For this purpose, several types of data are necessary. Furthermore, this data has to be converted into a specific form for technical and organizational reasons.

The following script snippets show how the data should be organized and which tools to use.

In detail, we will perform the following tasks:

  1. download the prepared data from the course server
  2. prepare the grid and vector data
  3. visualize the data
  4. calculate some indices
  5. save the results

Step 1 - Get the data and setup the working environment

The Hessian State Department of Land Management and Geoinformation (Hessische Verwaltung fuer Bodenmanagement und Geoinformation; HVBG) commissions flights for the entire state of Hesse every 2 years. They make the imagery, known as digital orthophotos (DOPs), available in 20cm and 40cm resolution with 4 channels: Red (R), Green (G), Blue (B) and Near-Infrared (NIR). More information about the HVBG’s DOPs is available here (German only).

The 20-cm DOP of our study area is available for download from the course server. Please note that the HVBG has provided these DOPs free of charge for the purpose of education and that they may only be used in the context of this course.

Writing it down as a script. Remember: We start every single script by sourcing our script geoAI_setup.R, from Unit 1. Then we start the actual work:

  • First, we import a digital orthophoto of Marburg. To import so-called raster data, we normally use the package terra, which is loaded with the call library(terra). We don’t need to do this, however, because we already loaded the package via the setup script.
  • Next, we import a vector data set containing the areas of the buildings.
  • Then, we check the georeferencing.
  • Finally, we visualize the data.

Step 2 - Prepare the data

After downloading the file, save it into the data path of our working environment. We will start by creating a new R script called data.R in the src folder.

Raster data

Specifically, we use the rast function from the terra package to import the TIF file here. By using the :: syntax, i.e. package::function, we guarantee that we are using a specific function from a specific package. This concept is important to ensure that we are using the correct function (because some packages use the same function names, which is called masking).

Vector data

In addition to raster data, R can handle vector data as well. A different package, sf, is required to read vector data of many types. For training purposes, we will download some data from the Openstreetmap (OSM) database. Take a look at the website for an overview of the available features.

# Example Polygons
# OSM public data buildings marburg
# load OSM data for the Marburg region

buildings <- osmdata::opq(bbox = "marburg de") %>%
  osmdata::add_osm_feature(key = "building") %>%
  osmdata::osmdata_sf()

buildings <- buildings$osm_polygons

# have a look at your data
buildings

plot(st_geometry(buildings))

Coordinate reference system

Geospatial data always needs a coordinate reference system (CRS). In R, you can check the CRS of an imported layer using the crs function in the terra package.

# check CRS
terra::crs(dop)
terra::crs(buildings)

If you want to work with both of these layers together they should return the same CRS. In cases when you are uncertain if two layers have the same CRS, you can use the same.crs function to test if they are the same.

terra::same.crs(dop, buildings)

If these layers have the same CRS, this command will return TRUE. Otherwise, R will return FALSE. Queries like this can be useful for more complex geospatial data workflows, e.g. if two layers have the same CRS, then continue with the analysis, otherwise stop.

In this case they are not the same, so we will transform the CRS of the vector data to the CRS of the raster data.

buildings <- sf::st_transform(buildings, terra::crs(dop))
terra::same.crs(dop, buildings)

Step 3 - Visualize the data

Now that we have the DOP imported into R, we want to see what it looks like. There are many options for visualizing geospatial data in R, whether it be raster or vector data, with options ranging from basic static plots, e.g. plotRGB to interactive plots, e.g. mapview.

#  visualize the data 
# simple RGB plot
terra::plotRGB(dop)

# interactive plot
mapview::viewRGB(raster::stack(dop), r=1,g=2,b=3)

Additional Resources Many packages and functions have been written to visualize geospatial data in R. In fact, there are too many to mention here. If you’re interested, Chapter 1.5 of Geocomputation with R by Lovelace, Nowosad & Muenchow provides a concise history of the packages developed by the R spatial community.

Step 4 - Calculate RGB indices

Visualizing spatial data is fantastic and makes sense – we want to make maps after all – but conventional GIS software can do this with raster and vector data as well. One way in which R adds value as a GIS is in the statistical analyses that you can do with raster data. For example, we can use the different spectral bands of an image to calculate so-called indices. One of the most widely known indices created used remote sensing data is the Normalized difference vegetation index (NDVI). NDVI is useful for distinguishing live green vegetation, because of its properties in the near-infrared and red wavelengths. Due to these spectral properties, NDVI requires more than the three standard RGB channels and is calculated using the Near Infrared and Red channels of an image as follows:

The equation for calculating NDVI.
# calculate RGB indices 
# We can use raster as simple calculator
# First, we assign the three first layers in the raster image to variables
# called - surprise - red, green and blue (this is to keep it simple and clear)
#-----------------------#
red   <- dop[[1]]
green <- dop[[2]]
blue  <- dop[[3]]

# Then we calculate all of the indices we need or want

## Normalized difference turbidity index (NDTI)
NDTI <- (red - green) / (red + green)
names(NDTI) <- "NDTI"

## Visible Atmospherically Resistant Index (VARI)
VARI <- (green - red) / (green + red - blue)
names(VARI) <- "VARI"

## Triangular greenness index (TGI)
TGI <- -0.5 * (190 * (red - green) - 120 * (red - blue))
names(TGI) <- "TGI"

rgbI <- c(NDTI, VARI, TGI)
terra::plot(rgbI)

Further reading There are plenty of remote sensing indices that can be calculated from simple RGB imagery as well – take a look here for some ideas.

Hint: For those interested in doing less typing and learning more about R package development and maintenance, the uavRst package contains these three and many more RGB indices in one simple function. The challenge is to get all the features of the package working, since it accesses the command line interfaces of SAGA, GRASS, and Orfeo toolbox. If you’re keen to challenge yourself – good luck!

Step 5 - Save the results for later

Finally, now that we have calculated some remote sensing indices that will be necessary for our machine learning prediction later on, it would be useful and time-efficient to only have to calculate them once (not every time that we open an R session).

# stack and save

dop_indices <- c(dop, rgbI)

terra::writeRaster(dop_indices, file.path(envrmt$path_data, "dop_indices.tif"))

Optional exercises

Now that we’ve covered some basics, it’s time to practice on your own. The following tasks serve as an orientation framework for practicing in a targeted manner. It requires you to solve some technical, content-related and conceptual problems. Let’s go.

At rspatial, you will find a lot of straightforward exercises, including our basic examples from before. It also provides the necessary data. Another highly recommend place is Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow. It is the outstanding reference and a perfect starting point for everything related to spatio-temporal data analysis and processing with R.

A good approach to improving your skills is to dive into these kind of exercises and use your own data in place of the example data. This means:

  1. Do the exercises with the example data (technical base check)
  2. Do the exercises with your own data (advanced technical base check)
  3. Understand the operation

It is a good habit to document what you learn (the knowledge you gain) and any open questions you may have as well as problems that arise. Documenting your progress in an Rmarkdown document is particularly useful for this purpose. The blogdown package is, in fact, excellent for this. The key is practice: not just getting sample source code to run, but changing it and understanding what it does.

  1. Read and operate the following chapters:
  2. Read and work through Robert Hijmans’ page about unsupervised classification. Follow his guidelines. Instead of the example data from Robert’s tutorial, you can use the DOP data. Since you will not find sufficient water areas in the data (unlike in Roberts’ example) you can combine the vegetation-covered classes and the vegetation-free classes.

Where can I find more information?

For more information, you can look at the following resources:

  • Spatial Data Analysis by Robert Hijmans. Very comprehensive and recommended. Many of the examples are based on his lecture and are adapted for our conditions.

  • Geocomputation with R by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow is the outstanding reference for everything related to spatio-temporal data analysis and processing with R.

Comments?

You can leave comments under this issue if you have questions or comments about any of the code chunks. Please copy the corresponding line into your comment to make it easier to answer the question.

Updated: