1 Introduction to the project

This is a tutorial made at the ‘SDM-Course’ of University of Marburg on how to create a species distribution model with the method ‘BioClim’, wich predicts the possibility of appearance of species. This tutorial uses data of Butterfly species in Pakistan to create such model. The final goal will be to create a species richness map of the butterfly species in Pakistan.

2 How does BioClim work?

Bioclim is an envelope-style profile method that only uses presence records, in order to characterize sites that are located within the occupied environmental space of a species (Elith et al. 2006, S. 132).

The model interpolates the bioclimatic envelope of a species, which is the summary of the climate conditions at locations where the species has been recorded (Beaumont et al. 2005, S. 253). Since the model is range-based, it creates this bioclimatic envelope as a rectilinear volume. This volume is limited by the extreme values of all climatic parameters that have been determined at the location of the species’ recorded occurrences and therefor indicates the species’ tolerable environmental space.

By comparing climatic parameters of locations to the environmental envelope of the species, locations with climatic values that lie within the environmental volume of a species are then defined by the model as suitable and used to create the possible distribution of the species. The range can also be confined with the climatic values lying within different percentile limits instead of extreme values (ebd.).

To create a Species Distribution Model with BioClim the dismo-package is a useful tool, and contains the function ‘bioclim()’ which needs a Raster object or matrix and a two-column matrix or Spatial Points object. Also this package implements a few species distribution models and provides a number of functions that can assist in using Boosted Regression Trees (Hijmans et al. 2021).

3 Preparation

Set working directory.

wd <- c("~/Uni/6_SS 2022/SpeciesDistribution/Assignment02/bsc-species-distribution-modelling-2022-ClaraPuettker/Bioclim_Project")
setwd(wd)

The necessary packages need to be installed and loaded.

library(sp)
library(rgdal)
## rgdal: version: 1.5-23, (SVN revision 1121)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/clara/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/clara/Documents/R/win-library/4.0/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Overwritten PROJ_LIB was C:/Users/clara/Documents/R/win-library/4.0/rgdal/proj
library(dismo)
## Loading required package: raster

4 Data input

As already mentioned the model needs a Raster object with our environmental data and our spatial points of the butterfly occurrences. So we will first load this data.

4.1 Data input of the environment data

First we load the files with the environmental data for Pakistan into the R environment and then stack them.

env_files <- list.files(path= "~/Uni/6_SS 2022/SpeciesDistribution/Assignment02/bsc-species-distribution-modelling-2022-ClaraPuettker/Bioclim_Project/Data/Environment", full.names=TRUE ) #files downloaded from course unit
predictors <- stack(env_files)
plot(predictors)

The environmental data consists of information about the precipitation, radiation, the average temperature, maximum and minimum temperature, water vapor pressure and wind speed.

Additionally we will load the country border of Pakistan. This is needed, to enclose our study area or rather our area of interests.

pakistan <- getData("GADM", country = "PAK", level = 0 ) #vector map of country border

4.2 Data input of occurrence data

Then we will load our occurrence data. In order to be able to use the occurrences in our model we then need to create Spatial Points out of the data frame.

occ_butterflies_csv <- read.csv("~/Uni/6_SS 2022/SpeciesDistribution/Assignment02/bsc-species-distribution-modelling-2022-ClaraPuettker/Bioclim_Project//Data/Occurrences/PakistanLadakh.csv") #file downloaded from course unit

occ_butterflies <- occ_butterflies_csv #allocating data to a new object since the data frame is still needed later on

coordinates(occ_butterflies) <- c("x", "y") #creating Spatial Points Data Frame
plot(occ_butterflies, pch=20, cex=0.5)
plot(pakistan, add=T)

occ_butterflies_pak <- crop(occ_butterflies, pakistan) #crop out occurrence data outside of Pakistan

5 Modeling

With all the data loaded the model can be fitted. This for we use the bioclim-Tool and name the results with the variable bc. We have to put them in this variable, because this one is needed as object for the next step.

bc <- bioclim(predictors, occ_butterflies_pak) #creating bioclim model

Now with the function ‘predict()’ from the dismo package a prediction with distribution of suitable habitats for all butterflies can be created. Therefor we have to include a plot function.

prediction <- dismo::predict(object=bc, x = predictors) #creating a prediction map
plot(prediction, xlab= "Latitude", ylab= "Longitude", main= "Prediction Map")
plot(pakistan, add=T, border= "grey")

On this Prediction map values from 0 to 1 can be observed. The value of 1 describes a location that has the median value of the data for all the considered environment variables. The value 0 describes a location with a value of one environmental variable that is outside the environmental range (Hijmans et al. 2021). It basically shows us the suitability of habitats generally for butterflies in Pakistan.

6 Species richness map

Our final goal is to make a species richness map that tells us where the species can be found individually and the total amount of species found in the cells of the created map. In order to create such a map, we need presence and absence information for each species as one layer, which then can be stacked in order to combine the information to the desired species richness map.

For this step we are following very useful hints from the tutorial of M. Hallenberger (2021) who already created a tutorial on this subject and provides helpful information on how to create the needed raster layers.

6.1 Setting a threshold and testing it on one species

To create the presence and absence information we are setting a threshold which functions as a cut-off point for the binary decision of indicating a presence or absence (Morrow 2019). We do this With a function created by Morrow (2019) and choose to use the method of minimum training presence (mtp) which works as follows:

“This threshold finds the lowest predicted suitability value for an occurrence point. Essentially, it assumes that the least suitable habitat at which the species is known to occur is the minimum suitability value for the species. The MTP threshold ensures that all occurrence points fall within the area of the binary model.” (ebd.)

The function looks like this:

sdm_threshold <- function(sdm, occs, type = "mtp", binary = FALSE){
  occPredVals <- raster::extract(sdm, occs)
  if(type == "mtp"){
    thresh <- min(na.omit(occPredVals))
  } else if(type == "p10"){
    if(length(occPredVals) < 10){
      p10 <- floor(length(occPredVals) * 0.9)
    } else {
      p10 <- ceiling(length(occPredVals) * 0.9)
    }
    thresh <- rev(sort(occPredVals))[p10]
  }
  sdm_thresh <- sdm
  sdm_thresh[sdm_thresh < thresh] <- NA
  if(binary){
    sdm_thresh[sdm_thresh >= thresh] <- 1
  }
  return(sdm_thresh)
}

First we will try to use the function for one species, in this case for the species “Acraea issoria”:

occs_Acraea_issoria <- subset(occ_butterflies_csv, occ_butterflies_csv$species == "Acraea_issoria")
butterfly_mtp <- sdm_threshold(prediction, as.data.frame(occs_Acraea_issoria)[,2:3], "mtp")

plot(butterfly_mtp, xlab = "Latitude", ylab = "Longitude")

This map now shows the distribution of the species “Acraea issoria” based on the prediction of the bioclim model and the created threshold.

6.2 Looping the function

Now the function can be looped to repeat the process for all species.

memory.limit(9999999999)
## [1] 1e+10
list_results=list()
list_binary=list()

occ_butterflies_csv$species <- as.factor(occ_butterflies_csv$species)
list_butterflies <- levels(occ_butterflies_csv$species)


for (i in list_butterflies) {

  data_butterfly <- subset(occ_butterflies_csv, occ_butterflies_csv$species == i) 
  
  data_butterfly <- data_butterfly[,2:3]
  
  butterfly_pred <- prediction
  
  butterfly_occs <- subset(occ_butterflies_csv, occ_butterflies_csv$species == i)
  
  butterfly_mtp <- sdm_threshold(butterfly_pred, as.data.frame(butterfly_occs)[,2:3], "mtp", binary=TRUE)
  
  list_binary[i] <- butterfly_mtp
  
  list_results[i] <- prediction
  }

6.3 Creating final species richness map

Now the results for the species can be stacked and plotted and the final species richness map can be created. For that we generate a plot, where the x-axis describes the Latitude and the y-axis the Longitude.

butterfly_stack <- stack(list_binary)
results <- stackApply(butterfly_stack,indices=c(1),fun=sum, na.rm=TRUE)

plot(results,main="Species Richness Map", xlab= "Latitude", ylab= "Longitude")
plot(pakistan, add = TRUE, border="dark grey")

7 Discussion of the Method

One useful parameter to evaluate the model is the AUC. In order to get this value absence points are needed, which can be created with the randomPoints() function from the dismo package. In a next step with the evaluate() function, model evaluation statistics can be created.

pseudo_absence <- randomPoints(predictors, n= 1000) #creating pseudo-absences
evaluation <- evaluate(p= occ_butterflies_pak, a= pseudo_absence, model= bc, x= predictors) #creating model evaluaton statistics

After that it is possible to view the statistics and consult the values.In this case the AUC value lies at 0.755 which is considered to be represent a moderate model performance (a value of 0.5 represents a random prediction, so that a value above that indicates a prediction that is better than random). A higher value would indicate a better model performance.

Overall it can be said that the BioCLim model is a straight forward approach to create a species distribution model and in the end just needs 2 objects to perform the calculation. On the other hand it is one of the first implemented tools and it can be found that it doesn’t always perform as well as other more recent and complex models.

8 Sources

Beaumont, L. J., Hughes, L., & M. Poulsen (2005): Predicting species distributions: use of climatic parameters in BIOCLIM and its impact on predictions of species’ current and future distributions. Ecological modeling 186(2): 251-270.

Biodiversity and Climate Change Virtual Laboratory (BCCVL) (2021):SDM-Interpretation of model outputs. https://support.bccvl.org.au/support/solutions/articles/6000127046-sdm-interpretation-of-model-outputs (19.06.21)

Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B., Manion, G., Moritz, C., Nakamura, M., Nakazawa, M., Overton, J.McC, Peterson, A.T., Phillips, S., Richardson, K., Scachetti-Pereira, R., Schapire, R., Soberon, J., Williams, S., Wisz, M. & N. Zimmerman (2006): Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29: 129-151.

Hallenberger, M. (2021): Species richness of butterflies in Pakistan. Species Distribution Modeling with BIOCLIM. https://geomoer.github.io/moer-bsc-project-seminar-SDM/unit99/student_tutorials-03b_bioclim_Hallenberger.html#references (25.06.21)

Hijmans, R. J., Phillips, S., Leathwick, J. & J. Elith (2021): Species Distribution Modeling. Package dismo.

Morrow, C. B. (2019): Thresholding species distribution models. https://babichmorrowc.github.io/post/2019-04-12-sdm-threshold/ (25.06.21)

9 Session Info

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dismo_1.3-5   raster_3.5-15 rgdal_1.5-23  sp_1.4-7     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.8.3     knitr_1.33       magrittr_2.0.1   lattice_0.20-41 
##  [5] R6_2.5.0         rlang_0.4.11     fastmap_1.1.0    stringr_1.4.0   
##  [9] highr_0.9        tools_4.0.3      grid_4.0.3       xfun_0.30       
## [13] terra_1.5-21     jquerylib_0.1.4  rgeos_0.5-5      htmltools_0.5.2 
## [17] yaml_2.2.1       digest_0.6.29    sass_0.4.1       codetools_0.2-16
## [21] evaluate_0.14    rmarkdown_2.13   stringi_1.5.3    compiler_4.0.3  
## [25] bslib_0.3.1      jsonlite_1.8.0