Introduction

This projekt deals with species distribution modelling of butterfly species in Pakistan. In the following script, distribution data of butterfly species as well as environmental data from http://chelsa-climate.org/ database is prepared to perform “boosted regression tree”(BRT) modelling process. BRT is an ensemble of methods for fitting statistical models (Elith et al. 2008). Aim of this work is to use the created model in order to predict on potential distribution areas of a particular buttefly species.

Theory: boosted regression trees

Ensemble of multiple methods to fit data to a statistical model.
combines the strength of two different algorithms: “Regression Trees” (predictors + response, recursive binary splits) and “Boosting” (adaptive method, combining several simple models for increased model performance)

benefits of BRT-modelling:

handle different types of predictor variables
handles missing data automatically
can fit complex non-linear relationships, and automatically handles interaction effects between predictors
fitting multiple trees -> powerful ecological insight, superior predictive performance

Video BRT

Befor we start, i recommend you to watch the following video. It gives further information and clarification concerning BRT. It illustrates how to use BRT and also the mathmatics behind it.

library(vembedr)

## Warning: package 'vembedr' was built under R version 3.6.2

embed_url("https://www.youtube.com/watch?v=3CC4N4z3GJc")

Librarys

First of all, you will need the following librarys for the process:

library(dismo)
library(raster)
library(gbm)
library(maptools)
library(rgdal)
library(shapefiles)

After loading in the librarys, we can start with the data preparation now. ***

Data preparation: distribution data

# distribution-data

distribution_shape <- read.shp("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_Pakistan_all_test.shp")

distribution_dataframe <- as.data.frame(distribution_shape)

occurrence_csv <- paste0(system.file(package="dismo"), "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_shape_csv.csv")
head(occurrence_csv)

## [1] "C:/Users/ich/Documents/R/win-library/3.6/dismoC:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_shape_csv.csv"

occurrence <- read.csv("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_shape_csv.csv")
head(occurrence)

##   X shp.record    shp.x    shp.y shp.shape.type header.file.code
## 1 1          1 72.07099 35.99492              1             9994
## 2 2          2 71.99009 35.15735              1             9994
## 3 3          3 73.39397 34.45065              1             9994
## 4 4          4 73.16466 33.82338              1             9994
## 5 5          5 73.16823 34.13390              1             9994
## 6 6          6 73.41688 34.00422              1             9994
##   header.file.length header.file.version header.shape.type header.xmin
## 1              82230                1000                 1    62.23056
## 2              82230                1000                 1    62.23056
## 3              82230                1000                 1    62.23056
## 4              82230                1000                 1    62.23056
## 5              82230                1000                 1    62.23056
## 6              82230                1000                 1    62.23056
##   header.ymin header.xmax header.ymax header.zmin header.zmax header.mmin
## 1    24.13072    77.77783    37.11705           0           0           0
## 2    24.13072    77.77783    37.11705           0           0           0
## 3    24.13072    77.77783    37.11705           0           0           0
## 4    24.13072    77.77783    37.11705           0           0           0
## 5    24.13072    77.77783    37.11705           0           0           0
## 6    24.13072    77.77783    37.11705           0           0           0
##   header.mmax
## 1           0
## 2           0
## 3           0
## 4           0
## 5           0
## 6           0

occurrence_lonlat <- occurrence [,3:4]
head(occurrence_lonlat)

##      shp.x    shp.y
## 1 72.07099 35.99492
## 2 71.99009 35.15735
## 3 73.39397 34.45065
## 4 73.16466 33.82338
## 5 73.16823 34.13390
## 6 73.41688 34.00422

dataframe_occurrence_record_lonlat <- occurrence [,2:4]
head(dataframe_occurrence_record_lonlat)

##   shp.record    shp.x    shp.y
## 1          1 72.07099 35.99492
## 2          2 71.99009 35.15735
## 3          3 73.39397 34.45065
## 4          4 73.16466 33.82338
## 5          5 73.16823 34.13390
## 6          6 73.41688 34.00422

write.csv(dataframe_occurrence_record_lonlat, "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/dataframe_occurrence_record_lonlat.csv")
head(dataframe_occurrence_record_lonlat)

##   shp.record    shp.x    shp.y
## 1          1 72.07099 35.99492
## 2          2 71.99009 35.15735
## 3          3 73.39397 34.45065
## 4          4 73.16466 33.82338
## 5          5 73.16823 34.13390
## 6          6 73.41688 34.00422

Now we have built a dataset of the distribution data containing different types of data, showing the lat/lon values of all species sightings.

Data preparation: species and subspecies names

We go on with cutting out the species and subspecies names of our data. ***

# read in species_csv

species_subspecies <- read.csv("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/distribution_Pakistan_all_species.csv")

species_subspecies_cut <- species_subspecies [,0:2]
head(species_subspecies_cut)

##           species.C.80 subspecies.C.80
## 1      Actinor_radians                
## 2      Actinor_radians                
## 3      Actinor_radians                
## 4 Aeromachus_stigmatus                
## 5 Aeromachus_stigmatus                
## 6 Aeromachus_stigmatus

Now we can combine the two generated dataframes to one:

Combine species + distribution + cut (subset) for only 1 species (–> Pyrgus_cashmirensis)

species_distribution_combined <- data.frame(dataframe_occurrence_record_lonlat, species_subspecies_cut)
head(species_distribution_combined)

##   shp.record    shp.x    shp.y         species.C.80 subspecies.C.80
## 1          1 72.07099 35.99492      Actinor_radians                
## 2          2 71.99009 35.15735      Actinor_radians                
## 3          3 73.39397 34.45065      Actinor_radians                
## 4          4 73.16466 33.82338 Aeromachus_stigmatus                
## 5          5 73.16823 34.13390 Aeromachus_stigmatus                
## 6          6 73.41688 34.00422 Aeromachus_stigmatus

# subset for species: Pyrgus_cashmirensis

subset_Pyrgus_cashmirensis <- subset(species_distribution_combined, species.C.80=="Pyrgus_cashmirensis")
subset_Pyrgus_cashmirensis

##     shp.record    shp.x    shp.y        species.C.80 subspecies.C.80
## 363        363 76.34609 35.53195 Pyrgus_cashmirensis   pseudoalpinus
## 364        364 75.50427 35.33850 Pyrgus_cashmirensis   pseudoalpinus
## 365        365 75.60242 35.95936 Pyrgus_cashmirensis   pseudoalpinus
## 366        366 75.31322 35.00677 Pyrgus_cashmirensis   pseudoalpinus
## 367        367 75.22376 35.24662 Pyrgus_cashmirensis   pseudoalpinus
## 368        368 74.88911 35.27707 Pyrgus_cashmirensis   pseudoalpinus
## 369        369 74.64545 35.03342 Pyrgus_cashmirensis   pseudoalpinus
## 370        370 74.78403 36.11664 Pyrgus_cashmirensis   pseudoalpinus
## 371        371 74.89111 36.29153 Pyrgus_cashmirensis   pseudoalpinus
## 372        372 74.59487 36.24870 Pyrgus_cashmirensis   pseudoalpinus
## 373        373 75.16594 36.87331 Pyrgus_cashmirensis   pseudoalpinus
## 374        374 74.62342 36.82334 Pyrgus_cashmirensis   pseudoalpinus
## 375        375 74.27721 36.12735 Pyrgus_cashmirensis   pseudoalpinus
## 376        376 74.11302 36.27012 Pyrgus_cashmirensis   pseudoalpinus
## 377        377 73.81678 36.45929 Pyrgus_cashmirensis   pseudoalpinus
## 378        378 73.79090 36.14564 Pyrgus_cashmirensis   pseudoalpinus
## 379        379 73.96044 35.92971 Pyrgus_cashmirensis   pseudoalpinus
## 380        380 73.24153 36.80245 Pyrgus_cashmirensis   pseudoalpinus
## 381        381 73.14444 36.33322 Pyrgus_cashmirensis   pseudoalpinus
## 382        382 72.70274 36.60831 Pyrgus_cashmirensis   pseudoalpinus
## 383        383 72.48888 36.51182 Pyrgus_cashmirensis   pseudoalpinus
## 384        384 72.34802 36.63651 Pyrgus_cashmirensis   pseudoalpinus
## 385        385 72.31571 36.32925 Pyrgus_cashmirensis   pseudoalpinus
## 386        386 72.06908 36.23717 Pyrgus_cashmirensis   pseudoalpinus
## 387        387 71.88414 36.24764 Pyrgus_cashmirensis   pseudoalpinus
## 388        388 73.04208 36.08512 Pyrgus_cashmirensis   pseudoalpinus
## 389        389 72.80182 36.10568 Pyrgus_cashmirensis   pseudoalpinus
## 390        390 72.62492 36.11361 Pyrgus_cashmirensis   pseudoalpinus
## 391        391 72.63135 35.95371 Pyrgus_cashmirensis   pseudoalpinus
## 392        392 72.29680 35.96861 Pyrgus_cashmirensis   pseudoalpinus
## 393        393 71.83670 35.97789 Pyrgus_cashmirensis   pseudoalpinus
## 394        394 72.05561 35.75802 Pyrgus_cashmirensis   pseudoalpinus
## 395        395 71.91070 35.60360 Pyrgus_cashmirensis   pseudoalpinus
## 396        396 71.73938 35.65428 Pyrgus_cashmirensis   pseudoalpinus
## 397        397 70.74320 33.87831 Pyrgus_cashmirensis    cashmirensis
## 398        398 71.43325 34.53719 Pyrgus_cashmirensis    cashmirensis
## 399        399 76.32010 34.95386 Pyrgus_cashmirensis    cashmirensis
## 400        400 75.82137 34.89866 Pyrgus_cashmirensis    cashmirensis
## 401        401 74.00003 35.16516 Pyrgus_cashmirensis    cashmirensis
## 402        402 73.80492 35.13423 Pyrgus_cashmirensis    cashmirensis
## 403        403 73.73651 34.75946 Pyrgus_cashmirensis    cashmirensis
## 404        404 73.61873 34.90044 Pyrgus_cashmirensis    cashmirensis
## 405        405 73.39967 34.77456 Pyrgus_cashmirensis    cashmirensis

subset_Pyrgus_cashmirensis_lat_lon <- subset_Pyrgus_cashmirensis[,2:3]
subset_Pyrgus_cashmirensis_lat_lon

##        shp.x    shp.y
## 363 76.34609 35.53195
## 364 75.50427 35.33850
## 365 75.60242 35.95936
## 366 75.31322 35.00677
## 367 75.22376 35.24662
## 368 74.88911 35.27707
## 369 74.64545 35.03342
## 370 74.78403 36.11664
## 371 74.89111 36.29153
## 372 74.59487 36.24870
## 373 75.16594 36.87331
## 374 74.62342 36.82334
## 375 74.27721 36.12735
## 376 74.11302 36.27012
## 377 73.81678 36.45929
## 378 73.79090 36.14564
## 379 73.96044 35.92971
## 380 73.24153 36.80245
## 381 73.14444 36.33322
## 382 72.70274 36.60831
## 383 72.48888 36.51182
## 384 72.34802 36.63651
## 385 72.31571 36.32925
## 386 72.06908 36.23717
## 387 71.88414 36.24764
## 388 73.04208 36.08512
## 389 72.80182 36.10568
## 390 72.62492 36.11361
## 391 72.63135 35.95371
## 392 72.29680 35.96861
## 393 71.83670 35.97789
## 394 72.05561 35.75802
## 395 71.91070 35.60360
## 396 71.73938 35.65428
## 397 70.74320 33.87831
## 398 71.43325 34.53719
## 399 76.32010 34.95386
## 400 75.82137 34.89866
## 401 74.00003 35.16516
## 402 73.80492 35.13423
## 403 73.73651 34.75946
## 404 73.61873 34.90044
## 405 73.39967 34.77456

#final presence data ***

p_a <- 1

final_presence <- cbind(subset_Pyrgus_cashmirensis_lat_lon, p_a)
final_presence

##        shp.x    shp.y p_a
## 363 76.34609 35.53195   1
## 364 75.50427 35.33850   1
## 365 75.60242 35.95936   1
## 366 75.31322 35.00677   1
## 367 75.22376 35.24662   1
## 368 74.88911 35.27707   1
## 369 74.64545 35.03342   1
## 370 74.78403 36.11664   1
## 371 74.89111 36.29153   1
## 372 74.59487 36.24870   1
## 373 75.16594 36.87331   1
## 374 74.62342 36.82334   1
## 375 74.27721 36.12735   1
## 376 74.11302 36.27012   1
## 377 73.81678 36.45929   1
## 378 73.79090 36.14564   1
## 379 73.96044 35.92971   1
## 380 73.24153 36.80245   1
## 381 73.14444 36.33322   1
## 382 72.70274 36.60831   1
## 383 72.48888 36.51182   1
## 384 72.34802 36.63651   1
## 385 72.31571 36.32925   1
## 386 72.06908 36.23717   1
## 387 71.88414 36.24764   1
## 388 73.04208 36.08512   1
## 389 72.80182 36.10568   1
## 390 72.62492 36.11361   1
## 391 72.63135 35.95371   1
## 392 72.29680 35.96861   1
## 393 71.83670 35.97789   1
## 394 72.05561 35.75802   1
## 395 71.91070 35.60360   1
## 396 71.73938 35.65428   1
## 397 70.74320 33.87831   1
## 398 71.43325 34.53719   1
## 399 76.32010 34.95386   1
## 400 75.82137 34.89866   1
## 401 74.00003 35.16516   1
## 402 73.80492 35.13423   1
## 403 73.73651 34.75946   1
## 404 73.61873 34.90044   1
## 405 73.39967 34.77456   1

Now we have the final presence dataset for the species Pyrgus cashmirensis. Later we will combine it with a generated set of pseudo-absence data and the predictor varriables of the environmental layers.

In order to convert our environmental data into the desired dataframe for model procession, have a look at the following procedure: # Preparation of environmental data: chelsa-dataframe

## chelsa data preparation ####

#merge

merged_chelsa_annual_mean_temp_path <- "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/"
merged_chelsa_annual_mean_temp_path

## [1] "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/"

merged_chelsa_annual_mean_temp_files <- list.files("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/")
merged_chelsa_annual_mean_temp_files

##  [1] "CHELSA_temp10_01_1979-2013_V1.2_land.tif"    
##  [2] "CHELSA_temp10_02_1979-2013_V1.2_land.tif"    
##  [3] "CHELSA_temp10_03_1979-2013_V1.2_land.tif"    
##  [4] "CHELSA_temp10_04_1979-2013_V1.2_land.tif"    
##  [5] "CHELSA_temp10_05_1979-2013_V1.2_land.tif"    
##  [6] "CHELSA_temp10_06_1979-2013_V1.2_land.tif"    
##  [7] "CHELSA_temp10_07_1979-2013_V1.2_land.tif"    
##  [8] "CHELSA_temp10_08_1979-2013_V1.2_land (1).tif"
##  [9] "CHELSA_temp10_09_1979-2013_V1.2_land.tif"    
## [10] "CHELSA_temp10_10_1979-2013_V1.2_land.tif"    
## [11] "CHELSA_temp10_11_1979-2013_V1.2_land.tif"    
## [12] "CHELSA_temp10_12_1979-2013_V1.2_land.tif"    
## [13] "merged"

raster_list <- list(merged_chelsa_annual_mean_temp_files) 
raster_list

## [[1]]
##  [1] "CHELSA_temp10_01_1979-2013_V1.2_land.tif"    
##  [2] "CHELSA_temp10_02_1979-2013_V1.2_land.tif"    
##  [3] "CHELSA_temp10_03_1979-2013_V1.2_land.tif"    
##  [4] "CHELSA_temp10_04_1979-2013_V1.2_land.tif"    
##  [5] "CHELSA_temp10_05_1979-2013_V1.2_land.tif"    
##  [6] "CHELSA_temp10_06_1979-2013_V1.2_land.tif"    
##  [7] "CHELSA_temp10_07_1979-2013_V1.2_land.tif"    
##  [8] "CHELSA_temp10_08_1979-2013_V1.2_land (1).tif"
##  [9] "CHELSA_temp10_09_1979-2013_V1.2_land.tif"    
## [10] "CHELSA_temp10_10_1979-2013_V1.2_land.tif"    
## [11] "CHELSA_temp10_11_1979-2013_V1.2_land.tif"    
## [12] "CHELSA_temp10_12_1979-2013_V1.2_land.tif"    
## [13] "merged"

for(i in 1:12)
{
  tilesloop <- paste0(merged_chelsa_annual_mean_temp_path, merged_chelsa_annual_mean_temp_files[i])
  tilesloop       
  raster_list[[i]] <- raster(tilesloop)
  rasterliste <- raster_list[[i]]
  rasterliste
}

raster_list$filename     <- "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/merged/merged_chelsa_annual_mean_temp.tif"
exporttif <- raster_list$filename 
exporttif

## [1] "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/merged/merged_chelsa_annual_mean_temp.tif"

raster_list$overwrite    <- TRUE


all_merged_chelsa_annual_mean_temp_raster <- do.call(raster::merge, raster_list, quote=TRUE)
all_merged_chelsa_annual_mean_temp_raster

## class      : RasterLayer 
## dimensions : 20880, 43200, 902016000  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : -180.0001, 179.9999, -90.00014, 83.99986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/merged/merged_chelsa_annual_mean_temp.tif 
## names      : merged_chelsa_annual_mean_temp 
## values     : -457, 345  (min, max)

plot(all_merged_chelsa_annual_mean_temp_raster, main="all_merged chelsa annual mean temperature")

chelsa_annual_global_prec <- 
  raster("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_global_prec/CHELSA_bio10_12.tif")
chelsa_prec_warmest_quarter <- 
  raster("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/prec_warmest_quarter/CHELSA_bio10_18.tif")
chelsa_warmest_quarter_mean_temp <- 
  raster("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/warmest_quarter_mean_t_global/CHELSA_bio10_10.tif")

plot(all_merged_chelsa_annual_mean_temp_raster, main="all_merged chelsa annual mean temperature")

plot(chelsa_annual_global_prec, main="chelsa annual global prec")

plot(chelsa_prec_warmest_quarter, main="chelsa prec warmest quarter")

plot(chelsa_warmest_quarter_mean_temp, main="chelsa warmest quarter mean temperature")

# crop onto pakistan extent

extend_pakistan <- extent(c(58, 83, 23, 38))

cropped_chelsa_annual_mean_temp_raster <- crop(all_merged_chelsa_annual_mean_temp_raster, extend_pakistan)
cropped_chelsa_annual_mean_temp_raster

## class      : RasterLayer 
## dimensions : 1800, 3000, 5400000  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 57.99986, 82.99986, 22.99986, 37.99986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171245_16916_80103.grd 
## names      : merged_chelsa_annual_mean_temp 
## values     : -341, 224  (min, max)

plot(cropped_chelsa_annual_mean_temp_raster)

#plot(wrld_simpl, add = TRUE)

cropped_chelsa_annual_global_prec <- crop(chelsa_annual_global_prec,extend_pakistan)
cropped_chelsa_annual_global_prec

## class      : RasterLayer 
## dimensions : 1800, 3000, 5400000  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 57.99986, 82.99986, 22.99986, 37.99986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171249_16916_09070.grd 
## names      : CHELSA_bio10_12 
## values     : 14, 3406  (min, max)

plot(cropped_chelsa_annual_global_prec)

#plot(wrld_simpl, add = TRUE)

cropped_chelsa_prec_warmest_quarter <- crop(chelsa_prec_warmest_quarter, extend_pakistan)
cropped_chelsa_prec_warmest_quarter

## class      : RasterLayer 
## dimensions : 1800, 3000, 5400000  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 57.99986, 82.99986, 22.99986, 37.99986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171253_16916_34908.grd 
## names      : CHELSA_bio10_18 
## values     : 0, 3221  (min, max)

plot(cropped_chelsa_prec_warmest_quarter)

#plot(wrld_simpl, add = TRUE)

cropped_chelsa_warmest_quarter_mean_temp <- crop(chelsa_warmest_quarter_mean_temp, extend_pakistan,)
cropped_chelsa_warmest_quarter_mean_temp

## class      : RasterLayer 
## dimensions : 1800, 3000, 5400000  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 57.99986, 82.99986, 22.99986, 37.99986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171258_16916_93479.grd 
## names      : CHELSA_bio10_10 
## values     : -110, 402  (min, max)

plot(cropped_chelsa_warmest_quarter_mean_temp)

#plot(wrld_simpl, add = TRUE)

# rasterstack

rasterstack_pakistan_chelsa <- raster::stack(cropped_chelsa_annual_global_prec, 
                                             cropped_chelsa_annual_mean_temp_raster, 
                                             cropped_chelsa_prec_warmest_quarter, 
                                             cropped_chelsa_warmest_quarter_mean_temp)
rasterstack_pakistan_chelsa

## class      : RasterStack 
## dimensions : 1800, 3000, 5400000, 4  (nrow, ncol, ncell, nlayers)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 57.99986, 82.99986, 22.99986, 37.99986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## names      : CHELSA_bio10_12, merged_chelsa_annual_mean_temp, CHELSA_bio10_18, CHELSA_bio10_10 
## min values :              14,                           -341,               0,            -110 
## max values :            3406,                            224,            3221,             402

plot(rasterstack_pakistan_chelsa)

# dataframe chelsa

pakistan_chelsa_dataframe <- as.data.frame(rasterstack_pakistan_chelsa)
head(pakistan_chelsa_dataframe)

##   CHELSA_bio10_12 merged_chelsa_annual_mean_temp CHELSA_bio10_18
## 1             241                             30              22
## 2             229                             32              21
## 3             228                             29              18
## 4             245                             28              18
## 5             281                             22              19
## 6             322                              9              22
##   CHELSA_bio10_10
## 1             286
## 2             288
## 3             285
## 4             283
## 5             277
## 6             263

str(pakistan_chelsa_dataframe)

## 'data.frame':    5400000 obs. of  4 variables:
##  $ CHELSA_bio10_12               : int  241 229 228 245 281 322 351 366 372 373 ...
##  $ merged_chelsa_annual_mean_temp: int  30 32 29 28 22 9 -5 -9 -14 -16 ...
##  $ CHELSA_bio10_18               : int  22 21 18 18 19 22 22 23 23 26 ...
##  $ CHELSA_bio10_10               : int  286 288 285 283 277 263 247 243 237 235 ...

Summary chelsa data preparation:

The procedure above shows how to merge several environmental layers of one kind in order to use it for the model. After that it is shown how to read in all files needed for prediction, as well as cropping it onto the right extent (extent of pakistan) and adding country borders.

Finally, the created raster stack of all four predictor sets is transformed into a data frame so we can combine it with our presence and absence data later. *** # Generate pseudo-absence data with dismo package

pseudo_absence_pakistan <- randomPoints(cropped_chelsa_annual_global_prec, 42)
pseudo_absence_pakistan

##              x        y
##  [1,] 71.83736 36.95403
##  [2,] 70.54569 36.05403
##  [3,] 60.44569 25.75403
##  [4,] 70.82069 37.57903
##  [5,] 73.31236 23.79569
##  [6,] 72.22903 24.49569
##  [7,] 73.61236 36.05403
##  [8,] 70.66236 34.18736
##  [9,] 72.03736 24.78736
## [10,] 79.12069 25.17903
## [11,] 82.77069 35.33736
## [12,] 70.83736 32.40403
## [13,] 63.17069 30.96236
## [14,] 78.73736 26.47069
## [15,] 66.97903 27.32903
## [16,] 60.31236 27.81236
## [17,] 80.23736 37.28736
## [18,] 72.65403 32.11236
## [19,] 72.84569 28.15403
## [20,] 58.67069 33.50403
## [21,] 59.74569 29.42903
## [22,] 66.24569 27.56236
## [23,] 75.89569 27.57903
## [24,] 81.13736 34.35403
## [25,] 71.91236 24.87069
## [26,] 60.17069 33.92903
## [27,] 66.65403 37.92069
## [28,] 71.28736 26.52069
## [29,] 75.80403 29.03736
## [30,] 73.11236 24.18736
## [31,] 80.32903 27.62903
## [32,] 78.55403 29.48736
## [33,] 61.86236 32.57903
## [34,] 61.97903 35.27069
## [35,] 71.06236 23.42903
## [36,] 77.27069 30.87069
## [37,] 70.85403 30.77069
## [38,] 82.89569 31.67903
## [39,] 76.27069 27.47069
## [40,] 70.61236 34.64569
## [41,] 70.82903 24.29569
## [42,] 72.78736 23.55403

head(pseudo_absence_pakistan)

##             x        y
## [1,] 71.83736 36.95403
## [2,] 70.54569 36.05403
## [3,] 60.44569 25.75403
## [4,] 70.82069 37.57903
## [5,] 73.31236 23.79569
## [6,] 72.22903 24.49569

plot(pseudo_absence_pakistan)

# add column "0" for absence:

p_a <- 0

final_absence <- cbind(pseudo_absence_pakistan, p_a)
final_absence

##              x        y p_a
##  [1,] 71.83736 36.95403   0
##  [2,] 70.54569 36.05403   0
##  [3,] 60.44569 25.75403   0
##  [4,] 70.82069 37.57903   0
##  [5,] 73.31236 23.79569   0
##  [6,] 72.22903 24.49569   0
##  [7,] 73.61236 36.05403   0
##  [8,] 70.66236 34.18736   0
##  [9,] 72.03736 24.78736   0
## [10,] 79.12069 25.17903   0
## [11,] 82.77069 35.33736   0
## [12,] 70.83736 32.40403   0
## [13,] 63.17069 30.96236   0
## [14,] 78.73736 26.47069   0
## [15,] 66.97903 27.32903   0
## [16,] 60.31236 27.81236   0
## [17,] 80.23736 37.28736   0
## [18,] 72.65403 32.11236   0
## [19,] 72.84569 28.15403   0
## [20,] 58.67069 33.50403   0
## [21,] 59.74569 29.42903   0
## [22,] 66.24569 27.56236   0
## [23,] 75.89569 27.57903   0
## [24,] 81.13736 34.35403   0
## [25,] 71.91236 24.87069   0
## [26,] 60.17069 33.92903   0
## [27,] 66.65403 37.92069   0
## [28,] 71.28736 26.52069   0
## [29,] 75.80403 29.03736   0
## [30,] 73.11236 24.18736   0
## [31,] 80.32903 27.62903   0
## [32,] 78.55403 29.48736   0
## [33,] 61.86236 32.57903   0
## [34,] 61.97903 35.27069   0
## [35,] 71.06236 23.42903   0
## [36,] 77.27069 30.87069   0
## [37,] 70.85403 30.77069   0
## [38,] 82.89569 31.67903   0
## [39,] 76.27069 27.47069   0
## [40,] 70.61236 34.64569   0
## [41,] 70.82903 24.29569   0
## [42,] 72.78736 23.55403   0

# rename latlon columns

final_absence_df <- as.data.frame(final_absence)
final_absence_df

##           x        y p_a
## 1  71.83736 36.95403   0
## 2  70.54569 36.05403   0
## 3  60.44569 25.75403   0
## 4  70.82069 37.57903   0
## 5  73.31236 23.79569   0
## 6  72.22903 24.49569   0
## 7  73.61236 36.05403   0
## 8  70.66236 34.18736   0
## 9  72.03736 24.78736   0
## 10 79.12069 25.17903   0
## 11 82.77069 35.33736   0
## 12 70.83736 32.40403   0
## 13 63.17069 30.96236   0
## 14 78.73736 26.47069   0
## 15 66.97903 27.32903   0
## 16 60.31236 27.81236   0
## 17 80.23736 37.28736   0
## 18 72.65403 32.11236   0
## 19 72.84569 28.15403   0
## 20 58.67069 33.50403   0
## 21 59.74569 29.42903   0
## 22 66.24569 27.56236   0
## 23 75.89569 27.57903   0
## 24 81.13736 34.35403   0
## 25 71.91236 24.87069   0
## 26 60.17069 33.92903   0
## 27 66.65403 37.92069   0
## 28 71.28736 26.52069   0
## 29 75.80403 29.03736   0
## 30 73.11236 24.18736   0
## 31 80.32903 27.62903   0
## 32 78.55403 29.48736   0
## 33 61.86236 32.57903   0
## 34 61.97903 35.27069   0
## 35 71.06236 23.42903   0
## 36 77.27069 30.87069   0
## 37 70.85403 30.77069   0
## 38 82.89569 31.67903   0
## 39 76.27069 27.47069   0
## 40 70.61236 34.64569   0
## 41 70.82903 24.29569   0
## 42 72.78736 23.55403   0

colnames(final_absence_df)

## [1] "x"   "y"   "p_a"

names(final_absence_df)[1] <- "shp.x"
names(final_absence_df)[2] <- "shp.y"
final_absence_df

##       shp.x    shp.y p_a
## 1  71.83736 36.95403   0
## 2  70.54569 36.05403   0
## 3  60.44569 25.75403   0
## 4  70.82069 37.57903   0
## 5  73.31236 23.79569   0
## 6  72.22903 24.49569   0
## 7  73.61236 36.05403   0
## 8  70.66236 34.18736   0
## 9  72.03736 24.78736   0
## 10 79.12069 25.17903   0
## 11 82.77069 35.33736   0
## 12 70.83736 32.40403   0
## 13 63.17069 30.96236   0
## 14 78.73736 26.47069   0
## 15 66.97903 27.32903   0
## 16 60.31236 27.81236   0
## 17 80.23736 37.28736   0
## 18 72.65403 32.11236   0
## 19 72.84569 28.15403   0
## 20 58.67069 33.50403   0
## 21 59.74569 29.42903   0
## 22 66.24569 27.56236   0
## 23 75.89569 27.57903   0
## 24 81.13736 34.35403   0
## 25 71.91236 24.87069   0
## 26 60.17069 33.92903   0
## 27 66.65403 37.92069   0
## 28 71.28736 26.52069   0
## 29 75.80403 29.03736   0
## 30 73.11236 24.18736   0
## 31 80.32903 27.62903   0
## 32 78.55403 29.48736   0
## 33 61.86236 32.57903   0
## 34 61.97903 35.27069   0
## 35 71.06236 23.42903   0
## 36 77.27069 30.87069   0
## 37 70.85403 30.77069   0
## 38 82.89569 31.67903   0
## 39 76.27069 27.47069   0
## 40 70.61236 34.64569   0
## 41 70.82903 24.29569   0
## 42 72.78736 23.55403   0

Now we have generated a set of 42 random points as absence data, because according to Elith et al. 2008 it is advisable to use the exact same number of absences as we have it for presences. The data was also given the name “0” as needed for absence data for BRT-modelling. We made sure the data is in the correct form by renaming the colums with the same names as used in the presence dataframe.

Finally, we can combine presence and absence data: ***

Combine presence + absence

final_p_a_lat_lon <- rbind(final_presence, final_absence_df)
final_p_a_lat_lon

##        shp.x    shp.y p_a
## 363 76.34609 35.53195   1
## 364 75.50427 35.33850   1
## 365 75.60242 35.95936   1
## 366 75.31322 35.00677   1
## 367 75.22376 35.24662   1
## 368 74.88911 35.27707   1
## 369 74.64545 35.03342   1
## 370 74.78403 36.11664   1
## 371 74.89111 36.29153   1
## 372 74.59487 36.24870   1
## 373 75.16594 36.87331   1
## 374 74.62342 36.82334   1
## 375 74.27721 36.12735   1
## 376 74.11302 36.27012   1
## 377 73.81678 36.45929   1
## 378 73.79090 36.14564   1
## 379 73.96044 35.92971   1
## 380 73.24153 36.80245   1
## 381 73.14444 36.33322   1
## 382 72.70274 36.60831   1
## 383 72.48888 36.51182   1
## 384 72.34802 36.63651   1
## 385 72.31571 36.32925   1
## 386 72.06908 36.23717   1
## 387 71.88414 36.24764   1
## 388 73.04208 36.08512   1
## 389 72.80182 36.10568   1
## 390 72.62492 36.11361   1
## 391 72.63135 35.95371   1
## 392 72.29680 35.96861   1
## 393 71.83670 35.97789   1
## 394 72.05561 35.75802   1
## 395 71.91070 35.60360   1
## 396 71.73938 35.65428   1
## 397 70.74320 33.87831   1
## 398 71.43325 34.53719   1
## 399 76.32010 34.95386   1
## 400 75.82137 34.89866   1
## 401 74.00003 35.16516   1
## 402 73.80492 35.13423   1
## 403 73.73651 34.75946   1
## 404 73.61873 34.90044   1
## 405 73.39967 34.77456   1
## 1   71.83736 36.95403   0
## 2   70.54569 36.05403   0
## 3   60.44569 25.75403   0
## 4   70.82069 37.57903   0
## 5   73.31236 23.79569   0
## 6   72.22903 24.49569   0
## 7   73.61236 36.05403   0
## 8   70.66236 34.18736   0
## 9   72.03736 24.78736   0
## 10  79.12069 25.17903   0
## 11  82.77069 35.33736   0
## 12  70.83736 32.40403   0
## 13  63.17069 30.96236   0
## 14  78.73736 26.47069   0
## 15  66.97903 27.32903   0
## 16  60.31236 27.81236   0
## 17  80.23736 37.28736   0
## 18  72.65403 32.11236   0
## 19  72.84569 28.15403   0
## 20  58.67069 33.50403   0
## 21  59.74569 29.42903   0
## 22  66.24569 27.56236   0
## 23  75.89569 27.57903   0
## 24  81.13736 34.35403   0
## 25  71.91236 24.87069   0
## 26  60.17069 33.92903   0
## 27  66.65403 37.92069   0
## 28  71.28736 26.52069   0
## 29  75.80403 29.03736   0
## 30  73.11236 24.18736   0
## 31  80.32903 27.62903   0
## 32  78.55403 29.48736   0
## 33  61.86236 32.57903   0
## 34  61.97903 35.27069   0
## 35  71.06236 23.42903   0
## 36  77.27069 30.87069   0
## 37  70.85403 30.77069   0
## 38  82.89569 31.67903   0
## 39  76.27069 27.47069   0
## 40  70.61236 34.64569   0
## 41  70.82903 24.29569   0
## 42  72.78736 23.55403   0

Extract rastervalues of chelsa data for whole point data:

Since the extract function of the raster package requires a data frame with lat/lon values only, we cut them out of our presence/absence dataset.

final_lat_lon <- final_p_a_lat_lon[,1:2]
final_lat_lon

##        shp.x    shp.y
## 363 76.34609 35.53195
## 364 75.50427 35.33850
## 365 75.60242 35.95936
## 366 75.31322 35.00677
## 367 75.22376 35.24662
## 368 74.88911 35.27707
## 369 74.64545 35.03342
## 370 74.78403 36.11664
## 371 74.89111 36.29153
## 372 74.59487 36.24870
## 373 75.16594 36.87331
## 374 74.62342 36.82334
## 375 74.27721 36.12735
## 376 74.11302 36.27012
## 377 73.81678 36.45929
## 378 73.79090 36.14564
## 379 73.96044 35.92971
## 380 73.24153 36.80245
## 381 73.14444 36.33322
## 382 72.70274 36.60831
## 383 72.48888 36.51182
## 384 72.34802 36.63651
## 385 72.31571 36.32925
## 386 72.06908 36.23717
## 387 71.88414 36.24764
## 388 73.04208 36.08512
## 389 72.80182 36.10568
## 390 72.62492 36.11361
## 391 72.63135 35.95371
## 392 72.29680 35.96861
## 393 71.83670 35.97789
## 394 72.05561 35.75802
## 395 71.91070 35.60360
## 396 71.73938 35.65428
## 397 70.74320 33.87831
## 398 71.43325 34.53719
## 399 76.32010 34.95386
## 400 75.82137 34.89866
## 401 74.00003 35.16516
## 402 73.80492 35.13423
## 403 73.73651 34.75946
## 404 73.61873 34.90044
## 405 73.39967 34.77456
## 1   71.83736 36.95403
## 2   70.54569 36.05403
## 3   60.44569 25.75403
## 4   70.82069 37.57903
## 5   73.31236 23.79569
## 6   72.22903 24.49569
## 7   73.61236 36.05403
## 8   70.66236 34.18736
## 9   72.03736 24.78736
## 10  79.12069 25.17903
## 11  82.77069 35.33736
## 12  70.83736 32.40403
## 13  63.17069 30.96236
## 14  78.73736 26.47069
## 15  66.97903 27.32903
## 16  60.31236 27.81236
## 17  80.23736 37.28736
## 18  72.65403 32.11236
## 19  72.84569 28.15403
## 20  58.67069 33.50403
## 21  59.74569 29.42903
## 22  66.24569 27.56236
## 23  75.89569 27.57903
## 24  81.13736 34.35403
## 25  71.91236 24.87069
## 26  60.17069 33.92903
## 27  66.65403 37.92069
## 28  71.28736 26.52069
## 29  75.80403 29.03736
## 30  73.11236 24.18736
## 31  80.32903 27.62903
## 32  78.55403 29.48736
## 33  61.86236 32.57903
## 34  61.97903 35.27069
## 35  71.06236 23.42903
## 36  77.27069 30.87069
## 37  70.85403 30.77069
## 38  82.89569 31.67903
## 39  76.27069 27.47069
## 40  70.61236 34.64569
## 41  70.82903 24.29569
## 42  72.78736 23.55403

rasValue_pakistan_all <- extract(rasterstack_pakistan_chelsa, final_lat_lon)

rasValue_pakistan_all

##       CHELSA_bio10_12 merged_chelsa_annual_mean_temp CHELSA_bio10_18
##  [1,]             138                           -170              21
##  [2,]             269                           -105              31
##  [3,]             157                           -195              27
##  [4,]             858                           -157              88
##  [5,]             471                           -171              83
##  [6,]             397                           -101              78
##  [7,]             964                           -166             158
##  [8,]             200                           -204              58
##  [9,]             149                           -181              39
## [10,]              81                           -119              28
## [11,]              54                           -165              15
## [12,]              63                           -146              17
## [13,]              98                            -86              28
## [14,]              74                           -130              25
## [15,]              55                           -109              16
## [16,]              89                            -92              30
## [17,]             146                           -140              51
## [18,]              41                           -134              11
## [19,]             124                           -176              35
## [20,]             178                           -188              25
## [21,]             138                           -122              17
## [22,]             141                           -141               9
## [23,]             231                           -108              18
## [24,]             238                            -97              12
## [25,]             355                           -193              13
## [26,]             230                           -165              70
## [27,]             217                           -138              54
## [28,]             288                           -148              37
## [29,]             411                           -173              93
## [30,]             512                           -170              62
## [31,]             361                            -74              30
## [32,]             660                           -111              92
## [33,]             601                            -37             121
## [34,]             438                            -12              38
## [35,]             332                             19              71
## [36,]             476                             85             101
## [37,]             182                           -151              32
## [38,]             517                           -178              58
## [39,]             571                           -140             148
## [40,]             698                           -130             210
## [41,]            1220                           -131             518
## [42,]             988                            -63             345
## [43,]            1450                           -109             753
## [44,]             187                           -193               8
## [45,]             627                           -156              12
## [46,]             142                            192               5
## [47,]             514                            -93               6
## [48,]             870                            190               7
## [49,]             580                            177              98
## [50,]             133                           -149              52
## [51,]             333                             74              47
## [52,]             473                            177              72
## [53,]             957                            161             134
## [54,]             171                           -189             118
## [55,]             393                            111              63
## [56,]              76                             90               0
## [57,]             787                            154             111
## [58,]             190                            139              35
## [59,]             128                            126              17
## [60,]              23                            -41               9
## [61,]             424                            123              70
## [62,]             210                            162              50
## [63,]             201                             38               1
## [64,]              44                            137               1
## [65,]             265                            108             124
## [66,]             484                            134              87
## [67,]             194                           -241             152
## [68,]             414                            182              58
## [69,]             216                             49               0
## [70,]             309                            -37              11
## [71,]             190                            168              43
## [72,]             419                            134              73
## [73,]             809                            182             139
## [74,]            1037                            146             205
## [75,]            1358                            135             227
## [76,]              96                             79               0
## [77,]             299                             38               0
## [78,]             490                            202              91
## [79,]            1595                             44             958
## [80,]             270                            139              49
## [81,]             148                           -145             150
## [82,]             763                            122             113
## [83,]             259                             53              32
## [84,]             363                            199              55
## [85,]             682                            197             138
##       CHELSA_bio10_10
##  [1,]              63
##  [2,]             128
##  [3,]              36
##  [4,]              73
##  [5,]              61
##  [6,]             128
##  [7,]              61
##  [8,]              29
##  [9,]              52
## [10,]             115
## [11,]              66
## [12,]              89
## [13,]             148
## [14,]             104
## [15,]             128
## [16,]             141
## [17,]              90
## [18,]             108
## [19,]              58
## [20,]              50
## [21,]             116
## [22,]              98
## [23,]             126
## [24,]             135
## [25,]              39
## [26,]              66
## [27,]              92
## [28,]              83
## [29,]              53
## [30,]              57
## [31,]             152
## [32,]             110
## [33,]             177
## [34,]             205
## [35,]             230
## [36,]             287
## [37,]              83
## [38,]              54
## [39,]              80
## [40,]              87
## [41,]              81
## [42,]             146
## [43,]              99
## [44,]              47
## [45,]              75
## [46,]             365
## [47,]             146
## [48,]             321
## [49,]             324
## [50,]              81
## [51,]             292
## [52,]             330
## [53,]             339
## [54,]              17
## [55,]             325
## [56,]             353
## [57,]             343
## [58,]             340
## [59,]             347
## [60,]             254
## [61,]             335
## [62,]             364
## [63,]             285
## [64,]             370
## [65,]             312
## [66,]             325
## [67,]               0
## [68,]             337
## [69,]             299
## [70,]             208
## [71,]             344
## [72,]             333
## [73,]             319
## [74,]             325
## [75,]             313
## [76,]             346
## [77,]             285
## [78,]             331
## [79,]             215
## [80,]             354
## [81,]              81
## [82,]             311
## [83,]             277
## [84,]             340
## [85,]             326

Final training data

Now we can create a dataframe out of our generated datasets for 1. occurences, 2. absences and 3. environmental predictors:

pakistan_butterflies_train_final_Pyrgus_cashmirensis <- data.frame(final_p_a_lat_lon, rasValue_pakistan_all)
pakistan_butterflies_train_final_Pyrgus_cashmirensis

##        shp.x    shp.y p_a CHELSA_bio10_12 merged_chelsa_annual_mean_temp
## 363 76.34609 35.53195   1             138                           -170
## 364 75.50427 35.33850   1             269                           -105
## 365 75.60242 35.95936   1             157                           -195
## 366 75.31322 35.00677   1             858                           -157
## 367 75.22376 35.24662   1             471                           -171
## 368 74.88911 35.27707   1             397                           -101
## 369 74.64545 35.03342   1             964                           -166
## 370 74.78403 36.11664   1             200                           -204
## 371 74.89111 36.29153   1             149                           -181
## 372 74.59487 36.24870   1              81                           -119
## 373 75.16594 36.87331   1              54                           -165
## 374 74.62342 36.82334   1              63                           -146
## 375 74.27721 36.12735   1              98                            -86
## 376 74.11302 36.27012   1              74                           -130
## 377 73.81678 36.45929   1              55                           -109
## 378 73.79090 36.14564   1              89                            -92
## 379 73.96044 35.92971   1             146                           -140
## 380 73.24153 36.80245   1              41                           -134
## 381 73.14444 36.33322   1             124                           -176
## 382 72.70274 36.60831   1             178                           -188
## 383 72.48888 36.51182   1             138                           -122
## 384 72.34802 36.63651   1             141                           -141
## 385 72.31571 36.32925   1             231                           -108
## 386 72.06908 36.23717   1             238                            -97
## 387 71.88414 36.24764   1             355                           -193
## 388 73.04208 36.08512   1             230                           -165
## 389 72.80182 36.10568   1             217                           -138
## 390 72.62492 36.11361   1             288                           -148
## 391 72.63135 35.95371   1             411                           -173
## 392 72.29680 35.96861   1             512                           -170
## 393 71.83670 35.97789   1             361                            -74
## 394 72.05561 35.75802   1             660                           -111
## 395 71.91070 35.60360   1             601                            -37
## 396 71.73938 35.65428   1             438                            -12
## 397 70.74320 33.87831   1             332                             19
## 398 71.43325 34.53719   1             476                             85
## 399 76.32010 34.95386   1             182                           -151
## 400 75.82137 34.89866   1             517                           -178
## 401 74.00003 35.16516   1             571                           -140
## 402 73.80492 35.13423   1             698                           -130
## 403 73.73651 34.75946   1            1220                           -131
## 404 73.61873 34.90044   1             988                            -63
## 405 73.39967 34.77456   1            1450                           -109
## 1   71.83736 36.95403   0             187                           -193
## 2   70.54569 36.05403   0             627                           -156
## 3   60.44569 25.75403   0             142                            192
## 4   70.82069 37.57903   0             514                            -93
## 5   73.31236 23.79569   0             870                            190
## 6   72.22903 24.49569   0             580                            177
## 7   73.61236 36.05403   0             133                           -149
## 8   70.66236 34.18736   0             333                             74
## 9   72.03736 24.78736   0             473                            177
## 10  79.12069 25.17903   0             957                            161
## 11  82.77069 35.33736   0             171                           -189
## 12  70.83736 32.40403   0             393                            111
## 13  63.17069 30.96236   0              76                             90
## 14  78.73736 26.47069   0             787                            154
## 15  66.97903 27.32903   0             190                            139
## 16  60.31236 27.81236   0             128                            126
## 17  80.23736 37.28736   0              23                            -41
## 18  72.65403 32.11236   0             424                            123
## 19  72.84569 28.15403   0             210                            162
## 20  58.67069 33.50403   0             201                             38
## 21  59.74569 29.42903   0              44                            137
## 22  66.24569 27.56236   0             265                            108
## 23  75.89569 27.57903   0             484                            134
## 24  81.13736 34.35403   0             194                           -241
## 25  71.91236 24.87069   0             414                            182
## 26  60.17069 33.92903   0             216                             49
## 27  66.65403 37.92069   0             309                            -37
## 28  71.28736 26.52069   0             190                            168
## 29  75.80403 29.03736   0             419                            134
## 30  73.11236 24.18736   0             809                            182
## 31  80.32903 27.62903   0            1037                            146
## 32  78.55403 29.48736   0            1358                            135
## 33  61.86236 32.57903   0              96                             79
## 34  61.97903 35.27069   0             299                             38
## 35  71.06236 23.42903   0             490                            202
## 36  77.27069 30.87069   0            1595                             44
## 37  70.85403 30.77069   0             270                            139
## 38  82.89569 31.67903   0             148                           -145
## 39  76.27069 27.47069   0             763                            122
## 40  70.61236 34.64569   0             259                             53
## 41  70.82903 24.29569   0             363                            199
## 42  72.78736 23.55403   0             682                            197
##     CHELSA_bio10_18 CHELSA_bio10_10
## 363              21              63
## 364              31             128
## 365              27              36
## 366              88              73
## 367              83              61
## 368              78             128
## 369             158              61
## 370              58              29
## 371              39              52
## 372              28             115
## 373              15              66
## 374              17              89
## 375              28             148
## 376              25             104
## 377              16             128
## 378              30             141
## 379              51              90
## 380              11             108
## 381              35              58
## 382              25              50
## 383              17             116
## 384               9              98
## 385              18             126
## 386              12             135
## 387              13              39
## 388              70              66
## 389              54              92
## 390              37              83
## 391              93              53
## 392              62              57
## 393              30             152
## 394              92             110
## 395             121             177
## 396              38             205
## 397              71             230
## 398             101             287
## 399              32              83
## 400              58              54
## 401             148              80
## 402             210              87
## 403             518              81
## 404             345             146
## 405             753              99
## 1                 8              47
## 2                12              75
## 3                 5             365
## 4                 6             146
## 5                 7             321
## 6                98             324
## 7                52              81
## 8                47             292
## 9                72             330
## 10              134             339
## 11              118              17
## 12               63             325
## 13                0             353
## 14              111             343
## 15               35             340
## 16               17             347
## 17                9             254
## 18               70             335
## 19               50             364
## 20                1             285
## 21                1             370
## 22              124             312
## 23               87             325
## 24              152               0
## 25               58             337
## 26                0             299
## 27               11             208
## 28               43             344
## 29               73             333
## 30              139             319
## 31              205             325
## 32              227             313
## 33                0             346
## 34                0             285
## 35               91             331
## 36              958             215
## 37               49             354
## 38              150              81
## 39              113             311
## 40               32             277
## 41               55             340
## 42              138             326

Final model

Pyrgus_cashmirensis.tc5.lr01 <- gbm.step(data=pakistan_butterflies_train_final_Pyrgus_cashmirensis, gbm.x = 4:7, gbm.y = 3, family = "bernoulli", tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)

## 
##  
##  GBM STEP - version 2.9 
##  
## Performing cross-validation optimisation of a boosted regression tree model 
## for p_a and using a family of bernoulli 
## Using 85 observations and 4 predictors 
## creating 10 initial models of 50 trees 
## 
##  folds are stratified by prevalence 
## total mean deviance =  1.3862 
## tolerance is fixed at  0.0014 
## ntrees resid. dev. 
## 50    1.0092 
## now adding trees... 
## 100   0.8515 
## 150   0.7829 
## 200   0.7504 
## 250   0.7401 
## 300   0.7321 
## 350   0.7298 
## 400   0.7234 
## 450   0.7249 
## 500   0.7285 
## 550   0.7387 
## 600   0.7317 
## 650   0.7282 
## 700   0.7401 
## 750   0.7467 
## 800   0.7498 
## 850   0.7637 
## 900   0.7672 
## 950   0.7693 
## 1000   0.7686 
## 1050   0.7674

## fitting final gbm model with a fixed number of 400 trees for p_a

## 
## mean total deviance = 1.386 
## mean residual deviance = 0.354 
##  
## estimated cv deviance = 0.723 ; se = 0.138 
##  
## training data correlation = 0.903 
## cv correlation =  0.786 ; se = 0.056 
##  
## training data AUC score = 0.994 
## cv AUC score = 0.905 ; se = 0.03 
##  
## elapsed time -  0.01 minutes

A model is built with the pakistan training data, using chelsa climate data as predictors and the binary absence/presence data as the response variable (as required by family = “bernoulli” in the BRT-model).

We are using a tree complexity of 5, a learning rate of 0,01 (since we have only 84 presence/absence points we need a low learning rate) and a bag fraction of 0,5.

Sources

Elith, J., Leathwick, J. R. and Hastie, T. (2008): A working guide to boosted regression trees, Journal of Animal Ecology 77, 802–813.
Hijmans, R. J. and Elith, J. (2019): Spatial Distribution Models, https://rspatial.org/sdm/SDM.pdf access: 12.01.2020
https://www.youtube.com/watch?v=3CC4N4z3GJc access: 25.01.2020
Oliver, J. (2018): A very brief introduction to species distribution models in R https://jcoliver.github.io/learn-r/011-species-distribution-models.html access: 23.01.2020

SDM_BRT_FINAL_Lukas

Lukas Hilberg

25 1 2020