This projekt deals with species distribution modelling of butterfly species in Pakistan. In the following script, distribution data of butterfly species as well as environmental data from http://chelsa-climate.org/ database is prepared to perform “boosted regression tree”(BRT) modelling process. BRT is an ensemble of methods for fitting statistical models (Elith et al. 2008). Aim of this work is to use the created model in order to predict on potential distribution areas of a particular buttefly species.
benefits of BRT-modelling:
Befor we start, i recommend you to watch the following video. It gives further information and clarification concerning BRT. It illustrates how to use BRT and also the mathmatics behind it.
library(vembedr)
## Warning: package 'vembedr' was built under R version 3.6.2
embed_url("https://www.youtube.com/watch?v=3CC4N4z3GJc")
First of all, you will need the following librarys for the process:
library(dismo)
library(raster)
library(gbm)
library(maptools)
library(rgdal)
library(shapefiles)
After loading in the librarys, we can start with the data preparation now. ***
# distribution-data
distribution_shape <- read.shp("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_Pakistan_all_test.shp")
distribution_dataframe <- as.data.frame(distribution_shape)
occurrence_csv <- paste0(system.file(package="dismo"), "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_shape_csv.csv")
head(occurrence_csv)
## [1] "C:/Users/ich/Documents/R/win-library/3.6/dismoC:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_shape_csv.csv"
occurrence <- read.csv("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/distribution_shape_csv.csv")
head(occurrence)
## X shp.record shp.x shp.y shp.shape.type header.file.code
## 1 1 1 72.07099 35.99492 1 9994
## 2 2 2 71.99009 35.15735 1 9994
## 3 3 3 73.39397 34.45065 1 9994
## 4 4 4 73.16466 33.82338 1 9994
## 5 5 5 73.16823 34.13390 1 9994
## 6 6 6 73.41688 34.00422 1 9994
## header.file.length header.file.version header.shape.type header.xmin
## 1 82230 1000 1 62.23056
## 2 82230 1000 1 62.23056
## 3 82230 1000 1 62.23056
## 4 82230 1000 1 62.23056
## 5 82230 1000 1 62.23056
## 6 82230 1000 1 62.23056
## header.ymin header.xmax header.ymax header.zmin header.zmax header.mmin
## 1 24.13072 77.77783 37.11705 0 0 0
## 2 24.13072 77.77783 37.11705 0 0 0
## 3 24.13072 77.77783 37.11705 0 0 0
## 4 24.13072 77.77783 37.11705 0 0 0
## 5 24.13072 77.77783 37.11705 0 0 0
## 6 24.13072 77.77783 37.11705 0 0 0
## header.mmax
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
occurrence_lonlat <- occurrence [,3:4]
head(occurrence_lonlat)
## shp.x shp.y
## 1 72.07099 35.99492
## 2 71.99009 35.15735
## 3 73.39397 34.45065
## 4 73.16466 33.82338
## 5 73.16823 34.13390
## 6 73.41688 34.00422
dataframe_occurrence_record_lonlat <- occurrence [,2:4]
head(dataframe_occurrence_record_lonlat)
## shp.record shp.x shp.y
## 1 1 72.07099 35.99492
## 2 2 71.99009 35.15735
## 3 3 73.39397 34.45065
## 4 4 73.16466 33.82338
## 5 5 73.16823 34.13390
## 6 6 73.41688 34.00422
write.csv(dataframe_occurrence_record_lonlat, "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/dataframe_occurrence_record_lonlat.csv")
head(dataframe_occurrence_record_lonlat)
## shp.record shp.x shp.y
## 1 1 72.07099 35.99492
## 2 2 71.99009 35.15735
## 3 3 73.39397 34.45065
## 4 4 73.16466 33.82338
## 5 5 73.16823 34.13390
## 6 6 73.41688 34.00422
Now we have built a dataset of the distribution data containing different types of data, showing the lat/lon values of all species sightings.
We go on with cutting out the species and subspecies names of our data. ***
# read in species_csv
species_subspecies <- read.csv("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/distribution_Pakistan_all_species.csv")
species_subspecies_cut <- species_subspecies [,0:2]
head(species_subspecies_cut)
## species.C.80 subspecies.C.80
## 1 Actinor_radians
## 2 Actinor_radians
## 3 Actinor_radians
## 4 Aeromachus_stigmatus
## 5 Aeromachus_stigmatus
## 6 Aeromachus_stigmatus
Now we can combine the two generated dataframes to one:
species_distribution_combined <- data.frame(dataframe_occurrence_record_lonlat, species_subspecies_cut)
head(species_distribution_combined)
## shp.record shp.x shp.y species.C.80 subspecies.C.80
## 1 1 72.07099 35.99492 Actinor_radians
## 2 2 71.99009 35.15735 Actinor_radians
## 3 3 73.39397 34.45065 Actinor_radians
## 4 4 73.16466 33.82338 Aeromachus_stigmatus
## 5 5 73.16823 34.13390 Aeromachus_stigmatus
## 6 6 73.41688 34.00422 Aeromachus_stigmatus
# subset for species: Pyrgus_cashmirensis
subset_Pyrgus_cashmirensis <- subset(species_distribution_combined, species.C.80=="Pyrgus_cashmirensis")
subset_Pyrgus_cashmirensis
## shp.record shp.x shp.y species.C.80 subspecies.C.80
## 363 363 76.34609 35.53195 Pyrgus_cashmirensis pseudoalpinus
## 364 364 75.50427 35.33850 Pyrgus_cashmirensis pseudoalpinus
## 365 365 75.60242 35.95936 Pyrgus_cashmirensis pseudoalpinus
## 366 366 75.31322 35.00677 Pyrgus_cashmirensis pseudoalpinus
## 367 367 75.22376 35.24662 Pyrgus_cashmirensis pseudoalpinus
## 368 368 74.88911 35.27707 Pyrgus_cashmirensis pseudoalpinus
## 369 369 74.64545 35.03342 Pyrgus_cashmirensis pseudoalpinus
## 370 370 74.78403 36.11664 Pyrgus_cashmirensis pseudoalpinus
## 371 371 74.89111 36.29153 Pyrgus_cashmirensis pseudoalpinus
## 372 372 74.59487 36.24870 Pyrgus_cashmirensis pseudoalpinus
## 373 373 75.16594 36.87331 Pyrgus_cashmirensis pseudoalpinus
## 374 374 74.62342 36.82334 Pyrgus_cashmirensis pseudoalpinus
## 375 375 74.27721 36.12735 Pyrgus_cashmirensis pseudoalpinus
## 376 376 74.11302 36.27012 Pyrgus_cashmirensis pseudoalpinus
## 377 377 73.81678 36.45929 Pyrgus_cashmirensis pseudoalpinus
## 378 378 73.79090 36.14564 Pyrgus_cashmirensis pseudoalpinus
## 379 379 73.96044 35.92971 Pyrgus_cashmirensis pseudoalpinus
## 380 380 73.24153 36.80245 Pyrgus_cashmirensis pseudoalpinus
## 381 381 73.14444 36.33322 Pyrgus_cashmirensis pseudoalpinus
## 382 382 72.70274 36.60831 Pyrgus_cashmirensis pseudoalpinus
## 383 383 72.48888 36.51182 Pyrgus_cashmirensis pseudoalpinus
## 384 384 72.34802 36.63651 Pyrgus_cashmirensis pseudoalpinus
## 385 385 72.31571 36.32925 Pyrgus_cashmirensis pseudoalpinus
## 386 386 72.06908 36.23717 Pyrgus_cashmirensis pseudoalpinus
## 387 387 71.88414 36.24764 Pyrgus_cashmirensis pseudoalpinus
## 388 388 73.04208 36.08512 Pyrgus_cashmirensis pseudoalpinus
## 389 389 72.80182 36.10568 Pyrgus_cashmirensis pseudoalpinus
## 390 390 72.62492 36.11361 Pyrgus_cashmirensis pseudoalpinus
## 391 391 72.63135 35.95371 Pyrgus_cashmirensis pseudoalpinus
## 392 392 72.29680 35.96861 Pyrgus_cashmirensis pseudoalpinus
## 393 393 71.83670 35.97789 Pyrgus_cashmirensis pseudoalpinus
## 394 394 72.05561 35.75802 Pyrgus_cashmirensis pseudoalpinus
## 395 395 71.91070 35.60360 Pyrgus_cashmirensis pseudoalpinus
## 396 396 71.73938 35.65428 Pyrgus_cashmirensis pseudoalpinus
## 397 397 70.74320 33.87831 Pyrgus_cashmirensis cashmirensis
## 398 398 71.43325 34.53719 Pyrgus_cashmirensis cashmirensis
## 399 399 76.32010 34.95386 Pyrgus_cashmirensis cashmirensis
## 400 400 75.82137 34.89866 Pyrgus_cashmirensis cashmirensis
## 401 401 74.00003 35.16516 Pyrgus_cashmirensis cashmirensis
## 402 402 73.80492 35.13423 Pyrgus_cashmirensis cashmirensis
## 403 403 73.73651 34.75946 Pyrgus_cashmirensis cashmirensis
## 404 404 73.61873 34.90044 Pyrgus_cashmirensis cashmirensis
## 405 405 73.39967 34.77456 Pyrgus_cashmirensis cashmirensis
subset_Pyrgus_cashmirensis_lat_lon <- subset_Pyrgus_cashmirensis[,2:3]
subset_Pyrgus_cashmirensis_lat_lon
## shp.x shp.y
## 363 76.34609 35.53195
## 364 75.50427 35.33850
## 365 75.60242 35.95936
## 366 75.31322 35.00677
## 367 75.22376 35.24662
## 368 74.88911 35.27707
## 369 74.64545 35.03342
## 370 74.78403 36.11664
## 371 74.89111 36.29153
## 372 74.59487 36.24870
## 373 75.16594 36.87331
## 374 74.62342 36.82334
## 375 74.27721 36.12735
## 376 74.11302 36.27012
## 377 73.81678 36.45929
## 378 73.79090 36.14564
## 379 73.96044 35.92971
## 380 73.24153 36.80245
## 381 73.14444 36.33322
## 382 72.70274 36.60831
## 383 72.48888 36.51182
## 384 72.34802 36.63651
## 385 72.31571 36.32925
## 386 72.06908 36.23717
## 387 71.88414 36.24764
## 388 73.04208 36.08512
## 389 72.80182 36.10568
## 390 72.62492 36.11361
## 391 72.63135 35.95371
## 392 72.29680 35.96861
## 393 71.83670 35.97789
## 394 72.05561 35.75802
## 395 71.91070 35.60360
## 396 71.73938 35.65428
## 397 70.74320 33.87831
## 398 71.43325 34.53719
## 399 76.32010 34.95386
## 400 75.82137 34.89866
## 401 74.00003 35.16516
## 402 73.80492 35.13423
## 403 73.73651 34.75946
## 404 73.61873 34.90044
## 405 73.39967 34.77456
#final presence data ***
p_a <- 1
final_presence <- cbind(subset_Pyrgus_cashmirensis_lat_lon, p_a)
final_presence
## shp.x shp.y p_a
## 363 76.34609 35.53195 1
## 364 75.50427 35.33850 1
## 365 75.60242 35.95936 1
## 366 75.31322 35.00677 1
## 367 75.22376 35.24662 1
## 368 74.88911 35.27707 1
## 369 74.64545 35.03342 1
## 370 74.78403 36.11664 1
## 371 74.89111 36.29153 1
## 372 74.59487 36.24870 1
## 373 75.16594 36.87331 1
## 374 74.62342 36.82334 1
## 375 74.27721 36.12735 1
## 376 74.11302 36.27012 1
## 377 73.81678 36.45929 1
## 378 73.79090 36.14564 1
## 379 73.96044 35.92971 1
## 380 73.24153 36.80245 1
## 381 73.14444 36.33322 1
## 382 72.70274 36.60831 1
## 383 72.48888 36.51182 1
## 384 72.34802 36.63651 1
## 385 72.31571 36.32925 1
## 386 72.06908 36.23717 1
## 387 71.88414 36.24764 1
## 388 73.04208 36.08512 1
## 389 72.80182 36.10568 1
## 390 72.62492 36.11361 1
## 391 72.63135 35.95371 1
## 392 72.29680 35.96861 1
## 393 71.83670 35.97789 1
## 394 72.05561 35.75802 1
## 395 71.91070 35.60360 1
## 396 71.73938 35.65428 1
## 397 70.74320 33.87831 1
## 398 71.43325 34.53719 1
## 399 76.32010 34.95386 1
## 400 75.82137 34.89866 1
## 401 74.00003 35.16516 1
## 402 73.80492 35.13423 1
## 403 73.73651 34.75946 1
## 404 73.61873 34.90044 1
## 405 73.39967 34.77456 1
Now we have the final presence dataset for the species Pyrgus cashmirensis. Later we will combine it with a generated set of pseudo-absence data and the predictor varriables of the environmental layers.
In order to convert our environmental data into the desired dataframe for model procession, have a look at the following procedure: # Preparation of environmental data: chelsa-dataframe
## chelsa data preparation ####
#merge
merged_chelsa_annual_mean_temp_path <- "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/"
merged_chelsa_annual_mean_temp_path
## [1] "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/"
merged_chelsa_annual_mean_temp_files <- list.files("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/")
merged_chelsa_annual_mean_temp_files
## [1] "CHELSA_temp10_01_1979-2013_V1.2_land.tif"
## [2] "CHELSA_temp10_02_1979-2013_V1.2_land.tif"
## [3] "CHELSA_temp10_03_1979-2013_V1.2_land.tif"
## [4] "CHELSA_temp10_04_1979-2013_V1.2_land.tif"
## [5] "CHELSA_temp10_05_1979-2013_V1.2_land.tif"
## [6] "CHELSA_temp10_06_1979-2013_V1.2_land.tif"
## [7] "CHELSA_temp10_07_1979-2013_V1.2_land.tif"
## [8] "CHELSA_temp10_08_1979-2013_V1.2_land (1).tif"
## [9] "CHELSA_temp10_09_1979-2013_V1.2_land.tif"
## [10] "CHELSA_temp10_10_1979-2013_V1.2_land.tif"
## [11] "CHELSA_temp10_11_1979-2013_V1.2_land.tif"
## [12] "CHELSA_temp10_12_1979-2013_V1.2_land.tif"
## [13] "merged"
raster_list <- list(merged_chelsa_annual_mean_temp_files)
raster_list
## [[1]]
## [1] "CHELSA_temp10_01_1979-2013_V1.2_land.tif"
## [2] "CHELSA_temp10_02_1979-2013_V1.2_land.tif"
## [3] "CHELSA_temp10_03_1979-2013_V1.2_land.tif"
## [4] "CHELSA_temp10_04_1979-2013_V1.2_land.tif"
## [5] "CHELSA_temp10_05_1979-2013_V1.2_land.tif"
## [6] "CHELSA_temp10_06_1979-2013_V1.2_land.tif"
## [7] "CHELSA_temp10_07_1979-2013_V1.2_land.tif"
## [8] "CHELSA_temp10_08_1979-2013_V1.2_land (1).tif"
## [9] "CHELSA_temp10_09_1979-2013_V1.2_land.tif"
## [10] "CHELSA_temp10_10_1979-2013_V1.2_land.tif"
## [11] "CHELSA_temp10_11_1979-2013_V1.2_land.tif"
## [12] "CHELSA_temp10_12_1979-2013_V1.2_land.tif"
## [13] "merged"
for(i in 1:12)
{
tilesloop <- paste0(merged_chelsa_annual_mean_temp_path, merged_chelsa_annual_mean_temp_files[i])
tilesloop
raster_list[[i]] <- raster(tilesloop)
rasterliste <- raster_list[[i]]
rasterliste
}
raster_list$filename <- "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/merged/merged_chelsa_annual_mean_temp.tif"
exporttif <- raster_list$filename
exporttif
## [1] "C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/merged/merged_chelsa_annual_mean_temp.tif"
raster_list$overwrite <- TRUE
all_merged_chelsa_annual_mean_temp_raster <- do.call(raster::merge, raster_list, quote=TRUE)
all_merged_chelsa_annual_mean_temp_raster
## class : RasterLayer
## dimensions : 20880, 43200, 902016000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : -180.0001, 179.9999, -90.00014, 83.99986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_mean_temperature/merged/merged_chelsa_annual_mean_temp.tif
## names : merged_chelsa_annual_mean_temp
## values : -457, 345 (min, max)
plot(all_merged_chelsa_annual_mean_temp_raster, main="all_merged chelsa annual mean temperature")
chelsa_annual_global_prec <-
raster("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/annual_global_prec/CHELSA_bio10_12.tif")
chelsa_prec_warmest_quarter <-
raster("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/prec_warmest_quarter/CHELSA_bio10_18.tif")
chelsa_warmest_quarter_mean_temp <-
raster("C:/Users/ich/Desktop/UNI/räumliche_Vorhersage/Pakistan_alles/chelsa/warmest_quarter_mean_t_global/CHELSA_bio10_10.tif")
plot(all_merged_chelsa_annual_mean_temp_raster, main="all_merged chelsa annual mean temperature")
plot(chelsa_annual_global_prec, main="chelsa annual global prec")
plot(chelsa_prec_warmest_quarter, main="chelsa prec warmest quarter")
plot(chelsa_warmest_quarter_mean_temp, main="chelsa warmest quarter mean temperature")
# crop onto pakistan extent
extend_pakistan <- extent(c(58, 83, 23, 38))
cropped_chelsa_annual_mean_temp_raster <- crop(all_merged_chelsa_annual_mean_temp_raster, extend_pakistan)
cropped_chelsa_annual_mean_temp_raster
## class : RasterLayer
## dimensions : 1800, 3000, 5400000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 57.99986, 82.99986, 22.99986, 37.99986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171245_16916_80103.grd
## names : merged_chelsa_annual_mean_temp
## values : -341, 224 (min, max)
plot(cropped_chelsa_annual_mean_temp_raster)
#plot(wrld_simpl, add = TRUE)
cropped_chelsa_annual_global_prec <- crop(chelsa_annual_global_prec,extend_pakistan)
cropped_chelsa_annual_global_prec
## class : RasterLayer
## dimensions : 1800, 3000, 5400000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 57.99986, 82.99986, 22.99986, 37.99986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171249_16916_09070.grd
## names : CHELSA_bio10_12
## values : 14, 3406 (min, max)
plot(cropped_chelsa_annual_global_prec)
#plot(wrld_simpl, add = TRUE)
cropped_chelsa_prec_warmest_quarter <- crop(chelsa_prec_warmest_quarter, extend_pakistan)
cropped_chelsa_prec_warmest_quarter
## class : RasterLayer
## dimensions : 1800, 3000, 5400000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 57.99986, 82.99986, 22.99986, 37.99986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171253_16916_34908.grd
## names : CHELSA_bio10_18
## values : 0, 3221 (min, max)
plot(cropped_chelsa_prec_warmest_quarter)
#plot(wrld_simpl, add = TRUE)
cropped_chelsa_warmest_quarter_mean_temp <- crop(chelsa_warmest_quarter_mean_temp, extend_pakistan,)
cropped_chelsa_warmest_quarter_mean_temp
## class : RasterLayer
## dimensions : 1800, 3000, 5400000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 57.99986, 82.99986, 22.99986, 37.99986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : C:/Users/ich/AppData/Local/Temp/RtmpaYkQIk/raster/r_tmp_2020-01-25_171258_16916_93479.grd
## names : CHELSA_bio10_10
## values : -110, 402 (min, max)
plot(cropped_chelsa_warmest_quarter_mean_temp)
#plot(wrld_simpl, add = TRUE)
# rasterstack
rasterstack_pakistan_chelsa <- raster::stack(cropped_chelsa_annual_global_prec,
cropped_chelsa_annual_mean_temp_raster,
cropped_chelsa_prec_warmest_quarter,
cropped_chelsa_warmest_quarter_mean_temp)
rasterstack_pakistan_chelsa
## class : RasterStack
## dimensions : 1800, 3000, 5400000, 4 (nrow, ncol, ncell, nlayers)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 57.99986, 82.99986, 22.99986, 37.99986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## names : CHELSA_bio10_12, merged_chelsa_annual_mean_temp, CHELSA_bio10_18, CHELSA_bio10_10
## min values : 14, -341, 0, -110
## max values : 3406, 224, 3221, 402
plot(rasterstack_pakistan_chelsa)
# dataframe chelsa
pakistan_chelsa_dataframe <- as.data.frame(rasterstack_pakistan_chelsa)
head(pakistan_chelsa_dataframe)
## CHELSA_bio10_12 merged_chelsa_annual_mean_temp CHELSA_bio10_18
## 1 241 30 22
## 2 229 32 21
## 3 228 29 18
## 4 245 28 18
## 5 281 22 19
## 6 322 9 22
## CHELSA_bio10_10
## 1 286
## 2 288
## 3 285
## 4 283
## 5 277
## 6 263
str(pakistan_chelsa_dataframe)
## 'data.frame': 5400000 obs. of 4 variables:
## $ CHELSA_bio10_12 : int 241 229 228 245 281 322 351 366 372 373 ...
## $ merged_chelsa_annual_mean_temp: int 30 32 29 28 22 9 -5 -9 -14 -16 ...
## $ CHELSA_bio10_18 : int 22 21 18 18 19 22 22 23 23 26 ...
## $ CHELSA_bio10_10 : int 286 288 285 283 277 263 247 243 237 235 ...
The procedure above shows how to merge several environmental layers of one kind in order to use it for the model. After that it is shown how to read in all files needed for prediction, as well as cropping it onto the right extent (extent of pakistan) and adding country borders.
Finally, the created raster stack of all four predictor sets is transformed into a data frame so we can combine it with our presence and absence data later. *** # Generate pseudo-absence data with dismo package
pseudo_absence_pakistan <- randomPoints(cropped_chelsa_annual_global_prec, 42)
pseudo_absence_pakistan
## x y
## [1,] 71.83736 36.95403
## [2,] 70.54569 36.05403
## [3,] 60.44569 25.75403
## [4,] 70.82069 37.57903
## [5,] 73.31236 23.79569
## [6,] 72.22903 24.49569
## [7,] 73.61236 36.05403
## [8,] 70.66236 34.18736
## [9,] 72.03736 24.78736
## [10,] 79.12069 25.17903
## [11,] 82.77069 35.33736
## [12,] 70.83736 32.40403
## [13,] 63.17069 30.96236
## [14,] 78.73736 26.47069
## [15,] 66.97903 27.32903
## [16,] 60.31236 27.81236
## [17,] 80.23736 37.28736
## [18,] 72.65403 32.11236
## [19,] 72.84569 28.15403
## [20,] 58.67069 33.50403
## [21,] 59.74569 29.42903
## [22,] 66.24569 27.56236
## [23,] 75.89569 27.57903
## [24,] 81.13736 34.35403
## [25,] 71.91236 24.87069
## [26,] 60.17069 33.92903
## [27,] 66.65403 37.92069
## [28,] 71.28736 26.52069
## [29,] 75.80403 29.03736
## [30,] 73.11236 24.18736
## [31,] 80.32903 27.62903
## [32,] 78.55403 29.48736
## [33,] 61.86236 32.57903
## [34,] 61.97903 35.27069
## [35,] 71.06236 23.42903
## [36,] 77.27069 30.87069
## [37,] 70.85403 30.77069
## [38,] 82.89569 31.67903
## [39,] 76.27069 27.47069
## [40,] 70.61236 34.64569
## [41,] 70.82903 24.29569
## [42,] 72.78736 23.55403
head(pseudo_absence_pakistan)
## x y
## [1,] 71.83736 36.95403
## [2,] 70.54569 36.05403
## [3,] 60.44569 25.75403
## [4,] 70.82069 37.57903
## [5,] 73.31236 23.79569
## [6,] 72.22903 24.49569
plot(pseudo_absence_pakistan)
# add column "0" for absence:
p_a <- 0
final_absence <- cbind(pseudo_absence_pakistan, p_a)
final_absence
## x y p_a
## [1,] 71.83736 36.95403 0
## [2,] 70.54569 36.05403 0
## [3,] 60.44569 25.75403 0
## [4,] 70.82069 37.57903 0
## [5,] 73.31236 23.79569 0
## [6,] 72.22903 24.49569 0
## [7,] 73.61236 36.05403 0
## [8,] 70.66236 34.18736 0
## [9,] 72.03736 24.78736 0
## [10,] 79.12069 25.17903 0
## [11,] 82.77069 35.33736 0
## [12,] 70.83736 32.40403 0
## [13,] 63.17069 30.96236 0
## [14,] 78.73736 26.47069 0
## [15,] 66.97903 27.32903 0
## [16,] 60.31236 27.81236 0
## [17,] 80.23736 37.28736 0
## [18,] 72.65403 32.11236 0
## [19,] 72.84569 28.15403 0
## [20,] 58.67069 33.50403 0
## [21,] 59.74569 29.42903 0
## [22,] 66.24569 27.56236 0
## [23,] 75.89569 27.57903 0
## [24,] 81.13736 34.35403 0
## [25,] 71.91236 24.87069 0
## [26,] 60.17069 33.92903 0
## [27,] 66.65403 37.92069 0
## [28,] 71.28736 26.52069 0
## [29,] 75.80403 29.03736 0
## [30,] 73.11236 24.18736 0
## [31,] 80.32903 27.62903 0
## [32,] 78.55403 29.48736 0
## [33,] 61.86236 32.57903 0
## [34,] 61.97903 35.27069 0
## [35,] 71.06236 23.42903 0
## [36,] 77.27069 30.87069 0
## [37,] 70.85403 30.77069 0
## [38,] 82.89569 31.67903 0
## [39,] 76.27069 27.47069 0
## [40,] 70.61236 34.64569 0
## [41,] 70.82903 24.29569 0
## [42,] 72.78736 23.55403 0
# rename latlon columns
final_absence_df <- as.data.frame(final_absence)
final_absence_df
## x y p_a
## 1 71.83736 36.95403 0
## 2 70.54569 36.05403 0
## 3 60.44569 25.75403 0
## 4 70.82069 37.57903 0
## 5 73.31236 23.79569 0
## 6 72.22903 24.49569 0
## 7 73.61236 36.05403 0
## 8 70.66236 34.18736 0
## 9 72.03736 24.78736 0
## 10 79.12069 25.17903 0
## 11 82.77069 35.33736 0
## 12 70.83736 32.40403 0
## 13 63.17069 30.96236 0
## 14 78.73736 26.47069 0
## 15 66.97903 27.32903 0
## 16 60.31236 27.81236 0
## 17 80.23736 37.28736 0
## 18 72.65403 32.11236 0
## 19 72.84569 28.15403 0
## 20 58.67069 33.50403 0
## 21 59.74569 29.42903 0
## 22 66.24569 27.56236 0
## 23 75.89569 27.57903 0
## 24 81.13736 34.35403 0
## 25 71.91236 24.87069 0
## 26 60.17069 33.92903 0
## 27 66.65403 37.92069 0
## 28 71.28736 26.52069 0
## 29 75.80403 29.03736 0
## 30 73.11236 24.18736 0
## 31 80.32903 27.62903 0
## 32 78.55403 29.48736 0
## 33 61.86236 32.57903 0
## 34 61.97903 35.27069 0
## 35 71.06236 23.42903 0
## 36 77.27069 30.87069 0
## 37 70.85403 30.77069 0
## 38 82.89569 31.67903 0
## 39 76.27069 27.47069 0
## 40 70.61236 34.64569 0
## 41 70.82903 24.29569 0
## 42 72.78736 23.55403 0
colnames(final_absence_df)
## [1] "x" "y" "p_a"
names(final_absence_df)[1] <- "shp.x"
names(final_absence_df)[2] <- "shp.y"
final_absence_df
## shp.x shp.y p_a
## 1 71.83736 36.95403 0
## 2 70.54569 36.05403 0
## 3 60.44569 25.75403 0
## 4 70.82069 37.57903 0
## 5 73.31236 23.79569 0
## 6 72.22903 24.49569 0
## 7 73.61236 36.05403 0
## 8 70.66236 34.18736 0
## 9 72.03736 24.78736 0
## 10 79.12069 25.17903 0
## 11 82.77069 35.33736 0
## 12 70.83736 32.40403 0
## 13 63.17069 30.96236 0
## 14 78.73736 26.47069 0
## 15 66.97903 27.32903 0
## 16 60.31236 27.81236 0
## 17 80.23736 37.28736 0
## 18 72.65403 32.11236 0
## 19 72.84569 28.15403 0
## 20 58.67069 33.50403 0
## 21 59.74569 29.42903 0
## 22 66.24569 27.56236 0
## 23 75.89569 27.57903 0
## 24 81.13736 34.35403 0
## 25 71.91236 24.87069 0
## 26 60.17069 33.92903 0
## 27 66.65403 37.92069 0
## 28 71.28736 26.52069 0
## 29 75.80403 29.03736 0
## 30 73.11236 24.18736 0
## 31 80.32903 27.62903 0
## 32 78.55403 29.48736 0
## 33 61.86236 32.57903 0
## 34 61.97903 35.27069 0
## 35 71.06236 23.42903 0
## 36 77.27069 30.87069 0
## 37 70.85403 30.77069 0
## 38 82.89569 31.67903 0
## 39 76.27069 27.47069 0
## 40 70.61236 34.64569 0
## 41 70.82903 24.29569 0
## 42 72.78736 23.55403 0
Now we have generated a set of 42 random points as absence data, because according to Elith et al. 2008 it is advisable to use the exact same number of absences as we have it for presences. The data was also given the name “0” as needed for absence data for BRT-modelling. We made sure the data is in the correct form by renaming the colums with the same names as used in the presence dataframe.
Finally, we can combine presence and absence data: ***
final_p_a_lat_lon <- rbind(final_presence, final_absence_df)
final_p_a_lat_lon
## shp.x shp.y p_a
## 363 76.34609 35.53195 1
## 364 75.50427 35.33850 1
## 365 75.60242 35.95936 1
## 366 75.31322 35.00677 1
## 367 75.22376 35.24662 1
## 368 74.88911 35.27707 1
## 369 74.64545 35.03342 1
## 370 74.78403 36.11664 1
## 371 74.89111 36.29153 1
## 372 74.59487 36.24870 1
## 373 75.16594 36.87331 1
## 374 74.62342 36.82334 1
## 375 74.27721 36.12735 1
## 376 74.11302 36.27012 1
## 377 73.81678 36.45929 1
## 378 73.79090 36.14564 1
## 379 73.96044 35.92971 1
## 380 73.24153 36.80245 1
## 381 73.14444 36.33322 1
## 382 72.70274 36.60831 1
## 383 72.48888 36.51182 1
## 384 72.34802 36.63651 1
## 385 72.31571 36.32925 1
## 386 72.06908 36.23717 1
## 387 71.88414 36.24764 1
## 388 73.04208 36.08512 1
## 389 72.80182 36.10568 1
## 390 72.62492 36.11361 1
## 391 72.63135 35.95371 1
## 392 72.29680 35.96861 1
## 393 71.83670 35.97789 1
## 394 72.05561 35.75802 1
## 395 71.91070 35.60360 1
## 396 71.73938 35.65428 1
## 397 70.74320 33.87831 1
## 398 71.43325 34.53719 1
## 399 76.32010 34.95386 1
## 400 75.82137 34.89866 1
## 401 74.00003 35.16516 1
## 402 73.80492 35.13423 1
## 403 73.73651 34.75946 1
## 404 73.61873 34.90044 1
## 405 73.39967 34.77456 1
## 1 71.83736 36.95403 0
## 2 70.54569 36.05403 0
## 3 60.44569 25.75403 0
## 4 70.82069 37.57903 0
## 5 73.31236 23.79569 0
## 6 72.22903 24.49569 0
## 7 73.61236 36.05403 0
## 8 70.66236 34.18736 0
## 9 72.03736 24.78736 0
## 10 79.12069 25.17903 0
## 11 82.77069 35.33736 0
## 12 70.83736 32.40403 0
## 13 63.17069 30.96236 0
## 14 78.73736 26.47069 0
## 15 66.97903 27.32903 0
## 16 60.31236 27.81236 0
## 17 80.23736 37.28736 0
## 18 72.65403 32.11236 0
## 19 72.84569 28.15403 0
## 20 58.67069 33.50403 0
## 21 59.74569 29.42903 0
## 22 66.24569 27.56236 0
## 23 75.89569 27.57903 0
## 24 81.13736 34.35403 0
## 25 71.91236 24.87069 0
## 26 60.17069 33.92903 0
## 27 66.65403 37.92069 0
## 28 71.28736 26.52069 0
## 29 75.80403 29.03736 0
## 30 73.11236 24.18736 0
## 31 80.32903 27.62903 0
## 32 78.55403 29.48736 0
## 33 61.86236 32.57903 0
## 34 61.97903 35.27069 0
## 35 71.06236 23.42903 0
## 36 77.27069 30.87069 0
## 37 70.85403 30.77069 0
## 38 82.89569 31.67903 0
## 39 76.27069 27.47069 0
## 40 70.61236 34.64569 0
## 41 70.82903 24.29569 0
## 42 72.78736 23.55403 0
Since the extract function of the raster package requires a data frame with lat/lon values only, we cut them out of our presence/absence dataset.
final_lat_lon <- final_p_a_lat_lon[,1:2]
final_lat_lon
## shp.x shp.y
## 363 76.34609 35.53195
## 364 75.50427 35.33850
## 365 75.60242 35.95936
## 366 75.31322 35.00677
## 367 75.22376 35.24662
## 368 74.88911 35.27707
## 369 74.64545 35.03342
## 370 74.78403 36.11664
## 371 74.89111 36.29153
## 372 74.59487 36.24870
## 373 75.16594 36.87331
## 374 74.62342 36.82334
## 375 74.27721 36.12735
## 376 74.11302 36.27012
## 377 73.81678 36.45929
## 378 73.79090 36.14564
## 379 73.96044 35.92971
## 380 73.24153 36.80245
## 381 73.14444 36.33322
## 382 72.70274 36.60831
## 383 72.48888 36.51182
## 384 72.34802 36.63651
## 385 72.31571 36.32925
## 386 72.06908 36.23717
## 387 71.88414 36.24764
## 388 73.04208 36.08512
## 389 72.80182 36.10568
## 390 72.62492 36.11361
## 391 72.63135 35.95371
## 392 72.29680 35.96861
## 393 71.83670 35.97789
## 394 72.05561 35.75802
## 395 71.91070 35.60360
## 396 71.73938 35.65428
## 397 70.74320 33.87831
## 398 71.43325 34.53719
## 399 76.32010 34.95386
## 400 75.82137 34.89866
## 401 74.00003 35.16516
## 402 73.80492 35.13423
## 403 73.73651 34.75946
## 404 73.61873 34.90044
## 405 73.39967 34.77456
## 1 71.83736 36.95403
## 2 70.54569 36.05403
## 3 60.44569 25.75403
## 4 70.82069 37.57903
## 5 73.31236 23.79569
## 6 72.22903 24.49569
## 7 73.61236 36.05403
## 8 70.66236 34.18736
## 9 72.03736 24.78736
## 10 79.12069 25.17903
## 11 82.77069 35.33736
## 12 70.83736 32.40403
## 13 63.17069 30.96236
## 14 78.73736 26.47069
## 15 66.97903 27.32903
## 16 60.31236 27.81236
## 17 80.23736 37.28736
## 18 72.65403 32.11236
## 19 72.84569 28.15403
## 20 58.67069 33.50403
## 21 59.74569 29.42903
## 22 66.24569 27.56236
## 23 75.89569 27.57903
## 24 81.13736 34.35403
## 25 71.91236 24.87069
## 26 60.17069 33.92903
## 27 66.65403 37.92069
## 28 71.28736 26.52069
## 29 75.80403 29.03736
## 30 73.11236 24.18736
## 31 80.32903 27.62903
## 32 78.55403 29.48736
## 33 61.86236 32.57903
## 34 61.97903 35.27069
## 35 71.06236 23.42903
## 36 77.27069 30.87069
## 37 70.85403 30.77069
## 38 82.89569 31.67903
## 39 76.27069 27.47069
## 40 70.61236 34.64569
## 41 70.82903 24.29569
## 42 72.78736 23.55403
rasValue_pakistan_all <- extract(rasterstack_pakistan_chelsa, final_lat_lon)
rasValue_pakistan_all
## CHELSA_bio10_12 merged_chelsa_annual_mean_temp CHELSA_bio10_18
## [1,] 138 -170 21
## [2,] 269 -105 31
## [3,] 157 -195 27
## [4,] 858 -157 88
## [5,] 471 -171 83
## [6,] 397 -101 78
## [7,] 964 -166 158
## [8,] 200 -204 58
## [9,] 149 -181 39
## [10,] 81 -119 28
## [11,] 54 -165 15
## [12,] 63 -146 17
## [13,] 98 -86 28
## [14,] 74 -130 25
## [15,] 55 -109 16
## [16,] 89 -92 30
## [17,] 146 -140 51
## [18,] 41 -134 11
## [19,] 124 -176 35
## [20,] 178 -188 25
## [21,] 138 -122 17
## [22,] 141 -141 9
## [23,] 231 -108 18
## [24,] 238 -97 12
## [25,] 355 -193 13
## [26,] 230 -165 70
## [27,] 217 -138 54
## [28,] 288 -148 37
## [29,] 411 -173 93
## [30,] 512 -170 62
## [31,] 361 -74 30
## [32,] 660 -111 92
## [33,] 601 -37 121
## [34,] 438 -12 38
## [35,] 332 19 71
## [36,] 476 85 101
## [37,] 182 -151 32
## [38,] 517 -178 58
## [39,] 571 -140 148
## [40,] 698 -130 210
## [41,] 1220 -131 518
## [42,] 988 -63 345
## [43,] 1450 -109 753
## [44,] 187 -193 8
## [45,] 627 -156 12
## [46,] 142 192 5
## [47,] 514 -93 6
## [48,] 870 190 7
## [49,] 580 177 98
## [50,] 133 -149 52
## [51,] 333 74 47
## [52,] 473 177 72
## [53,] 957 161 134
## [54,] 171 -189 118
## [55,] 393 111 63
## [56,] 76 90 0
## [57,] 787 154 111
## [58,] 190 139 35
## [59,] 128 126 17
## [60,] 23 -41 9
## [61,] 424 123 70
## [62,] 210 162 50
## [63,] 201 38 1
## [64,] 44 137 1
## [65,] 265 108 124
## [66,] 484 134 87
## [67,] 194 -241 152
## [68,] 414 182 58
## [69,] 216 49 0
## [70,] 309 -37 11
## [71,] 190 168 43
## [72,] 419 134 73
## [73,] 809 182 139
## [74,] 1037 146 205
## [75,] 1358 135 227
## [76,] 96 79 0
## [77,] 299 38 0
## [78,] 490 202 91
## [79,] 1595 44 958
## [80,] 270 139 49
## [81,] 148 -145 150
## [82,] 763 122 113
## [83,] 259 53 32
## [84,] 363 199 55
## [85,] 682 197 138
## CHELSA_bio10_10
## [1,] 63
## [2,] 128
## [3,] 36
## [4,] 73
## [5,] 61
## [6,] 128
## [7,] 61
## [8,] 29
## [9,] 52
## [10,] 115
## [11,] 66
## [12,] 89
## [13,] 148
## [14,] 104
## [15,] 128
## [16,] 141
## [17,] 90
## [18,] 108
## [19,] 58
## [20,] 50
## [21,] 116
## [22,] 98
## [23,] 126
## [24,] 135
## [25,] 39
## [26,] 66
## [27,] 92
## [28,] 83
## [29,] 53
## [30,] 57
## [31,] 152
## [32,] 110
## [33,] 177
## [34,] 205
## [35,] 230
## [36,] 287
## [37,] 83
## [38,] 54
## [39,] 80
## [40,] 87
## [41,] 81
## [42,] 146
## [43,] 99
## [44,] 47
## [45,] 75
## [46,] 365
## [47,] 146
## [48,] 321
## [49,] 324
## [50,] 81
## [51,] 292
## [52,] 330
## [53,] 339
## [54,] 17
## [55,] 325
## [56,] 353
## [57,] 343
## [58,] 340
## [59,] 347
## [60,] 254
## [61,] 335
## [62,] 364
## [63,] 285
## [64,] 370
## [65,] 312
## [66,] 325
## [67,] 0
## [68,] 337
## [69,] 299
## [70,] 208
## [71,] 344
## [72,] 333
## [73,] 319
## [74,] 325
## [75,] 313
## [76,] 346
## [77,] 285
## [78,] 331
## [79,] 215
## [80,] 354
## [81,] 81
## [82,] 311
## [83,] 277
## [84,] 340
## [85,] 326
Now we can create a dataframe out of our generated datasets for 1. occurences, 2. absences and 3. environmental predictors:
pakistan_butterflies_train_final_Pyrgus_cashmirensis <- data.frame(final_p_a_lat_lon, rasValue_pakistan_all)
pakistan_butterflies_train_final_Pyrgus_cashmirensis
## shp.x shp.y p_a CHELSA_bio10_12 merged_chelsa_annual_mean_temp
## 363 76.34609 35.53195 1 138 -170
## 364 75.50427 35.33850 1 269 -105
## 365 75.60242 35.95936 1 157 -195
## 366 75.31322 35.00677 1 858 -157
## 367 75.22376 35.24662 1 471 -171
## 368 74.88911 35.27707 1 397 -101
## 369 74.64545 35.03342 1 964 -166
## 370 74.78403 36.11664 1 200 -204
## 371 74.89111 36.29153 1 149 -181
## 372 74.59487 36.24870 1 81 -119
## 373 75.16594 36.87331 1 54 -165
## 374 74.62342 36.82334 1 63 -146
## 375 74.27721 36.12735 1 98 -86
## 376 74.11302 36.27012 1 74 -130
## 377 73.81678 36.45929 1 55 -109
## 378 73.79090 36.14564 1 89 -92
## 379 73.96044 35.92971 1 146 -140
## 380 73.24153 36.80245 1 41 -134
## 381 73.14444 36.33322 1 124 -176
## 382 72.70274 36.60831 1 178 -188
## 383 72.48888 36.51182 1 138 -122
## 384 72.34802 36.63651 1 141 -141
## 385 72.31571 36.32925 1 231 -108
## 386 72.06908 36.23717 1 238 -97
## 387 71.88414 36.24764 1 355 -193
## 388 73.04208 36.08512 1 230 -165
## 389 72.80182 36.10568 1 217 -138
## 390 72.62492 36.11361 1 288 -148
## 391 72.63135 35.95371 1 411 -173
## 392 72.29680 35.96861 1 512 -170
## 393 71.83670 35.97789 1 361 -74
## 394 72.05561 35.75802 1 660 -111
## 395 71.91070 35.60360 1 601 -37
## 396 71.73938 35.65428 1 438 -12
## 397 70.74320 33.87831 1 332 19
## 398 71.43325 34.53719 1 476 85
## 399 76.32010 34.95386 1 182 -151
## 400 75.82137 34.89866 1 517 -178
## 401 74.00003 35.16516 1 571 -140
## 402 73.80492 35.13423 1 698 -130
## 403 73.73651 34.75946 1 1220 -131
## 404 73.61873 34.90044 1 988 -63
## 405 73.39967 34.77456 1 1450 -109
## 1 71.83736 36.95403 0 187 -193
## 2 70.54569 36.05403 0 627 -156
## 3 60.44569 25.75403 0 142 192
## 4 70.82069 37.57903 0 514 -93
## 5 73.31236 23.79569 0 870 190
## 6 72.22903 24.49569 0 580 177
## 7 73.61236 36.05403 0 133 -149
## 8 70.66236 34.18736 0 333 74
## 9 72.03736 24.78736 0 473 177
## 10 79.12069 25.17903 0 957 161
## 11 82.77069 35.33736 0 171 -189
## 12 70.83736 32.40403 0 393 111
## 13 63.17069 30.96236 0 76 90
## 14 78.73736 26.47069 0 787 154
## 15 66.97903 27.32903 0 190 139
## 16 60.31236 27.81236 0 128 126
## 17 80.23736 37.28736 0 23 -41
## 18 72.65403 32.11236 0 424 123
## 19 72.84569 28.15403 0 210 162
## 20 58.67069 33.50403 0 201 38
## 21 59.74569 29.42903 0 44 137
## 22 66.24569 27.56236 0 265 108
## 23 75.89569 27.57903 0 484 134
## 24 81.13736 34.35403 0 194 -241
## 25 71.91236 24.87069 0 414 182
## 26 60.17069 33.92903 0 216 49
## 27 66.65403 37.92069 0 309 -37
## 28 71.28736 26.52069 0 190 168
## 29 75.80403 29.03736 0 419 134
## 30 73.11236 24.18736 0 809 182
## 31 80.32903 27.62903 0 1037 146
## 32 78.55403 29.48736 0 1358 135
## 33 61.86236 32.57903 0 96 79
## 34 61.97903 35.27069 0 299 38
## 35 71.06236 23.42903 0 490 202
## 36 77.27069 30.87069 0 1595 44
## 37 70.85403 30.77069 0 270 139
## 38 82.89569 31.67903 0 148 -145
## 39 76.27069 27.47069 0 763 122
## 40 70.61236 34.64569 0 259 53
## 41 70.82903 24.29569 0 363 199
## 42 72.78736 23.55403 0 682 197
## CHELSA_bio10_18 CHELSA_bio10_10
## 363 21 63
## 364 31 128
## 365 27 36
## 366 88 73
## 367 83 61
## 368 78 128
## 369 158 61
## 370 58 29
## 371 39 52
## 372 28 115
## 373 15 66
## 374 17 89
## 375 28 148
## 376 25 104
## 377 16 128
## 378 30 141
## 379 51 90
## 380 11 108
## 381 35 58
## 382 25 50
## 383 17 116
## 384 9 98
## 385 18 126
## 386 12 135
## 387 13 39
## 388 70 66
## 389 54 92
## 390 37 83
## 391 93 53
## 392 62 57
## 393 30 152
## 394 92 110
## 395 121 177
## 396 38 205
## 397 71 230
## 398 101 287
## 399 32 83
## 400 58 54
## 401 148 80
## 402 210 87
## 403 518 81
## 404 345 146
## 405 753 99
## 1 8 47
## 2 12 75
## 3 5 365
## 4 6 146
## 5 7 321
## 6 98 324
## 7 52 81
## 8 47 292
## 9 72 330
## 10 134 339
## 11 118 17
## 12 63 325
## 13 0 353
## 14 111 343
## 15 35 340
## 16 17 347
## 17 9 254
## 18 70 335
## 19 50 364
## 20 1 285
## 21 1 370
## 22 124 312
## 23 87 325
## 24 152 0
## 25 58 337
## 26 0 299
## 27 11 208
## 28 43 344
## 29 73 333
## 30 139 319
## 31 205 325
## 32 227 313
## 33 0 346
## 34 0 285
## 35 91 331
## 36 958 215
## 37 49 354
## 38 150 81
## 39 113 311
## 40 32 277
## 41 55 340
## 42 138 326
Pyrgus_cashmirensis.tc5.lr01 <- gbm.step(data=pakistan_butterflies_train_final_Pyrgus_cashmirensis, gbm.x = 4:7, gbm.y = 3, family = "bernoulli", tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)
##
##
## GBM STEP - version 2.9
##
## Performing cross-validation optimisation of a boosted regression tree model
## for p_a and using a family of bernoulli
## Using 85 observations and 4 predictors
## creating 10 initial models of 50 trees
##
## folds are stratified by prevalence
## total mean deviance = 1.3862
## tolerance is fixed at 0.0014
## ntrees resid. dev.
## 50 1.0092
## now adding trees...
## 100 0.8515
## 150 0.7829
## 200 0.7504
## 250 0.7401
## 300 0.7321
## 350 0.7298
## 400 0.7234
## 450 0.7249
## 500 0.7285
## 550 0.7387
## 600 0.7317
## 650 0.7282
## 700 0.7401
## 750 0.7467
## 800 0.7498
## 850 0.7637
## 900 0.7672
## 950 0.7693
## 1000 0.7686
## 1050 0.7674
## fitting final gbm model with a fixed number of 400 trees for p_a
##
## mean total deviance = 1.386
## mean residual deviance = 0.354
##
## estimated cv deviance = 0.723 ; se = 0.138
##
## training data correlation = 0.903
## cv correlation = 0.786 ; se = 0.056
##
## training data AUC score = 0.994
## cv AUC score = 0.905 ; se = 0.03
##
## elapsed time - 0.01 minutes
A model is built with the pakistan training data, using chelsa climate data as predictors and the binary absence/presence data as the response variable (as required by family = “bernoulli” in the BRT-model).
We are using a tree complexity of 5, a learning rate of 0,01 (since we have only 84 presence/absence points we need a low learning rate) and a bag fraction of 0,5.
Elith, J., Leathwick, J. R. and Hastie, T. (2008): A working guide to boosted regression trees, Journal of Animal Ecology 77, 802–813.
Hijmans, R. J. and Elith, J. (2019): Spatial Distribution Models, https://rspatial.org/sdm/SDM.pdf access: 12.01.2020
https://www.youtube.com/watch?v=3CC4N4z3GJc access: 25.01.2020
Oliver, J. (2018): A very brief introduction to species distribution models in R https://jcoliver.github.io/learn-r/011-species-distribution-models.html access: 23.01.2020