Species Distribution Modelling with GAMs

Jens Meyer

15 6 2021


1 Generalized Additive Modells

library(mgcv)

model <- gam(presence ~ s(x1) + s(x2) + s(x3) +     # one dimensional smooths
                s(x4, x5),                          # two dimensional smooths
                data = __,                          # your data
                family = ___,                       # your distribution (e.g. gaussian)
                method = "REML"                     # sets parameter for your smooths
                ...)                        

1.1 Smooths / Splines

Generalized Additive Models use spline functions (smooths).

1.1.1 Basis functions

The spline functions are composed of simpler basis functions witch are weighted and summed up.
\[s(x) = \sum_{k = 1}^K \beta_k b_k(x)\]


The more basis functions a smooth is made of, the more complex of a relationship the smooth can model (Simpson 2020). With the parameter k in s(x1, k = __) you can pick the number of basis functions.
source (Introduction to Generalized Additive Models with R and mgcv by Gavin Simpson)

To avoid over-fitting the Data and modelling noise, the “Wiggliness” of the Spline is penalized (Ross 2017). The “Wiggliness” is measured via the curvature \(\int_{\mathbb{R}} [f^{\prime\prime}]^2 dx = \boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta} = \large{W}\) of the function which is subtracted from the Fit with a factor \(\lambda\) (Simpson 2020). \[\mathcal{L}_p = \log(\text{Likelihood}) - \lambda W\]

The larger \(\lambda\) the straighter the spline function. With the argument method = “REML” (Restricted Maximum Likelihood) \(\lambda\) is picked automatically. One parameter you have check and pick manually sometimes is the amount of basis functions k the smooth is made of. k has to be large enough that the “true function” can be represented properly, but not too big due to computational costs (Simpson 2020).

1.1.2 Different types of smooths

In mgcv different smooth classes for different modelling tasks are implemented. The type of smooth can be picked with the bs parameter in s(x1, bs = __).

The default is a low-rank thin plate spline bs = 'tp'

Many others available

  • Cyclic splines bs = 'cc' or bs = 'cp'
  • Duchon splines bs = 'ds'
  • Spline on the sphere bs = 'sos'
  • MRFs bs = 'mrf'
  • Soap-film smooth bs = 'so'

For more information on different Types of Smooths have a look at…

?mgcv::smooth.terms

1.1.3 Two - Dimensional smooths

With s(x1, x2)you can include relationships between the two variables x1 and x2. If you want to use the s() function for a two-dimensional smooth x1 and x2 should be equally “wiggly” in both dimensions. te() and ti() are other functions in mgcv for creating multidimensional smooths (Ross 2017).

1.2 Different Distributions

With the argument family in gam(presence ~ (...), family = ___) you can specify the type and distribution of your data the model should fit like gaussian, binomial etc. For the species richness map of butterflies in Pakistan I will use a binomial distribution family = binomial witch is useful for Occurrence Data (Ross 2017).

For more information on different distributions have a look at…

?mgcv::family.mgcv

1.3 Summary

  1. GAMs are able to model non linear relationships between one - or multidimensional variables
  2. GAMs sum up little basis-functions to smooths
  3. GAMs use a penalty for curvature to trade of between generality and fit
  4. Make sure the smooths can be wiggly enough with s(..., k = __)
  5. Choose the right smooth-type for your data with s(..., bs = __)
  6. Specify the right distribution for your data with gam(..., family = __)

For further Information on GAMs watch the lectures by Gavin Simpson (https://www.youtube.com/watch?v=sgw4cu8hrZM) and Noam Ross (https://www.youtube.com/watch?v=q4_t8jXcQgc)

2 Modeling Species Richness

2.1 Data

2.1.1 Occurence data

The Occurrence data set used in this project contains 5870 sightings of butterflies with species and geographical coordinates attached.

##          species        x        y
## 1 Acraea_issoria 73.08635 34.43060
## 2 Acraea_issoria 73.36326 34.24367
## 3 Acraea_issoria 73.36758 34.14221
## 4 Acraea_issoria 73.82637 33.84266
## 5 Acraea_issoria 74.19839 33.94852
## 6  Acraea_violae 72.34951 31.70947

2.1.2 Environmental data

For the prediction of the spices distribution RasterData from WorldClim was used. The data sets used are raster - data sets with BioClimatic variables and Elevation with a resolution of 5min. The Coordinate System throughout the whole project is WGS 84.

2.1.3 People density

Additional to the environmental data, a raster - data set with population numbers was downloaded from https://worldpop.org.

2.2 Preprocessing

The aim of the preprocessing is to generate a dataframe containing coordinates and the values of each predictor within the Pakistani border. Each row should represent a raster cells with the same extent and spatial resolution as the cells of the raster data sets from WorldClim. Predictors with high collinearity are removed from the dataframe.

2.2.1 Loading required packages

library(sp)
library(raster)
library(plyr)
library(ggplot2)
library(viridis)
library(RColorBrewer)
library(rgdal)
library(mgcv)
library(gratia)

#boosting memory limit
memory.limit(9999999999)

2.2.2 Loading occurence - and environmental data sets

## Pakistans border
pak_border <- raster::getData(name = "GADM", country = "PAK", level = 1, path = "data/raw/")


## Butterfly occurrences
points <- read.csv(file = "data/raw/butterflies.csv")


### georeferencing
coordinates(points) <- c("x", "y")
points@proj4string@projargs <- pak_border@proj4string@projargs


## bioclim data
files <- list.files(path = "data/raw/wc2.1_5m_bio/", full.names = TRUE)
predictors <- stack(files)
predictors_crop <- crop(predictors, pak_border)
pred_pak <- mask(predictors_crop, pak_border)
## elevation data
elev <- raster("data/raw/wc2.1_5m_elev/wc2.1_5m_elev.tif")
elev <- crop(elev, pak_border)
elev <- mask(elev, pak_border)

pred_pak$elev <- elev

2.2.3 Loading and transforming people density data set

With aggregate(ppd, fact = __, fun = sum) the spatial resolution of the data set is changed until it has the resolution of the WorldClim data sets. cellStats(...) is used to check if values are changed in the process. resample(ppd_100, pred_pak, method = "ngb") ensures that extent and resolution of both rasters are the same.

## peopledensity of Pakistan
ppd <- raster("data/raw/pak_ppp_2020_UNadj.tif")
cellStats(ppd, stat = sum)
## [1] 220892332
ppd_100 <- aggregate(ppd, fact = 100, fun = sum)
cellStats(ppd_100, stat = sum)
## [1] 220892331
1/(res(ppd_100)/res(pred_pak))
## [1] 1 1
ppd <- resample(x= ppd_100, y=pred_pak, method="ngb")
cellStats(ppd, sum)
## [1] 220892325
ppd <- mask(ppd, pak_border)
pred_pak$ppd <- ppd

2.2.4 Removing predictors with high collinearity

Now we remove all predictors with high collinearity which means that one predictor can be estimated using the other predictors (Naimi et al. 2014). Therefore we use the vifstep() function from the usdm package with a threshold of 5.

set.seed(123)
vif <- usdm::vifstep(pred_pak, th = 5)
vif
## 13 variables from the 21 input variables have collinearity problem: 
##  
## wc2.1_5m_bio_6 wc2.1_5m_bio_11 wc2.1_5m_bio_10 wc2.1_5m_bio_1 wc2.1_5m_bio_3 wc2.1_5m_bio_12 wc2.1_5m_bio_5 wc2.1_5m_bio_13 wc2.1_5m_bio_7 wc2.1_5m_bio_17 elev wc2.1_5m_bio_16 wc2.1_5m_bio_19 
## 
## After excluding the collinear variables, the linear correlation coefficients ranges between: 
## min correlation ( ppd ~ wc2.1_5m_bio_9 ):  0.03585369 
## max correlation ( wc2.1_5m_bio_9 ~ wc2.1_5m_bio_2 ):  0.6976587 
## 
## ---------- VIFs of the remained variables -------- 
##         Variables      VIF
## 1 wc2.1_5m_bio_14 3.956877
## 2 wc2.1_5m_bio_15 3.142364
## 3 wc2.1_5m_bio_18 3.391822
## 4  wc2.1_5m_bio_2 3.130466
## 5  wc2.1_5m_bio_4 2.343441
## 6  wc2.1_5m_bio_8 3.197521
## 7  wc2.1_5m_bio_9 2.935347
## 8             ppd 1.121282
pred_pak <- usdm::exclude(pred_pak, vif = vif)

For more information on collinearity and the variance inflation factor have a look at

?usdm::vif

2.2.5 Let’s take a look at our preditors

2.2.6 Creating absence points

At first points at the center of each raster cell are created. Those points are used to extract the cellnumbers of the raster which are then used to extract coordinates later on. The cellnumbers are extracted with the argument cellnumbers = TRUE of the extract() function. The cellnumbers of cells with butterfly occurrences are later used for generating absence points. df[,4] contains the cellnumbers for which the coordinates are extracted and later saved in the x and y column.

points_df <- as.data.frame(points)
df <- extract(x = pred_pak, y = points, cellnumbers = TRUE, sp = TRUE)
df <- as.data.frame(df)

cell_coords <- coordinates(pred_pak)[df[,4],]
cell_coords <- as.data.frame(cell_coords)


df$x <- cell_coords$x
df$y <- cell_coords$y

df[is.na(df$x),]
##                    species  x  y cells Precipitation_of_Driest_Month
## 3854   Parnassius_delphius NA NA    NA                            NA
## 3901    Parnassius_epaphus NA NA    NA                            NA
## 3947     Parnassius_loxias NA NA    NA                            NA
## 4617 Polyommatus_chrysopis NA NA    NA                            NA
##      Precipitation_Seasonality Precipitation_of_Warmest_Quarter
## 3854                        NA                               NA
## 3901                        NA                               NA
## 3947                        NA                               NA
## 4617                        NA                               NA
##      Mean_Diurnal_Range Temperature_Seasonality
## 3854                 NA                      NA
## 3901                 NA                      NA
## 3947                 NA                      NA
## 4617                 NA                      NA
##      Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 3854                                  NA                                 NA
## 3901                                  NA                                 NA
## 3947                                  NA                                 NA
## 4617                                  NA                                 NA
##      People_Density
## 3854             NA
## 3901             NA
## 3947             NA
## 4617             NA
df[is.na(df$y),]
##                    species  x  y cells Precipitation_of_Driest_Month
## 3854   Parnassius_delphius NA NA    NA                            NA
## 3901    Parnassius_epaphus NA NA    NA                            NA
## 3947     Parnassius_loxias NA NA    NA                            NA
## 4617 Polyommatus_chrysopis NA NA    NA                            NA
##      Precipitation_Seasonality Precipitation_of_Warmest_Quarter
## 3854                        NA                               NA
## 3901                        NA                               NA
## 3947                        NA                               NA
## 4617                        NA                               NA
##      Mean_Diurnal_Range Temperature_Seasonality
## 3854                 NA                      NA
## 3901                 NA                      NA
## 3947                 NA                      NA
## 4617                 NA                      NA
##      Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 3854                                  NA                                 NA
## 3901                                  NA                                 NA
## 3947                                  NA                                 NA
## 4617                                  NA                                 NA
##      People_Density
## 3854             NA
## 3901             NA
## 3947             NA
## 4617             NA

2.2.7 Remove points outside of the raster extend

There are four points in the data set which are outside of the raster extend and therefore have no values in the x and y columns.

bad <- c(3854,3901,3947,4617)
bad <- points_df[bad,]

coordinates(bad) <- c(2,3)
crs(bad) <- crs(pred_pak)

plot(bad, main = "Points outside the raster extent")
plot(pred_pak$Precipitation_of_Driest_Month, add = TRUE, legend = FALSE)
plot(pak_border, add = TRUE)
abline(h = extent(pred_pak)[4])

The points are above the raster extend and therefore extract() assigned NAs. To move on these points must be removed.

points <- points_df[-c(3854, 3901, 3947, 4617),]
coordinates(points) <- c(2,3)
crs(points) <- crs(pred_pak)

df <- extract(x = pred_pak, y = points, cellnumbers = TRUE, sp = TRUE)
df <- as.data.frame(df)

cell_coords <- coordinates(pred_pak)[df[,4],]
cell_coords <- as.data.frame(cell_coords)

df$x <- cell_coords$x
df$y <- cell_coords$y

df[is.na(df$x),]
##  [1] species                             x                                  
##  [3] y                                   cells                              
##  [5] Precipitation_of_Driest_Month       Precipitation_Seasonality          
##  [7] Precipitation_of_Warmest_Quarter    Mean_Diurnal_Range                 
##  [9] Temperature_Seasonality             Mean_Temperature_of_Wettest_Quarter
## [11] Mean_Temperature_of_Driest_Quarter  People_Density                     
## <0 Zeilen> (oder row.names mit Länge 0)
df[is.na(df$y),]
##  [1] species                             x                                  
##  [3] y                                   cells                              
##  [5] Precipitation_of_Driest_Month       Precipitation_Seasonality          
##  [7] Precipitation_of_Warmest_Quarter    Mean_Diurnal_Range                 
##  [9] Temperature_Seasonality             Mean_Temperature_of_Wettest_Quarter
## [11] Mean_Temperature_of_Driest_Quarter  People_Density                     
## <0 Zeilen> (oder row.names mit Länge 0)

Now there are no more rows with NAs in the x or y column

2.2.8 Remove all cells with no preditors

There are still those Points left with cellnumbers and therefore x and y values but with Nas in all the other columns. These points are removed too. I therefore borrowed a function from a stackoverflow thread.

cell_coords_na <- df[rowSums(is.na(df)) == 8,c(2,3)]
coordinates(cell_coords_na) <- c(1,2)
crs(cell_coords_na) <- crs(pred_pak)


delete.na <- function(DF, n=0) {
  DF[rowSums(is.na(DF)) <= n,]
}
df <- delete.na(df, n = 7)

Let’s have a look at our data:

cells <- df[,c(2,3)]
coordinates(cells) <- c(1,2)
crs(cells) <- crs(pred_pak)

plot(pred_pak$Precipitation_of_Driest_Month, main = "Butterfly occurrences")
plot(cell_coords_na, add = TRUE, cex = 0.2, col = "red")
plot(cells, add = TRUE, cex = 0.2, col = "black")

2.2.9 seperating the different butterflyspecies and saving them in a list

The smooths of the GAMs applied later don’t work with dataframes only containing a few rows. For that only species with 10 or more occurrences where kept in the dataframe. Species with 9 or less where excluded. Of the 429 species of the original data set 212 were excluded.

species <- unique(df$species)
species_list <- list()

for (species_i in species) {
  
  
  if (length(df[df$species == species_i, 1]) >= 10) {
    species_list[[species_i]] <- df[df$species == species_i, ]
  }
  
}
species <- names(species_list)
print(species)
##   [1] "Aglais_caschmirensis"    "Aglais_rizana"          
##   [3] "Anaphaeis_aurota"        "Apolria_nabellica"      
##   [5] "Aporia_belucha"          "Aporia_soracta"         
##   [7] "Argynnis_aglaja"         "Argynnis_childreni"     
##   [9] "Argynnis_jainadeva"      "Argynnis_kamala"        
##  [11] "Argyreus_hyperbius"      "Arhoplala_dodonaea"     
##  [13] "Ariadne_merione"         "Athyma_opalina"         
##  [15] "Aulocera_brahminus"      "Aulocera_gilgitica"     
##  [17] "Aulocera_padma"          "Aulocera_saraswati"     
##  [19] "Aulocera_swaha"          "Azanus_ubaldus"         
##  [21] "Azanus_uranus"           "Badamia_axclamationis"  
##  [23] "Boloria_sipora"          "Borbo_bevani"           
##  [25] "Byasa_polyeuctes"        "Callerebia_annada"      
##  [27] "Carcharodus_alceae"      "Carcharodus_dravira"    
##  [29] "Catopsilia_pomona"       "Catopsilia_pyranthe"    
##  [31] "Celastrina_argiolus"     "Chaetopracta_odata"     
##  [33] "Chazara_enervata"        "Chazara_heydenreichi"   
##  [35] "Chilades_putli"          "Chilades_trochylus"     
##  [37] "Cigaritis_acamas"        "Cigaritis_chitralensis" 
##  [39] "Clossiana_jerdoni"       "Colias_alpherakii"      
##  [41] "Colias_eogene"           "Colias_erate"           
##  [43] "Colias_fieldii"          "Colias_wiskotti"        
##  [45] "Colotis_calais"          "Colotis_etrida"         
##  [47] "Colotis_fausta"          "Colotis_protractus"     
##  [49] "Colotis_vestalis"        "Danaus_chrysippus"      
##  [51] "Danaus_genutia"          "Deudoryx_epijarbas"     
##  [53] "Dodona_durga"            "Dodona_eugenes"         
##  [55] "Eogenes_lesliei"         "Erymnis_pathan"         
##  [57] "Euchloe_daphalis"        "Euchloe_lucilla"        
##  [59] "Euchrysops_cnejus"       "Eurema_hecabe"          
##  [61] "Eurema_laeta"            "Euthalia_aconthea"      
##  [63] "Everes_argiades"         "Everes_dipora"          
##  [65] "Gegenes_nostrodamus"     "Gegenes_pumilio"        
##  [67] "Gonepteryx_mahaguru"     "Gonepteryx_nepalensis"  
##  [69] "Graphium_cloanthus"      "Hasora_chromus"         
##  [71] "Heliophorus_bakeri"      "Hesperia_comma"         
##  [73] "Hipparchia_parisatis"    "Hypolimnas_bolina"      
##  [75] "Hypolimnas_misippus"     "Hyponephele_brevistigma"
##  [77] "Hyponephele_cheena"      "Hyponephele_chitralica" 
##  [79] "Hyponephele_cyri"        "Hyponephele_naricina"   
##  [81] "Hyponephele_pulchella"   "Hyponephele_pulchra"    
##  [83] "Hyponephele_tenuistigma" "Hyrcanana_evansii"      
##  [85] "Iolana_gigantea"         "Issoria_lathonia"       
##  [87] "Ixias_pyrene"            "Junonia_almana"         
##  [89] "Junonia_hierta"          "Junonia_lemonias"       
##  [91] "Junonia_orithya"         "Kanetisa_digna"         
##  [93] "Kaniska_canace"          "Karanasa_astorica"      
##  [95] "Karanasa_bolorica"       "Karanasa_moorei"        
##  [97] "Karanasa_pupilata"       "Kirinia_cashmirensis"   
##  [99] "Lachides_contracta"      "Lampides_boeticus"      
## [101] "Lasiommata_menava"       "Lasiommata_schakra"     
## [103] "Lasiommata_tarbena"      "Leptotes_plinius"       
## [105] "Lethe_rohria"            "Libythea_lepita"        
## [107] "Limenitis_hydaspes"      "Lycaena_aditya"         
## [109] "Lycaena_kasyapa"         "Lycaena_phlaeas"        
## [111] "Melitaea_chitralensis"   "Melitaea_fergana"       
## [113] "Melitaea_lutko"          "Melitaea_mixta"         
## [115] "Melitaea_persea"         "Melitaea_robertsi"      
## [117] "Neozephyrus_syla"        "Neptis_hylas"           
## [119] "Neptis_mahendra"         "Neptis_sappho"          
## [121] "Neptis_soma"             "Nesa_sena"              
## [123] "Nymphalis_xanthomelas"   "Papilio_bianor"         
## [125] "Papilio_demoleus"        "Papilio_machaon"        
## [127] "Papilio_polytes"         "Paralasa_chitralica"    
## [129] "Paralasa_kalinda"        "Paralasa_mani"          
## [131] "Paralasa_shallada"       "Parnara_guttata"        
## [133] "Parnassius_actius"       "Parnassius_charltonius" 
## [135] "Parnassius_delphius"     "Parnassius_epaphus"     
## [137] "Parnassius_hardwickii"   "Parnassius_jacquemontii"
## [139] "Parnassius_simo"         "Parnassius_stenosemus"  
## [141] "Pelopidas_mathias"       "Pelopidas_sinensis"     
## [143] "Phalanta_phalantha"      "Pieris_ajaka"           
## [145] "Pieris_brassicae"        "Pieris_canidia"         
## [147] "Pieris_krueperi"         "Pieris_rapae"           
## [149] "Plebejus_agestis"        "Plebejus_artaxerxes"    
## [151] "Plebejus_ashretha"       "Plebejus_astorica"      
## [153] "Plebejus_christophi"     "Plebejus_eumedon"       
## [155] "Plebejus_eversmani"      "Plebejus_indicus"       
## [157] "Plebejus_jaloka"         "Plebejus_loewii"        
## [159] "Plebejus_pheretiades"    "Polygonia_agnicula"     
## [161] "Polygonia_c-album"       "Polygonia_egea"         
## [163] "Polyommatus_ariana"      "Polyommatus_bilucha"    
## [165] "Polyommatus_devanica"    "Polyommatus_galathea"   
## [167] "Polyommatus_icadius"     "Polyommatus_icarus"     
## [169] "Polyommatus_lehanus"     "Polyommatus_metallica"  
## [171] "Polyommatus_omphisa"     "Polyommatus_pseuderos"  
## [173] "Polyommatus_sarta"       "Pontia_callidice"       
## [175] "Pontia_chloridice"       "Pontia_daplidice"       
## [177] "Pontia_glauconome"       "Potanthus_dara"         
## [179] "Prosotas_nora"           "Pseudochazara_baldiva"  
## [181] "Pseudochazara_lehana"    "Pseudophilotes_vicrama" 
## [183] "Pyrgus_alpinus"          "Pyrgus_cashmirensis"    
## [185] "Rapala_iarbus"           "Rapala_nissa"           
## [187] "Rapala_selira"           "Satyrium_deria"         
## [189] "Satyrus_pimpla"          "Sephisa_dichroa"        
## [191] "Spialia_carnea"          "Spialia_galba"          
## [193] "Spialia_geron"           "Spindasis_ictis"        
## [195] "Spindasis_vulcanus"      "Taractrocera_maevius"   
## [197] "Tarucus_alteratus"       "Tarucus_balkanicus"     
## [199] "Tarucus_callinara"       "Tarucus_nara"           
## [201] "Tarucus_theophrastus"    "Tarucus_venosus"        
## [203] "Tirumala_limniace"       "Vanessa_cardui"         
## [205] "Vanessa_indica"          "Ypthima_asterope"       
## [207] "Ypthima_avanta"          "Ypthima_nareda"         
## [209] "Ypthima_sakra"           "Zizeeria_karsandra"     
## [211] "Zizeeria_maha"           "Zizina_otis"

Now the Data has the structure we wanted and we can start to fit a model.

3 Using GAMs to predict species distributions

To create a species richness map we will need three steps:
1. Use a GAM to predict the distribution of one butterfly-species.
2. Use for-loops the apply the functions used in 1. to predict the distribution of all species in our list.
3 Sum up all distributions to a species richness map

3.1 Distribution of Aglais caschmirensis

To predict the distribution of Aglais caschmirensis we need identify all cells where the species has occurrences. We use the function ddply() from the plyr package. All cells with equivalent values in all columns specified inside the brackets are summarized to one cell with a new column presence having the value 1.

test <- species_list[["Aglais_caschmirensis"]]

test <-  ddply(test, .(x,y,cells,Precipitation_of_Driest_Month,Precipitation_Seasonality,
                     Precipitation_of_Warmest_Quarter,Mean_Diurnal_Range,
                     Temperature_Seasonality,Mean_Temperature_of_Wettest_Quarter,
                     Mean_Temperature_of_Driest_Quarter,People_Density),
           summarize, presence = 1)


head(test)
##          x        y cells Precipitation_of_Driest_Month
## 1 71.12500 33.79167  8040                            15
## 2 71.62500 35.95833  2768                            13
## 3 71.70833 35.79167  3175                            14
## 4 71.70833 35.87500  2972                            13
## 5 71.70833 35.95833  2769                            13
## 6 71.79167 36.12500  2364                            16
##   Precipitation_Seasonality Precipitation_of_Warmest_Quarter Mean_Diurnal_Range
## 1                  55.55941                              206           12.13867
## 2                  62.77824                               70           10.12242
## 3                  66.13657                               58           11.45983
## 4                  64.91814                               59           11.02450
## 5                  62.71591                               65           10.41258
## 6                  55.94557                               84           10.03908
##   Temperature_Seasonality Mean_Temperature_of_Wettest_Quarter
## 1                767.8447                          11.4294996
## 2                890.4564                           0.3888333
## 3                844.9498                           8.4488335
## 4                868.7729                           5.7328334
## 5                875.3101                           2.3113332
## 6                850.5377                          -3.5383334
##   Mean_Temperature_of_Driest_Quarter People_Density presence
## 1                         12.7331667      12254.709        1
## 2                          3.5798333       2660.602        1
## 3                         11.2356663       6340.192        1
## 4                          8.8926668       3476.803        1
## 5                          5.6725001       2975.141        1
## 6                          0.2651667       1485.123        1

We save this step in a function to use it later in step 2.

counting <- function(data, species) {
  
  return( 
    
    ddply(data[[species]], .(x,y,cells,Precipitation_of_Driest_Month,Precipitation_Seasonality,
                       Precipitation_of_Warmest_Quarter,Mean_Diurnal_Range,
                       Temperature_Seasonality,Mean_Temperature_of_Wettest_Quarter,
                       Mean_Temperature_of_Driest_Quarter,People_Density),
               summarize, presence = 1)
  )
}

3.1.1 creating absence points

We need absence points for our species so that the modell is able to evaluate weather a certain combination of predictors is good or bad. Our first step is to create a dataframe with all predictor values

raster <- raster::rasterToPoints(pred_pak$Precipitation_of_Driest_Month)
raster <- as.data.frame(raster)
raster <- raster[,c(1,2)]
coordinates(raster) <- c(1,2)
crs(raster) <- crs(pred_pak)

plot(pred_pak$Precipitation_of_Driest_Month, main = "New Dataframe")
plot(raster, add = TRUE, cex = 0.2, col = "black")

#assigning values
raster_df <- extract(x = pred_pak, y = raster, cellnumbers = TRUE, sp = TRUE)
raster_df <- as.data.frame(raster_df)

cell_ids <- c(raster_df$cells)

Then we subset all cells/rows where is no occurrence of Aglais_caschmirensis

cell_ids_spec <- test$cells
cell_ids_spec <- intersect(cell_ids_spec, cell_ids)

id <- !(cell_ids %in% cell_ids_spec)
cell_ids_spec_abs <- cell_ids[id]

length(cell_ids_spec)
## [1] 50
length(cell_ids_spec_abs)
## [1] 11723
length(cell_ids)
## [1] 11773

Now we pick random points out of the subset

set.seed(123)

i <- runif(length(cell_ids_spec)*10, min = 1, max = length(cell_ids_spec_abs))
cell_ids_absence <- cell_ids_spec_abs[i]

abs_points <- raster_df[raster_df$cells %in% cell_ids_absence, ]

Now we want to merge the two dataframes. For that we need to rearrange the absence-points dataframe

head(test)
##          x        y cells Precipitation_of_Driest_Month
## 1 71.12500 33.79167  8040                            15
## 2 71.62500 35.95833  2768                            13
## 3 71.70833 35.79167  3175                            14
## 4 71.70833 35.87500  2972                            13
## 5 71.70833 35.95833  2769                            13
## 6 71.79167 36.12500  2364                            16
##   Precipitation_Seasonality Precipitation_of_Warmest_Quarter Mean_Diurnal_Range
## 1                  55.55941                              206           12.13867
## 2                  62.77824                               70           10.12242
## 3                  66.13657                               58           11.45983
## 4                  64.91814                               59           11.02450
## 5                  62.71591                               65           10.41258
## 6                  55.94557                               84           10.03908
##   Temperature_Seasonality Mean_Temperature_of_Wettest_Quarter
## 1                767.8447                          11.4294996
## 2                890.4564                           0.3888333
## 3                844.9498                           8.4488335
## 4                868.7729                           5.7328334
## 5                875.3101                           2.3113332
## 6                850.5377                          -3.5383334
##   Mean_Temperature_of_Driest_Quarter People_Density presence
## 1                         12.7331667      12254.709        1
## 2                          3.5798333       2660.602        1
## 3                         11.2356663       6340.192        1
## 4                          8.8926668       3476.803        1
## 5                          5.6725001       2975.141        1
## 6                          0.2651667       1485.123        1
head(abs_points)
##     cells Precipitation_of_Driest_Month Precipitation_Seasonality
## 7     369                             4                  40.15966
## 10    372                             4                  41.03106
## 76    953                             8                  50.68524
## 99    976                             6                  35.71460
## 113   990                             1                  56.36630
## 125  1160                            10                  44.67263
##     Precipitation_of_Warmest_Quarter Mean_Diurnal_Range Temperature_Seasonality
## 7                                 50           11.20733                961.3560
## 10                                53           11.66400                973.4099
## 76                                38           10.90267                952.2430
## 99                                63           11.21775                963.1484
## 113                               40           12.11900                974.1218
## 125                               54           10.48300                923.5392
##     Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 7                            -6.7246666                        -15.0544996
## 10                            3.5699999                        -17.7334995
## 76                            0.3978333                          4.0085001
## 99                           -9.3888330                        -17.4855003
## 113                           6.0609999                        -10.2709999
## 125                          -2.6106668                          0.6716668
##     People_Density        x        y
## 7        147.88110 74.70833 36.95833
## 10        46.96605 74.95833 36.95833
## 76      1340.48291 72.62500 36.70833
## 99       116.48557 74.54167 36.70833
## 113       61.76914 75.70833 36.70833
## 125      675.12177 72.95833 36.62500
abs_points <- abs_points[,c(10,11,1,2:9)]
abs_points$presence <- 0

head(abs_points)
##            x        y cells Precipitation_of_Driest_Month
## 7   74.70833 36.95833   369                             4
## 10  74.95833 36.95833   372                             4
## 76  72.62500 36.70833   953                             8
## 99  74.54167 36.70833   976                             6
## 113 75.70833 36.70833   990                             1
## 125 72.95833 36.62500  1160                            10
##     Precipitation_Seasonality Precipitation_of_Warmest_Quarter
## 7                    40.15966                               50
## 10                   41.03106                               53
## 76                   50.68524                               38
## 99                   35.71460                               63
## 113                  56.36630                               40
## 125                  44.67263                               54
##     Mean_Diurnal_Range Temperature_Seasonality
## 7             11.20733                961.3560
## 10            11.66400                973.4099
## 76            10.90267                952.2430
## 99            11.21775                963.1484
## 113           12.11900                974.1218
## 125           10.48300                923.5392
##     Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 7                            -6.7246666                        -15.0544996
## 10                            3.5699999                        -17.7334995
## 76                            0.3978333                          4.0085001
## 99                           -9.3888330                        -17.4855003
## 113                           6.0609999                        -10.2709999
## 125                          -2.6106668                          0.6716668
##     People_Density presence
## 7        147.88110        0
## 10        46.96605        0
## 76      1340.48291        0
## 99       116.48557        0
## 113       61.76914        0
## 125      675.12177        0
test_points <- test
abs_points_points <- abs_points

test <- rbind(test, abs_points)

Let’s have a look at our points

Now we can fit our first GAM

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## presence ~ s(x, y, bs = "ds") + s(Mean_Temperature_of_Wettest_Quarter) + 
##     s(Precipitation_Seasonality) + s(Precipitation_of_Warmest_Quarter) + 
##     s(Mean_Diurnal_Range) + s(Temperature_Seasonality) + s(Mean_Temperature_of_Wettest_Quarter) + 
##     s(Mean_Temperature_of_Driest_Quarter)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -16.31      15.17  -1.075    0.282
## 
## Approximate significance of smooth terms:
##                                          edf Ref.df Chi.sq p-value  
## s(x,y)                                 7.325  8.898 17.144  0.0318 *
## s(Mean_Temperature_of_Wettest_Quarter) 1.000  1.000  0.919  0.3377  
## s(Precipitation_Seasonality)           3.485  3.968  9.071  0.0463 *
## s(Precipitation_of_Warmest_Quarter)    2.591  3.315  5.355  0.1707  
## s(Mean_Diurnal_Range)                  2.241  2.784  2.580  0.3343  
## s(Temperature_Seasonality)             1.000  1.000  2.688  0.1011  
## s(Mean_Temperature_of_Driest_Quarter)  2.801  3.479  3.362  0.4275  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.646   Deviance explained = 71.4%
## -REML = 63.113  Scale est. = 1         n = 538

Non of our eight predictors is significant for Aglais caschmirensis. Let’s have a look at the predicted distribution

test_pred <- c(predict.gam(test_gam, newdata = raster_df, type = "response"))
raster$test <- as.vector(test_pred)

pred_test <- rasterize(raster, pred_pak)
plot(pred_test$test, main = "Distribution of Aglais caschmirensis")

Now we wrap all these Steps into a function and apply it to all species in out list.

3.2 Predict the distribution of all species

3.2.1 creating absence points

We use the function created in 1. to loop over our list

for (species_i in species) {
  species_list[[species_i]] <- counting(data = species_list, species = species_i)
}

3.2.2 Dumping all the steps from Aglais caschmirensis testing into one function

spec_dist <- function(species) {
  
  test <- species_list[[species]]
  
  cell_ids_spec <- test$cells
  cell_ids_spec <- intersect(cell_ids_spec, cell_ids)
  
  id <- !(cell_ids %in% cell_ids_spec)
  cell_ids_spec_abs <- cell_ids[id]
  
  
  set.seed(123)
  i <- runif(length(cell_ids_spec)*10, min = 1, max = length(cell_ids_spec_abs))
  cell_ids_absence <- cell_ids_spec_abs[i]
  
  abs_points <- raster_df[raster_df$cells %in% cell_ids_absence, ]
  
  abs_points <- abs_points[,c(10,11,1,2:9)]
  abs_points$presence <- 0
  
  test_points <- test
  abs_points_points <- abs_points
  
  test <- rbind(test, abs_points)
  
  test_gam <- gam(presence ~ s(x, y, bs = "ds") + s(Mean_Temperature_of_Wettest_Quarter) + s(Precipitation_Seasonality) +
                    s(Precipitation_of_Warmest_Quarter) + s(Mean_Diurnal_Range) + s(Temperature_Seasonality) +
                    s(Mean_Temperature_of_Wettest_Quarter) + s(Mean_Temperature_of_Driest_Quarter),
                  method = "REML", family = binomial, data = test)
  
  #prediction
  test_pred <- c(predict.gam(test_gam, newdata = raster_df, type = "response"))
  raster$species <- as.vector(test_pred)
  return(raster)

  
}

3.2.3 Runing the function for all species in the list

3.3 Summing up all distribution to a species richness map

Before we are able to calculate out species richness map we need to

for (species_i in names(species_list)) {
  species <- rasterize(species_list[[species_i]], pred_pak)$species
  species_list[[species_i]] <- species
}

Now we convert the list into a rasterstack and sum up all occurrences to finally get our richness map

3.3.1 Some other plots

4 References

4.1 Literature

Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K. & A.G. Toxopeus (2014): Where is positional uncertainty a problem for species distribution modelling?. Ecography 37 (2): 191-203.

Ross, N. (2017): Nonlinear Models in R: The Wonderful World of mgcv. https://nyhackr.blob.core.windows.net/presentations/Nonlinear-Modeling-in-R-with-GAMs_Noam-Ross.pdf (Zugriff: 10.06.2021)

Simpson, G.(2020): Introduction to Generalized Additive Models with R and mgcv. https://github.com/gavinsimpson/intro-gam-webinar-2020/blob/master/README.md (Zugriff: 10.06.2021)

Wood, S.N., Pya N. & B. Saefken (2016): Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575.

Wood, S.N. (2006): Generalized additive models : an introduction with R. Texts in statistical science.

4.2 Data

WorldClim: Fick, S.E. & R.J. Hijmans (2017): WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37 (12). 4302-4315.

People Density: WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project.

5 Session Info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] usdm_1.1-18        gratia_0.6.0       mgcv_1.8-35        nlme_3.1-152      
##  [5] rgdal_1.5-23       RColorBrewer_1.1-2 viridis_0.6.1      viridisLite_0.4.0 
##  [9] ggplot2_3.3.3      plyr_1.8.6         raster_3.4-10      sp_1.4-5          
## 
## loaded via a namespace (and not attached):
##  [1] prettydoc_0.4.1   tidyselect_1.1.1  xfun_0.23         bslib_0.2.5.1    
##  [5] purrr_0.3.4       splines_4.1.0     lattice_0.20-44   colorspace_2.0-1 
##  [9] vctrs_0.3.8       generics_0.1.0    htmltools_0.5.1.1 yaml_2.2.1       
## [13] utf8_1.2.1        rlang_0.4.11      jquerylib_0.1.4   pillar_1.6.1     
## [17] glue_1.4.2        withr_2.4.2       lifecycle_1.0.0   stringr_1.4.0    
## [21] munsell_0.5.0     gtable_0.3.0      mvnfast_0.2.7     codetools_0.2-18 
## [25] evaluate_0.14     knitr_1.33        parallel_4.1.0    fansi_0.5.0      
## [29] highr_0.9         Rcpp_1.0.6        scales_1.1.1      jsonlite_1.7.2   
## [33] gridExtra_2.3     png_0.1-7         digest_0.6.27     stringi_1.6.1    
## [37] dplyr_1.0.6       grid_4.1.0        tools_4.1.0       magrittr_2.0.1   
## [41] sass_0.4.0        patchwork_1.1.1   tibble_3.1.2      tidyr_1.1.3      
## [45] crayon_1.4.1      pkgconfig_2.0.3   ellipsis_0.3.2    Matrix_1.3-3     
## [49] rmarkdown_2.8     R6_2.5.0          compiler_4.1.0