library(mgcv)
<- gam(presence ~ s(x1) + s(x2) + s(x3) + # one dimensional smooths
model s(x4, x5), # two dimensional smooths
data = __, # your data
family = ___, # your distribution (e.g. gaussian)
method = "REML" # sets parameter for your smooths
...)
Generalized Additive Models use spline functions (smooths).
The spline functions are composed of simpler basis functions witch are weighted and summed up.
\[s(x) = \sum_{k = 1}^K \beta_k b_k(x)\]
The more basis functions a smooth is made of, the more complex of a relationship the smooth can model (Simpson 2020). With the parameter k in s(x1, k = __)
you can pick the number of basis functions.
source (Introduction to Generalized Additive Models with R and mgcv by Gavin Simpson)
To avoid over-fitting the Data and modelling noise, the “Wiggliness” of the Spline is penalized (Ross 2017). The “Wiggliness” is measured via the curvature \(\int_{\mathbb{R}} [f^{\prime\prime}]^2 dx = \boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta} = \large{W}\) of the function which is subtracted from the Fit with a factor \(\lambda\) (Simpson 2020). \[\mathcal{L}_p = \log(\text{Likelihood}) - \lambda W\]
The larger \(\lambda\) the straighter the spline function. With the argument method = “REML” (Restricted Maximum Likelihood) \(\lambda\) is picked automatically. One parameter you have check and pick manually sometimes is the amount of basis functions k the smooth is made of. k has to be large enough that the “true function” can be represented properly, but not too big due to computational costs (Simpson 2020).
In mgcv different smooth classes for different modelling tasks are implemented. The type of smooth can be picked with the bs parameter in s(x1, bs = __)
.
The default is a low-rank thin plate spline bs = 'tp'
Many others available
bs = 'cc'
or bs = 'cp'
bs = 'ds'
bs = 'sos'
bs = 'mrf'
bs = 'so'
For more information on different Types of Smooths have a look at…
::smooth.terms ?mgcv
With s(x1, x2)
you can include relationships between the two variables x1 and x2. If you want to use the s()
function for a two-dimensional smooth x1 and x2 should be equally “wiggly” in both dimensions. te()
and ti()
are other functions in mgcv for creating multidimensional smooths (Ross 2017).
With the argument family in gam(presence ~ (...), family = ___)
you can specify the type and distribution of your data the model should fit like gaussian, binomial etc. For the species richness map of butterflies in Pakistan I will use a binomial distribution family = binomial
witch is useful for Occurrence Data (Ross 2017).
For more information on different distributions have a look at…
::family.mgcv ?mgcv
s(..., k = __)
s(..., bs = __)
gam(..., family = __)
For further Information on GAMs watch the lectures by Gavin Simpson (https://www.youtube.com/watch?v=sgw4cu8hrZM) and Noam Ross (https://www.youtube.com/watch?v=q4_t8jXcQgc)
The Occurrence data set used in this project contains 5870 sightings of butterflies with species and geographical coordinates attached.
## species x y
## 1 Acraea_issoria 73.08635 34.43060
## 2 Acraea_issoria 73.36326 34.24367
## 3 Acraea_issoria 73.36758 34.14221
## 4 Acraea_issoria 73.82637 33.84266
## 5 Acraea_issoria 74.19839 33.94852
## 6 Acraea_violae 72.34951 31.70947
For the prediction of the spices distribution RasterData from WorldClim was used. The data sets used are raster - data sets with BioClimatic variables and Elevation with a resolution of 5min. The Coordinate System throughout the whole project is WGS 84.
Additional to the environmental data, a raster - data set with population numbers was downloaded from https://worldpop.org.
The aim of the preprocessing is to generate a dataframe containing coordinates and the values of each predictor within the Pakistani border. Each row should represent a raster cells with the same extent and spatial resolution as the cells of the raster data sets from WorldClim. Predictors with high collinearity are removed from the dataframe.
library(sp)
library(raster)
library(plyr)
library(ggplot2)
library(viridis)
library(RColorBrewer)
library(rgdal)
library(mgcv)
library(gratia)
#boosting memory limit
memory.limit(9999999999)
## Pakistans border
<- raster::getData(name = "GADM", country = "PAK", level = 1, path = "data/raw/")
pak_border
## Butterfly occurrences
<- read.csv(file = "data/raw/butterflies.csv")
points
### georeferencing
coordinates(points) <- c("x", "y")
@proj4string@projargs <- pak_border@proj4string@projargs
points
## bioclim data
<- list.files(path = "data/raw/wc2.1_5m_bio/", full.names = TRUE)
files <- stack(files)
predictors <- crop(predictors, pak_border)
predictors_crop <- mask(predictors_crop, pak_border) pred_pak
## elevation data
<- raster("data/raw/wc2.1_5m_elev/wc2.1_5m_elev.tif")
elev <- crop(elev, pak_border)
elev <- mask(elev, pak_border)
elev
$elev <- elev pred_pak
With aggregate(ppd, fact = __, fun = sum)
the spatial resolution of the data set is changed until it has the resolution of the WorldClim data sets. cellStats(...)
is used to check if values are changed in the process. resample(ppd_100, pred_pak, method = "ngb")
ensures that extent and resolution of both rasters are the same.
## peopledensity of Pakistan
<- raster("data/raw/pak_ppp_2020_UNadj.tif")
ppd cellStats(ppd, stat = sum)
## [1] 220892332
<- aggregate(ppd, fact = 100, fun = sum)
ppd_100 cellStats(ppd_100, stat = sum)
## [1] 220892331
1/(res(ppd_100)/res(pred_pak))
## [1] 1 1
<- resample(x= ppd_100, y=pred_pak, method="ngb")
ppd cellStats(ppd, sum)
## [1] 220892325
<- mask(ppd, pak_border)
ppd $ppd <- ppd pred_pak
Now we remove all predictors with high collinearity which means that one predictor can be estimated using the other predictors (Naimi et al. 2014). Therefore we use the vifstep()
function from the usdm package with a threshold of 5.
set.seed(123)
<- usdm::vifstep(pred_pak, th = 5)
vif vif
## 13 variables from the 21 input variables have collinearity problem:
##
## wc2.1_5m_bio_6 wc2.1_5m_bio_11 wc2.1_5m_bio_10 wc2.1_5m_bio_1 wc2.1_5m_bio_3 wc2.1_5m_bio_12 wc2.1_5m_bio_5 wc2.1_5m_bio_13 wc2.1_5m_bio_7 wc2.1_5m_bio_17 elev wc2.1_5m_bio_16 wc2.1_5m_bio_19
##
## After excluding the collinear variables, the linear correlation coefficients ranges between:
## min correlation ( ppd ~ wc2.1_5m_bio_9 ): 0.03585369
## max correlation ( wc2.1_5m_bio_9 ~ wc2.1_5m_bio_2 ): 0.6976587
##
## ---------- VIFs of the remained variables --------
## Variables VIF
## 1 wc2.1_5m_bio_14 3.956877
## 2 wc2.1_5m_bio_15 3.142364
## 3 wc2.1_5m_bio_18 3.391822
## 4 wc2.1_5m_bio_2 3.130466
## 5 wc2.1_5m_bio_4 2.343441
## 6 wc2.1_5m_bio_8 3.197521
## 7 wc2.1_5m_bio_9 2.935347
## 8 ppd 1.121282
<- usdm::exclude(pred_pak, vif = vif) pred_pak
For more information on collinearity and the variance inflation factor have a look at
::vif ?usdm
At first points at the center of each raster cell are created. Those points are used to extract the cellnumbers of the raster which are then used to extract coordinates later on. The cellnumbers are extracted with the argument cellnumbers = TRUE
of the extract()
function. The cellnumbers of cells with butterfly occurrences are later used for generating absence points. df[,4]
contains the cellnumbers for which the coordinates are extracted and later saved in the x and y column.
<- as.data.frame(points)
points_df <- extract(x = pred_pak, y = points, cellnumbers = TRUE, sp = TRUE)
df <- as.data.frame(df)
df
<- coordinates(pred_pak)[df[,4],]
cell_coords <- as.data.frame(cell_coords)
cell_coords
$x <- cell_coords$x
df$y <- cell_coords$y
df
is.na(df$x),] df[
## species x y cells Precipitation_of_Driest_Month
## 3854 Parnassius_delphius NA NA NA NA
## 3901 Parnassius_epaphus NA NA NA NA
## 3947 Parnassius_loxias NA NA NA NA
## 4617 Polyommatus_chrysopis NA NA NA NA
## Precipitation_Seasonality Precipitation_of_Warmest_Quarter
## 3854 NA NA
## 3901 NA NA
## 3947 NA NA
## 4617 NA NA
## Mean_Diurnal_Range Temperature_Seasonality
## 3854 NA NA
## 3901 NA NA
## 3947 NA NA
## 4617 NA NA
## Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 3854 NA NA
## 3901 NA NA
## 3947 NA NA
## 4617 NA NA
## People_Density
## 3854 NA
## 3901 NA
## 3947 NA
## 4617 NA
is.na(df$y),] df[
## species x y cells Precipitation_of_Driest_Month
## 3854 Parnassius_delphius NA NA NA NA
## 3901 Parnassius_epaphus NA NA NA NA
## 3947 Parnassius_loxias NA NA NA NA
## 4617 Polyommatus_chrysopis NA NA NA NA
## Precipitation_Seasonality Precipitation_of_Warmest_Quarter
## 3854 NA NA
## 3901 NA NA
## 3947 NA NA
## 4617 NA NA
## Mean_Diurnal_Range Temperature_Seasonality
## 3854 NA NA
## 3901 NA NA
## 3947 NA NA
## 4617 NA NA
## Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 3854 NA NA
## 3901 NA NA
## 3947 NA NA
## 4617 NA NA
## People_Density
## 3854 NA
## 3901 NA
## 3947 NA
## 4617 NA
There are four points in the data set which are outside of the raster extend and therefore have no values in the x and y columns.
<- c(3854,3901,3947,4617)
bad <- points_df[bad,]
bad
coordinates(bad) <- c(2,3)
crs(bad) <- crs(pred_pak)
plot(bad, main = "Points outside the raster extent")
plot(pred_pak$Precipitation_of_Driest_Month, add = TRUE, legend = FALSE)
plot(pak_border, add = TRUE)
abline(h = extent(pred_pak)[4])
The points are above the raster extend and therefore extract()
assigned NAs. To move on these points must be removed.
<- points_df[-c(3854, 3901, 3947, 4617),]
points coordinates(points) <- c(2,3)
crs(points) <- crs(pred_pak)
<- extract(x = pred_pak, y = points, cellnumbers = TRUE, sp = TRUE)
df <- as.data.frame(df)
df
<- coordinates(pred_pak)[df[,4],]
cell_coords <- as.data.frame(cell_coords)
cell_coords
$x <- cell_coords$x
df$y <- cell_coords$y
df
is.na(df$x),] df[
## [1] species x
## [3] y cells
## [5] Precipitation_of_Driest_Month Precipitation_Seasonality
## [7] Precipitation_of_Warmest_Quarter Mean_Diurnal_Range
## [9] Temperature_Seasonality Mean_Temperature_of_Wettest_Quarter
## [11] Mean_Temperature_of_Driest_Quarter People_Density
## <0 Zeilen> (oder row.names mit Länge 0)
is.na(df$y),] df[
## [1] species x
## [3] y cells
## [5] Precipitation_of_Driest_Month Precipitation_Seasonality
## [7] Precipitation_of_Warmest_Quarter Mean_Diurnal_Range
## [9] Temperature_Seasonality Mean_Temperature_of_Wettest_Quarter
## [11] Mean_Temperature_of_Driest_Quarter People_Density
## <0 Zeilen> (oder row.names mit Länge 0)
Now there are no more rows with NAs in the x or y column
There are still those Points left with cellnumbers and therefore x and y values but with Nas in all the other columns. These points are removed too. I therefore borrowed a function from a stackoverflow thread.
<- df[rowSums(is.na(df)) == 8,c(2,3)]
cell_coords_na coordinates(cell_coords_na) <- c(1,2)
crs(cell_coords_na) <- crs(pred_pak)
<- function(DF, n=0) {
delete.na rowSums(is.na(DF)) <= n,]
DF[
}<- delete.na(df, n = 7) df
Let’s have a look at our data:
<- df[,c(2,3)]
cells coordinates(cells) <- c(1,2)
crs(cells) <- crs(pred_pak)
plot(pred_pak$Precipitation_of_Driest_Month, main = "Butterfly occurrences")
plot(cell_coords_na, add = TRUE, cex = 0.2, col = "red")
plot(cells, add = TRUE, cex = 0.2, col = "black")
The smooths of the GAMs applied later don’t work with dataframes only containing a few rows. For that only species with 10 or more occurrences where kept in the dataframe. Species with 9 or less where excluded. Of the 429 species of the original data set 212 were excluded.
<- unique(df$species)
species <- list()
species_list
for (species_i in species) {
if (length(df[df$species == species_i, 1]) >= 10) {
<- df[df$species == species_i, ]
species_list[[species_i]]
}
}<- names(species_list)
species print(species)
## [1] "Aglais_caschmirensis" "Aglais_rizana"
## [3] "Anaphaeis_aurota" "Apolria_nabellica"
## [5] "Aporia_belucha" "Aporia_soracta"
## [7] "Argynnis_aglaja" "Argynnis_childreni"
## [9] "Argynnis_jainadeva" "Argynnis_kamala"
## [11] "Argyreus_hyperbius" "Arhoplala_dodonaea"
## [13] "Ariadne_merione" "Athyma_opalina"
## [15] "Aulocera_brahminus" "Aulocera_gilgitica"
## [17] "Aulocera_padma" "Aulocera_saraswati"
## [19] "Aulocera_swaha" "Azanus_ubaldus"
## [21] "Azanus_uranus" "Badamia_axclamationis"
## [23] "Boloria_sipora" "Borbo_bevani"
## [25] "Byasa_polyeuctes" "Callerebia_annada"
## [27] "Carcharodus_alceae" "Carcharodus_dravira"
## [29] "Catopsilia_pomona" "Catopsilia_pyranthe"
## [31] "Celastrina_argiolus" "Chaetopracta_odata"
## [33] "Chazara_enervata" "Chazara_heydenreichi"
## [35] "Chilades_putli" "Chilades_trochylus"
## [37] "Cigaritis_acamas" "Cigaritis_chitralensis"
## [39] "Clossiana_jerdoni" "Colias_alpherakii"
## [41] "Colias_eogene" "Colias_erate"
## [43] "Colias_fieldii" "Colias_wiskotti"
## [45] "Colotis_calais" "Colotis_etrida"
## [47] "Colotis_fausta" "Colotis_protractus"
## [49] "Colotis_vestalis" "Danaus_chrysippus"
## [51] "Danaus_genutia" "Deudoryx_epijarbas"
## [53] "Dodona_durga" "Dodona_eugenes"
## [55] "Eogenes_lesliei" "Erymnis_pathan"
## [57] "Euchloe_daphalis" "Euchloe_lucilla"
## [59] "Euchrysops_cnejus" "Eurema_hecabe"
## [61] "Eurema_laeta" "Euthalia_aconthea"
## [63] "Everes_argiades" "Everes_dipora"
## [65] "Gegenes_nostrodamus" "Gegenes_pumilio"
## [67] "Gonepteryx_mahaguru" "Gonepteryx_nepalensis"
## [69] "Graphium_cloanthus" "Hasora_chromus"
## [71] "Heliophorus_bakeri" "Hesperia_comma"
## [73] "Hipparchia_parisatis" "Hypolimnas_bolina"
## [75] "Hypolimnas_misippus" "Hyponephele_brevistigma"
## [77] "Hyponephele_cheena" "Hyponephele_chitralica"
## [79] "Hyponephele_cyri" "Hyponephele_naricina"
## [81] "Hyponephele_pulchella" "Hyponephele_pulchra"
## [83] "Hyponephele_tenuistigma" "Hyrcanana_evansii"
## [85] "Iolana_gigantea" "Issoria_lathonia"
## [87] "Ixias_pyrene" "Junonia_almana"
## [89] "Junonia_hierta" "Junonia_lemonias"
## [91] "Junonia_orithya" "Kanetisa_digna"
## [93] "Kaniska_canace" "Karanasa_astorica"
## [95] "Karanasa_bolorica" "Karanasa_moorei"
## [97] "Karanasa_pupilata" "Kirinia_cashmirensis"
## [99] "Lachides_contracta" "Lampides_boeticus"
## [101] "Lasiommata_menava" "Lasiommata_schakra"
## [103] "Lasiommata_tarbena" "Leptotes_plinius"
## [105] "Lethe_rohria" "Libythea_lepita"
## [107] "Limenitis_hydaspes" "Lycaena_aditya"
## [109] "Lycaena_kasyapa" "Lycaena_phlaeas"
## [111] "Melitaea_chitralensis" "Melitaea_fergana"
## [113] "Melitaea_lutko" "Melitaea_mixta"
## [115] "Melitaea_persea" "Melitaea_robertsi"
## [117] "Neozephyrus_syla" "Neptis_hylas"
## [119] "Neptis_mahendra" "Neptis_sappho"
## [121] "Neptis_soma" "Nesa_sena"
## [123] "Nymphalis_xanthomelas" "Papilio_bianor"
## [125] "Papilio_demoleus" "Papilio_machaon"
## [127] "Papilio_polytes" "Paralasa_chitralica"
## [129] "Paralasa_kalinda" "Paralasa_mani"
## [131] "Paralasa_shallada" "Parnara_guttata"
## [133] "Parnassius_actius" "Parnassius_charltonius"
## [135] "Parnassius_delphius" "Parnassius_epaphus"
## [137] "Parnassius_hardwickii" "Parnassius_jacquemontii"
## [139] "Parnassius_simo" "Parnassius_stenosemus"
## [141] "Pelopidas_mathias" "Pelopidas_sinensis"
## [143] "Phalanta_phalantha" "Pieris_ajaka"
## [145] "Pieris_brassicae" "Pieris_canidia"
## [147] "Pieris_krueperi" "Pieris_rapae"
## [149] "Plebejus_agestis" "Plebejus_artaxerxes"
## [151] "Plebejus_ashretha" "Plebejus_astorica"
## [153] "Plebejus_christophi" "Plebejus_eumedon"
## [155] "Plebejus_eversmani" "Plebejus_indicus"
## [157] "Plebejus_jaloka" "Plebejus_loewii"
## [159] "Plebejus_pheretiades" "Polygonia_agnicula"
## [161] "Polygonia_c-album" "Polygonia_egea"
## [163] "Polyommatus_ariana" "Polyommatus_bilucha"
## [165] "Polyommatus_devanica" "Polyommatus_galathea"
## [167] "Polyommatus_icadius" "Polyommatus_icarus"
## [169] "Polyommatus_lehanus" "Polyommatus_metallica"
## [171] "Polyommatus_omphisa" "Polyommatus_pseuderos"
## [173] "Polyommatus_sarta" "Pontia_callidice"
## [175] "Pontia_chloridice" "Pontia_daplidice"
## [177] "Pontia_glauconome" "Potanthus_dara"
## [179] "Prosotas_nora" "Pseudochazara_baldiva"
## [181] "Pseudochazara_lehana" "Pseudophilotes_vicrama"
## [183] "Pyrgus_alpinus" "Pyrgus_cashmirensis"
## [185] "Rapala_iarbus" "Rapala_nissa"
## [187] "Rapala_selira" "Satyrium_deria"
## [189] "Satyrus_pimpla" "Sephisa_dichroa"
## [191] "Spialia_carnea" "Spialia_galba"
## [193] "Spialia_geron" "Spindasis_ictis"
## [195] "Spindasis_vulcanus" "Taractrocera_maevius"
## [197] "Tarucus_alteratus" "Tarucus_balkanicus"
## [199] "Tarucus_callinara" "Tarucus_nara"
## [201] "Tarucus_theophrastus" "Tarucus_venosus"
## [203] "Tirumala_limniace" "Vanessa_cardui"
## [205] "Vanessa_indica" "Ypthima_asterope"
## [207] "Ypthima_avanta" "Ypthima_nareda"
## [209] "Ypthima_sakra" "Zizeeria_karsandra"
## [211] "Zizeeria_maha" "Zizina_otis"
Now the Data has the structure we wanted and we can start to fit a model.
To create a species richness map we will need three steps:
1. Use a GAM to predict the distribution of one butterfly-species.
2. Use for-loops the apply the functions used in 1. to predict the distribution of all species in our list.
3 Sum up all distributions to a species richness map
To predict the distribution of Aglais caschmirensis we need identify all cells where the species has occurrences. We use the function ddply()
from the plyr package. All cells with equivalent values in all columns specified inside the brackets are summarized to one cell with a new column presence having the value 1.
<- species_list[["Aglais_caschmirensis"]]
test
<- ddply(test, .(x,y,cells,Precipitation_of_Driest_Month,Precipitation_Seasonality,
test
Precipitation_of_Warmest_Quarter,Mean_Diurnal_Range,
Temperature_Seasonality,Mean_Temperature_of_Wettest_Quarter,
Mean_Temperature_of_Driest_Quarter,People_Density),presence = 1)
summarize,
head(test)
## x y cells Precipitation_of_Driest_Month
## 1 71.12500 33.79167 8040 15
## 2 71.62500 35.95833 2768 13
## 3 71.70833 35.79167 3175 14
## 4 71.70833 35.87500 2972 13
## 5 71.70833 35.95833 2769 13
## 6 71.79167 36.12500 2364 16
## Precipitation_Seasonality Precipitation_of_Warmest_Quarter Mean_Diurnal_Range
## 1 55.55941 206 12.13867
## 2 62.77824 70 10.12242
## 3 66.13657 58 11.45983
## 4 64.91814 59 11.02450
## 5 62.71591 65 10.41258
## 6 55.94557 84 10.03908
## Temperature_Seasonality Mean_Temperature_of_Wettest_Quarter
## 1 767.8447 11.4294996
## 2 890.4564 0.3888333
## 3 844.9498 8.4488335
## 4 868.7729 5.7328334
## 5 875.3101 2.3113332
## 6 850.5377 -3.5383334
## Mean_Temperature_of_Driest_Quarter People_Density presence
## 1 12.7331667 12254.709 1
## 2 3.5798333 2660.602 1
## 3 11.2356663 6340.192 1
## 4 8.8926668 3476.803 1
## 5 5.6725001 2975.141 1
## 6 0.2651667 1485.123 1
We save this step in a function to use it later in step 2.
<- function(data, species) {
counting
return(
ddply(data[[species]], .(x,y,cells,Precipitation_of_Driest_Month,Precipitation_Seasonality,
Precipitation_of_Warmest_Quarter,Mean_Diurnal_Range,
Temperature_Seasonality,Mean_Temperature_of_Wettest_Quarter,
Mean_Temperature_of_Driest_Quarter,People_Density),presence = 1)
summarize,
) }
We need absence points for our species so that the modell is able to evaluate weather a certain combination of predictors is good or bad. Our first step is to create a dataframe with all predictor values
<- raster::rasterToPoints(pred_pak$Precipitation_of_Driest_Month)
raster <- as.data.frame(raster)
raster <- raster[,c(1,2)]
raster coordinates(raster) <- c(1,2)
crs(raster) <- crs(pred_pak)
plot(pred_pak$Precipitation_of_Driest_Month, main = "New Dataframe")
plot(raster, add = TRUE, cex = 0.2, col = "black")
#assigning values
<- extract(x = pred_pak, y = raster, cellnumbers = TRUE, sp = TRUE)
raster_df <- as.data.frame(raster_df)
raster_df
<- c(raster_df$cells) cell_ids
Then we subset all cells/rows where is no occurrence of Aglais_caschmirensis
<- test$cells
cell_ids_spec <- intersect(cell_ids_spec, cell_ids)
cell_ids_spec
<- !(cell_ids %in% cell_ids_spec)
id <- cell_ids[id]
cell_ids_spec_abs
length(cell_ids_spec)
## [1] 50
length(cell_ids_spec_abs)
## [1] 11723
length(cell_ids)
## [1] 11773
Now we pick random points out of the subset
set.seed(123)
<- runif(length(cell_ids_spec)*10, min = 1, max = length(cell_ids_spec_abs))
i <- cell_ids_spec_abs[i]
cell_ids_absence
<- raster_df[raster_df$cells %in% cell_ids_absence, ] abs_points
Now we want to merge the two dataframes. For that we need to rearrange the absence-points dataframe
head(test)
## x y cells Precipitation_of_Driest_Month
## 1 71.12500 33.79167 8040 15
## 2 71.62500 35.95833 2768 13
## 3 71.70833 35.79167 3175 14
## 4 71.70833 35.87500 2972 13
## 5 71.70833 35.95833 2769 13
## 6 71.79167 36.12500 2364 16
## Precipitation_Seasonality Precipitation_of_Warmest_Quarter Mean_Diurnal_Range
## 1 55.55941 206 12.13867
## 2 62.77824 70 10.12242
## 3 66.13657 58 11.45983
## 4 64.91814 59 11.02450
## 5 62.71591 65 10.41258
## 6 55.94557 84 10.03908
## Temperature_Seasonality Mean_Temperature_of_Wettest_Quarter
## 1 767.8447 11.4294996
## 2 890.4564 0.3888333
## 3 844.9498 8.4488335
## 4 868.7729 5.7328334
## 5 875.3101 2.3113332
## 6 850.5377 -3.5383334
## Mean_Temperature_of_Driest_Quarter People_Density presence
## 1 12.7331667 12254.709 1
## 2 3.5798333 2660.602 1
## 3 11.2356663 6340.192 1
## 4 8.8926668 3476.803 1
## 5 5.6725001 2975.141 1
## 6 0.2651667 1485.123 1
head(abs_points)
## cells Precipitation_of_Driest_Month Precipitation_Seasonality
## 7 369 4 40.15966
## 10 372 4 41.03106
## 76 953 8 50.68524
## 99 976 6 35.71460
## 113 990 1 56.36630
## 125 1160 10 44.67263
## Precipitation_of_Warmest_Quarter Mean_Diurnal_Range Temperature_Seasonality
## 7 50 11.20733 961.3560
## 10 53 11.66400 973.4099
## 76 38 10.90267 952.2430
## 99 63 11.21775 963.1484
## 113 40 12.11900 974.1218
## 125 54 10.48300 923.5392
## Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 7 -6.7246666 -15.0544996
## 10 3.5699999 -17.7334995
## 76 0.3978333 4.0085001
## 99 -9.3888330 -17.4855003
## 113 6.0609999 -10.2709999
## 125 -2.6106668 0.6716668
## People_Density x y
## 7 147.88110 74.70833 36.95833
## 10 46.96605 74.95833 36.95833
## 76 1340.48291 72.62500 36.70833
## 99 116.48557 74.54167 36.70833
## 113 61.76914 75.70833 36.70833
## 125 675.12177 72.95833 36.62500
<- abs_points[,c(10,11,1,2:9)]
abs_points $presence <- 0
abs_points
head(abs_points)
## x y cells Precipitation_of_Driest_Month
## 7 74.70833 36.95833 369 4
## 10 74.95833 36.95833 372 4
## 76 72.62500 36.70833 953 8
## 99 74.54167 36.70833 976 6
## 113 75.70833 36.70833 990 1
## 125 72.95833 36.62500 1160 10
## Precipitation_Seasonality Precipitation_of_Warmest_Quarter
## 7 40.15966 50
## 10 41.03106 53
## 76 50.68524 38
## 99 35.71460 63
## 113 56.36630 40
## 125 44.67263 54
## Mean_Diurnal_Range Temperature_Seasonality
## 7 11.20733 961.3560
## 10 11.66400 973.4099
## 76 10.90267 952.2430
## 99 11.21775 963.1484
## 113 12.11900 974.1218
## 125 10.48300 923.5392
## Mean_Temperature_of_Wettest_Quarter Mean_Temperature_of_Driest_Quarter
## 7 -6.7246666 -15.0544996
## 10 3.5699999 -17.7334995
## 76 0.3978333 4.0085001
## 99 -9.3888330 -17.4855003
## 113 6.0609999 -10.2709999
## 125 -2.6106668 0.6716668
## People_Density presence
## 7 147.88110 0
## 10 46.96605 0
## 76 1340.48291 0
## 99 116.48557 0
## 113 61.76914 0
## 125 675.12177 0
<- test
test_points <- abs_points
abs_points_points
<- rbind(test, abs_points) test
Let’s have a look at our points
Now we can fit our first GAM
##
## Family: binomial
## Link function: logit
##
## Formula:
## presence ~ s(x, y, bs = "ds") + s(Mean_Temperature_of_Wettest_Quarter) +
## s(Precipitation_Seasonality) + s(Precipitation_of_Warmest_Quarter) +
## s(Mean_Diurnal_Range) + s(Temperature_Seasonality) + s(Mean_Temperature_of_Wettest_Quarter) +
## s(Mean_Temperature_of_Driest_Quarter)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -16.31 15.17 -1.075 0.282
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(x,y) 7.325 8.898 17.144 0.0318 *
## s(Mean_Temperature_of_Wettest_Quarter) 1.000 1.000 0.919 0.3377
## s(Precipitation_Seasonality) 3.485 3.968 9.071 0.0463 *
## s(Precipitation_of_Warmest_Quarter) 2.591 3.315 5.355 0.1707
## s(Mean_Diurnal_Range) 2.241 2.784 2.580 0.3343
## s(Temperature_Seasonality) 1.000 1.000 2.688 0.1011
## s(Mean_Temperature_of_Driest_Quarter) 2.801 3.479 3.362 0.4275
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.646 Deviance explained = 71.4%
## -REML = 63.113 Scale est. = 1 n = 538
Non of our eight predictors is significant for Aglais caschmirensis. Let’s have a look at the predicted distribution
<- c(predict.gam(test_gam, newdata = raster_df, type = "response"))
test_pred $test <- as.vector(test_pred)
raster
<- rasterize(raster, pred_pak)
pred_test plot(pred_test$test, main = "Distribution of Aglais caschmirensis")
Now we wrap all these Steps into a function and apply it to all species in out list.
We use the function created in 1. to loop over our list
for (species_i in species) {
<- counting(data = species_list, species = species_i)
species_list[[species_i]] }
<- function(species) {
spec_dist
<- species_list[[species]]
test
<- test$cells
cell_ids_spec <- intersect(cell_ids_spec, cell_ids)
cell_ids_spec
<- !(cell_ids %in% cell_ids_spec)
id <- cell_ids[id]
cell_ids_spec_abs
set.seed(123)
<- runif(length(cell_ids_spec)*10, min = 1, max = length(cell_ids_spec_abs))
i <- cell_ids_spec_abs[i]
cell_ids_absence
<- raster_df[raster_df$cells %in% cell_ids_absence, ]
abs_points
<- abs_points[,c(10,11,1,2:9)]
abs_points $presence <- 0
abs_points
<- test
test_points <- abs_points
abs_points_points
<- rbind(test, abs_points)
test
<- gam(presence ~ s(x, y, bs = "ds") + s(Mean_Temperature_of_Wettest_Quarter) + s(Precipitation_Seasonality) +
test_gam s(Precipitation_of_Warmest_Quarter) + s(Mean_Diurnal_Range) + s(Temperature_Seasonality) +
s(Mean_Temperature_of_Wettest_Quarter) + s(Mean_Temperature_of_Driest_Quarter),
method = "REML", family = binomial, data = test)
#prediction
<- c(predict.gam(test_gam, newdata = raster_df, type = "response"))
test_pred $species <- as.vector(test_pred)
rasterreturn(raster)
}
Before we are able to calculate out species richness map we need to
for (species_i in names(species_list)) {
<- rasterize(species_list[[species_i]], pred_pak)$species
species <- species
species_list[[species_i]] }
Now we convert the list into a rasterstack and sum up all occurrences to finally get our richness map
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K. & A.G. Toxopeus (2014): Where is positional uncertainty a problem for species distribution modelling?. Ecography 37 (2): 191-203.
Ross, N. (2017): Nonlinear Models in R: The Wonderful World of mgcv. https://nyhackr.blob.core.windows.net/presentations/Nonlinear-Modeling-in-R-with-GAMs_Noam-Ross.pdf (Zugriff: 10.06.2021)
Simpson, G.(2020): Introduction to Generalized Additive Models with R and mgcv. https://github.com/gavinsimpson/intro-gam-webinar-2020/blob/master/README.md (Zugriff: 10.06.2021)
Wood, S.N., Pya N. & B. Saefken (2016): Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575.
Wood, S.N. (2006): Generalized additive models : an introduction with R. Texts in statistical science.
WorldClim: Fick, S.E. & R.J. Hijmans (2017): WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37 (12). 4302-4315.
People Density: WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] usdm_1.1-18 gratia_0.6.0 mgcv_1.8-35 nlme_3.1-152
## [5] rgdal_1.5-23 RColorBrewer_1.1-2 viridis_0.6.1 viridisLite_0.4.0
## [9] ggplot2_3.3.3 plyr_1.8.6 raster_3.4-10 sp_1.4-5
##
## loaded via a namespace (and not attached):
## [1] prettydoc_0.4.1 tidyselect_1.1.1 xfun_0.23 bslib_0.2.5.1
## [5] purrr_0.3.4 splines_4.1.0 lattice_0.20-44 colorspace_2.0-1
## [9] vctrs_0.3.8 generics_0.1.0 htmltools_0.5.1.1 yaml_2.2.1
## [13] utf8_1.2.1 rlang_0.4.11 jquerylib_0.1.4 pillar_1.6.1
## [17] glue_1.4.2 withr_2.4.2 lifecycle_1.0.0 stringr_1.4.0
## [21] munsell_0.5.0 gtable_0.3.0 mvnfast_0.2.7 codetools_0.2-18
## [25] evaluate_0.14 knitr_1.33 parallel_4.1.0 fansi_0.5.0
## [29] highr_0.9 Rcpp_1.0.6 scales_1.1.1 jsonlite_1.7.2
## [33] gridExtra_2.3 png_0.1-7 digest_0.6.27 stringi_1.6.1
## [37] dplyr_1.0.6 grid_4.1.0 tools_4.1.0 magrittr_2.0.1
## [41] sass_0.4.0 patchwork_1.1.1 tibble_3.1.2 tidyr_1.1.3
## [45] crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2 Matrix_1.3-3
## [49] rmarkdown_2.8 R6_2.5.0 compiler_4.1.0