Treeful Docs

Author

Christoph Trost

Published

June 20, 2023

About this Book

This book is the documentation for Treeful, an R Shiny application helping people plant trees based on species distribution modelling.

This project was funded from the German ministry for Education and Research from March 1st 2023 to September 1st 2023.

Who is behind this

Two data scientists as part of the Prototype Fund:

About this Project and technical Approach

The purpose is to build a shiny app that allows user to explore habitat shapes of trees in Europe. These habitat shapes are plots of existing trees from a merged database of close to 9 million trees. We extracted climatic variables present at each tree location. These shapes allow users to compare with their own location, a potential planting site, for the past and future.

Here’s an example what we mean with tree suitability: The scatter points are tree occurrences, with points from one location in te Harz mountains at different points in time, and 2070 projected. That place has already fallen out of the habitat of picea abies.

Infrastructure

The docker-compose file outlines the containers and is a good reference to start.

Code
version: '3'
services:
  db:
      build: "1_ETL/4_postgres/."
      image: "127.0.0.1:5000/treeful_db:0.0.2"
      ports:
        - "5432:5432"
      secrets:
        - postgres_pw
      environment:
        POSTGRES_DB: treeful-test
        POSTGRES_PW_FILE: /run/secrets/postgres_pw
        PGDATA: /var/lib/postgresql/data/pgdata
      volumes:
        -
          type: bind
          source: ./1_ETL/4_postgres
          target: /var/lib/postgresql/data
      deploy:
        resources:
            limits:
              cpus: '4'
              memory: 8096M
            reservations:
              cpus: '2'
              memory: 1024M
  frontend:
    build:
      context: "./3_shiny_frontend/"
      secrets:
          - postgres_pw
          - postgres_host
    image: "127.0.0.1:5000/treeful_frontend:0.0.3"
    volumes:
      - ./1_ETL/2_Data/1_output/tree_profiles:/srv/shiny-server/inst/app/tree_profiles
      - ./1_ETL/2_Data/0_raw_data:/srv/shiny-server/data
    ports:
      - "8080:3838"
    secrets:
      - postgres_pw
      - postgres_host
    environment:
      POSTGRES_DB: treeful-test
      POSTGRES_PW_FILE: /run/secrets/postgres_pw
      POSTGRES_HOST_FILE: /run/secrets/postgres_host
    deploy:
      resources:
          limits:
            cpus: '2'
            memory: 4096M
          reservations:
            cpus: '1'
            memory: 1024M
      # restart_policy:
      #   condition: on-failure
      #   delay: 5s
      #   max_attempts: 3
      #   window: 120s

  # etl:
  #   # building container to fetch tree dbs and raster data and stick into postgres
  #   build: 1_ETL/
  #   image: treeful_etl
  #   volumes:
  #     - .:/home/rstudio
  #   secrets:
  #     - postgres_pw
  #     - postgres_host
  #     - copernicus_key
  #     - copernicus_uid
  #     - gbif_email
  #     - gbif_pw
  #     - gbif_uid
  #     - keyring_pw
secrets:
      postgres_pw:
        file: ./1_ETL/0_secrets/postgres_pw.txt
      postgres_host:
        file: ./1_ETL/0_secrets/postgres_host.txt
      copernicus_key:
        file: ./1_ETL/0_secrets/copernicus_key.txt
      copernicus_uid:
        file: ./1_ETL/0_secrets/copernicus_uid.txt
      gbif_email:
        file: ./1_ETL/0_secrets/gbif_email.txt
      gbif_pw:
        file: ./1_ETL/0_secrets/gbif_pw.txt
      gbif_uid:
        file: ./1_ETL/0_secrets/gbif_uid.txt
      keyring_pw:
        file: ./1_ETL/0_secrets/keyring_pw.txt

ETL Container

This container runs through the main script, fetching tree location data, getting bioclimatic variables, extracting those from the tree locations and writing everything to Postgis.

Postgis Database Container

All processed data from the ETL pipeline ends up in here in a few tables. We chose not to write rasters to postgis.

Shiny Container

The actual app runs as docker swarm and connects to the Postgis database.

Treeful ETL Pipeline: Obtaining and transforming tree occurrence data

This section documents how tree occurrence data is obtained. At the end, we will have our tree locations with the corresponding bioclimatic variables in a postgis database.

Get Climate Projection data & other external datasets

First, we obtain data not included in this repository, namely EU Forest and Copernicus CDS raster files. These downloads will take a while. (Some datasets are neither downloaded, nor included in this repo. They’ve been shared privately with us)

We have used Copernicus CMIP5 regional projections, with RPC4.5 and RPC8.5 experiments from the NorESM1-M (NCC, Norway) model. The datasets can be built and obtained here. Our code contains a short helper script for the automated download (set env vars for CDS login before)

Code
#################### Trees4F from figshare ###################
if (!file.exists("2_Data/0_raw_data/EUforestspecies_AMauri.csv")) {
  download.file("https://springernature.figshare.com/ndownloader/files/6662535", destfile = "2_Data/0_raw_data/EUforestspecies_AMauri.csv")
}

#################### COPERNICUS CDS Download ###################
# REGISTER YOURSELF AND ENTER DETAILS FROM HERE:
# https://cds.climate.copernicus.eu/user/register
# Beware the user is the UID at the bottom of your user page.

uid <- Sys.getenv("COPERNICUS_UID")

wf_set_key(user=uid, # 
           key=Sys.getenv("COPERNICUS_KEY"), 
           service="cds")

request <- list(
  region = "europe",
  variable = c("annual_mean_temperature", "annual_precipitation", "isothermality", "maximum_temperature_of_warmest_month", 
               "mean_diurnal_range", "mean_temperature_of_coldest_quarter", "mean_temperature_of_driest_quarter", "mean_temperature_of_warmest_quarter", 
               "mean_temperature_of_wettest_quarter", "minimum_temperature_of_coldest_month", "precipitation_of_coldest_quarter", "precipitation_of_driest_month", 
               "precipitation_of_driest_quarter", "precipitation_of_warmest_quarter", "precipitation_of_wettest_month",
               "precipitation_of_wettest_quarter", "precipitation_seasonality", "temperature_annual_range", "temperature_seasonality", "volumetric_soil_water", "aridity"),
  derived_variable = "annual_mean",
  model = "noresm1_m",
  ensemble_member = "r1i1p1",
  experiment = c("rcp4_5", "rcp8_5"),
  statistic = "mean",
  version = "1.0",
  format = "zip",
  dataset_short_name = "sis-biodiversity-cmip5-regional",
  target = "download.zip"
)

ncfile <- wf_request(
  user = uid,
  request = request,   
  transfer = TRUE,  
  path = "2_Data/0_raw_data/future",
  verbose = FALSE
)


request <- list(
  region = "europe",
  origin = "era5",
  variable = c("annual_mean_temperature", "annual_precipitation", "isothermality", "maximum_temperature_of_warmest_month", "mean_diurnal_range", "mean_temperature_of_coldest_quarter", "mean_temperature_of_driest_quarter", "mean_temperature_of_warmest_quarter", "mean_temperature_of_wettest_quarter", "minimum_temperature_of_coldest_month", "precipitation_of_coldest_quarter", "precipitation_of_driest_month", "precipitation_of_driest_quarter", "precipitation_of_warmest_quarter", "precipitation_of_wettest_month", 
               "precipitation_of_wettest_quarter", "precipitation_seasonality", "temperature_annual_range", "temperature_seasonality", "volumetric_soil_water"),
  derived_variable = "annual_mean",
  statistic = "mean",
  version = "1.0",
  format = "zip",
  dataset_short_name = "sis-biodiversity-era5-regional",
  target = "download.zip"
)

ncfile <- wf_request(
  user = uid,
  request = request,   
  transfer = TRUE,  
  path = "2_Data/0_raw_data/past",
  verbose = FALSE
)


if (file.exists("2_Data/0_raw_data/past/download.zip")) {
  utils::unzip("2_Data/0_raw_data/past/download.zip", exdir = "2_Data/0_raw_data/past/")
  file.remove("2_Data/0_raw_data/past/download.zip")
}
if (file.exists("2_Data/0_raw_data/future/download.zip")) {
  utils::unzip("2_Data/0_raw_data/future/download.zip", exdir = "2_Data/0_raw_data/future/")
  file.remove("2_Data/0_raw_data/future/download.zip")
}

Getting Tree Location Data

To produce reliable climate envelopes we would like to obtain as many trees from various biogeographical regions of Europe. This section details how we process them and produces a database with these occurrences:

Tree occurences in treeful per source database

Note

We are not doing any sampling on tree occurrences. This means that small areas with many tree records will produce a higher density in the climate envelopes. In the current state of our app and plots, this does reflect to the user.

Academic data sources

We used the TRY plant trait database and EU Forest. Both contain tree locations and botanical names. Both are academic, high-quality datasets with extensive data cleaning in place. We used EU Forest to generate our master list of botanical names. These trees will subsequently be used. A simple fuzzy matching is used to ensure Sorbus Torminalis and Sorbus-torminalis are matched as Sorbus torminalis (We used a conservative string dist=1).

Code
############ Load Packages ############ 
if(!require(librarian)) install.packages("librarian")
library(librarian)
shelf(data.table,stringr,stringdist,fuzzyjoin,dplyr)

############ Define Function for fuzzy matching ############ 
flatten_names <- function(tree_name) {
  tree_name <-  str_replace_all(tree_name," ", "") %>%
    str_replace_all(.,"\\.","") %>%
    str_replace_all(.,"\\-","") %>%
    str_replace_all(.,"\\'","") %>%
    tolower(.)
}


get_fuzzy_plant_names <- function(plant_database, master_list, max_string_dist=3){
# Prepare for matching
# Extract unique names
database_names <- plant_database[,.(latin_name=unique(latin_name))]

# Flatten names
master_list[,flat_name:=flatten_names(latin_name)]
database_names[,flat_name:=flatten_names(latin_name)]

# Match
stringdist <- as.data.table(
                stringdist_join(database_names, master_list, 
                by = c("flat_name" = "flat_name"), 
                mode = "inner", 
                method = "dl", 
                max_dist = max_string_dist, 
                distance_col = "dist"))
# Clean matching
stringdist <- stringdist[,.(data_set_name=latin_name.x,master_list_name=latin_name.y,distance=dist)]
stringdist <- setorder(stringdist,-distance)
return(stringdist)
# As an example of possible other matching functions:
# amatch <-  amatch(database_names[,flat_name], master_list[,flat_name], maxDist = 0.1)
}
############ MAIN: Create master list of botanical names ############ 
# Load datasets
# Use trees4EU as master list, more coverage
master_list <- data.table::fread("2_Data/0_raw_data/EUforestspecies_AMauri.csv") %>% 
  janitor::clean_names() %>% 
  dplyr::select(latin_name = species_name) %>% 
  distinct()
master_list <- na.omit(master_list)

############ OPEN TREES DATA BASE ############ 
# Load and prepare database

# Load database
open_trees_db <- fread("2_Data/1_output/all_merged.csv")
setnames(open_trees_db,"species","latin_name")
open_trees_db <- na.omit(open_trees_db)

# Match db names
max_string_dist = 1 
open_trees_matching_table<- get_fuzzy_plant_names(open_trees_db, master_list, max_string_dist)
open_trees_matching_table_selection <- open_trees_matching_table[distance<=1,]
# Filter database
open_trees_db_selection <- open_trees_db[open_trees_matching_table_selection[,.(data_set_name,master_list_name)], on = c(latin_name = "data_set_name"), nomatch = NULL]
# open_trees_db_selection2 <- open_trees_db[latin_name %in% open_trees_matching_table_selection[,data_set_name]]

############ TRY DATA BASE ############ 
# Load database
if (file.exists("2_Data/0_raw_data/tree_georef_3.txt")) {
  try_trees <- data.table::fread("2_Data/0_raw_data/tree_georef_3.txt") %>% 
    janitor::clean_names()
  
  setnames(try_trees,"acc_species_name","latin_name")
  try_trees <- na.omit(try_trees)
  
  
  # Match db names
  max_string_dist = 1 
  try_trees_matching_table<- get_fuzzy_plant_names(try_trees, master_list, max_string_dist)
  try_trees_matching_table_selection <- try_trees_matching_table[distance<=1,]
  # Filter database
  try_trees_selection <- try_trees[try_trees_matching_table_selection[,.(data_set_name,master_list_name)], on = c(latin_name = "data_set_name"), nomatch = NULL]
  # open_trees_db_selection2 <- open_trees_db[latin_name %in% open_trees_matching_table_selection[,data_set_name]]
  
  
} else {
  try_trees_selection <- tibble(master_list_name = character(), 
                                tree_georef_1_std_value = numeric(),
                                obs_data_std_value = numeric())
}


############ Trees4F DATA BASE ############ 
# Load and prepare database

# Load database
trees4f_db <- data.table::fread("2_Data/0_raw_data/EUforestspecies_AMauri.csv") %>% 
  janitor::clean_names()

setnames(trees4f_db,"species_name","latin_name")
trees4f_db <- na.omit(trees4f_db)

# Match db names
max_string_dist = 3 
trees4f_db_matching_table<- get_fuzzy_plant_names(trees4f_db, master_list, max_string_dist)
trees4f_db_matching_table_selection <- trees4f_db_matching_table[distance<=1,]
# Filter database
trees4f_db_selection <- trees4f_db[trees4f_db_matching_table_selection[,.(data_set_name,master_list_name)], on = c(latin_name = "data_set_name"), nomatch = NULL]
# open_trees_db_selection2 <- open_trees_db[latin_name %in% open_trees_matching_table_selection[,data_set_name]]




# enhance master list with GBIF taxo IDs and write master list to file
tree_master_list <- master_list %>% 
  mutate(gbif_taxo_id = name_backbone_checklist(name=.$latin_name)$usageKey) %>% 
  # remove  unmatched or genus level taxo matches
  filter(str_length(gbif_taxo_id) > 5 & !is.na(gbif_taxo_id))

fwrite(tree_master_list,"2_Data/1_output/eu_native_trees_master.csv")
tree_master_list <- fread("2_Data/1_output/eu_native_trees_master.csv") 


rm(open_trees_db, open_trees_matching_table, open_trees_matching_table_selection, try_trees, try_trees_matching_table, 
   try_trees_matching_table_selection, trees4f_db, trees4f_db_matching_table, trees4f_db_matching_table_selection
   )

#EOF

We finish this file with rgbif::name_backbone_checklist() where we obtain the GBIF taxo ID for each botanical name. This is part of our master data list.

European Tree Cadastres

We wanted to include the proliferating corpus of tree cadastres of European cities. We collected two dozen or so data sources into this file. Since they’re all unharmonized, extensive data cleaning happens here (we did not deal with CRS transformations but only kept datasets with EPSG:4326):

Code
#############
# This script will load various tree location data into one. 
# Goal: have all in one db with harmonized botanical species name, X and Y coordinate in EPSG 4326


tree_dbs <- read_xlsx("2_Data/0_raw_data/opendata_trees.xlsx") %>% 
  janitor::clean_names() %>% 
  filter(suitable == "y" & epsg == "4326") %>% 
  mutate(location = janitor::make_clean_names(location)) %>% 
  mutate(botanical_col = tolower(botanical_col)) 
  # mutate(lon_col = janitor::make_clean_names(lon_col)) %>% 
  # mutate(lat_col = janitor::make_clean_names(lat_col)) 


tree_dbs <- data.table(tree_dbs)
for (i in tree_dbs[,file_name]) {
  utils::unzip(zipfile=paste("2_Data/0_raw_data/tree_cadastres/zip/",i,".zip",sep=""))
}

for (i in 1:nrow(tree_dbs)) {

  ifelse(str_detect(tree_dbs$file_name[i], "csv"),
         assign(tolower(tree_dbs$location[i]),
                janitor::clean_names(read_delim(paste0("2_Data/0_raw_data/tree_cadastres/", tree_dbs$file_name[i])))),
         assign(tolower(tree_dbs$location[i]),
                janitor::clean_names(bind_cols(jsonlite::read_json(paste0("2_Data/0_raw_data/tree_cadastres/", tree_dbs$file_name[i]), simplifyVector = TRUE)$features$properties,
                                     jsonlite::read_json(paste0("2_Data/0_raw_data/tree_cadastres/", tree_dbs$file_name[i]), simplifyVector = TRUE)$features$geometry)
                                     )
                )
  )
  print(i)
}



# # Compress all the csv files with gzip
# tree_dbs <- data.table(tree_dbs)
# for (i in tree_dbs[,file_name]) {
#      db <- fread(paste("2_Data/0_raw_data/tree_cadastres/",i,sep=""))
#      fwrite(db,paste("2_Data/0_raw_data/tree_cadastres/",i,".gz",sep=""),compress="gzip")
# }

# tree_dbs[,file_name:=paste(file_name,".gz",sep="")]
# write.xlsx2(tree_dbs, "2_Data/0_raw_data/opendata_trees.xlsx", sheetName = "Sheet1", 
#             col.names = TRUE, row.names = TRUE, append = FALSE)

# 
# tree_dbs <- data.table(tree_dbs)
# for (i in tree_dbs[,location]) {
#     print(i)
#     assign(tolower(i), paste(fread(paste("2_Data/0_raw_data/tree_cadastres/",tree_dbs[location==i,file_name],sep=""))))
# }
              
# too small data to automate. manually merge mutate
#split_species <- filter(tree_dbs, str_detect(botanical_col, "\\+"))
#mutate(tourcoing, merged_species = paste0(genre, " ", espece)) %>% write_csv(file = "data/opentrees/tourcoing.csv")

merged_geo <- filter(tree_dbs, lon_col == lat_col)
sep_geo <- filter(tree_dbs, lon_col != lat_col)


#######
# unmerge merged geographies
for (i in 1:nrow(merged_geo)) {
  temp_db <- get(tolower(merged_geo$location[i]), envir = globalenv()) %>% 
    dplyr::select(matches(paste0(merged_geo$botanical_col[i], "|", merged_geo$lon_col[i]), ignore.case = TRUE)) 
  
  temp_db <- temp_db %>% 
    separate(merged_geo$lon_col[i], into = c("x", "y"), sep = ",") %>% 
    mutate(x = str_remove(x, "c\\("), y = str_remove(y, "\\)")) %>% 
    rename(species = merged_geo$botanical_col[i])
  
  assign(tolower(merged_geo$location[i]), temp_db)
  print(i)
  print(merged_geo$location[i])
}

# treat regular lat lon sources
for (i in 1:nrow(sep_geo)) {
  temp_db <- get(tolower(sep_geo$location[i]), envir = globalenv()) %>% 
    dplyr::select(matches(paste0(sep_geo$botanical_col[i], "|", sep_geo$lon_col[i], "|", sep_geo$lat_col[i]), ignore.case = TRUE))
  
  temp_db <- temp_db %>% 
    rename(species = sep_geo$botanical_col[i], 
           x = sep_geo$lat_col[i],
           y = sep_geo$lon_col[i]
           )
  
  assign(tolower(sep_geo$location[i]), temp_db)
  print(i)
  print(sep_geo$location[i])
}

#bring it all together into one df
all_trees <- tibble()

for (i in 1:nrow(tree_dbs)) {
  temp_db <- get(tolower(tree_dbs$location[i]), envir = globalenv()) %>% 
    dplyr::select(x, y, species)
  
  temp_db <- temp_db %>% 
    mutate(x = as.numeric(x)) %>% 
    mutate(y = as.numeric(y)) %>% 
    mutate(db_city_origin = tree_dbs$location[i])
  
  assign(tolower(tree_dbs$location[i]), temp_db)
  all_trees <- bind_rows(all_trees, temp_db)
  print(i)
  print(tree_dbs$location[i])
}

all_trees_centroids <- all_trees %>% 
  st_as_sf(coords = c("y", "x"), na.fail = F) %>% 
  group_by(db_city_origin) %>% 
  summarise(st_union(geometry)) %>%
  st_centroid() 

#### save it all on disk, 4 million trees
write_csv(all_trees, "2_Data/1_output/all_merged.csv")


#"specie|especi|nom_lat|latboomsoort|espece|
#nom_cientific|wetenschappelijke_naam|classe|dendro_taxon_tid|
#  nome_scien|traeart|SPECIE|sortiment|essence_scient|gattung_lat|species|gattung|geo_point"

# all tree lists saved as csv in data folder. why csv?
# cause majority of opendata portal tree lists were in csv so its least cumbersome conversion
# also csv means less characters and therefore less data when things become very large. 

This writes a file with only botanical name, X, Y and source database for the next step.

GBIF

The largest amount of our tree data comes from GBIF. We simply query a bounding box of Europe for all species from our master list, enriched already with GBIF taxo IDs. You could re-generate the dataset with most recently added tree occurrences with this (GBIF takes a while to prepare the dataset and you need to have your env variables GBIF_EMAIL, GBIF_PWD, GBIF_USER configured):

gbif_download <- occ_download(
    pred_in("taxonKey", tree_master_list$gbif_taxo_id),
    #pred("taxonKey", 5284884),
    # this is the bounding box of europe
    pred_within("POLYGON((-15 75,-15 30,40 30,40 75,-15 75))"),
    pred_lt("coordinateUncertaintyInMeters",1000), #downstream processing needs 1km accuracy
    pred("hasCoordinate", TRUE),
    pred("hasGeospatialIssue", FALSE), # remove GBIF default geospatial issues
    pred("occurrenceStatus","PRESENT"),
    pred_gte("year", 1960), #only keep trees seen after 1960
    format = "SIMPLE_CSV")
  #
   occ_download_wait(gbif_download)

Getting Bioclimatic Variables

We tested CHELSA, worldclim and Copernicus CDS to get bioclimatic variables. We prefer Copernicus for their extensive documentation and the convenient time frame of past bioclimatic data, from 1979-2018. This is where we expect the largest overlap with the life span of our trees from the occurrence dataset.

Our functions getpastclimate() and getfutureclimate() can theoretically be used to switch bioclimatic data sources but we have not implemented those. These functions are used to read our Copernicus raster files.

Reading Climate Rasters into R

In order to extract bioclimatic variables from each species location, we’ll have to load raster files into R. The function described here will solve a few complexities:

  • We convert raster values into units fit for anlysis.
  • Stacked rasters need to be treated accordingly.
  • Efficiency matters greatly here, especially when it comes to RAM-efficient raster reading.

What could and probably should not be done here: CRS-reprojections. Sometimes you may get a different CRS. Reproject outside of your ETL pipeline and then read.

Code
##################### Function to Get  climate Rasters ##################
# In theory they can easily be adjusted to other climate raster providers. For some, snippets exist in the functions already but probably wont work 
# in the entire workflow of treeful. All processing here is only tested with Copernicus
# you pass a provider and a bioclimatic var like bio01 or bio13 to the function and it returns one raster. you can stack them later on. 


getpastclimate <- function(source = "copernicus", bioclim = "bio01") {
  if (source == "climateeu") {
    bio_path <- case_when(bioclim == "bio01" ~ "MAT",
                          bioclim == "bio12" ~ "MAP",
                          )
    
    bio_raster <- raster(paste0("data/climateEU/Normal_1961-1990_bioclim/", bio_path, ".asc"))

    raster::crs(bio_raster) <- "+proj=aea +lat_0=30 +lon_0=10 +lat_1=43 +lat_2=62 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs +type=crs"
    
    bio_raster <- raster::projectRaster(bio_raster, crs = 4326)
    
  }
  
  if (source == "copernicus") {
    # Get bioclimate data from copernicus. Download bioclimate file with login at https://cds.climate.copernicus.eu/
    
    bio_path <- toupper(bioclim)

    bio_raster <- terra::rast(paste0("2_Data/0_raw_data/past/", bio_path, "_era5-to-1km_1979-2018-mean_v1.0.nc"))
    # convert bioclim as per copernicus documentation. for some reasone case_when does not work here.     
    if (bioclim %in% c("bio01", "bio05", "bio06", "bio08", "bio09", "bio10", "bio11")) 
    {bio_raster <- bio_raster - 273.15
    } else if (bioclim == "bio12") {bio_raster <- bio_raster*3600*24*365*1000
    } else if (bioclim %in% c("bio13", "bio14")) {bio_raster <- bio_raster*3600*24*30.5*1000
    } else if (bioclim %in% c("bio16", "bio17", "bio18", "bio19")) {bio_raster <- bio_raster*3600*24*91.3*1000
    }
    # a bit unclear if bio13-bio19 can and should also be comverted like bio12. probably not as theyre not on annual reference period

    
  }
  # worldclim would be great cause it fetches all 19 bioclimatic at once. and has a great time range from 1970-2000. 
  # but units of each var are stange. 

  if (source == "worldclim") {
    
    bio_path <- case_when(bioclim == "bio01" ~ "bio_1",
                          bioclim == "bio02" ~ "bio_2",
                          bioclim == "bio03" ~ "bio_3",
                          bioclim == "bio04" ~ "bio_4",
                          bioclim == "bio05" ~ "bio_5",
                          bioclim == "bio06" ~ "bio_6",
                          bioclim == "bio07" ~ "bio_7",
                          bioclim == "bio08" ~ "bio_8",
                          bioclim == "bio09" ~ "bio_9",
                          bioclim == "bio10" ~ "bio_10",
                          bioclim == "bio11" ~ "bio_11",
                          bioclim == "bio12" ~ "bio_12",
                          bioclim == "bio13" ~ "bio_13",
                          bioclim == "bio14" ~ "bio_14",
                          bioclim == "bio15" ~ "bio_15",
                          bioclim == "bio16" ~ "bio_16",
                          bioclim == "bio17" ~ "bio_17",
                          bioclim == "bio18" ~ "bio_18",
                          bioclim == "bio19" ~ "bio_19"
    )
    
    bio_raster <- raster(paste0("2_Data/1_output/worldclim_cropped/wc2.1_30s_", bio_path, ".tif"))
    

  }
  
  if (source == "chelsa") {
    
    bio_path <- case_when(bioclim == "bio01" ~ "bio1",
                          bioclim == "bio12" ~ "bio12"
    )
    
    bio_raster <- raster(paste0("2_Data/1_output/CHELSA_cropped/CHELSA_", bio_path, "_1981-2010_V.2.1.tif"))
    
    if (bioclim == "bio01") {bio_raster <- raster::calc(bio_raster, function(x) { x / 10 - 273.15 })
    } else if (bioclim == "bio12") {
      bio_raster <- raster::calc(bio_raster, function(x) { x / 10})
    }
    
    
  }
  
  
  return(bio_raster)
  rm(bio_raster)
  
}


getsoilproperties <- function(variable = "STU_EU_DEPTH_ROOTS") {
  soil_layer <- terra::rast(paste0("2_Data/0_raw_data/soil/", variable, "_4326.tif"))
  return(soil_layer)
  rm(soil_layer)
}

Merging Tree Locations and Bioclimatic Variables

We now have millions of tree locations and rasters of past bioclimatic variables. We extract each bioclimatic variable for each tree location. For that, we first merge all tree data sources into a large simple feature object.

Code
#################### LOad DBs, merge, turn into SF, extract bioclim #############################

if (!file.exists("2_Data/1_output/tree_db.csv")) {
  
  ######################## turn name-matched data.tables into SF ###################
  trees4f_sf <- trees4f_db_selection %>% 
    sf::st_as_sf(coords = c("x", "y"), crs = 3035) %>% 
    sf::st_transform(crs = 4326) %>% 
    mutate(db = "trees4f")
  
  open_trees_sf <- open_trees_db_selection %>% 
    filter(!is.na(x) & !is.na(y)) %>% 
    st_as_sf(coords = c("y", "x"), crs = 4326) %>% 
    mutate(db = "cadastres")
  
  try_trees_sf <- try_trees_selection %>% 
    filter(!is.na(tree_georef_1_std_value) & !is.na(obs_data_std_value)) %>% 
    st_as_sf(coords = c("obs_data_std_value", "tree_georef_1_std_value"), crs = 4326) %>% 
    mutate(db = "try")
  
  gbif_trees_sf <- gbif_trees %>% 
    dplyr::select(taxonkey, x = decimallatitude, y = decimallongitude) %>% 
    left_join(dplyr::select(tree_master_list, master_list_name = latin_name, gbif_taxo_id), by = c("taxonkey" = "gbif_taxo_id")) %>% 
    filter(!is.na(x) & !is.na(y) & !is.na(master_list_name)) %>% 
    st_as_sf(coords = c("y", "x"), crs = 4326) %>% 
    mutate(db = "gbif")
  
  sendstatus("got all dbs and turned them into sf")
  rm(gbif_trees, trees4f_db_selection, open_trees_db_selection, try_trees_selection)
  ######################## bind all three sources into one ####################################
  
  tree_dbs <- rbind(
    as.data.table(trees4f_sf)[, .(master_list_name, db, geometry)],
    as.data.table(open_trees_sf)[, .(master_list_name, db, geometry)],
    as.data.table(try_trees_sf)[, .(master_list_name, db, geometry)],
    as.data.table(gbif_trees_sf)[, .(master_list_name, db, geometry)]
  )
  
  rm(gbif_trees_sf, open_trees_sf, try_trees_sf, trees4f_sf, try_species)
  gc()
  
  ######################### The heart of it all: getting bioclimatic vars for each tree ##########
  sendstatus(paste0("Cutting down tree db to species with n>1500, currently at ", nrow(tree_dbs)))
  
  tree_count <- tree_dbs %>% 
    group_by(master_list_name) %>% 
    summarise(n=n()) %>% 
    filter(n>1500)
  
  tree_dbs <- tree_dbs %>% 
    filter(master_list_name %in% tree_count$master_list_name)
  gc()
  
  sendstatus(paste0("Cut small species occurrences. Starting extraction for ", nrow(tree_dbs), " tree occurrences"))
  
  tree_dbs <- tree_dbs %>% 
    st_as_sf(crs = 4326)
  
  bioclim_stack <- c(
    getpastclimate(source = "copernicus", bioclim = "bio01"),
    getpastclimate(source = "copernicus", bioclim = "bio02"),
    getpastclimate(source = "copernicus", bioclim = "bio03"),
    getpastclimate(source = "copernicus", bioclim = "bio04"),
    getpastclimate(source = "copernicus", bioclim = "bio05"),
    getpastclimate(source = "copernicus", bioclim = "bio06"),
    getpastclimate(source = "copernicus", bioclim = "bio07"),
    getpastclimate(source = "copernicus", bioclim = "bio08"),
    getpastclimate(source = "copernicus", bioclim = "bio09"),
    getpastclimate(source = "copernicus", bioclim = "bio10"),
    getpastclimate(source = "copernicus", bioclim = "bio11"),
    getpastclimate(source = "copernicus", bioclim = "bio12"),
    getpastclimate(source = "copernicus", bioclim = "bio13"),
    getpastclimate(source = "copernicus", bioclim = "bio14"),
    getpastclimate(source = "copernicus", bioclim = "bio15"),
    getpastclimate(source = "copernicus", bioclim = "bio16"),
    getpastclimate(source = "copernicus", bioclim = "bio17"),
    getpastclimate(source = "copernicus", bioclim = "bio18"),
    getpastclimate(source = "copernicus", bioclim = "bio19")
  )
  sendstatus("read and stacked bioclim rasters. starting bioclim extraction")
  
  tree_dbs <- tree_dbs %>% 
    mutate(terra::extract(bioclim_stack, ., ID = F)) 
  
  rm(bioclim_stack)
  gc()
  
  ##### soil data extraction begin
  soil_stack <- c(getsoilproperties("STU_EU_DEPTH_ROOTS"),
                  getsoilproperties("STU_EU_T_CLAY"),
                  getsoilproperties("STU_EU_S_CLAY"),
                  getsoilproperties("STU_EU_T_SAND"),
                  getsoilproperties("STU_EU_S_SAND"),
                  getsoilproperties("STU_EU_T_SILT"),
                  getsoilproperties("STU_EU_S_SILT"),
                  getsoilproperties("STU_EU_T_OC"),
                  getsoilproperties("STU_EU_S_OC"),
                  getsoilproperties("STU_EU_T_BD"),
                  getsoilproperties("STU_EU_S_BD"),
                  getsoilproperties("STU_EU_T_GRAVEL"),
                  getsoilproperties("STU_EU_S_GRAVEL"),
                  getsoilproperties("SMU_EU_T_TAWC"),
                  getsoilproperties("SMU_EU_S_TAWC"),
                  getsoilproperties("STU_EU_T_TAWC"),
                  getsoilproperties("STU_EU_S_TAWC"))
  sendstatus("read and stacked soil rasters. starting soil extraction")
  tree_dbs <- tree_dbs %>% 
    mutate(terra::extract(soil_stack, ., ID = F)) %>% 
    mutate(across(.cols = starts_with(c("BIO", "STU", "SMU")), ~ round(.x, digits = 2), .names = "{.col}"))
  
  rm(soil_stack)
  gc()
  
  # tree_dbs <- tree_dbs %>% 
  #   st_drop_geometry()
  ################################ write it all to csv #################################
  sendstatus("saving DB to disk")
  data.table::fwrite(x = tree_dbs, file = "2_Data/1_output/tree_db.csv")
} else {
  sendstatus("tree db exists, reading from disk/n")
  tree_dbs <- fread("2_Data/1_output/tree_db.csv")
}

con <- backend_con()

if (!RPostgres::dbExistsTable(conn = con, name = "tree_dbs")) {
  # writing trees to postgres DB
  sendstatus("writing tree db to postgres")
  con <- backend_con()
  
  sf::st_write(tree_dbs, dsn = con, table = "tree_dbs",
               append = FALSE)
}


tree_db_sample_size <- group_by(tree_dbs, master_list_name) %>% 
  summarise(n=n())

tree_master_list <- fread("2_Data/1_output/eu_native_trees_master.csv") %>% 
  left_join(tree_db_sample_size, by = c("latin_name" = "master_list_name")) %>% 
  filter(n>1500)

sendstatus(paste0("Had ", nrow(tree_db_sample_size), " species, now down to ", nrow(tree_master_list)))

fwrite(tree_master_list, "2_Data/1_output/eu_native_trees_master.csv")

rm(tree_db_sample_size, tree_master_list)
gc()

DBI::dbDisconnect(conn = con)

##### EOF ####

What we found to work fastest and least memory intensive is stacking all rasters first and then running terra::extract()

  bioclim_stack <- c(
    getpastclimate(source = "copernicus", bioclim = "bio01"),
    getpastclimate(source = "copernicus", bioclim = "bio02")
    ######## etc #####
    ######## etc #####
  )
  
  tree_dbs <- tree_dbs %>% 
    mutate(terra::extract(bioclim_stack, ., ID = F)) 

Enrich Species Master Table

This section explains how to make your species distribution data more user friendly. For a specialist audience, you may be able to expose simply botanical names. Here we’re adding descriptions and images for species with a simple wikipedia scraper.

Code
library(librarian)
shelf(rvest)
# load list with botanical names and taxonomy IDs
tree_master_list <- fread("2_Data/1_output/eu_native_trees_master.csv")

# a simple function to search wikipedia for botanical names
wikipedia <- function(search_terms, lang = c("en", "de", "es", "fr")) {
    # if system language contains "en" use English Wikipedia version
    if (grepl("en", lang)) {
      return(paste0("https://en.wikipedia.org/w/index.php?search=", URLencode(search_terms)))
    }
    
    # if system language contains "de" use German Wikipedia version
    else if (grepl("de", lang)) {
      return(paste0("https://de.wikipedia.org/w/index.php?search=", URLencode(search_terms)))
    }
    
    
    # if "lang" is not specified and default system language is not
    # English, German, Spanish or French, use the English version
    else {
      return(paste0("https://en.wikipedia.org/w/index.php?search=", URLencode(search_terms)))
    }
  }

# scrape wikipedia for first paragraph and image
for (i in 1:nrow(tree_master_list)) {
  tree_url <- wikipedia(tree_master_list$latin_name[i], "de")
  tree_page <-  tree_url %>% 
    rvest::read_html()
  
  tree_tribble <- tibble::tribble(~descr_de, ~url, ~image_url,
                  rvest::html_text(rvest::html_element(tree_page, xpath = "/html/body/div[3]/div[3]/div[5]/div[1]/p[1]"), trim = TRUE),
                  tree_url,
                  rvest::html_attr(html_element(tree_page, xpath = "/html/body/div[3]/div[3]/div[5]/div[1]/table[1]/tbody/tr[2]/td/span/a/img"), "src")
                  ) 
  tree_master_list$descr_de[i] <- tree_tribble$descr_de
  tree_master_list$url[i] <- tree_tribble$url
  tree_master_list$image_url[i] <- tree_tribble$image_url
  cat("fetched ", tree_tribble$descr_de, " \n for ", tree_master_list$latin_name[i])
  # be kind to wikipedia
  Sys.sleep(1.2)
}

tree_master_list <- tree_master_list %>% 
  mutate(image_url = stringr::str_remove(image_url, "^\\/\\/"))

sendstatus("writing master list with drescriptions to postgres")

fwrite(tree_master_list, "2_Data/1_output/eu_native_trees_master.csv")
con <- backend_con()
RPostgres::dbWriteTable(con, "tree_master_list", tree_master_list, overwrite = TRUE)

DBI::dbDisconnect(conn = con)

Calculate Percentile Ranges from Mean

We have used the sampled and well documented dataset EU Forest to calculate percentiles ranges. Bioclimatic variables of tree occurrence do not yield a normal distribution, hence we are calculating the distance from mean for each species and variable. The Eu Forest dataset is sampled and contains enough occurrences, while occurrences from crowd-sourced efforts (such as GBIF) are more prone to distorted data (showing where trees have been mapped rather than where trees occur). We have used a cutoff of n>200 occurrences from EU Forest As per this source. Below are histograms for a few sample biclimatic variables: Annual BIO01 (Mean Temperature), BIO5 (Max Temperature of Warmest Month), BIO6 (Min Temperature of Coldest Month), BIO8 (Mean Temperature of Wettest Quarter), BIO9 (Mean Temperature of Driest Quarter), BIO12 (Annual Precipitation), BIO13 (Precipitation of Wettest Month), BIO14 (Precipitation of Driest Month)

Histogram of BIO01 of tree occurences

Histogram of BIO05 of tree occurences

Histogram of BIO12 of tree occurences

Histogram of BIO13 of tree occurences

Histogram of BIO14 of tree occurences

Make Raster data available to Shiny App

Tables created in this ETL pipeline are written to a PostGIS DB. To make raster files available to shiny you have two (actually more like one) options:

Option 1: Writing Raster files to PostGIS DB

The bioclimatic and soil rasters used in this project are several GBs large and usually do not fit into memory. This snipped reads them as raster stack, writes the stack to the PostGIS DB and removes it.

Warning

This section relies on the package rpostgis. Currently, there’s no simple other way to write raster data to postGIS from R, neither sf nor terra nor stars. See this discussion. We therefore switched to the approach below: the Shiny app reads values directly from raster files on disk. Slightly less performant for our use case.

Code
print("large data transfer out starting. Writing all Rasters as rasterstack to Postgres. WE DO NOT RECOMMEND THIS APPROACH")

con <- backend_con()

# 
# pastbio01 <- getpastclimate(source = "copernicus", bioclim = "bio01")
# 
# 
# pastbio12 <- getpastclimate(source = "copernicus", bioclim = "bio12")
# crs(pastbio12) <- "+proj=longlat +datum=WGS84 +no_defs +type=crs"
# # we set this proj here and it seems to stick to R raster object. 
# # when writing into postgis, there SRID appears to be 3395

########################## Past Raster ############################
if (!RPostgres::dbExistsTable(conn = con, name = "past")) {
  sendstatus("Reading in Copernicus past raster")
  past <- raster::stack(raster::raster(getpastclimate(source = "copernicus", bioclim = "bio01")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio02")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio03")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio04")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio05")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio06")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio07")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio08")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio09")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio10")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio11")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio12")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio13")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio14")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio15")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio16")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio17")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio18")),
                        raster::raster(getpastclimate(source = "copernicus", bioclim = "bio19"))
  )
  sendstatus("Writing Copernicus Past to DB")
  rpostgis::pgWriteRast(con,
                        name = "past", raster = past, overwrite = TRUE
  )
  rm(past)
  gc()
  } else {
    cat("Raster layers exist in Postgres. Skipping. ")
  }
  
########################## Future 2050 4.5 Raster ############################
if (!RPostgres::dbExistsTable(conn = con, name = "future_2050_4_5")) {
  sendstatus("starting with reading Copernicus Future 2050 4.5")
  future <- raster::stack(raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio01", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio02", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio03", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio04", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio05", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio06", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio07", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio08", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio09", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio10", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio11", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio12", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio13", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio14", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio15", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio16", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio17", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio18", experiment = "rcp45", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio19", experiment = "rcp45", future_date = "2050-01-01"))
  )
  
  sendstatus("Writing Copernicus Future 2050 4.5")
  rpostgis::pgWriteRast(con,
                        name = "future_2050_4_5", raster = future, overwrite = TRUE
  )
  rm(future)
  gc()
  } else {
    cat("Raster layers exist in Postgres. Skipping. ")
}
  
  
########################## Future 2050 8.5 Raster ############################
if (!RPostgres::dbExistsTable(conn = con, name = "future_2050_8_5")) {
  sendstatus("starting with reading Copernicus Future 2050 8.5")
  future <- raster::stack(raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio01", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio02", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio03", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio04", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio05", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio06", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio07", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio08", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio09", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio10", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio11", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio12", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio13", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio14", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio15", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio16", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio17", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio18", experiment = "rcp85", future_date = "2050-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio19", experiment = "rcp85", future_date = "2050-01-01"))
  )
  
  sendstatus("Writing Copernicus Future 2050 8.5")
  rpostgis::pgWriteRast(con,
                        name = "future_2050_8_5", raster = future, overwrite = TRUE
  )
  rm(future)
  gc()
} else {
  cat("Raster layers exist in Postgres. Skipping. ")
}

########################## Future 2070 4.5 Raster ############################
if (!RPostgres::dbExistsTable(conn = con, name = "future_2070_4_5")) {
  sendstatus("starting with reading Copernicus Future 2070 4.5")
  future <- raster::stack(raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio01", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio02", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio03", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio04", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio05", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio06", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio07", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio08", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio09", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio10", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio11", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio12", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio13", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio14", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio15", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio16", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio17", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio18", experiment = "rcp45", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio19", experiment = "rcp45", future_date = "2070-01-01"))
  )
  
  sendstatus("Writing Copernicus Future 2070 4.5")
  rpostgis::pgWriteRast(con,
                        name = "future_2070_4_5", raster = future, overwrite = TRUE
  )
  rm(future)
  gc()
} else {
  cat("Raster layers exist in Postgres. Skipping. ")
}


########################## Future 2070 8.5 Raster ############################
if (!RPostgres::dbExistsTable(conn = con, name = "future_2070_8_5")) {
  sendstatus("starting with reading Copernicus Future 2070 8.5")
  future <- raster::stack(raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio01", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio02", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio03", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio04", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio05", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio06", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio07", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio08", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio09", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio10", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio11", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio12", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio13", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio14", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio15", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio16", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio17", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio18", experiment = "rcp85", future_date = "2070-01-01")),
                          raster::raster(getfutureclimate(source = "copernicus", bioclim = "bio19", experiment = "rcp85", future_date = "2070-01-01"))
  )
  
  sendstatus("Writing Copernicus Future 2070 8.5")
  rpostgis::pgWriteRast(con,
                        name = "future_2070_8_5", raster = future, overwrite = TRUE
  )
  rm(future)
  gc()
} else {
  cat("Raster layers exist in Postgres. Skipping. ")
}
  
  
  print("Reading in Soil Rasters")
  soil <- raster::stack(raster::raster(getsoilproperties("STU_EU_DEPTH_ROOTS")),
                        raster::raster(getsoilproperties("STU_EU_T_CLAY")),
                        raster::raster(getsoilproperties("STU_EU_S_CLAY")),
                        raster::raster(getsoilproperties("STU_EU_T_SAND")),
                        raster::raster(getsoilproperties("STU_EU_S_SAND")),
                        raster::raster(getsoilproperties("STU_EU_T_SILT")),
                        raster::raster(getsoilproperties("STU_EU_S_SILT")),
                        raster::raster(getsoilproperties("STU_EU_T_OC")),
                        raster::raster(getsoilproperties("STU_EU_S_OC")),
                        raster::raster(getsoilproperties("STU_EU_T_BD")),
                        raster::raster(getsoilproperties("STU_EU_S_BD")),
                        raster::raster(getsoilproperties("STU_EU_T_GRAVEL")),
                        raster::raster(getsoilproperties("STU_EU_S_GRAVEL")),
                        raster::raster(getsoilproperties("SMU_EU_T_TAWC")),
                        raster::raster(getsoilproperties("SMU_EU_S_TAWC")),
                        raster::raster(getsoilproperties("STU_EU_T_TAWC")),
                        raster::raster(getsoilproperties("STU_EU_S_TAWC"))
  )
  
  
  print("Writing Soil to DB")
  rpostgis::pgWriteRast(con,
                        name = "soil", raster = soil, overwrite = TRUE
  )
  rm(soil)
  gc()
  




DBI::dbDisconnect(conn = con)

Option 2: Mount raster files into Shiny container

The raster files downloaded from Copernicus CDS can simply be mounted into the Docker container. See our docker-compose for details on this.

Shiny App Development

The shiny app itself is developed as package using the golem framework. Most of the server and ui code explains itself.

Database Deployment

We use the docker image postgis/postgis and only add raster extension to it with a short init script.

Code
from postgis/postgis
# need to enable raster extentions


COPY ./initialize-raster.sh /docker-entrypoint-initdb.d/initialize-raster.sh
# CREATE EXTENSION postgis_raster;
# SET postgis.gdal_enabled_drivers = 'ENABLE_ALL';
# SELECT name, default_version,installed_version
# FROM pg_available_extensions WHERE name LIKE 'postgis%' or name LIKE 'address%';

Shiny App Deployment

Running the package barebone in a Docker container

This approach means the R process will simply launch the shiny app. For a more robust deployment, use Shiny Server.

  • Generate the Dockerfile from the app. Golem handles this with golem::add_dockerfile() but you need to edit the Dockerfile by hand. When adding new packages, ensure they’re at the end of the Dockerfile in order to leverage build caching.
  • On your host: clone the repository with git clone https://github.com/3ful/treeful.git and move to the directory with cd ./treeful
  • Run stack with swarm and compose
    • launch your docker swarm: docker swarm init
    • If this is the first time running, the Postgis DB will be empty and you need to supply a first user password. While ETL is running the Shiny will not find any data.
    • In order to serve the same image to all nodes you need a docker registry. Create one with: docker service create --name registry --publish published=5000,target=5000 registry:2
    • Push images to registry: docker compose push
    • Now, deploy your swarm: docker stack deploy --compose-file docker-compose.yml treeful
    • Now you can easily scale up your shiny app to 3 replicas with docker service scale treeful_frontend=3
    • You can watch visitors of your frontend app with docker service logs treeful_frontend -f

Running in Shiny Server

This is the better option for serving the app to multiple users. Our Dockerfile in 3_shiny_frontend uses this approach. Important aspects:

  • You can use rocker images, such as shiny-verse.
  • Make the folder /srv/shiny-server readable by user shiny. Otherwise cache folder cannot be written and performance might suffer.
  • When building the frontend container: the database container needs to be running and you need to make sure secrets are mounted and available during the package install. Remember, the shiny app is a package.
  • golem::add_shinyserver_file() will generate an app.R file. Shiny server needs that.

Running in Shinyproxy

We abandoned this option. The main reason was that shinyproxy spins up containers for each user. Since this app requires a fair amount of Ram, it didn’t serve our needs. If you need authentication, this is your option.