Scaling Up: working with multiple subsets or multiple datasets • ManyEcoEvo

library(ManyEcoEvo)
#> Loading required package: rmarkdown
#> Loading required package: bookdown
#> Registered S3 method overwritten by 'parsnip':
#>   method          from 
#>   print.nullmodel vegan
#> Registered S3 method overwritten by 'lava':
#>   method         from    
#>   print.estimate EnvStats
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(tidyr)

1 A tidy approach with list-columns and nested dataframes

list-columns tidy modelling approach.

1.1 Working with Multiple datasets

Multiple different datasets that you might want to replicate the same analyses and compare. E.g. Blue tit vs. Eucalyptus.

Demonstrate the approach of the analysis pipeline for ManyEcoEvo just using the blue tit and Eucalyptus data -> make_viz() etc.

1.2 Creating data subsets based on various exclusion principles

Generate different subsets: - generate_exclusion_subsets() - generate_expertise_subsets() - generate_outlier_subsets() - generate_rating_subsets() - generate_yi_subsets()

1.3 Out-of-sample predictions

Note that the function generate_exclusion_subsets() does not currently need to be executed on the ManyEcoEvo_yi dataset since The default subsetting functions called in subset_fns_yi() by this function don’t result in any different subsets of data:

ManyEcoEvo_yi %>% 
  hoist(data, "exclusions_all",.transform = unique) %>% 
  select(-contains("data"))
#> # A tibble: 2 × 1
#>   exclusions_all
#>   <chr>         
#> 1 retain        
#> 2 retain