Scaling Up: working with multiple subsets or multiple datasets
Source:vignettes/multiple_datasets.Rmd
multiple_datasets.Rmd
library(ManyEcoEvo)
#> Loading required package: rmarkdown
#> Loading required package: bookdown
#> Registered S3 method overwritten by 'parsnip':
#> method from
#> print.nullmodel vegan
#> Registered S3 method overwritten by 'lava':
#> method from
#> print.estimate EnvStats
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
library(tidyr)
1 A tidy approach with list-columns and nested dataframes
list-columns tidy modelling approach.
1.1 Working with Multiple datasets
Multiple different datasets that you might want to replicate the same analyses and compare. E.g. Blue tit vs. Eucalyptus.
Demonstrate the approach of the analysis pipeline for ManyEcoEvo just using the blue tit and Eucalyptus data -> make_viz()
etc.
1.2 Creating data subsets based on various exclusion principles
Generate different subsets:
- generate_exclusion_subsets()
- generate_expertise_subsets()
- generate_outlier_subsets()
- generate_rating_subsets()
- generate_yi_subsets()
1.3 Out-of-sample predictions
Note that the function generate_exclusion_subsets()
does not currently need to be executed on the ManyEcoEvo_yi
dataset since The default subsetting functions called in subset_fns_yi()
by this function don’t result in any different subsets of data: