SM B: Effect Size Analysis

Code
library(targets)
library(withr)
library(here)
library(tidyverse)
library(performance)
library(broom.mixed)
library(gt)
library(lme4)
library(MuMIn)
library(ManyEcoEvo)
library(ggrepel)
set.seed(1234)
Code
ManyEcoEvo_results <- 
  ManyEcoEvo_results %>% 
  mutate(effects_analysis = 
           map(effects_analysis, 
               rename, 
               id_col = study_id)) #%>% 
  # mutate_at(c("data", 
  #             "diversity_data", 
  #             "diversity_indices", 
  #             "effects_analysis"),
  #           .funs = ~ map(.x, anonymise_teams,
  #                         TeamIdentifier_lookup))

B.1 Meta-analysis

B.1.1 Effect Sizes \(Z_r\)

B.1.1.1 Effect of categorical review rating

The figures below (Figure B.1,Figure B.2) shows the fixed effect of categorical review rating on deviation from the meta-analytic mean. There is very little difference in deviation for analyses in any of the review categories. It is worth noting that each analysis features multiple times in these figures corresponding to the multiple reviewers that provided ratings.

Figure B.1: Orchard plot of meta-analytic model fitted to all eucalyptus analyses with a fixed effect for categorical peer-review ratings, and random effects for analyst ID and reviewer ID. Black circles denote coefficient mean for each categorical publishability rating. Thick error bars represent 95% confidence intervals and whiskers indicate 95% prediction intervals. Effect sizes are represented by circles and their size corresponds to the precision of the estimate.
Figure B.2: Orchard plot of meta-analytic model fitted to all blue tit analyses with a fixed effect for categorical peer-review ratings, and random effects for analyst ID and reviewer ID. Black circles denote coefficient mean for each categorical publishability rating. Thick error bars represent 95% confidence intervals and whiskers indicate 95% prediction intervals. Effect sizes are represented by circles and their size corresponds to the precision of the estimate.

B.1.1.2 Post-hoc analysis: Exploring the effect of removing analyses with poor peer-review ratings on heterogeneity

In Figure B.3 we display the results of our post-hoc analysis, examining the effect of removing analyses that were reviewed at least once as being ‘unpublishable’, ‘unpublishable’ or requiring ‘major revisions’, as compared with retaining the full set of analyses. Removing these analyses from the blue tit data had little impact on the overall amount of deviation or the distribution of the results. For the Eucalytpus analyses, removing ‘unpublishable’ analyses meant dropping the outlier ‘Brooklyn-2-2-1’ which made a substantial difference to the amount of observerd deviation from the meta-analytic mean.

Figure B.3: Forest plots of meta-analytic estimated standardized effect sizes (\(Zr\), blue circles) and their 95% confidence intervals for each effect size included in the meta-analysis model. The meta-analytic mean effect size is denoted by a black triangle and a dashed vertical line, with error bars also representing the 95% confidence interval. The solid black vertical line demarcates effect size of 0, indicating no relationship between the test variable and the response variable. The left side of each panel shows the analysis team names (anonymous arbitrary names assigned by us), each followed by three numbers. The first number is the submission ID (some analyst teams submitted results to us on >1 submission form), the second number is the analysis ID (some analyst teams included results of >1 analysis in a given submission), and the third number is the effect ID (some analysts submitted values for >1 effect per analysis). Thus, each row in each forest plot is uniquely identified, but it is possible to determine which effects come from which analyses and which analysis teams. The plots in the top row depict effects from analyses of blue tit data, and the bottom row plots depict effects from analyses of Eucalyptus data. The right-most plots depict all usable effect sizes. The, plots on the left side exclude effects from analysis sets that received at least one rating of “unpublishable” from peer reviewers, and the plots in the middle exclude effects from analysis sets that received at least one rating of either “unpublishable” or “major revision” from peer reviewers.

B.1.1.3 Post-hoc analysis: Exploring the effect of excluding estimates in which we had reduced confidence

For each dataset (blue tit, Eucalyptus), we created a second, more conservative version, that excluded effects based on estimates of \(df\) that we considered less reliable (Table B.1). We compared the outcomes of analyses of the primary dataset (constituted according to our registered plan) with the outcomes of analyses of the more conservative version of the dataset. We also compared results from analyses of both of these versions of the dataset to versions with our post-hoc removal of outliers described in the main text. Our more conservative exclusions (based on unreliable estimates of \(df\)) had minimal impact on the meta-analytic mean for both blue tit and Eucalyptus analyses, regardless of whether outliers were excluded (Table B.1).

Code
ManyEcoEvo_viz %>% 
    dplyr::filter(estimate_type == "Zr", 
                  model_name == "MA_mod") %>% 
  hoist(tidy_mod_summary) %>% 
  unnest(tidy_mod_summary) %>% 
    filter(publishable_subset == "All", expertise_subset == "All") %>% 
  select(-publishable_subset, -expertise_subset) %>% 
  select(dataset, 
         exclusion_set, 
         estimate, 
         std.error, 
         statistic, 
         p.value, 
         starts_with("conf")) %>% 
  mutate(exclusion_set = 
           case_when(exclusion_set == "complete" ~ 
                       "Primary dataset",
                     exclusion_set == "complete-rm_outliers" ~ 
                       "Primary dataset, outliers removed",
                     exclusion_set == "partial" ~ 
                       "Conservative exclusions",
                     TRUE ~ "Conservative exclusions, outliers removed")) %>% 
group_by(exclusion_set) %>% 
  gt::gt() %>% 
  gt::opt_stylize(style = 6, color = "gray") %>% 
  gt::fmt(columns = "p.value",
          fns = function(x) gtsummary::style_pvalue(x, prepend_p = FALSE)) %>% 
  gt::fmt_number(columns = c(-p.value, -dataset)) %>% 
  gt::cols_label(estimate = gt::md("$$\\hat\\mu$$"), 
                 std.error = gt::md("$$SE[\\hat\\mu]$$"),
                 conf.low = gt::md("95\\%CI")) %>% 
  gt::cols_merge(columns = starts_with("conf"), 
                 pattern = "[{1},{2}]") %>% 
  gt::cols_move(columns = conf.low, after = std.error) 
Table B.1: Estimated meta-analytic mean, standard error, and 95% confidence intervals, from analyses of the primary data set, the more conservative version of the dataset which excluded effects based on less reliable estimates of \(df\), and both of these datasets with outliers removed.
dataset $$\hat\mu$$ $$SE[\hat\mu]$$ 95%CI statistic p.value
Primary dataset
blue tit −0.35 0.03 [−0.41,−0.28] −10.49 <0.001
eucalyptus −0.09 0.06 [−0.22,0.03] −1.47 0.14
Conservative exclusions
blue tit −0.36 0.03 [−0.42,−0.29] −10.50 <0.001
eucalyptus −0.11 0.07 [−0.24,0.03] −1.55 0.12
Primary dataset, outliers removed
blue tit −0.35 0.03 [−0.42,−0.29] −10.95 <0.001
eucalyptus −0.03 0.01 [−0.06,0.00] −2.23 0.026
Conservative exclusions, outliers removed
blue tit −0.36 0.03 [−0.43,−0.30] −11.09 <0.001
eucalyptus −0.04 0.02 [−0.07,−0.01] −2.52 0.012
Code
plot_forest <- function(data, intercept = TRUE, MA_mean = TRUE ){
  if(MA_mean == FALSE){
    data <- filter(data, term != "Overall")
  }
  
    p <- ggplot(data, aes(y = term, 
                        x =  estimate, 
                        ymin = conf.low, 
                        ymax = conf.high,
                        # shape = point_shape,
                        colour = parameter_type)) +
    geom_pointrange() +
    ggforestplot::theme_forest() +
    theme(axis.line = element_line(size = 0.10, colour = "black"),
          axis.line.y = element_blank(),
          text = element_text(family = "Helvetica"),
          axis.text.y = element_blank()) +
    guides(shape = "none", colour = "none") +
    coord_flip() +
    labs(y = "Standardised Effect Size, Zr",
         x = element_blank()) +
    scale_x_continuous(breaks = c(-4,-3,-2,-1,0,1),
                       minor_breaks = seq(from = -4.5, to = 1.5, by = 0.5)) +
    NatParksPalettes::scale_color_natparks_d("Glacier")
    
    if(intercept == TRUE){
      p <- p + geom_hline(yintercept = 0)
    }
    if(MA_mean == TRUE){
      p <- p + geom_hline(aes(yintercept = meta_analytic_mean), 
                          data = data,
                          colour = "#01353D", 
                          linetype = "dashed")
    }
    
  return(p)
}
Code
complete_euc_data <- 
  ManyEcoEvo_viz %>% 
  filter(exclusion_set == "complete", 
         estimate_type == "Zr", 
         model_name == "MA_mod",
         dataset == "eucalyptus",
         publishable_subset == "All") %>% 
  select(model) %>% 
  mutate(plot_data = map(model, 
                         .f = ~ broom::tidy(.x, 
                                            conf.int = TRUE, 
                                            include_studies = TRUE) %>% 
                           dplyr::mutate(point_shape = 
                                           ifelse(stringr::str_detect(term, "overall"), 
                                                  "diamond", 
                                                  "circle"),
                                         Parameter = 
                                           forcats::fct_reorder(term, 
                                                                estimate) %>% 
                                           forcats::fct_reorder(., 
                                                                point_shape,
                                                                .desc = TRUE))
  ),
  meta_analytic_mean = map_dbl(plot_data, 
                               ~ filter(.x, Parameter == "overall") %>% 
                                 pull(estimate))) %>% 
  select(plot_data, meta_analytic_mean) %>% 
  unnest(cols = c("plot_data")) %>% 
  mutate(parameter_type = case_when(str_detect(Parameter, "overall") ~ "mean",
                                    TRUE ~ "study"))

# complete_euc_data <- 
#   complete_euc_data %>% 
#   rename(id_col = term) %>% 
#   group_by(type) %>%  
#   group_split() %>% 
#   set_names(., complete_euc_data$type %>%  unique) %>% 
#   # map_if(.x = ., names(.) == "study",
#          # .f = ~ anonymise_teams(.x, TeamIdentifier_lookup)) %>% 
#   bind_rows() %>% 
#   rename(term = id_col)

min_outlier_euc <- complete_euc_data %>% 
  filter(type == "study") %>% 
  slice_min(estimate, n = 3) %>% 
  pull(term)

sample_size_euc_Zr <- ManyEcoEvo_results %>% 
    filter(exclusion_set == "complete", dataset == "eucalyptus") %>% 
    pluck("data", 1) %>% 
    select(id_col, sample_size) %>% 
    rename(term = id_col) %>% 
    mutate(sample_size = as.numeric(sample_size))

mean_n_euc_Zr <- sample_size_euc_Zr %>% 
  drop_na(sample_size) %>% 
  pull(sample_size) %>% 
  mean() %>% 
  round(2)

N_outliers_Zr_euc <- sample_size_euc_Zr %>% 
  filter(term %in% min_outlier_euc) %>% 
   arrange(desc(sample_size))

B.1.1.4 Post-hoc analysis: Exploring the effect of including only analyses conducted by analysis teams with at least one member self-rated as “highly proficient” or “expert” in conducting statitistical analyses in their research area

Code
plot_forest <- function(data, intercept = TRUE, MA_mean = TRUE){
  if (MA_mean == FALSE){
    data <- filter(data, Parameter != "overall")
  }
  
  p <- ggplot(data, aes(y = estimate, 
                        x =  term, 
                        ymin = conf.low, 
                        ymax = conf.high,
                        shape = parameter_type,
                        colour = parameter_type)) +
    geom_pointrange(fatten = 2) +
    ggforestplot::theme_forest() +
    theme(axis.line = element_line(linewidth = 0.10, colour = "black"),
          axis.line.y = element_blank(),
          text = element_text(family = "Helvetica")#,
          # axis.text.y = element_blank()
    ) +
    guides(shape = guide_legend(title = NULL), 
           colour = guide_legend(title = NULL)) +
    coord_flip() +
    ylab(bquote(Standardised~Effect~Size~Z[r])) +
    xlab(element_blank()) +
    # scale_y_continuous(breaks = c(-4,-3,-2,-1,0,1),
    # minor_breaks = seq(from = -4.5, to = 1.5, by = 0.5)) +
    NatParksPalettes::scale_color_natparks_d("Glacier")
  
  if(intercept == TRUE){
    p <- p + geom_hline(yintercept = 0)
  }
  if(MA_mean == TRUE){
    p <- p + geom_hline(aes(yintercept = meta_analytic_mean), 
                        data = data,
                        colour = "#01353D", 
                        linetype = "dashed")
  }
  
  return(p)
}

bt_experts_only <- 
  ManyEcoEvo_viz %>% 
  filter(exclusion_set == "complete", 
         estimate_type == "Zr", 
         model_name == "MA_mod",
         dataset == "blue tit",
         publishable_subset == "All", 
         expertise_subset == "expert") %>% 
  select(model) %>% 
  mutate(plot_data = map(model, 
                         .f = ~ broom::tidy(.x, 
                                            conf.int = TRUE, 
                                            include_studies = TRUE)%>% 
                           dplyr::mutate(point_shape = 
                                           ifelse(stringr::str_detect(term, "overall"), 
                                                  "diamond", 
                                                  "circle"),
                                         Parameter = 
                                           forcats::fct_reorder(term, 
                                                                estimate) %>% 
                                           forcats::fct_reorder(., 
                                                                point_shape,
                                                                .desc = TRUE))
  ),
  meta_analytic_mean = map_dbl(plot_data, 
                               ~ filter(.x, Parameter == "overall") %>% 
                                 pull(estimate))) %>% 
  select(plot_data, meta_analytic_mean) %>% 
  unnest(cols = c("plot_data")) %>% 
  mutate(parameter_type = case_when(str_detect(Parameter, "overall") ~ "mean",
                                    TRUE ~ "study")) 

# bt_experts_only <- 
#   bt_experts_only %>% 
#   rename(id_col = term) %>% 
#   group_by(type) %>%  
#   group_split() %>% 
#   set_names(., bt_experts_only$type %>%  unique) %>% 
#   # map_if(.x = ., names(.) == "study",
#          # .f = ~ anonymise_teams(.x, TeamIdentifier_lookup)) %>% 
#   bind_rows() %>% 
#   rename(term = id_col)

bt_forest_experts <- bt_experts_only %>% 
  arrange(desc(type)) %>% 
  mutate(type = forcats::as_factor(type)) %>% 
  group_by(type) %>% 
  arrange(desc(estimate),.by_group = TRUE) %>% 
  mutate(term = forcats::as_factor(term),
         point_shape = case_when(str_detect(type, "summary") ~ "mean",
                                 TRUE ~ "study")) %>% 
  plot_forest(intercept = TRUE, MA_mean = TRUE) +
  theme(axis.text.x = element_text(size = 15), 
        axis.title.x = element_text(size = 15),
        axis.text.y = element_blank()
  ) +
  scale_y_continuous(limits = c(-1.6, 0.65)) 

euc_experts_only <- 
  ManyEcoEvo_viz %>% 
  filter(exclusion_set == "complete", 
         estimate_type == "Zr", 
         model_name == "MA_mod",
         dataset == "eucalyptus",
         publishable_subset == "All",
         expertise_subset == "expert") %>% 
  select(model) %>% 
  mutate(plot_data = map(model, 
                         .f = ~ broom::tidy(.x, 
                                            conf.int = TRUE, 
                                            include_studies = TRUE) %>% 
                           dplyr::mutate(point_shape = 
                                           ifelse(stringr::str_detect(term, "overall"), 
                                                  "diamond", 
                                                  "circle"),
                                         Parameter = 
                                           forcats::fct_reorder(term, 
                                                                estimate) %>% 
                                           forcats::fct_reorder(., 
                                                                point_shape,
                                                                .desc = TRUE))
  ),
  meta_analytic_mean = map_dbl(plot_data, 
                               ~ filter(.x, Parameter == "overall") %>% 
                                 pull(estimate))) %>% 
  select(plot_data, meta_analytic_mean) %>% 
  unnest(cols = c("plot_data")) %>% 
  mutate(parameter_type = case_when(str_detect(Parameter, "overall") ~ "mean",
                                    TRUE ~ "study"))


# euc_experts_only <- 
#   euc_experts_only %>% 
#   rename(id_col = term) %>% 
#   group_by(type) %>%  
#   group_split() %>% 
#   set_names(., euc_experts_only$type %>%  unique) %>% 
#   # map_if(.x = ., names(.) == "study",
#          # .f = ~ anonymise_teams(.x, TeamIdentifier_lookup)) %>% 
#   bind_rows() %>% 
#   rename(term = id_col)

euc_forest_experts <- euc_experts_only %>% 
  arrange(desc(type)) %>% 
  mutate(type = forcats::as_factor(type)) %>% 
  group_by(type) %>% 
  arrange(desc(estimate),.by_group = TRUE) %>% 
  mutate(term = forcats::as_factor(term),
         point_shape = case_when(str_detect(type, "summary") ~ "mean",
                                 TRUE ~ "study")) %>% 
  plot_forest(intercept = TRUE, MA_mean = TRUE) +
  theme(axis.text.x = element_text(size = 15), 
        axis.title.x = element_text(size = 15),
        axis.text.y = element_blank()
  ) +
  scale_y_continuous(limits = c(-5, 1), 
                     breaks = c(-5, -4, -3, -2, -1, 0, 1) )

# ---- Extract Viz & Summary Stats

bt_forest_experts

euc_forest_experts
(a) Blue tit dataset analyses
(b) Eucalyptus dataset analyses
Figure B.4: Estimated meta-analytic mean effect size (\(Z_r\)), standard error, and 95% confidence intervals, from analyses of the primary data set with at least one member self-rated as “highly proficient” or “expert” in conducting statistical analyses in their research area.

B.1.2 Out of sample predictions \(y_i\)

B.1.2.1 Non-truncated \(y_{i}\) meta-analysis forest plot

Below is the non-truncated version of Figure 3.3 showing a forest plot of the out-of-sample predictions, \(y_{i}\), on the response-scale (stems counts), for Eucalyptus analyses, showing the full error bars of all model estimates.

Code
plot_forest_2 <- function(data, intercept = TRUE, MA_mean = TRUE, y_zoom = numeric(2L)){
  if(MA_mean == FALSE){
    data <- filter(data, study_id != "overall")
  }
  
  plot_data <- data %>% 
    group_by(study_id) %>% 
    group_nest() %>% 
    hoist(data, "estimate",.remove = FALSE) %>% 
    hoist(estimate, y50 = 2) %>% 
    select(-estimate) %>% 
    unnest(data) %>% 
    arrange(desc(type)) %>% 
    mutate(type = forcats::as_factor(type)) %>% 
    group_by(type) %>% 
    arrange(desc(y50),.by_group = TRUE) %>% 
    mutate(study_id = forcats::as_factor(study_id),
           point_shape = case_when(str_detect(type, "summary") ~ "diamond",
                                   TRUE ~ "circle"))
  
  p <- ggplot(plot_data, aes(y = estimate, 
                        x =  study_id,
                        ymin = conf.low, 
                        ymax = conf.high,
                        # shape = type,
                        shape = point_shape,
                        colour = estimate_type
                        )) +
    geom_pointrange(position = position_dodge(width = 0.5)) +
    ggforestplot::theme_forest() +
    theme(axis.line = element_line(linewidth = 0.10, colour = "black"),
          axis.line.y = element_blank(),
          text = element_text(family = "Helvetica")) +
    guides(shape = "none", colour = "none") +
    coord_flip(ylim = y_zoom) +
    labs(y = "Model estimated out of sample predictions, stem counts",
         x = element_blank()) +
    scale_y_continuous(breaks = scales::breaks_extended(10)) +
    NatParksPalettes::scale_color_natparks_d("Glacier") 
  
  if(intercept == TRUE){
    p <- p + geom_hline(yintercept = 0)
  }
  if(MA_mean == TRUE){
    p <- p +
      geom_hline(aes(yintercept = plot_data %>%
                       filter(type == "summary", estimate_type == "y25") %>%
                       pluck("estimate")),
                 data = data,
                 colour = "#01353D",
                 linetype = "dashed") +
      geom_hline(aes(yintercept = plot_data %>%
                       filter(type == "summary", estimate_type == "y50") %>%
                       pluck("estimate")),
                 data = data,
                 colour = "#088096",
                 linetype = "dashed") +
      geom_hline(aes(yintercept = plot_data %>%
                       filter(type == "summary", estimate_type == "y75") %>%
                       pluck("estimate")),
                 data = data,
                 colour = "#58B3C7" ,
                 linetype = "dashed")
  }
  
  print(p)
}

# TODO put into R/ and build into package to call!
fit_MA_mv <- function(effects_analysis, Z_colname, VZ_colname, estimate_type){
  Zr <- effects_analysis %>%  pull({{Z_colname}})
  VZr <- effects_analysis %>%  pull({{VZ_colname}})
  mod <- ManyEcoEvo::fit_metafor_mv(estimate = Zr, 
                        variance = VZr, 
                        estimate_type = estimate_type, 
                        data = effects_analysis)
  return(mod)
}

back_transformed_predictions <- 
  ManyEcoEvo_yi %>% 
  dplyr::mutate(data = 
                  purrr::map(data, 
                             ~ dplyr::filter(.x,
                                             stringr::str_detect(response_variable_type, "constructed",                                                                       negate = TRUE)))) %>% 
  prepare_response_variables_yi(estimate_type = "yi",
                                param_table = ManyEcoEvo:::analysis_data_param_tables) %>% 
  generate_yi_subsets()


raw_mod_data_logged <- 
  back_transformed_predictions %>% 
  filter(dataset == "eucalyptus") %>%
  group_by(estimate_type) %>% 
  select(estimate_type, data) %>% 
  unnest(data) %>% 
  rename(study_id = id_col) %>% 
  hoist(params, param_mean = list("value", 1), param_sd = list("value", 2)) %>% 
  rowwise() %>% 
  mutate(exclusion_threshold = param_mean + 3*param_sd) %>% 
  filter(fit < exclusion_threshold) %>% 
  mutate(log_vals = map2(fit, se.fit, log_transform, 1000)) %>% 
  unnest(log_vals) %>%
  select(study_id, 
         TeamIdentifier,
         estimate_type, 
         starts_with("response_"), 
         -response_id_S2, 
         ends_with("_log")) %>% 
  group_by(estimate_type) %>% 
  nest()
  

mod_data_logged <- raw_mod_data_logged %>% 
  mutate(MA_mod = 
           map(data, 
               ~fit_MA_mv(.x, mean_log, std.error_log, "yi")))


plot_data_logged <- mod_data_logged %>% 
  mutate(tidy_mod = map(.x = MA_mod,
                        ~broom::tidy(.x,
                                     conf.int = TRUE, 
                                     include_studies = TRUE) %>% 
                          rename(study_id = term))) %>% 
  select(tidy_mod) %>% 
  unnest(cols = c(tidy_mod)) 

plot_data_logged %>% 
  mutate(response_scale = map2(estimate, std.error, log_back, 1000)) %>% 
  select(estimate_type, study_id, type, response_scale) %>% 
  unnest(response_scale) %>% 
  rename(estimate = mean_origin, conf.low = lower, conf.high = upper) %>% 
#  filter(estimate <1000) %>% 
  plot_forest_2(MA_mean = T,y_zoom = c(0,140))
Figure B.5: Forest plot of meta-analytic estimated out of sample predictions, \(y_{i}\), on the response-scale (stems counts), for Eucalyptus analyses. Circles represent individual analysis estimates. Triangles represent the meta-analytic mean for each prediction scenario. Navy blue coloured points correspond to \(y_{25}\) scenario, blue coloured points correspond to the \(y_{50}\) scenario, while light blue points correspond to the \(y_{75}\) scenario. Error bars are 95% confidence intervals. Outliers (observations more than 3SD above the mean) have been removed prior to model fitting.