| tags: [ reproducibility reproducible research software computational reproducibility ] categories: [talk ]

John Blischak reproducibility and workflowr

Institutional road blocks against full reproducibility - methods sections are insufficient. But good software practice is helpful for future you, your labmates when you leave.

Lowndes et al (2017) Peng et al (2011)

Strategies for computational repro

  1. Record computing environment
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14.1
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    bookdown_0.7   
##  [5] rprojroot_1.3-2 tools_3.5.1     htmltools_0.3.6 yaml_2.2.0     
##  [9] Rcpp_1.0.0      stringi_1.2.4   rmarkdown_1.10  blogdown_0.9   
## [13] knitr_1.20      stringr_1.3.1   digest_0.6.18   xfun_0.4       
## [17] evaluate_0.12
  1. Set the seed! want to ensure exact / direct computational reproducibility before someone can extend your analyses. Ensure point estimates / outputs are the same no matter the computing environment.

  2. Organizing into subdirectories

code, data, figures, into distinct subdirectories

using relative paths - whatabout getting around Rmarkdown paths vs. script paths.

  1. Run code in clean environment

But won’t remove attached packages:

rm(list = ls())

So better to restart your R session in RStudio.

Best practice: produce final results from command line:

Rscript -e 'rmarkdown::render("fit_model.R")'