Adaptive Preregistration for Modelling

A template suitable for use in Ecology and Conservation with accompanying rationale and explanation.

Note: Black boxes provide an explanation or justification of a particular section or preregistration item. Text contained in ‘preregistration item’ boxes requires a response.
Authors

Elliot Gould

Hannah Fraser

Libby Rumpff

Jian Yen

Megan Good

Chris Jones

1 Problem Formulation

Rationale & Explanation

This section specifies the decision-making context in which the model will be used or the intended scope and context of conclusions. Important components include the decision maker and stakeholders (including experts) and their view on: i) the nature of the problem or decision addressed and how the scope of the modelling tool fits within the (broader) context (i.e. model purpose; ii) the spatial and temporal scales relevant to the decision context; iii) specified desired outputs; iv) role and inclusion in model development and testing; v) whether they foresee unacceptable outcomes that need to be represented in the model (i.e. as constraints), and; vi) what future scenarios does the model need to account for (noting this may be revised later). It should also provide a summary of the domain of applicability of the model, and reasonable extrapolation limits (Grimm et al. 2014).

1.1 Model Context and Purpose

Rationale & Explanation

Defining the purpose of the model is critical because the model purpose influences choices at later stages of model development (Jakeman, Letcher, and Norton 2006). Common model purposes in ecology include: gaining a better qualitative understanding of the target system, synthesising and reviewing knowledge, and providing guidance for management and decision-making (Jakeman, Letcher, and Norton 2006). Note that modelling objectives are distinct from the analytical objectives of the model.

The scope of the model includes temporal and spatial resolutions, which should also be defined here (Mahmoud et al. 2009). Any external limitations on model development, analysis and flexibility should also be outlined in this section (Jakeman, Letcher, and Norton 2006).

1.1.1 Key stakeholders and model users

Preregistration Item

Identify relevant interest groups:

1.1.2 Model purpose, context and problem context

Preregistration Item

Briefly outline:

1.1.3 Analytical objectives

Explanation

How will the model be analysed, what analytical questions will the model be used to answer? For example, you might be using your model in a scenario analysis to determine which management decision is associated with minimum regret or the highest likelihood of improvement. Other examples from ecological decision-making include: to compare the performance of alternative management actions under budget constraint (Fraser et al. 2017) to search for robust decisions under uncertainty (McDonald-Madden, Baxter, and Possingham 2008), to choose the conservation policy that minimises uncertainty [insert ref]. See other examples in (Moallemi, Elsawah, and Ryan 2019).

Preregistration Item

Provide detail on the analytical purpose and scope of the model:

1.1.4 Logistical Constraints

Preregistration Item

1.1.5 Model Scope, Scale and Resolution

Preregistration Item

1.1.6 Intended application of results

Explanation

Preregistration Items in this section are relevant to model transferability (Yates et al. 2018) and constraints on generality in model analysis interpretation. How far do can the results be extrapolated based on the study design (data + model + analysis)? For instance, if there are many confounding variables and not enough spatial / environmental replication, then making broader more general claims beyond the stated boundaries of the model ((analyticalobjectives?)) may not be warranted. However, larger generalisations about results may be acceptable if the data comes from experimentally manipulated or controlled systems.

Preregistration Item

1.2 Scenario Analysis Operationalisation

Preregistration Item (delete as necessary)

2 Define Conceptual Model

Explanation

Conceptual models underpin the formal or quantitative model (Cartwright et al. 2016). The conceptual model describes the biological mechanisms relevant to the ecological problem and should capture basic premises about how the target system works, including any prior knowledge and assumptions about system processes. Conceptual models may be represented in a variety of formats, such as influence diagrams, linguistic model block diagram or bond graphs, and these illustrate how model drivers are linked to both outputs or observed responses, and internal (state) variables (Jakeman, Letcher, and Norton 2006).

2.1 Choose elicitation and representation method

Preregistration Item

2.2 Explain Critical Conceptual Design Decisions

Preregistration Item

List and explain critical conceptual design decisions (Grimm et al. 2014), including:

2.3 Model assumptions and uncertainties

Preregistration Item

Specify key assumptions and uncertainties underlying the model design, describing how uncertainty and variation will be represented in the model (Moallemi, Elsawah, and Ryan 2019). Sources of uncertainty may include:

2.4 Identify predictor and response variables

Explanation

The identification and definition of primary model input variables should be driven by scenario definitions, and by the scope of the model described in the problem formulation phase (Mahmoud et al. 2009).

Preregistration Item

Identify and define system system variables and structures, referencing scenario definitions, and the scope of the model as described within problem formulation:

2.5 Define prior knowledge, data specification and evaluation

Explanation

This section specifies the plan for collecting, processing and preparing data available for parameterisation, determining model structure, and for scenario analysis. It also allows the researchers to disclose any prior interaction with the data.

2.5.1 Collate available data sources that could be used to parameterise or structure the model

Preregistration Item

For pre-existing data (delete as appropriate):

Sampling Plan (for data you will collect, delete as appropriate):

2.5.2 Data Processing and Preparation

Preregistration Item

2.5.3 Describe any data exploration or preliminary data analyses.

Explanation

In most modelling cases, it is necessary to perform preliminary analyses to understand the data and check that assumptions and requirements of the chosen modelling procedures are met. Data exploration prior to model fitting or development may include exploratory analyses to check for collinearity, spatial and temporal coverage, quality and resolution, outliers, or the need for transformations (Yates et al. 2018).

Preregistration Item

For each separate preliminary or investigatory analysis:

2.5.4 Data evaluation, exclusion and missing data

Explanation

Documenting issues with reliability is important because data quality and ecological relevance might be constrained by measurement error, inappropriate experimental design, and heterogeneity and variability inherent in ecological systems (Grimm et al. 2014). Ideally, model input data should be internally consistent across temporal and spatial scales and resolutions, and appropriate to the problem at hand (Mahmoud et al. 2009).

Preregistration Item

2.6 Conceptual model evaluation

Preregistration Item

3 Formalise and Specify Model

Explanation

In this section describe what quantitative methods you will use to build the model/s, explain how they are relevant to the client/manager/user’s purpose.

3.1 Model class, modelling framework and approach

Explanation

Modelling approaches can be described as occurring on a spectrum from correlative or phenomenological to mechanistic or process-based (Yates et al. 2018); where correlative models use mathematical functions fitted to data to describe underlying processes, and mechanistic models explicitly represent processes and details of component parts of a biological system that are expected to give rise to the data (White and Marshall 2019). A model ‘class,’ ‘family’’ or ‘type’ is often used to describe a set of models each of which has a distinct but related sampling distribution (C. C. Liu and Aitkin 2008). The model family is driven by choices about the types of variables covered and the nature of their treatment, as well as structural features of the model, such as link functions, spatial and temporal scales of processes and their interactions (Jakeman, Letcher, and Norton 2006).

Preregistration Item

3.2 Choose model features and family

Explanation

All modelling approaches require the selection of model features, which conform with the conceptual model and data specified in previous steps (Jakeman, Letcher, and Norton 2006). The choice of model are determined in conjunction with features are selected. Model features include elements such as the functional form of interactions, data structures, measures used to specify links, any bins or discretisation of continuous variables. It is usually difficult to change fundamental features of a model beyond an early stage of model development, so careful thought and planning here is useful to the modeller (Jakeman, Letcher, and Norton 2006). However, if changes to these fundamental aspects of the model do need to change, document how and why these choices were made, including any results used to support any changes in the model.

3.2.1 Operationalising Model Variables

Preregistration Item

3.2.2 Choose model family

Preregistration Item

3.3 Describe approach for identifying model structure

Explanation

This section relates to the process of determining the best/most efficient/parsimonious representation of the system at the appropriate scale of concern (Jakeman, Letcher, and Norton 2006) that best meets the analytical objectives specified in the problem formulation phase. Model structure refers to the choice of variables included in the model, and the nature of the relationship among those variables. Approaches to finding model structure and parameters may be knowledge-supported, or data-driven (Boets et al. 2015). Model selection methods can include traditional inferential approaches such as unconstrained searches of a dataset for patterns that explain variations in the response variable, or use of ensemble-modelling methods (Barnard et al. 2019). Ensemble modelling procedures might aim to derive a single model, or a multi-model average (Yates et al. 2018). Refining actions to develop a model could include iteratively dropping parameters or adding them, or aggregating / disaggregating system descriptors, such as dimensionality and processes (Jakeman, Letcher, and Norton 2006).

Preregistration Item
    • the functional form of interactions (if any)
    • data structures,
    • measures used to specify links,
    • any bins or discretisation of continuous variables (Jakeman, Letcher, and Norton 2006),
    • any other relevant features of the model structure.

3.4 Describe parameter estimation technique and performance criteria

Explanation

Before calibrating the model to the data, the performance criteria for judging the calibration (or model fit) are specified. These criteria and their underlying assumptions should reflect the desired properties of the parameter estimates / structure (Jakeman, Letcher, and Norton 2006). For example, modellers might seek parameter estimates that are robust to outliers, unbiased, and yield appropriate predictive performance. Modellers will need to consider whether the assumptions of the estimation technique yielding those desired properties are suited to the problem at hand. For integrated or sub-divided models, other considerations might include choices about where to disaggregate the model for parameter estimation; e.g. spatial sectioning (streams into reaches) and temporal sectioning (piece-wise linear models) (Jakeman, Letcher, and Norton 2006).

3.4.1 Parameter estimation technique

Preregistration Item

3.4.2 Parameter estimation / model fit performance criteria

Preregistration Item

3.5 Model assumptions and uncertainties

Preregistration Item

3.6 Specify formal model(s)

Explanation

Once critical decisions have been made about the modelling approach and method of model specification, the conceptual model is translated into the quantitative model.

Preregistration Item

4 Model Calibration, Validation & Checking

4.1 Model calibration and validation scheme

Explanation

This section pertains to any data calibration, validation or testing schemes that will be implemented. For example, the model may be tested on data independent of those used to parameterise the model (external validation), or the model may be cross-validated on random sub-samples of the data used to parameterise the model (internal cross-validation) (Yates et al. 2018; Barnard et al. 2019). For some types of models, hyper-parameters are estimated from data, and may be tuned on further independent holdouts of the training data, (“validation data”).

Preregistration Item

4.1.1 Describe calibration/validation data

Explanation & Rationale

The following items pertain to properties of the datasets used for calibration (training), validation, and testing.

Preregistration Item

If partitioning data for cross-validation or similar approach (delete as needed):

If using external / independent holdout data for model testing and evaluation (delete as needed):

4.2 Implementation verification

Explanation & Examples

Model implementation verification is the process of ensuring that the model has been correctly implemented, and that the model performs as described by the model description (Grimm et al. 2014). This process is distinct from model checking, which assesses the model’s performance in representing the system of interest (Conn et al. 2018).

  • Checks for verification implementation should include i) thoroughly checking for bugs or programming errors, and ii) whether the implemented model performs as described by the model description (Grimm et al. 2014).
  • Qualitative tests could include syntax checking of code, and peer-code review (Ivimey et al. 2023). Technical measures include using unit tests, or in-built checks within functions to prevent potential errors.
Preregistration Item

4.3 Model checking

Rationale & Explanation

“Model Checking” goes by many names (“conditional verification”, “quantitative verification”, “model output verification” ), and refers to a series of analyses that assess a model’s performance in representing the system of interest (Conn et al. 2018). Model checking aids in diagnosing assumption violations, and reveals where a model might need to be altered to better represent the data, and therefore system (Conn et al. 2018). Quantitative model checking diagnostics include goodness of fit, tests on residuals or errors, such as for heteroscedascity, cross-correlation, and autocorrelation (Jakeman, Letcher, and Norton 2006).

4.3.1 Quantitative model checking

Preregistration Item

During this process, observed data, or data and patterns that guided model design and calibration, are compared to model output in order to identify if and where there are any systematic differences.

4.3.2 Qualitative model checking

Explanation

This step is largely informal and case-specific, but requires‚ ‘face validation’ with model users / clients / managers who aren’t involved in the development of the model to assess whether the interactions and outcomes of the model are feasible an defensible (Grimm et al. 2014). This process is sometimes called a ‚“laugh test” or a “pub test” and in addition to checking the model’s believability, it builds the client’s confidence in the model (Jakeman, Letcher, and Norton 2006). Face validation could include structured walk-throughs, or presenting descriptions, visualisations or summaries of model results to experts for assessment.

Preregistration Item

4.3.3 Assumption Violation Checks

Preregistration Item

The consequences of assumption violations on the interpretation of results should be assessed (Araújo et al. 2019).

5 Model Validation and Evaluation

Explanation

The model validation & evaluation phase comprises a suite of analyses that collectively inform inferences about whether, and under what conditions, a model is suitable to meet its intended purpose (Augusiak, Van den Brink, and Grimm 2014). Errors in design and implementation of the model and their implication on the model output are assessed. Ideally independent data is used against the model outputs to assess whether the model output behaviour exhibits the required accuracy for the model’s intended purpose. The outcomes of these analyses build confidence in the model applications and increase understanding of model strengths and limitations. Model evaluation including, model analysis, should complement model checking. It should evaluate model checking, and consider over-fitting and extrapolation. The higher the proportion of calibrated, or uncertain parameters, “the greater the risk that the model seems to work correctly, but for the wrong reasons” (citaiton). Evaluation thus complements model checking because we can rule out the chance that the model fits the calibration data well, but has not captured the relevant ecological mechanisms of the system pertinent to the research question or the decision problem underpinning the model (Grimm et al. 2014). Evaluation of model outputs against external data in conjunction with the results from model checking provide information about the structural realism and therefore credibility of the model (Grimm and Berger 2016).

5.1 Model output corroboration

Explanation

Ideally, model outputs or predictions are compared to independent data and patterns that were not used to develop, parameterise, or verify the model. Testing against a dataset of response and predictor variables that are spatially and/or temporally independent from the training dataset minimises the risk of artificially inflating model performance measures (Araújo et al. 2019). Although the corroboration of model outputs against an independent validation dataset is considered the ‘gold standard’ for showing that a model properly represents the internal organisation of the system, model validation is not always possible because empirical experiments are infeasible or model users are working on rapid-response time-frames, hence, why ecologists often model in the first place (Grimm et al. 2014). Independent predictions might instead be tested on sub-models. Alternatively, patterns in model output that are robust and seem characteristic of the system can be identified and evaluated in consultation with the literature or by experts to judge how accurate the model output is (Grimm et al. 2014).

Preregistration Item

5.2 Choose performance metrics and criteria

Explanation

Model performance can be quantified by a range of tests, including measures of agreement between predictions and independent observations, or estimates of accuracy, bias, calibration, discrimination refinement, resolution and skill (Araújo et al. 2019). Note that the performance metrics and criteria in this section are used for evaluating the structured and parameterised models (ideally) on independent holdout data, so this step is additional to any performance criteria used for determining model structure or parameterisation in section X.

Preregistration Item

5.3 Model analysis

Rationale & Explanation

Uncertainty in models arises due to incomplete system understanding (which processes to include, or which interact), from imprecise, finite and sparese data measurements, and from uncertainty in input conditions and scenarios for model simulations or runs (Jakeman, Letcher, and Norton 2006). Non-technical uncertainties can also be introduced throughout the modellign process, such as uncertainties arising from issues in problem-framing, indeterminicies, and modeller / client values (Jakeman, Letcher, and Norton 2006).

The purpose of model analysis is to prevent blind trust in the model by understanding how model outputs have emerged, and to ‘challenge’ the model by verifying whether the model is still believable and fit for purpose if one or more parameters are changed (Grimm et al. 2014).

Model analysis should increase understanding of the model behaviour by identifying which processes and process interactions explain characteristic behaviours of the model system. Model analysis typically consists of sensitivity analyses preceded by uncertainty analyses (Saltelli et al. 2019), and a suite of other simulation or other computational experiments. The aim of such computational experiments is to increase understanding of the model behaviour by identifying which processes and process interactions explain characteristic behaviours of the model system (Grimm et al. 2014). Uncertainty analyses and sensitivity analyses augment one another to draw conclusions about model uncertainty.

Because the results from a full suite of sensitivity analysis and uncertainty analysis can be difficult to interpret due to the number and complexity of causal relations examined (Jakeman, Letcher, and Norton 2006), it is useful for the analyst to relate the choice of analysis to the modelling context, purpose and analytical objectives defined in the problem formulation phase, in tandem with any critical uncertainties that have emerged during model development and testing prior to this point.

5.3.1 Uncertainty Analyses

Explanation

Uncertainty can arise from different modelling techniques, response data and predictor variables (Araújo et al. 2019). Uncertainty analyses characterise the uncertainty in model outputs, and identify how uncertainty in model parameters affects uncertainty in model output, but does not identify which model assumptions are driving this behaviour (Grimm et al. 2014; Saltelli et al. 2019). Uncertainty analyses can include propagating known uncertainties through the model, or by investigating the effect of different model scenarios with different parameters and modelling technique combinations (Araújo et al. 2019), for example. It could also include characterising the output distribution, such as through empirical construction using model output data points. It could also include extracting summary statistics like the mean, median and variance from this distribution, and perhaps constructing confidence intervals on the mean (Saltelli et al. 2019).

Preregistration Item

5.3.2 Sensitivity analyses

Explanation

Sensitivity analysis examines how uncertainty in model outputs can be apportioned to different sources of uncertainty in model input (Saltelli et al. 2019).

Preregistration Item

5.3.3 Model application or scenario analysis

Preregistration Item

5.3.4 Other simulation experiments / robustness analyses

Preregistration Item
Back to top

References

Araújo, MB, RP Anderson, A Márcia Barbosa, CM Beale, CF Dormann, R Early, RA Garcia, et al. 2019. “Standards for Distribution Models in Biodiversity Assessments.” Sci Adv 5 (1): eaat4858.
Augusiak, Jacqueline, Paul J Van den Brink, and Volker Grimm. 2014. “Merging Validation and Evaluation of Ecological Models to ‘Evaludation’: A Review of Terminology and a Practical Approach.” Ecological Modelling 280: 117–28.
Barnard, David M., Matthew J. Germino, David S. Pilliod, Robert S. Arkle, Cara Applestein, Bill E. Davidson, and Matthew R. Fisk. 2019. “Can’t See the Random Forest for the Decision Trees: Selecting Predictive Models for Restoration Ecology.” Restoration Ecology.
Boets, Pieter, Dries Landuyt, Gert Everaert, Steven Broekx, and Peter L M Goethals. 2015. “Evaluation and Comparison of Data-Driven and Knowledge-Supported Bayesian Belief Networks to Assess the Habitat Suitability for Alien Macroinvertebrates” 74: 92–103.
Cartwright, Samantha J, Katharine M Bowgen, Catherine Collop, Kieran Hyder, Jacob Nabe-Nielsen, Richard Stafford, Richard A Stillman, Robert B Thorpe, and Richard M Sibly. 2016. “Communicating Complex Ecological Models to Non-Scientist End Users.” Ecological Modelling 338: 51–59.
Conn, Paul B, Devin S Johnson, Perry J Williams, Sharon R Melin, and Mevin B Hooten. 2018. “A Guide to Bayesian Model Checking for Ecologists.” Ecological Monographs 9: 341–17.
Dwork, Cynthia, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. 2015. “The reusable holdout: Preserving validity in adaptive data analysis.” Science 349 (6248): 636–38. https://doi.org/10.1126/science.aaa9375.
Fraser, Hannah, Libby Rumpff, Jian D L Yen, Doug Robinson, and Brendan A Wintle. 2017. “Integrated Models to Support Multiobjective Ecological Restoration Decisions.” Conservation Biology 31 (6): 1418–27.
Grimm, Volker, Jacqueline Augusiak, Andreas Focks, Béatrice M Frank, Faten Gabsi, Alice S A Johnston, Chun Liu, et al. 2014. “Towards Better Modelling and Decision Support: Documenting Model Development, Testing, and Analysis Using TRACE.” Ecological Modelling 280: 129–39.
Grimm, Volker, and Uta Berger. 2016. “Structural Realism, Emergence, and Predictions in Next-Generation Ecological Modelling: Synthesis from a Special Issue.” Ecological Modelling 326: 177–87. https://doi.org/10.1016/j.ecolmodel.2016.01.001.
Ivimey, Edward R, Joel L Pick, Kevin R Bairos, Antica Culina, Elliot Gould, Matthew Grainger, Benjamin M Marshall, et al. 2023. “Implementing Code Review in the Scientific Workflow: Insights from Ecology and Evolutionary Biology.” Journal of Evolutionary Biology. https://doi.org/https://doi.org/10.1111/jeb.14230.
Jakeman, A J, R A Letcher, and J P Norton. 2006. “Ten Iterative Steps in Development and Evaluation of Environmental Models.” Environmental Modelling & Software 21 (5): 602–14.
Liu, Charles C., and Murray Aitkin. 2008. “Bayes Factors: Prior Sensitivity and Model Generalizability.” Journal of Mathematical Psychology 52 (6): 362–75. https://doi.org/10.1016/j.jmp.2008.03.002.
Liu, Zelin, Changhui Peng, Timothy Work, Jean-Noel Candau, Annie DesRochers, and Daniel Kneeshaw. 2018. “Application of Machine-Learning Methods in Forest Ecology: Recent Progress and Future Challenges.” Environmental Reviews 26 (4): 339–50.
Mahmoud, Mohammed, Yuqiong Liu, Holly Hartmann, Steven Stewart, Thorsten Wagener, Darius Semmens, Robert Stewart, et al. 2009. “A Formal Framework for Scenario Development in Support of Environmental Decision-Making.” Environmental Modelling & Software 24 (7): 798–808.
McDonald-Madden, Eve, Peter W. J. Baxter, and Hugh P. Possingham. 2008. “Making Robust Decisions for Conservation with Restricted Money and Knowledge.” Journal of Applied Ecology 45 (6): 1630–38.
Moallemi, Enayat A., Sondoss Elsawah, and Michael J. Ryan. 2019. “Strengthening ‘Good’ Modelling Practices in Robust Decision Support: A Reporting Guideline for Combining Multiple Model-Based Methods.” Mathematics and Computers in Simulation.
Moon, Katie, Angela M. Guerrero, Vanessa. M. Adams, Duan Biggs, Deborah A. Blackman, Luke Craven, Helen Dickinson, and Helen Ross. 2019. “Mental Models for Conservation Research and Practice.” Conservation Letters 12 (3): e12642.
Saltelli, Andrea, Ksenia Aleksankina, William Becker, Pamela Fennell, Federico Ferretti, Niels Holst, Sushan Li, and Qiongli Wu. 2019. “Why so Many Published Sensitivity Analyses Are False: A Systematic Review of Sensitivity Analysis Practices.” Environmental Modelling & Software 114: 29–39.
White, Craig R, and Dustin J Marshall. 2019. “Should We Care If Models Are Phenomenological or Mechanistic.” Trends in Ecology & Evolution 34 (4): 276–78.
Yates, KL, PJ Bouchet, MJ Caley, K Mengersen, CF Randin, S Parnell, AH Fielding, et al. 2018. “Outstanding Challenges in the Transferability of Ecological Models.” Trends Ecol. Evol. (Amst.) 33 (10): 790–802.