| tags: [ QRP ] categories: [meeting ]

2018-09-08 meeting QRP survey design

Themis, Ethics

  • [x] Location / public / private
  • [x] Named Researcher credentials and experience
RIOT accreditation counts for the training question. Just give one sentence 
spiel for the other questions.

Survey Design

Estimating Prevalence and Risk of QRPs

\(Risk = Likelihood \times Consequence\)


% papers or researchers engaging in the practice.

If researchers:

  • can compare the number of estimated vs. self-reported? Some interesting questions about whether self reporters estimate the prevalence to be greater than non-self reporters, is the self reported rate lower than the estimated rate?
  • In the discussion section, might be able to interpret in the context of other QRP papers, eg Fraser et al and most other QRP survey papers. Directly comparable measures (minus the sample of course).

If papers, might be able to measure directly some of these, e.g. during the systematic review. This would only work for some particular QRPs, namely the “failure to report” practices.

I think this gives a better or more accurate estimate of “risk”, though. Since people are authoring many papers etc.. Estimating the likelihood of the QRP across studies gives a better sense of how distorted the literature is.. type I error rate across whole field. And ultimately, our main concern is about the overall bias present in the evidence-base / literature.

HF thinks that a comparison of impersonal estimates vs. self-reporting of QRP 
engagement is not really important. Because it really only tells you how good 
people are at estimating overall prevalence, and maybe provides some information 
about how forthcoming people are... So asking people to estimate the number of 
papers / studies in which people engage in a QRP provides complementary information.

Framing the question

When asking people to both self-report and also to estimate the prevalence of these 
practices, it's important to make sure that we're not eliciting how *common the 
method is* but rather, of the times that people are using this method, how 
prevalent is the method?

Frequency of engagement

HF suggests not to bother. She used this measure in her paper (Never, occasionally, 
often, etc.), but apart from presenting the frequency of these, doesn't think it 
reveals a great deal of useful information. Also, it might be difficult to disentangle
the frequency of engaging the questionable practice from the frequency you use 
the particular method / practice.


How to estimate or elicit judgments about this?

  1. Use the proportion of those who say it is “definitely” unacceptable, (and/or) “maybe” unacceptable, as a proxy for the severity of the practice (e.g. the voting exercise we discussed last time).

Do you consider the practice defensible? (Yes, No, Maybe or Depends, Unsure?).

  • But this doesn’t really get at the degree to which the practice is likely to cause a type I error. Some practices might be more severe than others in how they skew the model to a positive signal/finding.
  1. Elicit judgment about how likely it is that the practice will result in a false positive.

What is the chance (0 - 100%) that the practice will artificially skew the model/analysis to a positive signal / finding?

LR thinks the second option is the best, namely because people make different 
value judgments, and might draw the line differently. So, we will need to unpack
this a bit further. One method to do so is to provide a visual / scenario. 
We discussed the working definition of QRPs for non-NHST research, such as 
providing model outputs. We can formulate the question in one of two ways:

1. Provide some model output and position it as if there was 100% best practice 
in terms of reporting and other practices marked as questionable. Then we ask 
participants "How much would the uncertainty around this model output be reduced
if you engaged in this practice?" Instead of just asking for a single estimate, 
we can ask them to provide the *worst* it could be, the *best* it could be, and 
on average, what it would be.

2. The second option is to ask them, :"if you *didn't* engage in this QRP, how 
much would the uncertainty around the model outputs increase?" Again, we elicit 
the worst, best, and on average estimates.

We can use a slider to capture their estimated distributions. HF says the slider 
in Qualtrics is crap. We could use a shiny app instead of qualtrics, and then we 
might have better control over the shape of the output dataset too (qualtrics 
outputs are terrible, most definitely not tidy).

After getting them to estimate the consequences of the practice, next we ask 
them to make value judgments about the practice... Is it defensible? "Yes", "No", 
"possibly or depends". This will provide some interesting analysis because people 
might draw the line differently!!! You would assume that people who don't think 
the practice is questionable will say that it *won't* skew the model's output in 
any way... Another benefit of asking this question this way, is that we ensure 
people are really centreing on our working definition of the QRP when trying to 
elicit the consequences.

Another very important point that LR raised, was about tieing in the impact of 
the uncertainty on the management decision under consideration... in my working 
definition of the QRP this is a realy important facet. So we need to frame in 
the scenario description that the uncertainty around the model outputs as a result 
of the research practice is that it pushes the DM or the manager to either a) a 
decision at all, or b) a particular decision.

But the tolerance to uncertainty will affect the decision... So do we have a 
decision rule provided in the scenario about what the decision is based on the 
model outputs? OR do we leave this up to the experts? There is a sliding scale of 
tolerance to uncertainty, e.g. when knowing which decision you're in... if massive 
uncertainty, some people wouldn't want to make a decision...

but it definitely affects the decision... two options. leave in the question.. 
two, put onto participants to make this judgment. 

BUT. Most people might be reluctant to think about this because they will say 
that it's up to the managers, so could be worth providing the decision rules, 
and stating that the managers have said the decision rules are XYZ.

IMO and as an aside it's a QRP if in as an analyst or researcher you fail to have 
a systematic method for choosing between alternatives based ont the system model 
outputs (or perhaps just bad practice, "decision blindeness", and B) definitely 
questionable if you fail to report your systematic method for choosing between 
alternatives. SO, it's perhaps just worth having this in the description.

How to frame as a question... e.g. "would it be advisable to thin or not".. what 
would your decision be if you did this particular QRP.

But coming back to this definition of QRPs… it’s not just about uncertianty… it might also be about the direction or the estimate…

Comparing Expert and non-expert estimates

Instead of funnelling people to asnwer only the methods on which they are an 
expert on, we can get participants to answer for fields on which they aren't 
experts. For example, we can ask that non-Bayesians answer the Bayesian questions. 
By doing this we can try to get a sense of the bias that experts have towards 
tolerance to questionable practices within their own disciplines.

In terms of avoiding instances where people who have no idea what the research 
practice actually is but are answering for that question, we can have a bail 
out / skip option, where if people don't know what the method is that we are 
asking about, they can skip the question.

LR says this is a common sort of question done in expert elicitation / judgment 

What methods do you have the most experience in? Which of these fields do you have experience in?

question order and truth-telling

Plain language statement: using the term “questionable”, as in “questionable reserach practices”? Does that prevent participants from being forth-coming in their self reporting of engaging in these practices?

Question order: Ask participants to estimate the prevalence of these practices first? “If everyone else is doing it, then it must be ok”.

HF and LR agree that framing the study is really important for ensuring that 
people truthfully self-report. HF used the term 'traditional' research practies 
in her PLS for the consent forms. The location of where we elicit people's value 
judgments on the defensibility of particular practices is therefore also 

In the design of the survey, will also need to provide an example or description of the research practice, perhaps in a pop-up text box to reduce textual information overload to participants. Some qrp’s might need this, and others probably won’t.