You are here: Home / Publications / Publications Database / Random responses inflate statistical estimates in heavily skewed addictions data

Random responses inflate statistical estimates in heavily skewed addictions data

King, Kevin M.; Kim, Dale S.; & McCabe, Connor J. (2018). Random responses inflate statistical estimates in heavily skewed addictions data. Drug and Alcohol Dependence, 183, 102-110. PMCID: PMC5803341 NIHMSID: NIHMS928743

King, Kevin M.; Kim, Dale S.; & McCabe, Connor J. (2018). Random responses inflate statistical estimates in heavily skewed addictions data. Drug and Alcohol Dependence, 183, 102-110. PMCID: PMC5803341 NIHMSID: NIHMS928743

Octet Stream icon 7225.ris — Octet Stream, 2 kB (2207 bytes)

Background: Some respondents may respond at random to self-report surveys, rather than responding conscientiously (Meade and Craig, 2012), and this has only recently come to the attention of researchers in the addictions field (Godinho et al., 2016). Almost no research in the published addictions literature has reported screening for random responses. We illustrate how random responses can bias statistical estimates using simulated and real data, and how this is especially problematic in skewed data, as is common with substance use outcomes. Method: We first tested the effects of varying amounts and types of random responses on covariance-based statistical estimates in distributions with varying amounts of skew. We replicated these findings in correlations from a real dataset (Add Health) by replacing varying amounts of real data with simulated random responses. Results: Skew and the proportion of random responses influenced the amount and direction of bias. When the data were not skewed, uniformly random responses deflated estimates, while long-string random responses inflated estimates. As the distributions became more skewed, all types of random responses began to inflate estimates, even at very small proportions. We observed similar effects in the Add Health data. Conclusions: Failing to screen for random responses in survey data produces biased statistical estimates, and data with only 2.5% random responses can inflate covariance-based estimates (i.e., correlations, Cronbach’s alpha, regression coefficients, factor loadings, etc.) when data are heavily skewed. Screening for random responses can substantially improve data quality, reliability and validity.


Statistical bias
Online surveys
Invalid responses
Skewed data


JOUR



King, Kevin M.
Kim, Dale S.
McCabe, Connor J.



2018


Drug and Alcohol Dependence

183


102-110


December 9, 2017




0376-8716

10.1016/j.drugalcdep.2017.10.033

PMC5803341

NIHMS928743

7225