« BackInternational Journal of Cardiology
Article in Press

How easily can omission of patients, or selection amongst poorly-reproducible measurements, create artificial correlations? Methods for detection and implications for observational research design in cardiology

Received 6 August 2011; received in revised form 28 November 2011; accepted 3 December 2011. published online 30 January 2012.
Corrected Proof

Abstract 

Background

When reported correlation coefficients seem too high to be true, does investigative verification of source data provide suitable reassurance? This study tests how easily omission of patients or selection amongst irreproducible measurements generate fictitious strong correlations, without data fabrication.

Method and results

Two forms of manipulation are applied to a pair of normally-distributed, uncorrelated variables: first, exclusion of patients least favourable to a hypothesised association and, second, making multiple poorly-reproducible measurements per patient and choosing the most supportive.

Excluding patients raises correlations powerfully, from 0.0±0.11 (no patients omitted) to 0.40±0.11 (one-fifth omitted), 0.59±0.08 (one-third omitted) and 0.78±0.05 (half omitted). Study size offers no protection: omitting just one-fifth of 75 patients (i.e. publishing 60) makes 92% of correlations statistically significant.

Worse, simply selecting the most favourable amongst several measurements raises correlations from 0.0±0.12 (single measurement of each variable) to 0.73±0.06 (best of 2), and 0.90±0.03 (best of 4). 100% of correlation coefficients become statistically significant.

Scatterplots may reveal a telltale “shave sign” or “bite sign”. Simple statistical tests are presented for these suspicious signatures in single or multiple studies.

Conclusion

Correlations are vulnerable to data manipulation. Cardiology is especially vulnerable to patient deletion (because cardiologists ourselves might completely control enrolment and measurement), and selection of “best” measurements (because alternative heartbeats are numerous, and some modalities poorly reproducible). Source data verification cannot detect these but tests might highlight suspicious data and – aggregating across studies – unreliable laboratories or research fields. Cardiological correlation research needs adequately-informed planning and guarantees of integrity, with teeth.

Keywords: Correlation coefficients, Observational studies, Prediction

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0167-5273(11)02178-4

doi:10.1016/j.ijcard.2011.12.018

« BackInternational Journal of Cardiology