Modeling heaping in self-reported cigarette counts

Research output: Contribution to journalArticlepeer-review

60 Scopus citations


In studies of smoking behavior, some subjects report exact cigarette counts, whereas others report rounded-off counts, particularly multiples of 20, 10 or 5. This form of data reporting error, known as heaping, can bias the estimation of parameters of interest such as mean cigarette consumption. We present a model to describe heaped count data from a randomized trial of bupropion treatment for smoking cessation. The model posits that the reported cigarette count is a deterministic function of an underlying precise cigarette count variable and a heaping behavior variable, both of which are at best partially observed. To account for an excess of zeros, as would likely occur in a smoking cessation study where some subjects successfully quit, we model the underlying count variable with zero-inflated count distributions. We study the sensitivity of the inference on smoking cessation by fitting various models that either do or do not account for heaping and zero inflation, comparing the models by means of Bayes factors. Our results suggest that sufficiently rich models for both the underlying distribution and the heaping behavior are indispensable to obtaining a good fit with heaped smoking data. The analyses moreover reveal that bupropion has a significant effect on the fraction abstinent, but not on mean cigarette consumption among the non-abstinent.

Original languageEnglish (US)
Pages (from-to)3789-3804
Number of pages16
JournalStatistics in Medicine
Issue number19
StatePublished - Aug 30 2008


  • Bayesian inference
  • Heaped data
  • Rounded data
  • Smoking cessation
  • Zero-inflated negative binomial
  • Zero-inflated poisson

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability


Dive into the research topics of 'Modeling heaping in self-reported cigarette counts'. Together they form a unique fingerprint.

Cite this