One hahaha, two hahaha, three hahaha…

(Picture of Count von Count taken from from noticingappeals.com)

I just posted a working copy of a paper my co-authors and I received our proofs for today, over at academia.edu: http://bit.ly/1qLRzVY, entitled “Count-based Research in Management: Suggestions for improvement”, by Dane Blevins, Eric Tsang, and me!

Counts of events are frequently used as outcome variables in a wide range of disciplines, including strategic management and OB. For example, the number of patents obtained could be an outcome of interest to researchers in either camp, and things like number of acquisitions, or number of corporate board interlocks could be of interest to a strategic management researcher. Even to a humble industrial psychologist, such as myself, a count of words or lines of code written might be an interesting index of performance/effectiveness.

The most basic way to approach counts (which are naturally bounded below by 0, and only take on integer values) is to use the Poisson distribution (named for Simeon Denis Poisson, one of the all-time great names for a mathematician). The Poisson distribution works quite well in dealing with those two conditions, but it imposes a pretty strict assumption that the variance of the data is equal to the mean of the data. When this isn’t the case, Poisson models can give some pretty spectacularly bad results. And real data is almost *always* over-dispersed relative to the basic Poisson.

In general, researchers are better off using the negative binomial as a default. This distribution is sometimes called the Gamma-Poisson, because it can be thought of this way: for data y_1, …, y_n distributed as negative binomial can be modeled with Poisson rate parameters lambda_1, …, lambda_n which are drawn from an underlying gamma distribution with parameters alpha and beta. Using these two parameters allows the data to show more variability than the Poisson allows. In this way, the negative binomial can be thought of as a “robust” version of the Poisson–it tolerates greater variability.

In addition, the kind of data that management researchers encounter often have a substantial number of zeros, more than these basic models allow for. This can sometimes be accounted for using zero-inflated models, which, basically, blend a logistic model predicting “structural” zeros versus the count data model (Poisson or negative binomial).

Our paper goes through a number of examples and uses a simulation study to help researchers figure out when to use one model versus another. We even provide a little decision tree figure to help make it even easier!