Half of a parameter

Science produces models that provide parsimonious descriptions of the world. In cognitive psychology, models regularly compete to explain a few phenomena. But models can survive experiment after experiment, both because of the difficulty of capturing participant’s nuanced behavior, and because models often make highly overlapping predictions. In these cases, a model succeeds through its relative parsimony.

In cognitive psychology, measures of information criteria, specifically Akaike’s and the Bayesian information criteria (Schwarz 1978; Akaike 1973), determine a winning model. These criteria measure complexity by tallying the number of parameters in a model. Long maths and specific assumptions justify the claim that a model with more parameters is more complex than a model with fewer parameters, and so does our intuition that a model is complex if it has many moving parts. Unfortunately, the assumptions fail in common situations, such as when the models are fit in a Bayesian rather than frequentist setting. An alphabet soup of other information criteria exists (in addition to Akaike’s the Bayesian criteria, there is the DIC, WAIC, KIC, NIC, TIC, etc), and these other criteria assign complexity more complexly. These criteria are sensitive not only to the number of parameters in a model but also to the varied roles that a parameter can have. They assign the complexity of a model based on the model’s number of ‘effective parameters.’

For intuition on why tallying the number of parameters is an insufficient measure complexity, consider two models of response time. Both models assume that response times are distributed according to a normal distribution. In this simple example, the variability of the distributions are known, and so the models have only a single free parameter, which is the average response time. In one model, that average can be any number, a value from negative to positive infinity. This is the kind of model implicitly assumed when we conduct a t-test on the averages of response times. Of course the model is a simplification of response times, but this model also has the glaring flaw that it allows the average response time to be negative; a participant cannot respond to stimulation before the stimulus appears. The second model addresses this flaw by adding the constraint that the average response time cannot be negative. Although the second model is more constrained, the models have the same number of parameters. The second model can only account for half of the patterns of data as the first; the second model is twice as parsimonious as the first (Gelman, Hwang, and Vehtari 2014). To adjudicate between these model requires a measure that is sensitive to complexity but does not simply tally the number of parameters in each model.

References

Akaike, Hirotogu. 1973. “Information Theory and an Extension of the Maximum Likelihood Principle.” In Proceedings of the Second International Symposium on Information Theory, 267–81.
Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding Predictive Information Criteria for Bayesian Models.” Statistics and Computing 24 (6): 997–1016. https://doi.org/10.1007/s11222-013-9416-2.
Schwarz, Gideon. 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6 (2): 461–64. https://doi.org/10.1214/aos/1176344136.

View Page Source

Patrick Sadil
Patrick Sadil
Research Associate; Biostatistics Faculty