Logit models don’t make mistakes, people do

4 minute read

Published:

I recently found myself reading the large methodological literature on logit models in sociology again. And I think I may have found a way to more intuivtively explain certain things about logit models. I’ll try some ideas here to see if they make sense to people.

The functional form assumption of logit models

It’s worth it to write out the basic assumption for logit models first. For a logit model to estimate something interpretable, a functional form assumption on the conditional probability $Pro(Y=1 \mid X)$ is needed: \(Pr(Y=1 \mid X)=\frac{\exp(\alpha+\beta X)}{1+\exp(\alpha+\beta X)}.\)

I’ll call this the logit functional form. Given this assumption, using a logit model to regress $Y$ on $X$ will give you consistent estimates of $\alpha$ and $\beta$.

Why do nested logits not make sense?

Short answer: because you (the researcher) are making self-contradictory assumptions! In two nested logit models, the researcher first use logit to regress $Y$ on $X1$, then again use logit to regress $Y$ on $X1$ and $X2$ at the same time. But to do this, the researcher implicitly assume that $Pr(Y=1 \mid X1)=\frac{\exp(\alpha+\beta X1)}{1+\exp(\alpha+\beta X1)}$ and $Pr(Y=1 \mid X1, X2)=\frac{\exp(\alpha+\beta 1 X1 + \beta 2 X2)}{1+\exp(\alpha+\beta 1 X1 + \beta 2 X2)}$. However, $Pr(Y=1 \mid X1)=E[Pr(Y=1 \mid X1, X2) \mid X1]$, which means that if $Pr(Y=1 \mid X1, X2)$ has a logit functional form, it is generally impossible that $Pr(Y=1 \mid X1)$ also has a logit functional form. (This can be numerically verified by using a simple DGP as an example.) Hence, by estimating nested logits, the researcher contradict themselves. Logit, on the other hand, makes no mistake.

Are logits biased if there's any unmodelled variation in $Y$?

Short answer: no! Recent methodological literature tends to give the impression that omitting covariates that are independent of the variable of interest will bias the estimated logit coefficient. But this impression is false. The reasoning in the literature amounts to saying that if $Pr(Y=1 \mid X1, X2)$ has a logit functional form $\frac{\exp(\alpha+\beta 1 X1 + \beta 2 X2)}{1+\exp(\alpha+\beta 1 X1 + \beta 2 X2)}$, then using a logit model to regress $Y$ on $X1$ will give you a biased estimate of $\beta 1$, even if $X1$ and $X2$ are independent. This is technically true, but again, a researcher who doesn’t want to be self-contradictory just wouldn’t use a logit model to regress $Y$ on $X1$ if they believe $Pr(Y=1 \mid X1, X2)$ has a logit functional form.

If, instead, we just assume $Pr(Y=1 \mid X1)$ alone has a logit functional form $\frac{\exp(\alpha+\beta X1)}{1+\exp(\alpha+\beta X1)}$, then the logit model will simply give us consistent estimates of $\alpha$ and $\beta$, regardless of whether there’s any variation in $Y$ within each level of $X1$. Hence, logit is not really “biased” in the presence of unmodelled variation, as long as the researcher doesn’t go against their own assumption.

What's the way forward?

Short answer: forget about logit, and also forget about linear probability model, for that matter. Even if logit itself doesn’t make mistakes, it nevertheless makes it easy for researchers to do so, because the researcher has to assume the logit functional form in order to use logit. But there really is no need for us to assume any functional form, including the assumed functional form of the linear probability model: $Y=\alpha + \beta X$. What we care about are conditional probabilities like $Pr(Y=1 \mid X)$ or maybe log odds $\log \left[\frac{Pr(Y=1 \mid X)}{1-Pr(Y=1 \mid X)} \right]$ (or maybe an effect measure such as risk difference or marginal odds ratio). These estimands are in no way tied to any specific model (logit, probit, linear probability model). In fact, since we, as humans, tend to make mistakes about functional form assumptions, we shouldn’t be given the power to determine the functional form at all. Given a chosen estimand, we can and should just let the machine take over and learn the functional form for us. Replacing logit with Machine Learning, we have nothing to lose, but everything to gain.

Leave a Comment