Endogeneity has received attention in the past decade, as a significant source of bias in results reported in a wide variety of studies. Papers can now be desk rejected by top journals if there is reason to believe there may be endogeneity at play.

Endogeneity refers to the situation when the explanatory / independent variable is correlated with the error term.

To understand one reason why it is critical, it helps to look at the 3 criteria for **causality** between x and y (Holland, 1986; Kenny 1979),

- y follows x in time
- y changes as x changes
- There are no other causes that would eliminate the relation between x and y.

Endogeneity can violate the third condition, and can have several **sources**. These are:

- Omitted variable:
- not including an important variable / control variable, such as testing the predictive power of EQ without controlling for IQ.
- Omitting fixed effects
- Using random effects without justification
- In all other cases, independant variables that are not is exogenous, that is, that they are not predicted by the workings of the specific model.

- Omitted selection:
- comparing a treatment group to other non-equivalent groups. (need two equal groups)
- Comparing entities that are grouped nominally, but where the inclusion to the group is not equal.
- Sample is non-representative, such as through self-selection

- Simultaneity:
- Reverse causality

- Measurement error
- Including imperfectly measured variables as independent variables, without modelling measurement error.

- Common method variance:
- Independent and dependent variables are gathered from the same rating source.

- Inconsistent inference:
- Using normal standard errors without examining for hetroskedasticity
- Not using cluster robust standard errors in panel data.

- Model misspecification:
- Not correlating disturbances of potentially endogenous regressors in mediation models (should be tested using a Hausman test of augmented regression)
- Using full information estimators (eg ML or 3SLS) without comparing estimates to a limited information estimator (eg “2SLS)

The above list comes from a chapter well worth reading by; Antonakis, Bendahan, Jacquart and Lalive 2014

Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2014). Causality and endogeneity: Problems and solutions. In D.V. Day (Ed.), The Oxford Handbook of Leadership and Organizations (pp. 93-117). New York: Oxford University Press.

For an introduction (where Antonakis covers most of the above and more, in an easy to understand manner), view the lecture: Endogeneity: An inconvenient truth

## Endogeneity: An inconvenient truth (full version), by John Antonakis

A key assumption of regression analysis (or structural equation modeling) is that the modeled independent variables are not endogenous. Yet, the problems of endogeneity are not well known to researchers working in many social sciences disciplines (e.g., management, applied psychology, sociology, etc.).