Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Variance Inflation Factor

The Variance Inflation Factor (VIF) is a means to detect multicollinearities between the independent variables of a model. The basic idea is to try to express a particular variable xk by a linear model based on all other independent variables. If the calculated model shows a high reliability (i.e. the goodness of fit is high) the tested variable xk is likely to be (multi)collinear to one or more of the other variables.

In general, the VIF is calculated for all independent variables of a model. In a second step the variables showing the greatest values are removed from the model. As a rule of thumb, the VIF of all variables should be less than 10 in order to avoid troubles with the stability of the coefficients.

From a mathematical point of view, the VIF measures the increase of the variance in comparison to an orthogonal basis. The VIF of the k-th variable is defined by the following formula:

VIFk= 1/(1-rk2),

where rk2 is the goodness of fit of the linear model for xk based on all other variables.

Example: The following example shows the interpretation and application of the VIF. Assuming that we want to estimate the boiling point of chemical substances from various structural parameters, we select six suitable independent variables and calculate the corresponing VIF values:
Parameter VIF
O-Atoms 9.792
S-Atoms 59.085
JHET 2.533
n-Branch 1.561
Randic-Ix 122.933
RandicToz 138.540

As one can easily see from the table above there are at least three variables which are not linear independent from the other variables. If we remove the variable showing the highest VIF ("RandicToz"), the new VIF values are calculated as follows:

Parameter VIF
O-Atoms 7.218
S-Atoms 7.698
JHET 2.548
n-Branch 1.346
Randic-Ix 1.024

The removal of the variable "RandicToz" has resolved the multicollinearity problem. It is obvious that the removed variable may be expressed by the variables "S-Atoms" and "Randic-Ix" as their VIF values decreased drastically, as well.