Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


PCA - Different Forms

The principal components may be calculated by eigenanalysis of one of three different matrices:
 

  • the scatter matrix: this is simply the product of ATA, and does not imply any scaling
  • the variance-covariance matrix: which is equal to the scatter matrix after mean-centering the data
  • the correlation matrix: which is equal to the scatter matrix after standardizing the data


Which method is chosen to perform the PCA depends on the problem at hand. Most often the best results are obtained  by experimenting with all three approaches. Generally speaking, the matrix to be used is determined by the importance of either the absolute numbers in the data (scatter matrix), or the relationships between the variables (correlation matrix). If a fixed offset in the variables causes problems, one may use the covariance matrix. Details about these matrices can be obtained on a separate page.

In order to see the effects of different scalings, take as an example the data set WORLDPOP, which contains some demographic data on all countries of the world (as of 1988). It is quite natural that the absolute numbers are important in this case, so go to the  DataLab  and look at the first two principal components using the three different matrices. For this data set, the standardization prior to the PCA does not make any sense and results in badly differentiated PC plots. However, keep in mind that the opposite may be true for other data sets.

Another good approach worth checking is the 3D rotational display using the first three principal components (start the PCA, then copy the scores into the data matrix, and view the first three PCs by the command "3D Rotation")