# R–Data Exploration

# PCA

**Principal Components Analysis **(PCA) allows us to study and

explore a set of quantitative variables measured on a set of objects

###### Core Idea

With PCA we seek to reduce the dimensionality (reduce the number

of variables) of a data set while retaining as much as possible of the

variation present in the data

Before performing a PCA(or any other multivariate method) we

should start with some preliminary explorations

- Descriptive statistics
- Basic graphical displays
- Distribution of variables
- Pair-wise correlations among variables
- Perhaps transforming some variables
- ETC

The minimal output from any PCA should contain 3 things:

**Eigenvalues **provide information about the amount of

variability captured by each principal component

**Scores **or PCs (principal components) that provide coordinates to graphically represent objects in a lower dimensional space

**Loadings **provide information to determine what variables

characterize each principal component

###### Some questions to keep in mind

- How many PCs should be retained?
- How good (or bad) is the data approximation with the retained PCs?
- What variables characterize each PC?
- Which variables are influential, and how are they correlated?
- Which variables are responsible for the patterns among objects?
- Are there any outlier objects?

Loadings

=

Eigenvectors

⋅

Eigenvalues

.