PCA and Eigenvectors Visualization

Interactive visualization of Principal Component Analysis, covariance ellipses, and eigenvectors for understanding dimensionality reduction through geometric interpretation

Data Controls

Presets

Analysis Results

Covariance Matrix Σ

1.00 0.70
0.70 1.00

Eigenvalues (λ)

λ₁ (PC1): 1.70
λ₂ (PC2): 0.30
Total Variance 2.00

Eigenvectors

v₁ (PC1): [0.71, 0.71]
v₂ (PC2): [-0.71, 0.71]

Explained Variance

PC1:
85%
PC2:
15%

Covariance Matrix

Measures how variables vary together. For centered data: Σ = (1/n)XᵀX. Diagonal elements are variances, off-diagonal are covariances.

Eigenvectors

Principal directions of maximum variance. Orthogonal vectors that define the axes of the covariance ellipse. First eigenvector points in direction of maximum variance.

Eigenvalues

Amount of variance explained by each eigenvector. Larger eigenvalue means more variance in that direction. The squared lengths of the covariance ellipse semi-axes.

Covariance Ellipse

Visual representation of the covariance matrix. Shows the shape and orientation of data distribution. Semi-axes aligned with eigenvectors, lengths proportional to √eigenvalues.

Data Centering

Subtracting the mean from each dimension: x_centered = x - μ. Essential for PCA to find directions of maximum variance around the mean.

Dimensionality Reduction

Keeping only top-k principal components reduces dimensions while preserving maximum variance. Reconstruction error = sum of discarded eigenvalues.

Covariance Matrix

For centered data matrix X, Σ = (1/n)XᵀX

Eigendecomposition

Σ can be decomposed as Σ = QΛQᵀ where Q contains eigenvectors and Λ is diagonal matrix of eigenvalues

PCA Transformation

Projects data onto principal components (rotation and possibly projection)

Reconstruction

Reconstructs data using only k principal components

Explained Variance Ratio

Fraction of total variance explained by first principal component

Covariance Ellipse

Parametric equation for covariance ellipse at 1σ (multiply by k for kσ ellipse)

1

1. Center the Data

Subtract the mean from each dimension: x_centered = x - μ. This shifts the data to be centered at the origin.

2

2. Compute Covariance Matrix

Calculate Σ = (1/n)XᵀX where X is the centered data matrix. This captures how variables vary together.

3

3. Find Eigenvectors and Eigenvalues

Solve Σv = λv. Sort eigenvectors by eigenvalues in descending order. Larger eigenvalues indicate directions of more variance.

4

4. Project onto Principal Components

Transform data: z = Qᵀ(x - μ). This rotates the coordinate system to align with principal directions.

5

5. Optional: Dimensionality Reduction

Keep only top-k components: z_k = Q_kᵀ(x - μ). This reduces dimensions while preserving maximum variance.

6

6. Optional: Reconstruct

Reconstruct from k components: x̂ = Q_k z_k + μ. Reconstruction error = sum of discarded eigenvalues.

Data Visualization

Project high-dimensional data to 2D or 3D for visualization while preserving as much variance as possible. Essential for exploratory data analysis.

Feature Extraction

Extract compact feature representations for machine learning. Used in face recognition (Eigenfaces), handwriting recognition, and more.

Noise Reduction

Remove noise by reconstructing with fewer components. Noise typically captured by smaller eigenvalues (later PCs).

Image Compression

Compress images by keeping top-k principal components. Achieve significant compression while preserving main features.

Anomaly Detection

Detect outliers by measuring reconstruction error. Anomalies have high reconstruction error when using few PCs.

Multicollinearity

Handle correlated features in regression analysis. PCA transforms to orthogonal (uncorrelated) components.

Visual Guide

When ρ = 0 (Uncorrelated)

Covariance ellipse becomes a circle (or axis-aligned). No preferred direction. Equal variance in all directions. Eigenvalues are equal.

When ρ > 0 (Positive Correlation)

Data trends upward. Covariance ellipse tilts at 45°. First eigenvector points in direction of trend.

When ρ < 0 (Negative Correlation)

Data trends downward. Covariance ellipse tilts at -45°. Inverse relationship between variables.

When |ρ| = 1 (Perfect Correlation)

Degenerate ellipse becomes a line. One eigenvalue approaches zero. Data is essentially 1D. Perfect reconstruction with 1 PC.

Effect of Noise

High noise increases both eigenvalues equally. Makes ellipse more circular. Reduces the advantage of dimensionality reduction.

Why Eigenvectors?

Eigenvectors are directions that don't change direction under the linear transformation. They're the 'natural axes' of the data distribution.

Legend

Data Points
Mean Point
PC1 (1st Eigenvector)
PC2 (2nd Eigenvector)
Covariance Ellipses (1σ, 2σ, 3σ)
Projected Points
Reconstructed Points