This article was automatically translated from the original Turkish version.

Principal Component Analysis

+1 More

Quote

Principal Component Analysis (PCA) is a transformation technique that reduces the dimensions of datasets containing many interrelated variables while preserving as much of the variation within the data as possible. These studies were first initiated by Karl Pearson in 1901 and later developed by Hotelling in 1933. The goal is to find the optimal transformation that allows the data to be represented using fewer variables. The variables obtained after transformation are called principal components and are ordered such that the principal component with the highest variance is placed first.

Principal Component Analysis (PCA) is commonly used for the following purposes:

Dimensionality Reduction: Reducing the number of variables in the dataset to make it more manageable.
Elimination of Correlation Structure: Minimizing relationships among variables to obtain more independent components.
Data Preparation for Analysis: Transforming the data into a more suitable format for other statistical analyses.

In PCA, there are two main components: the first principal component (PC1) and the second principal component (PC2).
First Principal Component (PC1): This is the direction in which the data points exhibit the highest variance. It represents the line that best captures the shape of the projected points. The greater the variance captured by the first component, the more information from the original dataset is preserved, and no other principal component will have a higher variance.
Second Principal Component (PC2): PC2 explains the next highest variance in the dataset and is necessarily uncorrelated with PC1; that is, PC2 is orthogonal (perpendicular) to PC1. This relationship implies that the correlation between PC1 and PC2 must be zero.

When PCA is applied, a scatter plot is typically used to illustrate the relationship between PC1 and PC2. The axes for PC1 and PC2 are shown perpendicular to each other. The first and second principal components are graphically represented below.

First and Second Principal Components (generated by artificial intelligence.)
Mathematical Model of Principal Component Analysis
Let our data matrix be M observation vectors, each of size Nx1, represented as X. In matrix X, each column represents a different variable (data type), as described below.

X=​x11​x21​.xN1​​x12​x22​.xN2​​............​x1M​x2M​.xNM​​​

Since the variables may have different units of measurement, the data are standardized. Standardization is performed by centering each variable so that its mean becomes zero. This is done by subtracting the mean of the dataset from each data point.

m=M1​i=1∑n​xi=​m1​m2​.mN​​​

After subtracting the means, the matrix X~ is obtained as follows.

X~=​x11​−m1​x21​−m2​.xN1​−mN​​x12​−m1​x22​−m2​.xN2​−mN​​............​x1M​−m1​x2M​−m2​.xNM​−mN​​​

In the next step, the covariance matrix C is calculated as follows.

C=X~X~T

C=​(x11​−m1​)2(x21​−m2​)(x12​−m1​).((xN1​−mN​)(x1M​−m1​)​(x12​−m1​)(x21​−m2​)(x22​−m2​)2.(xN2​−mN​)(x2M​−m2​)​............​(xM1​−m1​)(xN1​−mN​)(xM2​−m2​)(xN2​−mN​).(xNM​−mN​)2​​

Variance and covariance are used to understand how variables behave within a dataset. In the covariance matrix C the coefficients along the diagonal represent variance values, indicating the spread of data in a single dimension around its mean. Covariance indicates how two variables change together: positive covariance means that when one variable increases, the other tends to increase as well, or both decrease; negative covariance means that when one variable increases, the other tends to decrease. The obtained covariance matrix C undergoes eigenvalue-eigenvector decomposition.

Cv=λv

Here, λ represents the eigenvalues and v represents the eigenvectors. The eigenvalues are ordered from largest to smallest, and the first P eigenvectors corresponding to the largest eigenvalues are selected to form the columns of the projection matrix W.

W=[w1​​w2​​...​wp​​]

Using the projection matrix W, the dimensionality of the data is reduced from N dimensions to P dimensions for i=1,2,...,M.

yi=WTxi

Bibliographies

Bulut, Hasan. "R Uygulamaları ile Çok Değişkenli İstatistiksel Yöntemler." Ankara: Nobel Akademik Yayıncılık, 2018.

H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, Volume 24, pp. 417-441, 1933.

H. S. Yavuz and M. A. Çay, "Applications of the Principal Component Analysis Method and Some Classical and Robust Adaptations in Face Recognition," ESOGÜ Journal of Engineering and Architecture Faculty, vol. 22, no. 1, pp. 49–63, 2009.

IBM. "Principal Component Analysis." Accessed May 17, 2025. Accessed Adresi. 

K. Pearson, "On lines and planes of closest fit to systems of points in space," Philosophical Magazine, vol. 2, no. 11, pp. 559-572, 1901.

Author Information

AuthorSertaç ArısoyDecember 8, 2025 at 11:39 AM

Discussions

No Discussion Added Yet

Start discussion for "Principal Component Analysis" article

View Discussions

Mathematical Model of Principal Component Analysis

Principal Component Analysis

Mathematical Model of Principal Component Analysis

Bibliographies

Author Information

Tags

Discussions

Contents