Truth or Statistics: PCA Limitations Illustrated

Principal Component Analysis

PCA is well known method, included in most data mining packages and is inexpensive to compute, requiring only first order statistics sampled mean and variance. The PCA method assumes that parameters and/or features are linearly correlated and that . PCA transforms the observed data into a linearly independent metric space.

https://www.quora.com/Are-there-implicit-Gaussian-assumptions-in-the-use-of-PCA-principal-components-analysis

All methods have limitations, I wish that data scientists would discuss method limitations.
The picture below originates from SciKit Learn community site. I rearranged the panels by moving "True Sources" signal panel to the top, followed by "Observed Samples", then "PCA transformed signals", then "ICA transformed signals".

As can be easily seen the PCA distorts the original signal significantly. ICA (Indpendent Component Analysis) utilizes non-linear, computationally expensive methods to extract something similar to "true source" signals, also described in above Quora explanation of PCA that mentions ICA.

After seeing many signals in my lifetime, in optics, audio, electrical and process control systems, I find that "Observations from mixed signal" resemble a "filtered, under-sampled" representation of the true sources. This is an example of how the attempt to reduce noise in the signal via filters also corrupts the "true sources signal".

Truth or Statistics

Friday, July 29, 2016

PCA Limitations Illustrated

No comments:

Post a Comment