jeudi 30 juin 2016

How to interpret Singular Value Decomposition results (Python 3)?


I'm trying to learn how to reduce dimensionality in datasets. I came across some tutorials on Principle Component Analysis and Singular Value Decomposition. I understand that it takes the dimension of greatest variance and sequentially collapses dimensions of the next highest variance (overly simplified).

I'm confused on how to interpret the output matrices. I looked at the documentation but it wasn't much help. I followed some tutorials and was not too sure what the resulting matrices were exactly. I provided some code to get a feel for the distribution of each variable in the dataset (sklearn.datasets) .

My initial input array is a (n x m) matrix of n samples and m attributes. I could do a common PCA plot of PC1 vs. PC2 but how do I know which dimensions each PC represents?

Sorry if this is a basic question. A lot of the resources are very math heavy which I'm fine with but a more intuitive answer would be useful. No where I've seen talks about how to interpret the output in terms of the original labeled data.

I'm open to using sklearn's decomposition.PCA

#Singular Value Decomposition
U, s, V = np.linalg.svd(X, full_matrices=True)
print(U.shape, s.shape, V.shape, sep="n")
(442, 442)
(10,)
(10, 10)

Aucun commentaire:

Enregistrer un commentaire