PRINCIPAL COMPONENT ANALYSIS FOR MACHINE LEARNING

Polina Shulpina,
Moscow Technical University of Communications and Informatics (MTUCI), Moscow, Russia,
polli-lionet@yandex.ru

V. A. Dokuchaev,
Moscow Technical University of Communications and Informatics (MTUCI), Moscow, Russia,
v.a.dokuchaev@mtuci.ru

DOI: 10.36724/2664-066X-2022-8-6-18-24

SYNCHROINFO JOURNAL. Volume 8, Number 6 (2022). P. 18-24.

Abstract

Training a Supervised Machine Learning model involves several stages. In the first stage, the data is passed via model, creating predictions (forecasts). The next stage is to compare these forecasts with factual values (ground truth). The final stage is optimizing the model by minimizing a certain cost function. The model improves that way. Occasionally, an input sample contains many columns. Using each column in a model leads to problems, the
curse of dimensionality. At that rate, it is necessary to be selective about functions. We will embrace Principal Component Analysis (PCA), that is one of the main ways to reduce the dimensionality of data, losing the least amount of information.

Keywords: principal component analysis, PCA, machine learning, deep learning, feature scaling, feature extraction, data preprocessing

References

V. A. Dokuchaev, “Digital Transformation: New Drivers and New Risks,” 2020 International Conference on Engineering Management of Communication and Technology (EMCTECH), 2020, pp. 1-7, doi: 10.1109/EMCTECH49634.2020.9261544.
V.A. Dokuchaev, A.A. Kalfa. V.V. Maklachkova (2020). Architecture of Data Centers. Moscow: Hot Line-Telecom. 240 p. ISBN 978-5-9912-0849-9.
V. A. Dokuchaev, V. V. Maklachkova, V. Yu. Statev, “Data subject as augmented reality,” SYNCHROINFO JOURNAL, vol.6, no.1, 2020, pp.11-15, doi: 10.36724/2664-066X-2020-6-1-11-15.
S.V. Pavlov, V.A. Dokuchaev, V.V. Maklachkova, and S.S. Mytenkov, “Features of supporting decision making in modern enterprise infocommunication systems,” T-Comm, vol. 13, no. 3, pp. 71-74, 2019, doi: 10.24411/2072-8735-2018-10252.
V.Yu. Statev, V.A. Dokuchaev, V.V Maklachkova, “Information security in the big data space”. T-Comm, vol. 16, no.4, 2022, pр. 21-28. (in Russian).
V. A. Dokuchaev, E. V. Gorban and V. V. Maklachkova, “The System of Indicators for Risk Assessment in High-Loaded Infocommunication Systems,” 2019 Systems of Signals Generating and Processing in the Field of on Board Communications, Moscow, Russia, 2019, pp. 1-4, doi: 10.1109/SOSG.2019.8706726.
S. V. Pavlov, V. A. Dokuchaev, V. V. Maklachkova, S. S. Mytenkov, “Features of supporting decision making in modern enterprise infocommunication systems”, T-Comm, vol. 13, no.3, 2019, pр. 71-74.
X. Zhu, H. Dong, P. S. Rossi, M. Landro, «Feature Selection based on Principal Component Analysis for Underwater Source Localization by Deep Learning», Department of Electronic Systems, Norwegian University of Science and Technology, 2020, pp. 1-15, doi: 10.3390/rs13081486.
Relationship between SVD and PCA. How to use SVD to perform PCA, StackExchange, Available: https://stats.stackexchange.com/questions/ 134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca.
V.M. Efimov, K.V. Efimov, V.Y. Kovaleva, «Principal component method and its generalizations for sequences of any type», Vavilovsky Journal of Genetics and Selection, 2019, pp. 1032-1036, doi: 10.18699/VJ19.584.
S. Raschka, «Principal Component Analysis», Available: https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
J. V. Lambers, «PCA Mathematical Fundamentals, Available: https://www.math.usm.edu/lambers/cos702/cos702_files/docs/PCA.pdf.
N. Cristianini, J. Shawe-Taylor, «An Introduction to Support Vector Machines and other kernel-based learning methods», Cambridge University Press, 2000, pp. 687-689, doi: 10.1017/CBO9780511801389.
The Complete Guide to Principal Component Analysis – PCA in Machine Learning, machinelearningmastery, Available: https://en.wikipedia.org/wiki/Covariance_matrix.
Understanding the output of SVD when used for PCA, StackExchange, Available: https://stats.stackexchange.com/questions/96482/ understanding-the-output-of-svd-when-used-for-pca.
Why does Andrew Ng prefer to use SVD and not EIG of covariance matrix to do PCA, StackExchange, Available: https://stats.stackexchange.com /questions/314046/why-does-andrew-ng-prefer-to-use-svd-and-not-eig-of-covariance-matrix-to-do-pca.

Information about authors:
Polina Shulpina, Network Information Technologies and Services, MTUCI, Moscow, Russia
V.A. Dokuchaev, DSc, Prof., Network Information Technologies and Services, MTUCI, Moscow, Russia

PRINCIPAL COMPONENT ANALYSIS FOR MACHINE LEARNING

Contacts

Site management

Institute of Radio and Information Systems (IRIS)