Probabilistic methods of linear algebra and Big Data

Vladimir Agishev; Galina Goremykina

Probabilistic methods of linear algebra and Big Data

Vladimir Agishev, Galina Goremykina

Abstract

This study explores the application of probabilistic methods in linear algebra to enhance computational efficiency in big data processing. As traditional data processing methods face challenges in scalability and computational complexity, this research investigates the potential of randomized approaches (RandNLA), that have been applied in many linear algebra application packages in recent years, to address these issues, specifically focusing on the benefits of dimensionality reduction. The research employs key probabilistic algorithms, particularly sketching techniques, to significantly reduce computational costs without processing entire datasets. Various test datasets were utilized to demonstrate the performance of these algorithms, with a special emphasis on least squares problems, which are prevalent in optimization tasks. The results indicate that the implementation of probabilistic methods leads to considerable acceleration in computational processes, achieving high performance with minimal loss of accuracy. Furthermore, the study highlights refinement techniques that enhance the precision of approximate solutions obtained through randomized algorithms. This research provides valuable insights into the effectiveness of probabilistic methods as a promising tool for big data analysis, particularly in areas such as machine learning and optimization. It demonstrates the potential for future research to further explore the integration of these methods into various applications, paving the way for innovative solutions to complex data challenges.

Full Text:

PDF (Russian)

References

Drineas P., Mahoney M. W., Muthukrishnan S., Sarlos T. Faster Least Squares Approximation https://arxiv.org/abs/0710.1435

Avron H., Maymounkov P., Toledo S. Blendenpik: Supercharging LAPACK's Least-Squares Solver https://doi.org/10.1137/090767911

Mahoney M. W. Randomized algorithms for matrices and data https://arxiv.org/abs/1104.5557

N. Benjamin Erichson, Peng Zheng, Krithika Manohar, Steven L. Brunton, J. Nathan Kutz, Aleksandr Y. Aravkin. Sparse Principal Component Analysis via Variable Projection. https://arxiv.org/abs/1804.00341

J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher. “Fixed-rank approximation of a positive-semidefinite matrix from streaming data”. In: Advances in Neural Information Processing Systems. Ed. by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Vol. 30. https://arxiv.org/abs/1706.05736

S. Voronin and P.-G. Martinsson. “Efficient algorithms for CUR and interpolative matrix decompositions”. In: Advances in Computational Mathematics 43.3 (Nov. 2016), pp. 495–516. https://arxiv.org/abs/1412.8447

Y. Dong and P.-G. Martinsson. Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition. 2021. https://arxiv.org/abs/2104.05877

N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. https://arxiv.org/abs/0909.4061

Erichson N. B., Voronin S., Brunton S. L., Kutz J. N. Randomized Matrix Decompositions Using R. https://arxiv.org/abs/1608.02148

Ootomo H., Yokota R. Mixed-Precision Random Projection for RandNLA on Tensor Cores https://arxiv.org/abs/2304.04612

Filip Hanzely, Konstantin Mishchenko, and Peter Richtárik. SEGA: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems, 31, 2018. https://arxiv.org/abs/1809.03054

Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, and Raman Arora. Fetchsgd: Communication-efficient federated learning with sketching. In International Conference on Machine Learning, pages 8253–8265. PMLR, 2020. https://arxiv.org/abs/2007.07682

Deanna Needell, Rachel Ward, and Nati Srebro. Gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Advances in neural information processing systems. https://arxiv.org/abs/1310.5715

Alon Gonen, Francesco Orabona, and Shai Shalev-Shwartz. Solving ridge regression using sketched preconditioned SVRG. https://arxiv.org/abs/1602.02350

Robert Gower, Nicolas Le Roux, and Francis Bach. Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods. In International Conference on Artificial Intelligence and Statistics, pages 707–715. PMLR, 2018. https://arxiv.org/abs/1710.07462

Murray R., Demmel J., Mahoney M. W., Erichson N. B., Melnichenko M., Malik O. A., Grigori L., Luszczek P., Dereziński M., Lopes M. E., Liang T., Luo H., Dongarra J. Randomized Numerical Linear Algebra: A Perspective on the Field With an Eye to Software. https://arxiv.org/abs/2302.11474

T. Sarlos. “Improved approximation algorithms for large matrices via random projections”. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS). FOCS ’06. USA: IEEE Computer Society, 2006, pp. 143–152. isbn: 0769527205. DOI: 10.1109/FOCS.2006.37

Hitczenko P., Kwapień S. On the Rademacher series. Probability in Banach Spaces. Progress in Probability. Т. 35. С. 31–36. DOI: 10.1007/978-1-4612-0253-0_2

D. P. Woodruff. “Sketching as a tool for numerical linear algebra”. In: Found. Trends Theor. Comput. Sci. 10.1–2 (Oct. 2014), pp. 1–157. https://arxiv.org/abs/1411.4357

V. Rokhlin and M. Tygert. “A fast randomized algorithm for overdetermined linear least-squares regression”. In: Proceedings of the National Academy of Sciences 105.36 (Sept. 2008), pp. 13212–13217. DOI: 10.1073/pnas.080486910

A. Ben-Israel and T.N.E. Greville. Generalized Inverses: Theory and Applications. SpringerVerlag, New York, 2003. DOI: 10.1007/b97366

M. E. Lopes, S. Wang, and M. Mahoney. “Error estimation for randomized least-squares algorithms via the bootstrap”. In: Proceedings of the 35th International Conference on Machine Learning (ICML). Ed. by J. Dy and A. Krause. Vol. 80. Proceedings of Machine Learning Research. PMLR, July 2018, pp. 3217–3226. https://arxiv.org/abs/1803.08021

D. M. Kane and J. Nelson. “Sparser Johnson-Lindenstrauss Transforms”. In: Proceedings of the 2012 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). https://arxiv.org/abs/1012.1577

Lloyd N. Trefethen и David Bau III. “Numerical Linear Algebra”. (1997) DOI: https://doi.org/10.1137/1.9780898719574

C. C. Paige and M. A. Saunders. “LSQR: an algorithm for sparse linear equations and sparse least squares”. In: ACM Trans. Math. Softw. 8.1 (Mar. 1982), pp. 43–71. DOI: 10.1145/355984.355989

Refbacks

There are currently no refbacks.

Abava Кибербезопасность ИТ конгресс СНЭ

ISSN: 2307-8162

International Journal of Open Information Technologies