Experimental evaluation of the temporal efficiency of big data processing for specified storage formats
Abstract
Full Text:
PDF (Russian)References
D. Chong, H. Shi, “Big data analytics: A literature review,” J. Manag. Anal., vol. 2, p. 175–201, 2015.
R. Moro Visconti, D. Morea, “Big Data for the Sustainability of Healthcare Project Financing,” Sustainability, vol. 11, p. 3748, 2019. doi:10.3390/su11133748.
L. Ardito, V. Scuotto, M. Del Giudice, A. Messeni, “A bibliometric analysis of research on Big Data analytics for business and management,” Manag. Decis., vol. 57, p. 1993–2009, 2018. doi:10.1108/MD-07-2018-0754.
F. Cappa, R. Oriani, E. Peruffo, I.P. McCarthy, “Big Data for Creating and Capturing Value in the Digitalized Environment: Unpacking the Effects of Volume, Variety and Veracity on Firm Performance,” Journal of Product Innovation Management, vol. 38, no. 1, p.. 49-67, 2021. https://doi.org/10.1111/jpim.12545.
E. Nikulchev, D. Ilin, A. Silaeva, et al., “Digital Psychological Platform for Mass Web-Surveys,” Data, vol. 5, no. 4, p. 95. doi: 10.3390/data5040095
I. Mavridis, H. Karatza, “Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark,” J. Syst. Softw., vol. 125, p. 133–151, 2017.
S. Lee, J.Y. Jo, Y. Kim, “Survey of Data Locality in Apache Hadoop,” In 2019 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD), Honolulu, USA, 29-31 May 2019; pp. 46–53.
K. Garg, D. Kaur, “Sentiment Analysis on Twitter Data using Apache Hadoop and Performance Evaluation on Hadoop MapReduce and Apache Spark,” In Proceedings on the International Conference on Artificial Intelligence (ICAI), Las Vegas, Nevada, USA, 29 July - 01 August 2019; pp. 233–238.
Hive. 2020 Apache Hive Specification. Available online: https://cwiki.apache.org/confluence/display/HIVE.
Impala. 2020 Apache Impala Specification. Available online: https://impala.apache.org/impala-docs.html.
E. Nazari, M.H. Shahriari, H. Tabesh, “BigData Analysis in Healthcare: Apache Hadoop, Apache spark and Apache Flink,” Frontiers in Health Informatics, vol. 8, no. 1, p. 14, 2019.
S. Salloum, R. Dautov, X. Chen, P.X. Peng, J.Z. Huang, ‘Big data analytics on Apache Spark,’ International Journal of Data Science and Analytics, vol. 1, no. 3, pp. 145-164, 2016.
A. Gusev, D. Ilin, E. Nikulchev, “The Dataset of the Experimental Evaluation of Software Components for Application Design Selection Directed by the Artificial Bee Colony Algorithm,” Data, vol. 5, p. 59, 2020.
A. Gusev, D. Ilin, P. Kolyasnikov, E. Nikulchev, “Effective Selection of Software Components Based on Experimental Evaluations of Quality of Operation,” Engineering Letters, vol. 28, no. 2, p. 420–427, 2020.
A. Ramírez, J.A. Parejo, J.R. Romero, S. Segura, A. Ruiz-Cortés, “Evolutionary composition of QoS-aware web services: A many-objective perspective,” Expert Syst. Appl., vol. 72, p. 357–370, 2017,
S. Gholamshahi, S.M.H. Hasheminejad, “Software component identification and selection: A research review,” Softw. Pract. Exp., vol. 49, p. 40–69, 2019.
R.F. Munir, A. Abelló, O. Romero, M. Thiele, W. Lehner, “A cost-based storage format selector for materialized results in big data frameworks,” Distrib Parallel Databases, vol. 38, p. 335–364, 2020. doi:10.1007/s10619-019-07271-0.
X. Wang, Z. Xie, “The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle,”. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 177-186, 2020.
D. He, D. Wu, R. Huang, G. Marchionini, P. Hansen, S.J. Cunningham, ”ACM/IEEE Joint Conference on Digital Libraries 2020 in Wuhan virtually,” ACM Sigweb Newsl, vol. 1, p. 1–7, 2020.
S. Ahmed, M.U. Ali, J. Ferzund, M.A. Sarwar, A. Rehman,; A. Mehmood, “Modern Data Formats for Big Bioinformatics Data Analytics,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 4, p. 366-377, 2017, doi:10.14569/IJACSA.2017.080450.
D. Plase, L. Niedrite, R. Taranovs, “A Comparison of HDFS Compact Data Formats: Avro Versus Parquet,” Moksl. Liet. Ateitis, vol. 9, p. 267–276, 2017.
D. Ilin, E. Nikulchev, “Performance Analysis of Software with a Variant NoSQL Data Schemes,” In 2020 13th International Conference "Management of large-scale system development" (MLSD), p. 1-5, 2020. 10.1109/MLSD49919.2020.9247656
T. L. Saaty, “Ob izmerenii neosyazaemogo. Podhod k otnositel'nym izmereniyam na osnove glavnogo sobstvennogo vektora matricy parnyh sravnenij,” Cloud of science, vol. 2, no. 1, p. 5-39, 2015.
T. L. Saaty, “Otnositel'noe izmerenie i ego obobshchenie v prinyatii reshenij. Pochemu parnye sravneniya yavlyayutsya klyuchevymi v matematike dlya izmereniya neosyazaemyh faktorov,”Cloud of science, vol. 3, no. 2, p. 171-262, 2016.
S. Sakr, A. Liu, A.G. Fayoumi, “The family of mapreduce and large-scale data processing systems,” ACM Comput. Surv. (CSUR), vol. 46, p. 1–44, 2013.
S. Chellappan, D. Ganesan, “Introduction to Apache Spark and Spark Core,” In Practical Apache Spark; Apress: Berkeley, CA, USA; pp. 79–113, 2018.
V. Belov, A. Tatarintsev, E. Nikulchev, “Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark,” Symmetry, vol. 13, no. 2, p. 195, 2021. doi: 10.3390/sym13020195
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162