Hybrid Naive Bayes TF-IDF Algorithm and Lexicon Approach for Sentiment Analysis of Reviews

Ahmad Harits Ramadhani; Huzaifah Qahar Djauhari; Vincent Lius; Andi Nugroho

Hybrid Naive Bayes TF-IDF Algorithm and Lexicon Approach for Sentiment Analysis of Reviews

Ahmad Harits Ramadhani, Huzaifah Qahar Djauhari, Vincent Lius, Andi Nugroho

Abstract

Amidst the increasing reliance on social media for public expression, accurate sentiment analysis has become essential, notably in assessing application reviews. This study focuses on the need for precise sentiment classification, exemplified by the prevalence of negative feedback in certain app reviews. To address this, we propose a hybrid approach integrating the Naive Bayes algorithm with lexicon-based sentiment labeling and TF-IDF for the model training. Using a dataset of 5000 reviews, we explore Indonesian Lexicons, specifically InSet and SentiStrengthID, to label sentiments. Our primary objective is to classify reviews into positive and negative sentiments, providing valuable insights. Through evaluating the effectiveness of combining Naive Bayes with TF-IDF and lexicon-based methods, this study contributes to a deeper understanding of sentiment analysis in the context of application reviews.

Full Text:

PDF

References

H. Barakat, R. Yeniterzi, and L. Martín-Domingo, “Applying deep learning models to twitter data to detect airport service quality,” J Air Transp Manag, vol. 91, Mar. 2021, doi: 10.1016/j.jairtraman.2020.102003.

Q. A. Xu, V. Chang, and C. Jayne, “A systematic review of social media-based sentiment analysis: Emerging trends and challenges,” Decision Analytics Journal, vol. 3, p. 100073, Jun. 2022, doi: 10.1016/j.dajour.2022.100073.

M. V. Mäntylä, D. Graziotin, and M. Kuutila, “The evolution of sentiment analysis—A review of research topics, venues, and top cited papers,” Computer Science Review, vol. 27. Elsevier Ireland Ltd, pp. 16–32, 2018. doi: 10.1016/j.cosrev.2017.10.002.

S. Hassan, C. Tantithamthavorn, C. P. Bezemer, and A. E. Hassan, “Studying the dialogue between users and developers of free apps in the Google Play Store,” Empir Softw Eng, vol. 23, no. 3, pp. 1275–1312, Jun. 2018, doi: 10.1007/s10664-017-9538-9.

A. Nayak and S. Natarajan, “Comparative study of Naïve Bayes, Support Vector Machine and Random Forest Classifiers in Sentiment Analysis of Twitter feeds,” International Journal of Advanced Studies in Computer Science and Engineering, vol. 5, no. 1, 2016.

N. C. Agustina, D. Herlina Citra, W. Purnama, C. Nisa, and A. Rozi Kurnia, “The Implementation of Naïve Bayes Algorithm for Sentiment Analysis of Shopee Reviews on Google Play Store Implementasi Algoritma Naive Bayes untuk Analisis Sentimen Ulasan Shopee pada Google Play Store,” MALCOM: Indonesian Journal of Machine Learning and Computer Science , vol. 2, pp. 47–54, 2022.

V. A. Fitri, R. Andreswari, and M. A. Hasibuan, “Sentiment analysis of social media Twitter with case of Anti-LGBT campaign in Indonesia using Naïve Bayes, decision tree, and random forest algorithm,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 765–772. doi: 10.1016/j.procs.2019.11.181.

I. R. Hendrawan, E. Utami, and A. D. Hartanto, “Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification,” Edumatic: Jurnal Pendidikan Informatika, vol. 6, no. 1, pp. 143–149, Jun. 2022, doi: 10.29408/edumatic.v6i1.5613.

K. Fajri and R. Gemala Y, “InSet Lexicon: Evaluation of a Word List for Indonesian Sentiment Analysis in Microblogs,” in International Conference on Asian Language Processing (IALP), 2017, pp. 391–394.

D. Haryalesmana Wahid, “Peringkasan Sentimen Esktraktif di Twitter Menggunakan Hybrid TF-IDF dan Cosine Similarity,” IJCCS, vol. 10, no. 2, pp. 207–218, 2016.

Naufal Adi Nugroho and Erwin Budi Setiawan, “Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 5, pp. 837–842, Oct. 2021, doi: 10.29207/resti.v5i5.3325.

N. R. Bhowmik, M. Arifuzzaman, and M. R. H. Mondal, “Sentiment analysis on Bangla text using extended lexicon dictionary and deep learning algorithms,” Array, vol. 13, Mar. 2022, doi: 10.1016/j.array.2021.100123.

S. Anggina, N. Y. Setiawan, and F. A. Bachtiar, “Analisis Ulasan Pelanggan Menggunakan Multinomial Naïve Bayes Classifier dengan Lexicon-Based dan TF-IDF Pada Formaggio Coffee and Resto,” @is The Best Accounting Information Systems and Information Technology Business Enterprise this is link for OJS us, vol. 7, no. 1, pp. 76–90, Sep. 2022, doi: 10.34010/aisthebest.v7i1.7072.

I. F Fanhar Nur, A. Herdiani, and W. Astuti, “Analisis Sentimen Berbasis Leksikon InSet Terhadap Partai Politik Peserta Pemilu 2019 Pada Media Sosial Twitter,” in e-Proceeding of Engineering : Vol.6, No.3, 2019, pp. 10397–10407.

M. L. Loureiro, M. Alló, and P. Coello, “Hot in Twitter: Assessing the emotional impacts of wildfires with sentiment analysis,” Ecological Economics, vol. 200, Oct. 2022, doi: 10.1016/j.ecolecon.2022.107502.

A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protest using twitter data,” International Journal of Information Management Data Insights, vol. 1, no. 2, Nov. 2021, doi: 10.1016/j.jjimei.2021.100019.

N. A. Mansour, A. I. Saleh, M. Badawy, and H. A. Ali, “Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy,” J Ambient Intell Humaniz Comput, vol. 13, no. 1, pp. 41–73, Jan. 2022, doi: 10.1007/s12652-020-02883-2.

T. Olsson, M. Ericsson, and A. Wingkvist, “To automatically map source code entities to architectural modules with Naive Bayes,” Journal of Systems and Software, vol. 183, Jan. 2022, doi: 10.1016/j.jss.2021.111095.

H. Noviyarto, “Implementation Regression and Naïve Bayes to Predict and Classify Data Asset at Educational Institutions,” International Journal of Multidisciplinary Research and Publications (IJMRAP), vol. 2, no. 12, pp. 22–25, 2020.

Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 498–506. doi: 10.1016/j.procs.2021.01.033.

M. Artur, “Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features,” in Procedia Computer Science, Elsevier B.V., Jul. 2021, pp. 564–570. doi: 10.1016/j.procs.2021.06.066.

D. Gamal, M. Alfonse, E. S. M. El-Horbaty, and A. B. M. Salem, “Implementation of Machine Learning Algorithms in Arabic Sentiment Analysis Using N-Gram Features,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 332–340. doi: 10.1016/j.procs.2019.06.048.

V. Umarani, A. Julian, and J. Deepa, “Sentiment Analysis using various Machine Learning and Deep Learning Techniques,” Journal of the Nigerian Society of Physical Sciences, vol. 3, no. 4, pp. 385–394, Nov. 2021, doi: 10.46481/jnsps.2021.308.

A. G. Gozal, H. Pranoto, and M. F. Hasani, “Sentiment analysis of the Indonesian community toward face-to-face learning during the Covid-19 pandemic,” in Procedia Computer Science, Elsevier B.V., 2023, pp. 398–405. doi: 10.1016/j.procs.2023.10.539.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 341–348. doi: 10.1016/j.procs.2019.05.008.

S. Tabinda Kokab, S. Asghar, and S. Naz, “Transformer-based deep learning models for the sentiment analysis of social media data,” Array, vol. 14, Jul. 2022, doi: 10.1016/j.array.2022.100157.

K. Madatov, S. Beckhanov, and J. Viˇciˇc, “Dataset of stopwords extracted from Uzbek texts,” Data Brief, vol. 43, 2022, doi: 10.5281/zenodo.6319953.

R. Rani and D. K. Lobiyal, “Performance evaluation of text-mining models with Hindi stopwords lists,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 2771–2786, Jun. 2022, doi: 10.1016/j.jksuci.2020.03.003.

T. H. J. Hidayat, Y. Ruldeviyani, A. R. Aditama, G. R. Madya, A. W. Nugraha, and M. W. Adisaputra, “Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 660–667. doi: 10.1016/j.procs.2021.12.187.

K. Ahmed et al., “Breaking down linguistic complexities: A structured approach to aspect-based sentiment analysis,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, Sep. 2023, doi: 10.1016/j.jksuci.2023.101651.

Refbacks

There are currently no refbacks.

Abava Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162

International Journal of Open Information Technologies