Recognizing real estate agent ads using machine learning methods

Ilya Zelenskiy

Abstract


A solution of housing stock state open data validation problem for further using the verified data in the urban environment monitoring process is described. The validation problem in this study is the automatic recognition of potentially unreliable ads published by real estate agents. Automation is complicated by the lack of formal criteria for ads affiliation, the existing methods dependence on additional resources and information sources. Existing methods for solving this and similar problems of recognizing unreliable information are considered using fake news identifying as an example. Two groups of methods are distinguished: social-based and content-based. The first are poorly applicable to this work’s task, while the latter marks data being reliable or unreliable based on natural language analysis and classification methods, which is synonymous with the problem posed in this work. The result is the problem formulation as the binary classification of ads based on machine learning methods task. A pre-labeled dataset of Moscow apartments ads from open sources was used to train and test the classification models. Six classification models were compared by target metrics: accuracy, precision, recall, F1 and ROC-AUC metrics. The best model by metrics was selected and its hyperparameters were optimized.


Full Text:

PDF (Russian)

References


Gradostroitel`ny`j kodeks Rossijskoj Federacii ot 29.12.2004 N 190-FZ (red. ot 02.07.2021) (s izm. i dop., vstup. v silu s 01.10.2021).

Sustainable Development Goals https://www.un.org/sustainabledevelopment/ru/sustainable-development-goals/ Retrieved: Dec, 2024

Rejting kachestva zhizni https://asi.ru/government_officials/quality-of-life-ranking/ Retrieved: Dec, 2024

Indeks kachestva gorodskoj sredy` — instrument dlya ocenki kachestva material`noj gorodskoj sredy` i uslovij eyo formirovaniya https://xn----dtbcccdtsypabxk.xn--p1ai/#/ Retrieved: Dec, 2024

Metodika formirovaniya indeksa kachestva gorodskoj sredy https://docs.cntd.ru/document/553937399?marker=6560IO/ Retrieved: Dec, 2024

Housing prices dataset https://ru-brightdata.com/products/datasets/real-estate/housing-prices/ Retrieved: Dec, 2024

Zhuravlev Yu.I., Ryazanov V.V., Sen`ko O.V. "Raspoznavanie". Matematicheskie metody`. Programmnaya sistema. Prakticheskie primeneniya. Moskva: FAZIS, 2006 (M.: Tipografiya "Nauka" RAN). 176 s.

Gorenburgov M.A., Goncharov V.V. Vy`rabotka pravil i form izlozheniya biznes-informacii v seti internet kak sredstvo protivodejstviya moshennicheskim sxemam // Regional`naya informatika (RI-2020): XVII Sankt-Peterburgskaya mezhdunarodnaya konferenciya. Materialy` konferencii. 2020. T. 1. S. 296–297.

Epry`nceva N.A., Sokolova A.V., Rudneva A.A. Iskusstvenny`j intellekt v sfere nedvizhimosti // Informacionny`e texnologii v stroitel`ny`x, social`ny`x i e`konomicheskix sistemax. 2018. № 4(14). S. 47–50.

Ou Ts.Y., Lin G.Yu., Fu H.P. et al. An Intelligent Recommendation System for Real Estate Commodity. Computer Systems Science and Engineering, 2022, vol. 42, no. 3, pp. 881–897. doi: 10.32604/csse.2022.022637.

Sinyak N.G., Tajinder S., Madhu K.Ja., Kozlovskiy V.V. Predicting real estate market trends and value using pre-processing and sentiment text mining analysis. Real Estate: Economics, Management, 2021, no. 1, pp. 35–43.

Golubev A., Zelenskiy I., Parygin D. et al. Validation of real estate ads based on the identification of identical images. Proceedings of the 2018 International Conference on System Modeling and Advancement in Research Trends, SMART 2018: 7, 2018, pp. 274–279. doi: 10.1109/SYSMART.2018.8746926.

Vedova M.L.D., Tacchini E., Moret S. et al. Automatic Online Fake News Detection Combining Content and Social Signals. Conference of Open Innovations Association, FRUCT, 2018, no. 22, pp. 272–279.

Prachi N.N., Habibullah Md., Rafi Md.E.H. et al. Detection of Fake News Using Machine Learning and Natural Language Processing Algorithmsю Journal of Advances in Information Technology, 2022, vol. 13, no. 6. doi: 10.12720/jait.13.6.652-661.

Mohapatra A., Thota N., Prakasam P. Fake news detection and classification using hybrid BiLSTM and self-attention model. Multimedia Tools and Applications, 2022, vol. 81, no. 13, pp. 18503–18519. doi: 10.1007/s11042-022-12764-9.

Anderson J. R. Machine Learning: An Artificial Intelligence Approach. Elsevier Science, 1983. 572 p.

https://ru-brightdata.com/products/datasets/real-estate/housing-prices

Gasparetto A., Marcuzzo M., Zangari A., Albarelli A. A Survey on Text Classification Algorithms: From Text to Predictions. Information (Switzerland), 2022, vol. 13, no. 2, p. 83. doi: 10.3390/info13020083.

Wadud Md.A.H., Mridha M.F., Rahman M.M. Word Embedding Methods for Word Representation in Deep Learning for Natural Language Processing. Iraqi Journal of Science, 2022, pp. 1349–1361. doi: 10.24996/ijs.2022.63.3.37

Rakshit, P. A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques / P. Rakshit, A. Sarkar // Multimedia Tools and Applications. – 2024. – DOI 10.1007/s11042-024-19045-7. – EDN GVJSKV.

Zelenskiy, I., Parygin, D., Savina, O., Finogeev, A., Gurtyakov, A. (2022) ‘Effective Implementation of Integrated Area Development Based on Consumer Attractiveness Assessment’, Sustainability, 14(23), art. no. 16239. doi: 10.3390/su142316239. available at: https://www.mdpi.com/2071-1050/14/23/16239/pdf (accessed November 06, 2024).

Zelensky, I.S., Parygin, D.S., Savina, O.V., Finogeev, A.A., Shuklin, A.A., Antyufeev, A.Yu. (2020) ‘Intelligent decision support on real estate objects use for urbanized territories management’, Proc. International Journal of Open Information Technologies, 8(11), pp. 13–29.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162