Analysis and Forecasting for Labor Markets Based on Online Data

V. S. Giorgashvili, M. A. Bakaev


The problem of incomplete data is quite typical in sociological, economics or statistical studies that employ online data. The possible reasons for the incompleteness are: errors and changes at the data source websites, failures and errors in the instruments for collecting data, etc. Since missing data is generally undesirable in labor market forecasting, the preferred solution is filling-in the gaps through the use of an appropriate method that wouldn’t bias the results. In our paper we present a brief review of methods for eliminating incompleteness of data and describe the application of the k-means method to fill the gaps in the labor market online data that we previously collected with a dedicated software system. We evaluate the effectiveness of the method by comparing the produced results (average wages and number of ads posted by the companies) with the data additionally collected by the system through the enhanced API-based mechanism. Further, we use autoregressive integrated moving average (ARIMA) model to provide forecasts for the labor market demand in IT specialists. Validation with the data subsequently collected for the last months of 2018 suggest reasonable accuracy of the model, which can be useful in labor market monitoring and management.

Full Text:

PDF (Russian)


Barcaroli G. Nurra A. Salamone S. Internet as Data Source in the Istat Survey on ICT in Enterprises // Austrian Journal of Statistics. 2015. Vol. 44. P. 31-43.

Bakaev M., Avdeenko T. Data Extraction for Decision-Support Systems: Application in Labour Market Monitoring and Analysis // International Journal of e-Education, e-Business, e-Management and e-Learning. 2014. Vol. 4, # 1. P. 23-27.

Bakaev M., Avdeenko T. Prospects and challenges in online data mining: experiences of three-year labour market monitoring project // Lecture Notes in Computer Science. 2016. Vol. 9714: Data Mining and Big Data. P. 15–23.

Zloba E., Jackiv I. Statisticheskie metody vosstanovlenija propushhennyh dannyh // Computer Modelling & New Technologies, 2002. Vol. 6, # 1. P. 51–61.

Snitjuk V.E. Jevoljucionnyj metod vosstanovlenija propuskov v dannyh // VI MK “Intellektual'nyj analiz informacii”. Sb. trud. – Kiev, 2006. C. 262-271.

MacQueen J. Some methods for classification and analysis of multivariate observations // Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. 1967. Vol. 1. P. 281–297.

Borovikov V.P. Populjarnoe vvedenie v sovremennyj analiz dannyh v sisteme STATISTICA / Gorjachaja linija-Telekom. M., 2013. — 288 s. API. URL: (data obrashhenija: 15.02.2018).

HeadHunter API. URL: (data obrashhenija: 15.02.2018).

Otkrytye dannye, API // Rabota v Rossii. Obshherossijskaja baza vakansij. URL: (data obrashhenija: 15.02.2018).

StatSoft. Jelektronnyj uchebnik po statistike. URL: (data obrashhenija: 27.04.2018).


  • There are currently no refbacks.

Abava  Absolutech Convergent 2022

ISSN: 2307-8162