Semantic Data Fragmentation for Identification of Covariant Conceptual Drift in Machine Learning Models

Igor Kashirin

Abstract


Such types of data drift in ML models as actual positive and negative drifts, as well as varieties of fragmentary drifts in different directions, are considered. An overview of the current state of the problem is given, which highlights the methods of sliding window, trigger ensemble of models, covariant shift, personalization drift correction, season correction, online learning method, low precision sampling, monitoring and clipping features. A modernized theory of binary relations was used to design the new method. As an example, the subject area "communication services" is considered, for which a special architecture of the ontological knowledge model is designed. The new drift correction method is the basis of a new technology for designing classification, regression and forecasting models for specifically formalized subject areas. When choosing the scope of the "sliding window", the structure of the knowledge model is taken into account first of all. The input features of the training data set are grouped according to the structure of the concepts and relationships of the knowledge base. The resulting data drift compensation technology makes it possible to improve the ROC AUC accuracy characteristics of ML models from 0.74 to 0.80, which makes it possible to evaluate the technology as an effective means of automatic correction. The new drift correction method is the basis of a new technology for designing classification, regression and forecasting models for specifically formalized subject areas. When choosing the scope of the "sliding window", the structure of the knowledge model is taken into account first of all. The input features of the training data set are grouped according to the structure of the concepts and relationships of the knowledge base.


Full Text:

PDF

References


M. Asghari, D. Sierra-Sosa, M. Telahun, A. Kumar, A.S. Elmaghraby, Aggregate density-based concept drift identification for dynamic sensor data models, Neural Comput. Appl. 33 (8) (2021) 3267–3279.

M. Tsiakmaki, G. Kostopoulos, S. Kotsiantis, and O. Ragos, Transfer learning from deep neural networks for predicting student performance, Applied Sciences, vol. 10, no. 6 ( 2020) pp. 2145–2218.

Ch. Schröerab, F.Kruseb, J.M.Gómezb, A Systematic Literature Review on Applying CRISP-DM Process Model, Procedia Computer Science 181 (2021) 526–534.

F. Bayram, B.S.Ahmed, A.Kassler. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowledge-Based Systems 245 (2022) 108632

J. P. Barddal, H. M. Gomes, F. Enembreck. Sfnclassifier: A scale-free social network method to handle concept drift, in: Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14, ACM, New York, NY, 41 USA, 2014, pp. 786–791. doi:10.1145/2554850.2554855.

J. P. Barddal, H. M. Gomes, F. Enembreck. Sncstream: A social networkbased data stream clustering algorithm, in: Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC ’15, ACM, New York, NY, USA, 2015, pp. 935–940. doi:10.1145/2695664.2695674.

R. F. de Mello, Y. Vaz, C. H. G. Ferreira, and A. Bifet. On learning guarantees to unsupervised concept drift detection on data streams. Expert Syst. Appl., 117:90–102, 2019.

C. Göpfert, L. Pfannschmidt, J. P. Göpfert, and B. Hammer. Interpretation of linear classifiers by means of feature relevance bounds. Neurocomputing, 298:69 – 79, 2018.

I. Goldenberg and G. I. Webb. Survey of distance measures for quantifying concept drift and shift in numeric data. Knowl. Inf. Syst., 60(2):591–615, 2019.

J.Gama, I.Žliobaitė, A.Bifet, M.Pechenizkiy, A.Bouchachia A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37, 2014.

M. Khannouz and T. Glatard, “Mondrian Forest for Data Stream Classification Under Memory Constraints,” 2022.

M. Khannouz, B. Li, and T. Glatard, “OrpailleCC: a Library for Data Stream Analysis on Embedded Systems,” The Journal of Open Source Software, vol. 4, p. 1485, 2019.

J. L. Lobo, J. Del Ser, and E. Osaba, “Lightweight Alternatives for Hyper-parameter Tuning in Drifting Data Streams,” in 2021 International Conference on Data Mining Workshops (ICDMW), 2021, pp. 304–311.

A.Dehghani, O. Sarbishei, T. Glatard, and E. Shihab, “A Quantitative Comparison of Overlapping and Non-Overlapping Sliding Windows for Human Activity Recognition Using Inertial Sensors,” Sensors, vol. 19, no. 22, 2019.

M. Khannouz, B. Li, and T. Glatard, “OrpailleCC: a Library for Data Stream Analysis on Embedded Systems,” The Journal of Open Source Software, vol. 4, p. 1485, 07 2019.

H.M.Gomes, J.Read & A.Bifet. Streaming random patches for evolving data stream classification. In 2019 IEEE International Conference 1232 on Data Mining (ICDM) , 2019, pp. 240–249.

Du, H., Zhang, Y., Gang, K., Zhang, L., & Chen, Y.-C. (2021). Online1197ensemble learning algorithm for imbalanced data stream. Applied Soft 1198 Computing,107 , 107378.

Della Valle, E., Ziffer, G., Bernardo, A., Cerqueira, V., & Bifet, A. (2022).1183Towards Time-Evolving Analytics: Online Learning for Time-Dependent1184Evolving Data Streams. Data Science, p. 16.

T. M. Ali, A. Nawaz, A. Rehman et al., A sequential machine learning-cum-attentions mechanism for effective segmentation of brain tumor, Frontiers in Oncology, vol. 12, pp. 1–10, 2022.

F. S. Tsai, S. Cabrilo, H. H. Chou, F. Hu, and A. D. Tang, “Open innovation and SME performance: the roles of reverse knowledge sharing and stakeholder relationships,” Journal of Business Research, vol. 148, pp. 433–443, 2022.

K.Xia, Kai-Zhan Lee, Y. Bengio, E.Bareinboim. The causal-neural connection: Expressiveness, learnability, and inference. Advances in Neural Information Processing Systems, 34:10823–10836, 2021.

Wang, T. D.; Parsia, B.; Hendler, J. (2006). "A Survey of the Web Ontology Landscape". The Semantic Web - ISWC 2006. Lecture Notes in Computer Science. Vol. 4273. p. 682.

I.Yu. Kashirin, I.Yu.Filatov Formalized Description Of Intuitive Perception Of Spatial Situations. 2019 8th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 2019, pp. 1-4.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162