A Data Stream Dynamic Classification Method with Control of the Decision-making Area

  • Анастасия [Anastasiya] Олеговна [O.] Гурина [Gurina]
  • Лариса [Larisa] Анатольевна [A.] Гурина [Gurina]
Keywords: dynamic classification, concept drift, anomaly detection, autoencoder, ensemble, incremental online and machine learning, classification quality indicators

Abstract

A dynamic classification problem in which a change with time in the data classified is assumed is considered. Data streams, such as computer network data, sensor data, bank transactions, etc., are characterized by problems of data drift, the emergence of new classes, and anomalies. The existing data streams classification methods are analyzed. It is pointed out that there is no a single and effective classification method that would simultaneously take into account the problems of anomaly detection, drift, and model adaptation to new data. The importance of controlling the decision-making area of classifiers for obtaining a high-quality solution of the problem is noted. A dynamic classification method based on a scalable ensemble of autoencoders with a decision-making area controlled using the EDCAP criterion is proposed. The autoencoder properties are used to solve the problems of detecting drift, anomalies and new classes. The ensemble autoencoders were trained to recognize a single class. Based on the EDCAP criterion, the size of the recognition area of each autoencoder was controlled. The classification result is based on analyzing the responses of the ensemble's all autoencoders. When a new class of data is detected, the ensemble is scaled by adding a new autoencoder. When a drift is detected, only the corresponding autoencoders are retrained. The qualities of the proposed dynamic classifier and an incremental algorithm based on an adaptive Hoeffding tree are compared. The advantages of the proposed method are demonstrated on the example of a synthetic data stream that includes drift, a new class, and anomalies.

Information about authors

Анастасия [Anastasiya] Олеговна [O.] Гурина [Gurina]

Senior Researcher at the Center for Scientific Research and Advanced Development, JSC «Infotex», e-mail: asya.gurina001512@yandex.ru

Лариса [Larisa] Анатольевна [A.] Гурина [Gurina]

Researcher at the Research Center for Topogeodesic and Navigation Support, 27th Central Research Institute of the Ministry of Defense of the Russian Federation, e-mail: larisa.gurina0702@gmail.com

References

1. Kubát M. A Machine Learning-based Approach to Load Balancing in Computer Networks // Cybernetics and Systems. 1992. V. 23(3—4). Pp. 389—400.
2. Widmer G., Kubát M. Effective Learning in Dynamic Environments by Explicit Context Tracking // Proc. Machine Learning Conf. 1993. V. 667. Pp. 227—243.
3. Widmer G., Kubát M. Learning in the Presence of Concept Drift and Hidden Contexts // Machine Learning. 1996. V. 23(1). Pp. 69—101.
4. Machine Learning Monitoring. Pt. 5. Why You Should Care about Data and Concept Drift [Электрон. ресурс] https://evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift (дата обращения 14.06.2022).
5. Zang W. e. a. Comparative Study between Incremental and Ensemble Learning on Data Streams: Case Study // J. Big Data. 2014. V. 1(5). Pp. 1—5.
6. Žliobaitė I., Pechenizkiy M., Gama J. An Overview of Concept Drift Applications // Big Data Analysis: New Algorithms for a New Society. N.-Y.: Springer, 2016. V. 16. Pp. 91—114.
7. Webb G. I. e. a. Understanding Concept Drift [Электрон. ресурс] https://www.researchgate.net/publication/315765198_Understanding_Concept_Drift (дата обращения 14.06.2022).
8. Nakhaeizadeh G., Taylor C.C., Kunisch G. Dynamic Supervised Learning: Some Basic Issues фnd Application Aspects // Classification and Knowledge Organization. Studies in Classification, Data Analysis, and Knowledge Organization. Berlin: Springer, 1997. Pp. 123—135.
9. Tsymbal A. The Problem of Concept Drift: Definitions and Related Work. Dublin: Trinity College, 2004. V. 106(2). P. 7.
10. Жуков А.В., Сидоров Д.Н. Модификация алгоритма случайного леса для классификации нестационарных потоковых данных // Вестник Южно-Уральского гос. ун-та. Серия «Математическое моделирование и программирование». 2016. № 4(9). C. 86—95.
11. Гимаров В.А. Методы и автоматизированные системы динамической классификации сложных техногенных объектов: автореф. дис. … доктора техн. наук. М.: Изд-во Российского химико-технолог. ун-та им. Д.И. Менделеева, 2004.
12. Nakhaeizadeh G., Taylor C., Lanquillon C. Evaluating Usefulness for Dynamic Classification // KDD. 1998. V. 1. Pp. 87—93.
13. Gama J. e. a. Learning with Drift Detection Lecture Notes in Computer Science // Proc. Brazilian Symp. Artifical Intelligence. 2004. V. 3171. Pp. 286—295.
14. Baena-Garc M. e. a. Early Drift Detection Method // Proc. IV Intern. Workshop Knowledge Discovery from Data Streams. 2006. V. 6. Pp. 77—86.
15. Althabiti M., Abdullah M. CDDM: Concept Drift Detection Model for Data Stream // Intern. J. Interactive Mobile Technol. 2020. V. 14(10). Pp. 90—106.
16. Dong F. e. a. Fuzzy Competence Model Drift Detection for Data-Driven Decision Support Systems // Knowledge-based Syst. 2018. V. 143. Pp. 284—294.
17. Boracchi G. e. a. QuantTree: Histograms for Change Detection in Multivariate Data Streams // Proc. XXXV Intern. Conf. Machine Learning. 2018. V. 80. Pp. 639—648.
18. Kifer D., Ben-David S., Gehrke J. Detecting Change in Data Streams // Proc. XXX Intern. conf. Very Large Data Bases. 2004. V. 4. Pp. 180—191.
19. Sethi T.S., Kantardzic M. On the Reliable Detection of Concept Drift from Streaming Unlabeled Data // Expert Systems with Appl. 2017. V. 82. Pp. 77—99.
20. Jaworski M., Rutkowski L., Angelov P. Concept Drift Detection Using Autoencoders in Data Streams Processing // Proc. Intern. Conf. Artificial Intelligence and Soft Computing. 2020. V. 12415. Pp. 124—133.
21. Ditzler G. Incremental Learning of Concept Drift from Imbalanced Data. Glassboro: Rowan University, 2011.
22. Krawczyk B., Woźniak M. One-class Classifiers with Incremental Learning and Forgetting for Data Streams with Concept Drift // Soft Comput. 2015. V. 19. Pp. 3387—3400.
23. Bifet A., Gavaldà R. Learning from Time-changing Data with Adaptive Windowing // Proc. SIAM Intern. Conf. Data Mining. 2007. Pp. 443—448.
24. Sun Y., Shao H., Wang S. Efficient Ensemble Classification for Multi-label Data Streams with Concept Drift // Information. 2019. V. 10(5). Pp. 158—172.
25. Sarnovsky M., Kolarik M. Classification of the Drifting Data Streams Using Heterogeneous Diversified Dynamic Class-weighted Ensemble // PeerJ Computer Sci. 2021. V. 7. Pp. 459—490.
26. Ludwig S. Applying a Neural Network Ensemble to Intrusion Detection // J. Artificial Intelligence and Soft Computing Research. 2019. V. 9(3). Pp. 177—188.
27. Kolter J.Z., Maloof M. Dynamic Weighted Majority: an Ensemble Method for Drifting Concepts // J. Mach. Learn. Res. 2007. V. 8. Pp. 2755—2790.
28. Best Practices for Dealing with Concept Drift [Электрон. ресурс] https://opendatascience.com/best-practices-for-dealing-with-concept-drift/ (дата обращения 19.06.2022).
29. Montiel J. Learning from Evolving Data Streams // Proc. IXX Python in Sci. Conf. 2020. Pp. 70—77.
30. Bifet A., Gavaldà R. Adaptive Learning from Evolving Data Streams // Advances in Intelligent Data Analysis VIII Lecture Notes in Computer Sci. Berlin, Heidelberg: Springer, 2009. Pp. 249—260.
31. Gözüaçik Ö., Can F. Concept Learning Using One-class Classifiers for Implicit Drift Detection in Evolving Data Streams // Artifical Intelligence Rev. 2021. V. 54(3). Pp. 1—23.
32. Kemker R. e. a. Measuring Catastrophic Forgetting in Neural Networks // Proc. Conf. Artifical Intelligence. 2017. V. 32(1). Pp. 1—15.
33. Sharkey A. On Combining Artificial Neural Nets // Connection Sci. 1996. V. 8(3—4). Pp. 299—314.
34. Tennant M. e. a. Scalable Real-time Classification of Data Streams with Concept Drift // Future Generation Computer Syst. 2017. V. 75. Pp. 187—199.
35. Alam K.R., Siddique N., Adeli H. A Dynamic Ensemble Learning Algorithm for Neural Networks // Neural Computing and Appl. 2020. V. 32(1). Pp. 8675—8690.
36. Suryawanshi S., Goswami A., Patil P. Incremental Ensemble of One Class Classifier for Data Streams with Concept Drift Adaption // Proc. Intern. Advanced Computing Conf. Communications in Computer and Information Sci. 2021. V. 1467. Pp. 407—416.
37. Li Z. e. a. Incremental Learning Imbalanced Data Streams with Concept Drift: The Dynamic Updated Ensemble Algorithm // Knowledge-Based Syst. 2020. V. 195(4). P. 105694.
38. Gurina A.O., Eliseev V.L., Kolpinskiy S.V. Dynamic Classification Approach Using Scalable Ensemble of Autoencoders to Classify Data with Drift // J. Phys.: Conf. Ser. 2021. V. 2134(1). P. 012009.
39. Гурина А.О., Елисеев В.Л. Эмпирический критерий качества одноклассового классификатора // Информационные системы и технологии: Материалы XXVII Междунар. науч.-техн. конф. Нижний Новгород: Изд-во Нижегородского гос. техн. ун-та им. Р.Е. Алексеева, 2021. С. 648—657.
40. Gurina A., Eliseev V. Quality Criteria and Method of Synthesis for Adversarial Attack-resistant Classifiers // Machine Learning and Knowledge Extraction. 2022. V. 4(2). Pp. 519—541.
41. Гурина А.О., Гузев О.Ю., Елисеев В.Л. Обнаружение аномальных событий на хосте с использованием автокодировщика // Intern. J. Open Information Technol. 2020. Т. 8(8). С. 26—36.
42. Пат. № 2773010 РФ. Способ обнаружения аномалий в многомерных данных / А.О. Гурина, О.Ю. Гузев // Бюл. изобрет. 2022. № 16.
43. Skmultiflow.trees.HoeffdingAdaptiveTreeClassifier — Scikit-multiflow 0.5.3 Documentation [Электрон. ресурс] https://scikit-multiflow.readthedocs.io/en/stable/api/generated/skmultiflow.trees.HoeffdingAdaptiveTreeClassifier.html?highlight=HoeffdingAdaptiveTreeClassifier()#skmultiflow.trees.HoeffdingAdaptiveTreeClassifier (дата обращения 23.06.2022).
---
Для цитирования: Гурина А.О., Гурина Л.А. Метод динамической классификации потоков данных с контролем области принятия решений // Вестник МЭИ. 2023. № 1. С. 120—135. DOI: 10.24160/1993-6982-2022-5-120-135.
---
Работа выполнена при поддержке: РФФИ (научный проект № 20-37-90073)
#
1. Kubát M. A Machine Learning-based Approach to Load Balancing in Computer Networks. Cybernetics and Systems. 1992;23(3—4):389—400.
2. Widmer G., Kubát M. Effective Learning in Dynamic Environments by Explicit Context Tracking. Proc. Machine Learning Conf. 1993;667:227—243.
3. Widmer G., Kubát M. Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning. 1996;23(1):69—101.
4. Machine Learning Monitoring. Pt. 5. Why You Should Care about Data and Concept Drift [Elektron. Resurs] https://evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift (Data Obrashcheniya 14.06.2022).
5. Zang W. e. a. Comparative Study between Incremental and Ensemble Learning on Data Streams: Case Study. J. Big Data. 2014;1(5):1—5.
6. Žliobaitė I., Pechenizkiy M., Gama J. An Overview of Concept Drift Applications. Big Data Analysis: New Algorithms for a New Society. N.-Y.: Springer, 2016;16:91—114.
7. Webb G. I. e. a. Understanding Concept Drift [Elektron. Resurs] https://www.researchgate.net/publication/315765198_Understanding_Concept_Drift (Data Obrashcheniya 14.06.2022).
8. Nakhaeizadeh G., Taylor C.C., Kunisch G. Dynamic Supervised Learning: Some Basic Issues фnd Application Aspects. Classification and Knowledge Organization. Studies in Classification, Data Analysis, and Knowledge Organization. Berlin: Springer, 1997:123—135.
9. Tsymbal A. The Problem of Concept Drift: Definitions and Related Work. Dublin: Trinity College, 2004;106(2). P. 7.
10. Zhukov A.V., Sidorov D.N. Modifikatsiya Algoritma Sluchaynogo Lesa dlya Klassifikatsii Nestatsionarnykh Potokovykh Dannykh. Vestnik Yuzhno-Ural'skogo Gos. Un-ta. Seriya «Matematicheskoe Modelirovanie i Programmirovanie». 2016;4(9):86—95. (in Russian).
11. Gimarov V.A. Metody i Avtomatizirovannye Sistemy Dinamicheskoy Klassifikatsii Slozhnykh Tekhnogennykh Ob'ektov: Avtoref. Dis. … Doktora Tekhn. Nauk. M.: Izd-vo Rossiyskogo Khimiko-tekhnolog. Un-ta im. D.I. Mendeleeva, 2004. (in Russian).
12. Nakhaeizadeh G., Taylor C., Lanquillon C. Evaluating Usefulness for Dynamic Classification. KDD. 1998;1:87—93.
13. Gama J. e. a. Learning with Drift Detection Lecture Notes in Computer Science. Proc. Brazilian Symp. Artifical Intelligence. 2004;3171:286—295.
14. Baena-Garc M. e. a. Early Drift Detection Method. Proc. IV Intern. Workshop Knowledge Discovery from Data Streams. 2006;6:77—86.
15. Althabiti M., Abdullah M. CDDM: Concept Drift Detection Model for Data Stream. Intern. J. Interactive Mobile Technol. 2020;14(10):90—106.
16. Dong F. e. a. Fuzzy Competence Model Drift Detection for Data-Driven Decision Support Systems. Knowledge-based Syst. 2018;143:284—294.
17. Boracchi G. e. a. QuantTree: Histograms for Change Detection in Multivariate Data Streams. Proc. XXXV Intern. Conf. Machine Learning. 2018;80:639—648.
18. Kifer D., Ben-David S., Gehrke J. Detecting Change in Data Streams. Proc. XXX Intern. conf. Very Large Data Bases. 2004;4:180—191.
19. Sethi T.S., Kantardzic M. On the Reliable Detection of Concept Drift from Streaming Unlabeled Data. Expert Systems with Appl. 2017;82:77—99.
20. Jaworski M., Rutkowski L., Angelov P. Concept Drift Detection Using Autoencoders in Data Streams Processing. Proc. Intern. Conf. Artificial Intelligence and Soft Computing. 2020;12415:124—133.
21. Ditzler G. Incremental Learning of Concept Drift from Imbalanced Data. Glassboro: Rowan University, 2011.
22. Krawczyk B., Woźniak M. One-class Classifiers with Incremental Learning and Forgetting for Data Streams with Concept Drift. Soft Comput. 2015;19:3387—3400.
23. Bifet A., Gavaldà R. Learning from Time-changing Data with Adaptive Windowing. Proc. SIAM Intern. Conf. Data Mining. 2007:443—448.
24. Sun Y., Shao H., Wang S. Efficient Ensemble Classification for Multi-label Data Streams with Concept Drift. Information. 2019;10(5):158—172.
25. Sarnovsky M., Kolarik M. Classification of the Drifting Data Streams Using Heterogeneous Diversified Dynamic Class-weighted Ensemble. PeerJ Computer Sci. 2021;7:459—490.
26. Ludwig S. Applying a Neural Network Ensemble to Intrusion Detection. J. Artificial Intelligence and Soft Computing Research. 2019;9(3):177—188.
27. Kolter J.Z., Maloof M. Dynamic Weighted Majority: an Ensemble Method for Drifting Concepts. J. Mach. Learn. Res. 2007;8:2755—2790.
28. Best Practices for Dealing with Concept Drift [Elektron. Resurs] https://opendatascience.com/best-practices-for-dealing-with-concept-drift/ (Data Obrashcheniya 19.06.2022).
29. Montiel J. Learning from Evolving Data Streams. Proc. IXX Python in Sci. Conf. 2020:70—77.
30. Bifet A., Gavaldà R. Adaptive Learning from Evolving Data Streams. Advances in Intelligent Data Analysis VIII Lecture Notes in Computer Sci. Berlin, Heidelberg: Springer, 2009:249—260.
31. Gözüaçik Ö., Can F. Concept Learning Using One-class Classifiers for Implicit Drift Detection in Evolving Data Streams. Artifical Intelligence Rev;2021;54(3):1—23.
32. Kemker R. e. a. Measuring Catastrophic Forgetting in Neural Networks. Proc. Conf. Artifical Intelligence. 2017;32(1):1—15.
33. Sharkey A. On Combining Artificial Neural Nets. Connection Sci. 1996;8(3—4):299—314.
34. Tennant M. e. a. Scalable Real-time Classification of Data Streams with Concept Drift. Future Generation Computer Syst. 2017;75:187—199.
35. Alam K.R., Siddique N., Adeli H. A Dynamic Ensemble Learning Algorithm for Neural Networks. Neural Computing and Appl. 2020;32(1):8675—8690.
36. Suryawanshi S., Goswami A., Patil P. Incremental Ensemble of One Class Classifier for Data Streams with Concept Drift Adaption. Proc. Intern. Advanced Computing Conf. Communications in Computer and Information Sci. 2021;1467:407—416.
37. Li Z. e. a. Incremental Learning Imbalanced Data Streams with Concept Drift: The Dynamic Updated Ensemble Algorithm. Knowledge-Based Syst. 2020;195(4):105694.
38. Gurina A.O., Eliseev V.L., Kolpinskiy S.V. Dynamic Classification Approach Using Scalable Ensemble of Autoencoders to Classify Data with Drift. J. Phys.: Conf. Ser. 2021;2134(1):012009.
39. Gurina A.O., Eliseev V.L. Empiricheskiy Kriteriy Kachestva Odnoklassovogo Klassifikatora. Informatsionnye Sistemy i Tekhnologii: Materialy XXVII Mezhdunar. Nauch.-tekhn. Konf. Nizhniy Novgorod: Izd-vo Nizhegorodskogo Gos. Tekhn. Un-ta im. R.E. Alekseeva, 2021:648—657. (in Russian).
40. Gurina A., Eliseev V. Quality Criteria and Method of Synthesis for Adversarial Attack-resistant Classifiers. Machine Learning and Knowledge Extraction. 2022;4(2):519—541.
41. Gurina A.O., Guzev O.Yu., Eliseev V.L. Obnaruzhenie Anomal'nykh Sobytiy na Khoste s Ispol'zovaniem Avtokodirovshchika. Intern. J. Open Information Technol. 2020;8(8):26—36. (in Russian).
42. Pat. № 2773010 RF. Sposob Obnaruzheniya Anomaliy v Mnogomernykh Dannykh. A.O. Gurina, O.Yu. Guzev. Byul. Izobret. 2022;16. (in Russian).
43. Skmultiflow.trees.HoeffdingAdaptiveTreeClassifier — Scikit-multiflow 0.5.3 Documentation [Elektron. Resurs] https://scikit-multiflow.readthedocs.io/en/stable/api/generated/skmultiflow.trees.HoeffdingAdaptiveTreeClassifier.html?highlight=HoeffdingAdaptiveTreeClassifier()#skmultiflow.trees.HoeffdingAdaptiveTreeClassifier (Data Obrashcheniya 23.06.2022).
---
For citation: Gurina A.O., Gurina L.A. A Data Stream Dynamic Classification Method with Control of the Decision-making Area. Bulletin of MPEI. 2023;1:120—135. (in Russian). DOI: 10.24160/1993-6982-2023-1-120-135.
---
The work is executed at support: RFBR (Scientific Project No. 20-37-90073)
Published
2022-10-24
Section
system analisSystem Analysis, Management and Information Processing (2.3.1)