Strojno učenje u uvjetima manje raspoloživosti podataka

Vedran Juričić

doi:10.36978/cte.7.2.3

Vol. 7 No. 2 (2023), Articles

Vol. 7 No. 2 (2023)

Machine learning in conditions of low data availability

Articles

https://doi.org/10.36978/cte.7.2.3

Published 2023-12-18

Vedran Juričić⁺⁻

Vedran Juričić

Faculty of Humanities and Social Sciences, University of Zagreb

PDF (Hrvatski)

Keywords

machine learning
transfer learning
classification
training set
system performance strojno učenje
učenje prijenosom
klasifikacija
skup za treniranje
neuronske mreže

How to Cite

Juričić, V. (2023). Machine learning in conditions of low data availability. Polytechnica, 7(2), 26-32. https://doi.org/10.36978/cte.7.2.3

Abstract

Machine learning is the subject of numerous scientific and professional research projects and is an important component of systems used in medicine, banking, computer security, communications and numerous other fields. It is one of the most active areas of research with constant progress and development of new algorithms and approaches as well as improvement of existing methods. The performance of the machine learning model is significantly affected by the dataset used for training, i.e. the quality of the data, the uniform distribution of values and the size of the set. This is a potential problem with machine learning methods that require pre-labelled data, as data acquisition can be extremely complex, expensive and time-consuming. In this case, the classical machine learning model will most likely not perform well. One approach to solve this problem is to apply transfer learning, where the model uses a dataset not only from the target domain but also from other, and ideally related domains. In the work, conditions with lower availability of datasets were simulated, under which the performance of three models was analyzed, one of which was based on a previously trained model. The process of creating training sets is described, and the results of analyzing the three models with different sized sets are presented.

https://doi.org/10.36978/cte.7.2.3

PDF (Hrvatski)

References

Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 1-42.

He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. U Proceedings of the IEEE conference on computer vision and pattern recognition (str. 770-778).

Hu, C., Shi, W. (2022). Impact of Scaled Image on Robustness of Deep Neural Networks. arXiv e-prints, arXiv-2209.

ImageNet. (2021). https://www.image-net.org/

Jain, A., Patel, H., Nagalapatti, L., Gupta, N., Mehta, S., Guttula, S., Munigala, V. (2020). Overview and importance of data quality for machine learning tasks. U Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (str. 3561-3562).

Kaggle (2023). Kaggle: your machine learning and data science community. (2023). https://www.kaggle.com/

Li, Y. (2017). Deep Reinforcement Learning: An Overview. arXiv e-prints, arXiv-1701.

Mamaev, A. (2021). Flowers recognition. Kaggle. Preuzeto s https://www.kaggle.com/datasets/alxmamaev/flowers-recognition

Moerland, T. M., Broekens, J., Jonker, C. M. (2020). A Framework for Reinforcement Learning and Planning. U ICAPS 2020: 30th International Conference on Automated Planning and Scheduling (str. 50-52). Association for the Advancement of Artificial Intelligence (AAAI).

Molnar, C. (2020). Interpretable machine learning. Preuzeto s https://www.lulu.com

Paullada, A., Raji, I. D., Bender, E. M., Denton, E., Hanna, A. (2021). Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, 2(11).

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. U Proceedings of the IEEE conference on computer vision and pattern recognition (str. 4510-4520).

Sarker, I. H. (2021a). Machine learning: algorithms, Real-World applications and research directions. SN Computer Science, 2(3).

Sarker, I. H. (2021b). Deep Learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2(6).

Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2), 227–244.

TensorFlow Hub. (2023). TensorFlow. Preuzeto s https://www.tensorflow.org/hub

Wang, Z., Dai, Z., Póczos, B., Carbonell, J. (2019). Characterizing and avoiding negative transfer. U Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (str. 11293-11302).

Weiss, K., Khoshgoftaar, T. M., Wang, D. (2016). A survey of transfer learning. Journal of Big data, 3(1), 1-40.

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Machine learning in conditions of low data availability

Keywords

How to Cite

Download Citation

Abstract

References