Sažetak
Cilj rada je pokušati u kontekstu testiranja modela ChatGPT na studentskim zadacima iz područja statistike prepoznati slučajeve u kojima veliki jezični modeli pokazuju slično ponašanje ljudskom razmišljanju, a u kojima „razmišljaju“ na drugačiji način te identificirati prilike, rizike i ograničenja kod primjene umjetne inteligencije u nastavi. Analizirat će se mogućnosti i ograničenja velikih jezičnih modela te načini na koje se u ovom brzo rastućem području nastoji nadići postojeće pristranosti i nedostatke. U radu će se testirati chatbot na temelju velikog jezičnog modela GPT-4 ChatGPT u znanju uvodnog statističkog kolegija koji se predaje na drugoj godini studija studentima informatičkog studija. Testiranje je provedeno ručnim unošenjem 170 kviz pitanja iz područja statistike u preglednik ChatGPT-a. Pitanja su podijeljena u tri kategorije: teorijska pitanja u kojim se reproducira znanje, teorijska pitanja u kojim se testira razumijevanje područja i zadaci. Kviz pitanja su postavljena na hrvatskom jeziku i analizirani su odgovori dani na hrvatskom jeziku. Uspoređena je točnost rješavanja kviz pitanja za studente i ChatGPT po kategorijama pitanja korištenjem Wilcoxonovog testa sume rangova. Rezultati pokazuju da ChatGPT daje statistički bolje rezultate od studenata u kategorijama teorijskih pitanja u kojima se traži reprodukcija znanja i razumijevanje, dok su kod rješavanja zadataka studenti uspješniji, ali razlika u točnosti nije statistički značajna (p<0,01).Reference
Alfertshofer, M., Hoch, C. C., Funk, P. F., Hollmann, K., Wollenberg, B., Knoedler, S., Knoedler, L. (2023). Saling the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations, Ann Biomed Eng (2023), Aug 8, doi: https://doi.org/10.1007/s10439-023-03338-3
Bengio, J., Ducharme, R., Vincet, P., Jauvin, C. (1997). A Neural probabilistic language model. Journal of Machine Learning Research, 3, 1137-1155.
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding, U Burstein, J. Doran, C., Solorio, T. (ur.), Proceedings of NAACL-HLT 2019 (str. 4171–4186), Association for Computational Linguistics. doi: https://doi.org/10.18653/v1/N19-1423
Lin, S. Hilton, J., Evans, O. (2021). TruthfulQA: Measuring how models mimic human falsehoods. Preuzeto s https://arxiv.org/abs/2109.07958
Ljubešić, N., Lauc, D. (2021). BERTić – The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian. Preuzeto s https://arxiv.org/abs/2104.09243v1
Martin, L., Muller, B., Suarez, P. J. O., Dupont, Y., Romary, L., Villemonte de la Clergerie, E., Seddah, D., Sagot, B. (2019). CamemBERT: a tasty French language model. Preuzeto s https://arxiv.org/abs/1911.03894
Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. Preuzeto s https://arxiv.org/abs/1301.3781
Ordak, M. (2023). ChatGPT's skills in statistical analysis using the example of allergology: Do we have reason for concern?. Healthcare 2023, 11(18), 2554, doi: https://doi.org/10.3390/healthcare11182554
Ouyang, L., Wu, J., Jing, X., Almeida, D., Wainwright, C. L., Mishin, P., Zheng, C., Agarwal, S., Slama, K., Ray, A., Schulman, J.,
Hilton, J., Kelton, F., Miler, L., Simens, M., Askell, A., Welinder, P., Cristiano, P. Leike, J., Lowe, R. (2022). Training language models to follow instructions with human feedback. Preuzeto s https://arxiv.org/abs/2203.02155
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L. (2018). Deep contextual word representations. Preuzeto s https://arxiv.org/abs/1802.05365
Radford, A., Narasimhan, K. (2018). Improving language understanding by generative pre-training. Preuzeto s https://api.semanticscholar.org/CorpusID:49313245
Roivainen, E. (2023). AI's IQ. Scinetific American Magazine, 329 (1), 7.
Salton, G., McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill, Inc.
Sareen, K. (2023). Assesing the ethical capabilities of ChatGPT in healthcare: A study on its proficiency in situational judgement test. Innovations in Education and Teaching International. doi: https://doi.org/10.1080/14703297.2023.2258114
Schuman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. Preuzeto s https://arxiv.org/abs/1707.06347
Sheng, E., Cheng, K., Natarajan, P., Peng, N. (2019). The woman worked as a babysitter: on biases in language generation. Preuzeto s https://arxiv.org/abs/1909.01326
Šebalj, D. (2022). Analiza tekstnih dokumenata na hrvatskom jeziku korištenjem metoda dubokog učenja, diplomski rad, Sveučilište u Zagrebu Fakultet organizacije i informatike. Preuzeto s https://gpml.foi.hr/laboratory/data/uploads/domagoj-sebalj-diplomski-rad.pdf
Tamkin, A., Brundage, M., Clark, J., Ganguli, D. (2021). Understanding the capabilities, limitations, and social impact of large language models. Preuzeto s na https://arxiv.org/abs/2102.02503
Ulčar, M., Robnik-Šikonja, M. (2020). FinEst and CroSloEngual BERT: less is more in multilingual models. Preuzeto s https://arxiv.org/abs/2006.07890
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I. (2017). Attention is all you need. Preuzeto s https://arxiv.org/abs/1706.03762
Vazquez-Cano, E., Jose M. Ramirez-Hurtado, Jose M. Saez-Lopez, Eloy Lopez-Meneses (2023). ChatGPT: The britest student in the class. Thinking Skills and Creativity, 49 (2023), 11380. doi: https://doi.org/10.1016/j.tsc.2023.101380
Virtanen, A., Kanerva, J., Ilo, R., Luoma, J., Luotolahti, J., Salakoski, T., Ginter, F., Pyysalo, S. (2019). Multilingual is not enough: BERT for Finnish. Preuzeto s https://arxiv.org/abs/1912.07076
Walsh, T. (2022). Everyone's having a field day with ChatGPT – but nobody knows how it actually works, The Conversation. Preuzeto s https://theconversation.com/everyones-having-a-field-day-with-chatgpt-but-nobody-knows-how-it-actually-works-196378
Wang, A., Russakovsky, O. (2021). Directional bias amplification. Preuzeto s https://arxiv.org/abs/2304.04874
Weidenger, L., Mellor, J. Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M. G., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Henricks, L. A., Isaac, W., Legassick, S., Irving, G., Gabriel, I. (2021). Ethical and social risks of harm from language models. Preuzeto s https://arxiv.org/abs/2112.04359
Ovaj rad licenciran je pod Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2023 Array