Sažetak
Istraživanje prikazuje primjenu metoda umjetne inteligencije u veterinarskoj oftalmologiji kroz kombinirani pristup računalnog vida i jezične analize. Korišten je prilagođeni U-Net model za detekciju i segmentaciju simptoma očnih bolesti pasa, uključujući zamućenje oka, crvenilo bjeloočnice, pretjerano suzenje i obojeno očno ispupčenje. Dobiveni rezultati poslužili su kao ulazni podaci za velike jezične modele (GPT-4o, Mistral 7B, Gemini2, Llama-3 i Claude 4) s ciljem interpretacije simptoma i davanja preliminarne dijagnoze. Evaluacija je provedena pomoću lingvističkih i semantičkih metrika (MPNet, MiniLM, BERTScore, CLIPScore, BLEU, METEOR, ROUGE i SPICE). Rezultati pokazuju da integracija U-Net segmentacije i analitičkih sposobnosti velikih jezičnih modela (engl. Large Language Model, LLM) omogućuje učinkovitu preliminarnu dijagnozu očnih bolesti pasa. Model s okosnicom ResNet34 pokazao je najveću točnost u prepoznavanju crvenila bjeloočnice, dok je GPT-4o bio najuspješniji u interpretaciji simptoma i postavljanju dijagnoze. Ovaj pristup doprinosi razvoju sustava koji mogu povećati točnost i učinkovitost veterinarske dijagnostike.Reference
Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). SPICE: Semantic Propositional Image Caption Evaluation. https://doi.org/10.48550/ARXIV.1607.08822
Anderson, P., Fernando, B., Johnson, M., & Gould, S. Boevé, M., & Stades, F. (1985). Glaucoma in dogs and cats. Review and retrospective evaluation of 421 patients. I. Pathobiological background, classification and breed predisposition. Tijdschrift Voor Diergeneeskunde, 110(6), 219—227.
Azad, R., Aghdam, E. K., Rauland, A., Jia, Y., Avval, A. H., Bozorgpour, A., Karimijafarbigloo, S., Cohen, J. P., Adeli, E., & Merhof, D. (2022). Medical Image Segmentation Review: The success of U-Net. https://doi.org/10.48550/ARXIV.2211.14830
Bucur, A.-M. (2023). Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media. https://doi.org/10.48550/ARXIV.2307.02313
Buric, M., Grozdanic, S., & Ivasic-Kos, M. (2024). Diagnosis of ophthalmologic diseases in canines based on images using neural networks for image segmentation. Heliyon, e38287. https://doi.org/10.1016/j.heliyon.2024.e38287
Burić, M., Ivašić-Kos, M., & Grozdanić, S. (n.d.). DogEyeSeg4: Dog Eye Segmentation 4-Class Ophthalmic Disease Dataset (No. urn:nbn:hr:195:405214) [Dataset]. Faculty of Informatics and Digital Technologies, University of Rijeka. Retrieved August 22, 2024, from https://urn.nsk.hr/urn:nbn:hr:195:405214
Burić, M., Paulin, G., & Ivašić-Kos, M. (n.d.). Object Detection Using Synthesized Data. In Proceedings of the ICT Innovations Conference, Ohrid, North Macedonia, 15.
Dandekar, A., Zen, R. A. M., & Bressan, S. (2018). A Comparative Study of Synthetic Dataset Generation Techniques. In S. Hartmann, H. Ma, A. Hameurlain, G. Pernul, & R. R. Wagner (Eds.), Database and Expert Systems Applications (pp. 387–395). Springer International Publishing. https://doi.org/10.1007/978-3-319-98812-2_35
Deane, J., Kearney, S., Kim, K. I., & Cosker, D. (2021). DynaDog+T: A Parametric Animal Model for Synthetic Canine Image Generation (No. arXiv:2107.07330). arXiv. http://arxiv.org/abs/2107.07330
Denkowski, M., & Lavie, A. (2014). Meteor Universal: Language Specific Translation Evaluation for Any Target Language. Proceedings of the Ninth Workshop on Statistical Machine Translation, 376–380. https://doi.org/10.3115/v1/W14-3348
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805
Ganesan, K. (2018). ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. https://doi.org/10.48550/ARXIV.1803.01937
Gemini Team, Reid, M., Savinov, N., Teplyashin, D., Dmitry, Lepikhin, Lillicrap, T., Alayrac, J., Soricut, R., Lazaridou, A., Firat, O., Schrittwieser, J., Antonoglou, I., Anil, R., Borgeaud, S., Dai, A., Millican, K., Dyer, E., Glaese, M., … Vinyals, O. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. https://doi.org/10.48550/ARXIV.2403.05530
González-Chávez, O., Ruiz, G., Moctezuma, D., & Ramirez-delReal, T. A. (2023). Are metrics measuring what they should? An evaluation of image captioning task metrics (No. arXiv:2207.01733). arXiv. http://arxiv.org/abs/2207.01733
Grozdanić, S., Đukić, S., Luzhetskiy, S., Milčić-Matić, N., & Lazić, T. (2020). Atlas bolesti oka pasa i mačaka. Oculus Vet.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (No. arXiv:1512.03385; Version 1). arXiv. http://arxiv.org/abs/1512.03385
He, Z., Bhasuran, B., Jin, Q., Tian, S., Hanna, K., Shavor, C., Arguello, L. G., Murray, P., & Lu, Z. (2024). Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study. Journal of Medical Internet Research, 26, e56655. https://doi.org/10.2196/56655
Hessel, J., Holtzman, A., Forbes, M., Le Bras, R., & Choi, Y. (2021). CLIPScore: A Reference-free Evaluation Metric for Image Captioning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7514–7528. https://doi.org/10.18653/v1/2021.emnlp-main.595
Hrga, I., & Ivasic-Kos, M. (2024). Measuring the Sensitivity of Image Captioning Metrics to Caption Perturbations. In X.-S. Yang, R. S. Sherratt, N. Dey, & A. Joshi (Eds.), Proceedings of Eighth International Congress on Information and Communication Technology (Vol. 696, pp. 1053–1063). Springer Nature Singapore. https://doi.org/10.1007/978-981-99-3236-8_85
Huang, K.-W., Yang, Y.-R., Huang, Z.-H., Liu, Y.-Y., & Lee, S.-H. (2023). Retinal Vascular Image Segmentation Using Improved UNet Based on Residual Module. Bioengineering, 10(6), 722. https://doi.org/10.3390/bioengineering10060722
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. de las, Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., & Sayed, W. E. (2023). Mistral 7B. https://doi.org/10.48550/ARXIV.2310.06825
Johnsen, D. A. J., Maggs, D. J., & Kass, P. H. (2006). Evaluation of risk factors for development of secondary glaucoma in dogs: 156 cases (1999–2004). Journal of the American Veterinary Medical Association, 229(8), 1270–1274. https://doi.org/10.2460/javma.229.8.1270
Katic, T., Pavlovski, M., Sekulic, D., & Vucetic, S. (2021). Learning Semi-Structured Representations of Radiology Reports (No. arXiv:2112.10746). arXiv. http://arxiv.org/abs/2112.10746
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2001). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, 311. https://doi.org/10.3115/1073083.1073135
PBC, A. (2024). Claude LLM (Version 3) [Computer software]. Anthropic PBC.
Petit, O., Thome, N., Rambour, C., & Soler, L. (2021). U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. https://doi.org/10.48550/ARXIV.2103.06104
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation (No. arXiv:1505.04597). arXiv. http://arxiv.org/abs/1505.04597
Sasazawa, Y., Yokote, K., Imaichi, O., & Sogawa, Y. (2023). Text Retrieval with Multi-Stage Re-Ranking Models. https://doi.org/10.48550/ARXIV.2311.07994
Savage, C. H., Park, H., Kwak, K., Smith, A. D., Rothenberg, S. A., Parekh, V. S., Doo, F. X., & Yi, P. H. (2024). General-Purpose Large Language Models Versus a Domain-Specific Natural Language Processing Tool for Label Extraction From Chest Radiograph Reports. American Journal of Roentgenology, AJR.23.30573. https://doi.org/10.2214/AJR.23.30573
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition (No. arXiv:1409.1556). arXiv. http://arxiv.org/abs/1409.1556
Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2020). MPNet: Masked and Permuted Pre-training for Language Understanding. https://doi.org/10.48550/ARXIV.2004.09297
Sreng, S., Maneerat, N., Hamamoto, K., & Win, K. Y. (2020). Deep Learning for Optic Disc Segmentation and Glaucoma Diagnosis on Retinal Images. Applied Sciences, 10(14), 4916. https://doi.org/10.3390/app10144916
Strom, A. R., Hässig, M., Iburg, T. M., & Spiess, B. M. (2011). Epidemiology of canine glaucoma presented to University of Zurich from 1995 to 2009. Part 1: Congenital and primary glaucoma (4 and 123 cases). Veterinary Ophthalmology, 14(2), 121–126. https://doi.org/10.1111/j.1463-5224.2010.00855.x
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision (No. arXiv:1512.00567; Version 3). arXiv. http://arxiv.org/abs/1512.00567
Tan, M., & Le, Q. V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (No. arXiv:1905.11946). arXiv. http://arxiv.org/abs/1905.11946
Thamizharasan, A., Murugan, M. S., & Parthiban, S. (2016). Surgical Management of Cherry Eye in a Dog. Intas Polivet, 17(II), 420-421.
Thirunavukarasu, A. J., Mahmood, S., Malem, A., Foster, W. P., Sanghera, R., Hassan, R., Zhou, S., Wong, S. W., Wong, Y. L., Chong, Y. J., Shakeel, A., Chang, Y.-H., Tan, B. K. J., Jain, N., Tan, T. F., Rauz, S., Ting, D. S. W., & Ting, D. S. J. (2024). Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study. PLOS Digital Health, 3(4), e0000341. https://doi.org/10.1371/journal.pdig.0000341
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., … Scialom, T. (2023). Llama-3 2: Open Foundation and Fine-Tuned Chat Models (No. arXiv:2307.09288). arXiv. http://arxiv.org/abs/2307.09288
Tripathi, R. M., Kashyap, D. K., & Giri, D. K. (2014). Surgical Management of Cherry Eye in a Dog. Intas Polivet, 15(1), 131-132.
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. https://doi.org/10.48550/ARXIV.2002.10957
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. https://doi.org/10.48550/ARXIV.1904.09675

Ovaj rad licenciran je pod Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2025 Array
