Abstract
This study presents the application of artificial intelligence methods in veterinary ophthalmology through the combination of image segmentation and language understanding models. A customized U-Net model was used for detecting and segmenting canine ocular symptoms, including ocular opacity, sclera redness, excessive tearing, and colored ocular protrusion. The obtained results were used as input for large language models (GPT-4o, Mistral 7B, Gemini2, Llama-3, and Claude 4) to interpret symptoms and provide preliminary diagnoses. The evaluation was performed using linguistic and semantic metrics (MPNet, MiniLM, BERTScore, CLIPScore, BLEU, METEOR, ROUGE, and SPICE). The results show that integrating U-Net segmentation with LLM analytical capabilities enables effective preliminary diagnosis of canine eye diseases. The ResNet34-based model achieved the highest accuracy in identifying sclera redness, while GPT-4o performed best in interpreting symptoms and suggesting diagnoses. This approach contributes to the development of intelligent systems that can improve the accuracy and efficiency of veterinary diagnostics.References
Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). SPICE: Semantic Propositional Image Caption Evaluation. https://doi.org/10.48550/ARXIV.1607.08822
Anderson, P., Fernando, B., Johnson, M., & Gould, S. Boevé, M., & Stades, F. (1985). Glaucoma in dogs and cats. Review and retrospective evaluation of 421 patients. I. Pathobiological background, classification and breed predisposition. Tijdschrift Voor Diergeneeskunde, 110(6), 219—227.
Azad, R., Aghdam, E. K., Rauland, A., Jia, Y., Avval, A. H., Bozorgpour, A., Karimijafarbigloo, S., Cohen, J. P., Adeli, E., & Merhof, D. (2022). Medical Image Segmentation Review: The success of U-Net. https://doi.org/10.48550/ARXIV.2211.14830
Bucur, A.-M. (2023). Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media. https://doi.org/10.48550/ARXIV.2307.02313
Buric, M., Grozdanic, S., & Ivasic-Kos, M. (2024). Diagnosis of ophthalmologic diseases in canines based on images using neural networks for image segmentation. Heliyon, e38287. https://doi.org/10.1016/j.heliyon.2024.e38287
Burić, M., Ivašić-Kos, M., & Grozdanić, S. (n.d.). DogEyeSeg4: Dog Eye Segmentation 4-Class Ophthalmic Disease Dataset (No. urn:nbn:hr:195:405214) [Dataset]. Faculty of Informatics and Digital Technologies, University of Rijeka. Retrieved August 22, 2024, from https://urn.nsk.hr/urn:nbn:hr:195:405214
Burić, M., Paulin, G., & Ivašić-Kos, M. (n.d.). Object Detection Using Synthesized Data. In Proceedings of the ICT Innovations Conference, Ohrid, North Macedonia, 15.
Dandekar, A., Zen, R. A. M., & Bressan, S. (2018). A Comparative Study of Synthetic Dataset Generation Techniques. In S. Hartmann, H. Ma, A. Hameurlain, G. Pernul, & R. R. Wagner (Eds.), Database and Expert Systems Applications (pp. 387–395). Springer International Publishing. https://doi.org/10.1007/978-3-319-98812-2_35
Deane, J., Kearney, S., Kim, K. I., & Cosker, D. (2021). DynaDog+T: A Parametric Animal Model for Synthetic Canine Image Generation (No. arXiv:2107.07330). arXiv. http://arxiv.org/abs/2107.07330
Denkowski, M., & Lavie, A. (2014). Meteor Universal: Language Specific Translation Evaluation for Any Target Language. Proceedings of the Ninth Workshop on Statistical Machine Translation, 376–380. https://doi.org/10.3115/v1/W14-3348
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805
Ganesan, K. (2018). ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. https://doi.org/10.48550/ARXIV.1803.01937
Gemini Team, Reid, M., Savinov, N., Teplyashin, D., Dmitry, Lepikhin, Lillicrap, T., Alayrac, J., Soricut, R., Lazaridou, A., Firat, O., Schrittwieser, J., Antonoglou, I., Anil, R., Borgeaud, S., Dai, A., Millican, K., Dyer, E., Glaese, M., … Vinyals, O. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. https://doi.org/10.48550/ARXIV.2403.05530
González-Chávez, O., Ruiz, G., Moctezuma, D., & Ramirez-delReal, T. A. (2023). Are metrics measuring what they should? An evaluation of image captioning task metrics (No. arXiv:2207.01733). arXiv. http://arxiv.org/abs/2207.01733
Grozdanić, S., Đukić, S., Luzhetskiy, S., Milčić-Matić, N., & Lazić, T. (2020). Atlas bolesti oka pasa i mačaka. Oculus Vet.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (No. arXiv:1512.03385; Version 1). arXiv. http://arxiv.org/abs/1512.03385
He, Z., Bhasuran, B., Jin, Q., Tian, S., Hanna, K., Shavor, C., Arguello, L. G., Murray, P., & Lu, Z. (2024). Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study. Journal of Medical Internet Research, 26, e56655. https://doi.org/10.2196/56655
Hessel, J., Holtzman, A., Forbes, M., Le Bras, R., & Choi, Y. (2021). CLIPScore: A Reference-free Evaluation Metric for Image Captioning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7514–7528. https://doi.org/10.18653/v1/2021.emnlp-main.595
Hrga, I., & Ivasic-Kos, M. (2024). Measuring the Sensitivity of Image Captioning Metrics to Caption Perturbations. In X.-S. Yang, R. S. Sherratt, N. Dey, & A. Joshi (Eds.), Proceedings of Eighth International Congress on Information and Communication Technology (Vol. 696, pp. 1053–1063). Springer Nature Singapore. https://doi.org/10.1007/978-981-99-3236-8_85
Huang, K.-W., Yang, Y.-R., Huang, Z.-H., Liu, Y.-Y., & Lee, S.-H. (2023). Retinal Vascular Image Segmentation Using Improved UNet Based on Residual Module. Bioengineering, 10(6), 722. https://doi.org/10.3390/bioengineering10060722
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. de las, Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., & Sayed, W. E. (2023). Mistral 7B. https://doi.org/10.48550/ARXIV.2310.06825
Johnsen, D. A. J., Maggs, D. J., & Kass, P. H. (2006). Evaluation of risk factors for development of secondary glaucoma in dogs: 156 cases (1999–2004). Journal of the American Veterinary Medical Association, 229(8), 1270–1274. https://doi.org/10.2460/javma.229.8.1270
Katic, T., Pavlovski, M., Sekulic, D., & Vucetic, S. (2021). Learning Semi-Structured Representations of Radiology Reports (No. arXiv:2112.10746). arXiv. http://arxiv.org/abs/2112.10746
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2001). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, 311. https://doi.org/10.3115/1073083.1073135
PBC, A. (2024). Claude LLM (Version 3) [Computer software]. Anthropic PBC.
Petit, O., Thome, N., Rambour, C., & Soler, L. (2021). U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. https://doi.org/10.48550/ARXIV.2103.06104
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation (No. arXiv:1505.04597). arXiv. http://arxiv.org/abs/1505.04597
Sasazawa, Y., Yokote, K., Imaichi, O., & Sogawa, Y. (2023). Text Retrieval with Multi-Stage Re-Ranking Models. https://doi.org/10.48550/ARXIV.2311.07994
Savage, C. H., Park, H., Kwak, K., Smith, A. D., Rothenberg, S. A., Parekh, V. S., Doo, F. X., & Yi, P. H. (2024). General-Purpose Large Language Models Versus a Domain-Specific Natural Language Processing Tool for Label Extraction From Chest Radiograph Reports. American Journal of Roentgenology, AJR.23.30573. https://doi.org/10.2214/AJR.23.30573
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition (No. arXiv:1409.1556). arXiv. http://arxiv.org/abs/1409.1556
Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2020). MPNet: Masked and Permuted Pre-training for Language Understanding. https://doi.org/10.48550/ARXIV.2004.09297
Sreng, S., Maneerat, N., Hamamoto, K., & Win, K. Y. (2020). Deep Learning for Optic Disc Segmentation and Glaucoma Diagnosis on Retinal Images. Applied Sciences, 10(14), 4916. https://doi.org/10.3390/app10144916
Strom, A. R., Hässig, M., Iburg, T. M., & Spiess, B. M. (2011). Epidemiology of canine glaucoma presented to University of Zurich from 1995 to 2009. Part 1: Congenital and primary glaucoma (4 and 123 cases). Veterinary Ophthalmology, 14(2), 121–126. https://doi.org/10.1111/j.1463-5224.2010.00855.x
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision (No. arXiv:1512.00567; Version 3). arXiv. http://arxiv.org/abs/1512.00567
Tan, M., & Le, Q. V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (No. arXiv:1905.11946). arXiv. http://arxiv.org/abs/1905.11946
Thamizharasan, A., Murugan, M. S., & Parthiban, S. (2016). Surgical Management of Cherry Eye in a Dog. Intas Polivet, 17(II), 420-421.
Thirunavukarasu, A. J., Mahmood, S., Malem, A., Foster, W. P., Sanghera, R., Hassan, R., Zhou, S., Wong, S. W., Wong, Y. L., Chong, Y. J., Shakeel, A., Chang, Y.-H., Tan, B. K. J., Jain, N., Tan, T. F., Rauz, S., Ting, D. S. W., & Ting, D. S. J. (2024). Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study. PLOS Digital Health, 3(4), e0000341. https://doi.org/10.1371/journal.pdig.0000341
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., … Scialom, T. (2023). Llama-3 2: Open Foundation and Fine-Tuned Chat Models (No. arXiv:2307.09288). arXiv. http://arxiv.org/abs/2307.09288
Tripathi, R. M., Kashyap, D. K., & Giri, D. K. (2014). Surgical Management of Cherry Eye in a Dog. Intas Polivet, 15(1), 131-132.
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. https://doi.org/10.48550/ARXIV.2002.10957
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. https://doi.org/10.48550/ARXIV.1904.09675

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2025 Array
