Korean Institute of Information Technology

Current Issue

The Journal of Korean Institute of Information Technology - Vol. 21 , No. 8

[ Article ]
The Journal of Korean Institute of Information Technology - Vol. 21, No. 8, pp. 143-152
Abbreviation: Journal of KIIT
ISSN: 1598-8619 (Print) 2093-7571 (Online)
Print publication date 31 Aug 2023
Received 15 Jun 2023 Revised 10 Jul 2023 Accepted 13 Jul 2023
DOI: https://doi.org/10.14801/jkiit.2023.21.8.143

Leveraging Image Classification and Semantic Segmentation for Robust Cardiomegaly Diagnosis in Pet
Jun-Young Oh* ; In-Gyu Lee* ; Young-Min Go* ; Euijong Lee* ; Ji-Hoon Jeong**
*Dept. of Computer Science, Chungbuk National University
**Dept. of Computer Science, Chungbuk National University

Correspondence to : Ji-Hoon Jeong Dept. of Computer Science, Chungbuk National University, 28644, Chungdaero 1, Cheongju, Chungcheongbuk-do, Korea Tel.: +82-43-261-2254, Email: jh.jeong@chungbuk.ac.kr

Funding Information ▼


With increasing global interest in pet health, the importance of artificial intelligence (AI) has grown. However, veterinary AI has not been studied as extensively as AI in human medicine. Therefore, we applied AI to pet health management, especially presenting a diagnostic flow. The diagnostic flow consists of data collection, data preprocessing, object detection and classification, and image segmentation. For classification, we utilized YOLOv5 to detect the heart in X-ray images and classify it as normal or abnormal. Subsequently, for cases classified as abnormal, image segmentation visually demonstrates the degree of left atrial enlargement. The classification accuracy achieved 0.8800 for the normal class and 0.8933 for the abnormal class, resulting in an overall classification accuracy of 0.8866. Additional classification metrics include an f1 score of 0.8864 and an area under the curve (AUC) score of 0.8866. The image segmentation performance was evaluated using the dice score, achieving an average performance of 0.9026.


애완동물 건강에 대한 세계적인 관심이 높아지면서 이 분야에서 AI의 중요성이 커졌다. 하지만 수의학 인공지능은 인간 의학 AI만큼 광범위하게 연구되지 않았다. 따라서, 우리는 AI를 애완동물 건강 관리에 적용했고, 특히 컴퓨터 비전을 이용한 진단 흐름을 제시하였다. 진단 흐름은 데이터 수집, 데이터 전처리, 객체 감지 및 분류, 영상 분할로 구성된다. 분류를 위해 X선 영상에서 심장을 감지하고 정상 또는 비정상으로 분류하기 위해 YOLOv5를 사용하였다. 이후 비정상으로 분류된 경우 영상 분할은 좌심방 확대 정도를 시각적으로 보여준다. 분류 정확도는 정상 클래스의 경우 0.8800, 비정상 클래스의 경우 0.8933을 달성하여 전체 분류 정확도는 0.8866이다. 추가 분류 메트릭에는 f1 점수 0.8864 및 AUC 점수 0.8866이 포함된다. 영상 분할 성능은 주사위 점수를 사용하여 평가되었으며 평균 0.9026의 성능을 달성하였다.

Keywords: artificial intelligence, object detection, segmentation, cardiomegaly, left atrial enlargement

Ⅰ. Introduction

As the number of pets worldwide increases, interest in pet healthcare is growing, and people are paying attention to artificial intelligence(AI) technology. Pet healthcare AI can be used in various ways[1]. For example, it can monitor and diagnose the health status of pets, or AI can analyze the biological signals and data of pets to detect diseases or health abnormalities early and notify the owners. This can reduce the time and cost involved in diagnosis and treatment. In this way, pet healthcare AI can support a healthier life more conveniently. We have focused on AI for disease diagnosis that uses computer vision technology among many digital healthcare AIs.

AI utilizing computer vision technology has been more actively used in the field of human medicine rather than veterinary medicine. For instance, Guo et al.(2022) proposed artery-aware network(AANet) for pulmonary embolism detection in computed tomography pulmonary angiography(CTPA) images[2]. The method showed high performance and achieved state-of-the-art. Qadri et al. proposed OP-convNet model for automatic vertebrae computed tomography(CT) images segmentation[3]. The method showed superior performance than previous models like VGGNet, ResNet, DenseNet[4]-[6]. Liu et al. (2019) proposed a model based on atlas registration and linearized kernel sparse representative classifier for subcortical brain segmentation[7]. The method showed significant potential among the methods that were compared.

Contrary to what was mentioned earlier, we conducted experiments applying computer vision techniques to veterinary medicine rather than human medicine. Among then, We focus on myxomatous mitral valve disease(MMVD) and left atrial enlargement[8]. MMVD is characterized by the impaired functioning of the mitral valve, resulting in a decline in heart function. This disease is frequently observed in older dogs[9]. To apply computer vision, our particular emphasis was on X-ray data that can be visually examined. The symptom of left atrial enlargement, which indicates MMVD, has traditionally been measured using methods such as vertebral heart size(VHS) and vertebral left atrial size(VLAS) [10][11]. While VHS measures the overall size of the heart, VLAS specifically measures left atrial enlargement. These methods can introduce subjective judgment by the observer, which can lead to misdiagnosis.

Computer vision can be broadly categorized into three branches: object detection, classification, and semantic segmentation. We utilized all three techniques to diagnose cardiomegaly. Firstly, for object detection and classification, we employed YOLOv5. Additionally, to obtain accurate information about left atrial enlargement, we utilized a semantic segmentation model called UNet[12]. This approach allowed us to perform precise image segmentation, accurately annotating the boundaries of the heart and left atrial enlargement, one of the symptoms of cardiomegaly. By combining object detection, classification, and segmentation, we were able to extract detailed information regarding left atrial enlargement.

Overall, we summarize our contributions as follows:

We applied AI to diagnose heart disease in the field of veterinary medicine, which has been relatively less researched compared to human medicine. By introducing AI technology to veterinary medicine, we made a significant contribution to the diagnosis and treatment of animal health conditions. This can help to promote research and advancement in the field of veterinary medicine by incorporating new technologies.

By combining object detection and classification with segmentation, we achieved highly accurate diagnosis of heart diseases. This approach allowed us to accurately classify different types of heart conditions and perform precise segmentation of the heart.

Our novel contribution lies in the successful integration of these techniques, enabling us to provide a more accurate diagnosis of heart diseases in veterinary medicine. This can have the potential to greatly improve the prevention and treatment of cardiomegaly in pets.

In this paper, we propose a flow for diagnosing cardiomegaly in dogs using computer vision technology. We first employ object detection to locate the heart, followed by classification to distinguish between normal and abnormal hearts. The classified hearts are then subjected to semantic segmentation to assess the extent of left atrial enlargement. This approach enables accurate and rapid diagnosis of cardiomegaly in dogs.

Ⅱ. Materials and Methods

The experimental process we conducted consists of four stages. First, we collect data. Second, we preprocess data for training purposes. Next, we perform a classification of normal and abnormal hearts. Finally, we diagnose cardiomegaly by visually representing a segmented image of the heart classified as abnormal. An overview of the overall experiment is shown in Fig. 1.

Fig. 1. 
Overview of the diagnosing cardiomegaly utilizing YOLOLv5 and UNet

2.1 Data acquisition

The data used in the project consists of a combination of public and private datasets. The public dataset was obtained from AIHUB "X-ray data for diagnosing diseases in companion animals(thoracic images) for pet disease diagnosis". The private dataset, on the other hand, was obtained from a collaborative hospital associated with "Zentry Inc.". This dataset included X-ray images of both dogs with cardiomegaly and healthy dogs.

2.2 Data preprocessing

To prepare the data for training a deep learning model, the preprocessing is necessary. Firstly, the center-crop method was used to standardize the image sizes. Since the heart is the most important aspect, the center crop method that crops the images around the center was chosen to ensure the heart is not cut off.

Next, labeling was performed on the standardized images. Due to the two-step process involved in diagnosing cardiomegaly, there are also two different labeling methods used. For object detection and classification, bounding boxes were labeled around the heart, while image annotation was carried out for image segmentation.

2.3 Normal/Abnormal classification

In this study, we conducted an experiment to classify normal and abnormal findings in chest X-ray using the YOLOv5 algorithm, which is an upgraded version of the YOLO series[13]. YOLOv5 is renowned for its exceptional object detection performance, improved accuracy, and efficient inference speed. It employs a deep convolutional neural network architecture, consisting of a backbone network and detection heads, to detect and classify objects within an input image. The backbone network of YOLOv5 extracts multi-scale features from the input image, enabling the model to capture object details at different scales effectively. Following the backbone network, YOLOv5 utilizes detection heads, consisting of multiple convolutional layers, to generate predictions for bounding boxes and class probabilities. These detection heads operate on feature maps at different scales to detect objects of various sizes. YOLOv5 also employs anchor boxes and a focal loss function to handle the challenges posed by object detection, such as scale variance and class imbalance.

We used 500 images each of normal class and abnormal class data totaling 1000 images for model training. The training was conducted with a batch size of 4 and 50 epochs. YOLOv5 uses mean squared error(MSE) as the loss function for bounding box regression. The MSE calculation for bounding boxes is performed by computing the distances between the diagonal points of the boxes and taking their average. The formula for calculating MSE is as follows, where P represents the prediction bounding box and G represents the ground truth bounding box:


YOLOv5 uses the softmax activation function to predict the class of an object. The softmax function is commonly used in multi-class classification problems to compute the probabilities for each class. It transforms the input values into probabilities representing the likelihood of the object belonging to each class. The formula for the softmax function is as follows, where zi represents the i-th element of the input and N is the total number of classes:


During the training process, YOLOv5 minimizes the cross-entropy loss to optimize the model's predicted class probabilities, encouraging them to align closely with the ground truth class labels. By minimizing this loss, YOLOv5 aims to improve the accuracy and reliability of classifying objects in the given image. The formula for cross-entropy loss(CE) is as follows, where yi represents a one-hot encoded vector with a value of 1 for the correct class and 0 for the remaining classes and y^ as the predicted result vector:


The evaluation of YOLO's performance involves three metrics: classification accuracy, f1 score, and AUC score[14]. Classification accuracy measures the percentage of correctly predicted classes, where higher values indicate better performance.

F1 score is calculated using precision and recall, derived from the confusion matrix. AUC score is obtained from the receiver operating characteristic (ROC) curve, illustrating the trade-off between true positive and false positive rates. Greater values of these three metrics correspond to superior performance. True positive(TP) means when the model accurately predicts a positive outcome. True negative(TN) means when the model accurately predicts a negative outcome. False positive(FP) means when the model incorrectly predicts a positive outcome that should have been negative. False negative(FN) means when the model incorrectly predicting a negative outcome that should have been positive. The formula for the evaluation metrics is below.

Accuracy =TP+TNTP+FP+TN+FN(4) 
F1 score =2× Precision × Recall  Precision + Recall (5) 
2.4 Diagnosis cardiomegaly

To perform precise segmentation on the identified abnormal regions, we employed the UNet model, which is a widely used fully convolutional neural network architecture for medical image segmentation tasks. The UNet architecture consists of an encoder-decoder structure with skip connections. The encoder pathway, composed of convolutional and pooling layers, extracts high-level features and encodes contextual information. The decoder pathway, symmetric to the encoder pathway, uses up-convolutional layers and concatenation with skip connections to progressively recover the spatial resolution and refine the segmentation output. By using skip connections, UNet model allows for the transfer of fine-grained details from the encoder to the decoder.

We used 200 normal class images and 100 abnormal class images totaling 300 for model training. The training was conducted with a batch size of 8 and 100 epochs. UNet utilizes adaptive moment estimation(Adam) as an optimizer, which is a widely used optimization algorithm in machine learning and deep learning. Adam is a combination of momentum and root mean square propagation(RMSProp) and has been proven to have effective performance in various tasks[15][16]. The evaluation metric used in this case is the dice score. The dice score measures the overlap between the ground truth and the prediction. The formula for the dice score is as follows, where G represents ground truth and P represents prediction:

Dicescore =GPGP(6) 

Ⅲ. Experimental Results
3.1 Normal/Abnormal classification

To evaluate the performance of the model on the separate test set, we carefully curated a collection of 75 normal and 75 abnormal chest X-ray images. This dataset served as a reliable benchmark to assess the model's accuracy and effectiveness in classifying chest X-ray images. The obtained results, as summarized in Table 1, demonstrate the impressive capabilities of the model. With an accuracy of 0.8933, the model accurately classified 67 out of 75 normal images, correctly identifying them as normal. Moreover, it achieved an accuracy of 0.8800 for abnormal classifications, correctly detecting 66 out of 75 abnormal images.

Table 1. 
Model performance of classification
Evaluation criteria Performance
Classification accuracy Normal 0.8933 (±0.01)
Abnormal 0.8800 (±0.02)
Total 0.8866 (±0.01)
Metrics Precision 0.8933 (±0.01)
Recall 0.8815 (±0.02)
F1 Score 0.8874 (±0.02)
AUC score 0.8866 (±0.01)

The overall classification performance of the model reached 0.8866. To provide a visual representation of the classification experiment results, we present Fig. 2, which showcases the model's performance on the test set. This figure serves as a valuable tool in understanding the model's classification capabilities and its ability to distinguish between normal and abnormal chest X-ray images. Additionally, we assessed the model's performance using several commonly used performance metrics. The precision, recall, f1 score, and AUC score demonstrated outstanding performance, achieving values of 0.8933, 0.8815, 0.8874, and 0.8866, respectively. These metrics further validate the model's accuracy and its ability to correctly classify chest X-ray images. For a comprehensive analysis, we present additional visualizations in the form of Fig. 3, showcasing the confusion matrix, and Fig. 4, illustrating the ROC curves. Overall, the result obtained from our experiments underscore the effectiveness and reliability of the model in accurately classifying chest X-ray images. The high accuracy, high performance metrics, and visualizations proves the model's potential in assisting veterinary medical professionals in the rapid and accurate diagnosis and assessment of cardiomegaly.

Fig. 2. 
Figure representing the visualization of classification experiment results. Normal is red, abnormal is pink

Fig. 3. 
Confusion matrix of normal/abnormal classification experiment

Fig. 4. 
ROC curve of normal/abnormal classification experiment

3.2 Diagnosis cardiomegaly

Building upon the successful normal/abnormal classification, we conducted further experiments specifically targeting the diagnosis of left atrial enlargement, a characteristic symptom of cardiomegaly. To accomplish this, we utilized the UNet architecture, a widely employed model in medical image segmentation. In this particular experiment, our test set consisted of 76 abnormal images that were previously identified as anomalies using YOLOv5. The results of this experiment demonstrated promising outcomes, with the average dice score reaching a high value of 0.9026. Fig. 5 illustrates the dice scores obtained for the test set, providing a graphical representation of the segmentation performance. Among the test data, the 6th sample exhibited the lowest dice score of 0.8245,while the 72nd sample achieved the highest dice score of 0.9515. Fig. 6 shows the experimental results, offering a visual representation of the segmentation. By focusing on the diagnosis of left atrial enlargement, we were able to leverage the UNet architecture to accurately identify and segment the affected regions. The high average dice score highlight the effectiveness of the model in precisely delineating the boundaries of cardiomegaly.

Fig. 5. 
Dice score of each test data of diagnosis cardiomegaly. Blue bar is the lowest, the red bar is the highest

Fig. 6. 
Experiment results of diagnosis cardiomegaly

Ⅳ. Discussion & Conclusion

During our experiments, we encountered three main issues. Firstly, there were misclassification cases, although such instances were relatively infrequent. While the current accuracy is already high, achieving a higher performance is necessary for the optimal utilization in clinical settings. Example of misclassification case is depicted in Fig. 7. Secondly, in terms of segmentation, although the dice score was generally high, there were instances where the prediction did not accurately capture the enlarged portion of the left atrium when compared to the ground truth. Therefore, an upgrade is required to improve the segmentation performance, ensuring a more precise diagnosis of the extent and presence of symptoms. The third issue is insufficient high-quality data. The characteristics of medical data make it difficult to obtain high-quality data. However, training a deep learning model requires a large amount of data. Therefore, it is important to find a way to get more data than now.

Fig. 7. 
Misclassification results of classification. Each data classified opposite class

In this paper, our experiments yielded impressive results, achieving a classification accuracy of 0.8866 and a dice score of 0.9026. These findings showcase the potential applicability of our approach in a clinical setting. However, it is important to acknowledge the existing limitations and the need for further research and development to enhance both classification and segmentation performance. To overcome these challenges, it is essential to explore the application of advanced UNet architectures such as UNet3+, UNet++, and other related variants[17]-[19]. Afterwards, it is necessary to combine the strengths and advantages of UNet with other architectures in order to create models that are tailored for veterinary AI. Furthermore, in order to overcome the scarcity of medical data, we intend to investigate novel approaches such as utilizing generative models like generative adversarial networks(GAN) or diffusion model to generate synthetic medical data[20][21]. This strategy would enable us to augment the existing data and facilitate the training of more comprehensive and diverse models.


This work was supported by the research grant of the Chungbuk National University in 2022 and partly supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2021R1G1A101097111)

1. K. Matsubars, M. Ibaraki, M. Nemoto, H. Watabe, and Y. Kimura, "A Review on AI in PET Imaging", Annals of Nuclear Medicine, Vol. 36, No. 2, pp. 133-143, Nov. 2022.
2. J. Guo, X. Liu, Y. Chen, S.Zhang, G. Tao, H. Yu, H. Zhu, W. Lei, H. Li, and N. Wang, "AANet: Artery-Aware Network for Pulmonary Embolism Detection in CTPA Images", Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 473-483, Sep. 2022.
3. S. F. Qadri, L. Shen, M. Ahmad, S. Qadri, S. S. Zareen, and S. Khan, "Op-convNet: A Patch Classification-Based Framework for CT Vertebrae Segmentation", IEEE Access, Vol. 9, pp. 158227-158240, Sep. 2021.
4. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition", arXiv:1409.1556, Sep. 2014.
5. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770-778, Jun. 2016.
6. G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 4700-4708, Jul. 2017.
7. Y. Liu, Y. Wei, and C. Wang, "Subcortical Brain Segmentation Based on Atlas Registration and Linearized Kernel Sparse Representative Classifier", IEEE Access, Vol. 7, pp. 31547-31557, Mar. 2019.
8. M. Hollmer, J. Willesen, A. Tolver, and J. Koch, "Left Atrial Volume and Function in Dogs with Naturally Occuring Myxomatous Mitral Valve Disease", Journal of Veterinary Cardiology, Vol. 19, No. 1, pp. 24-34, Feb. 2017.
9. E. L. Malcolm, L. C. Visser, K. L. Phillips, and L. R. Johnson, "Diagnostic Value of Vertebral Left Atrial Size as Determined from Thoracic Radiographs for Assessment of Left Atrial Size in Dogs with Myxomatous Mitral Valve Disease", Journal of the American Veterinary Medical Assocation (JAVMA), Vol. 253, No. 8, pp. 1038-1045, Oct. 2018.
10. J. W. Buchanan and J. Bucheler, "Vertebral Scale System to Measure Canine Heart Size in Radiographs", Journal-American Veterinary Medical Association, Vol. 206, No. 2, pp. 194-194, Jan. 1995.
11. B. W. Keene, et al., "ACVIM Consensus Guidelines for the Diagnosis and Treatment of Myxomatous Mitral Valve Disease in Dogs", Journal of Veterinary Internal Medicine (JVIM), Vol. 33, No. 3, pp. 1127-1140, May 2019.
12. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation", Medical Image Computing and Computer Assisted Intervention (MICCAI), Vol. 9351, pp. 234-241, Oct. 2015.
13. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 779-788, Jun. 2016.
14. D. M. Powers, "Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correltation", arXiv:2010.16061, Oct. 2020.
15. D. P. Kingman and J. Ba, "Adam: A Method for Stochastic Optimization", arXiv:1412.6980, Dec. 2014.
16. S. Ruder, "An Overview of Gradient Descent Optimization Algorithms", arXiv:1609.04747, Sep. 2016.
17. H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, X. Han, Y.-W. Chen, and J. Wu, "UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, pp. 1055-1059, May 2020.
18. Z. Zhou, M. Siddiquee, N. Tajbakhsh, and J. U. Liang, "UNet ++: A Nested U-Net Architecture for Medical Image Segmentation", Deep Learning in Medical Image Analysis (DLMIA) and Multimodal Learning for Clinical Decision Support (ML-CDS), Vol. 11045, pp. 3-11, Sep. 2018.
19. M. Yahyatabar, P. Jouvet, and F. Cheriet, "Dense-Unet: a Light Model for Lung Fields Segmentation in Chest X-ray Images", Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, pp. 1242-1245, Jul. 2020.
20. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Networks", Communication of the Association for Computing Machinery (ACM), Vol. 63, No. 11, pp. 139-144, Oct. 2020.
21. J. Ho, A. Jain, and P. Abbeel, "Denoising Diffusion Probabilistic Models", Advances in Neural Information Processing Systems (NIPS), Vol. 33, pp. 6840-6851, Dec. 2020.

Jun-Young Oh

2017 ~ 2023 : B.S. degree in School of Computer Science, Chungbuk National University

2023 ~ present : Integrated M.S.&Ph.D. candidate in Dept. Computer Science, Chungbuk National University

Research interests : Machine Learning, Artificial Intelligence, Computer Vision

In-Gyu Lee

2018 ~ present : Undergraduate course in School of Computer Science, Chungbuk National University

Research interests : Machine Learning, Artificial Intelligence, Computer Vision

Young-Min Go

2019 ~ present : Undergraduate course in School of Computer Science, Chungbuk National University

Research interests : Machine Learning, Artificial Intelligence, Computer Vision

Euijong Lee

2006 ~ 2012 : B.S. degree in Department of Computer Science, Korea University

2012 ~ 2018 : Ph.D. degree in Dept. Computer Science and Engineering, Korea University

2020 ~ present : Assistant Professor, School of Computer Science, Chungbuk National University

Research interests : Self-Adaptive Software, Internet of Things, and Software Modeling

Ji-Hoon Jeong

2009 ~ 2015 : B.S. degree in Department of Computer Science & Dept. Brain and Cognitive Sciences, Korea University

2015 ~ 2021 : Ph.D. degree in Dept. Brain and Cognitive Engineering, Korea University

2022 ~ present : Assistant Professor, School of Computer Science, Chungbuk National University

Research interests : Machine Learning, Brain-machine Interface, and Artificial Intelligence