Explainable Hybrid CNN-Transformer Framework for Papaya Leaf Disease Classification with Layer-Wise Grad-CAM Analysis
DOI:
https://doi.org/10.53799/vxkj7t08Keywords:
Agricultural artificial intelligence, Crop disease detection, Explainable deep learning, Hybrid neural networks, Image Classification, Papaya leaf diseases, Visual attention mechanismsAbstract
While deep learning achieves high accuracy in plant disease classification, its black-box nature limits adoption by agricultural practitioners who require transparent and interpretable predictions for decision-making. This paper presents a hybrid CNN-Transformer framework with systematic layer-wise interpretability that combines convolutional neural networks with vision transformers to prioritize model transparency alongside competitive classification accuracy. Unlike existing hybrid approaches that apply single-layer visualization, this work introduces a systematic multi-depth Grad-CAM analysis that captures progressive feature evolution from edge detection through texture analysis to disease-region localization, revealing hierarchical diagnostic reasoning unavailable in standard single-layer explanations. Five-fold stratified cross-validation demonstrates performance statistically comparable to high-performing CNN and Transformer baselines, with the key distinction being multi-level interpretability rather than superior accuracy. Quantitative evaluation through the Pointing Game localization protocol achieves 87.3% accuracy on expert-annotated disease regions. Robustness evaluation under synthetic perturbations shows graceful performance degradation, though real-world field validation remains future work.
References
[1] R. Azad et al., “A comprehensive review of deep learning-based methods
for papaya disease detection,” Neural Computing and Applications,
vol. 33, pp. 16065–16082, 2021.
[2] M. Saleem et al., “Deep learning-based computer vision approaches
for smart agricultural applications,” Frontiers in Plant Science, vol. 14,
p. 1126002, 2023.
[3] K. Ferentinos, “Deep learning models for plant disease detection and
diagnosis,” Computers and Electronics in Agriculture, vol. 145, pp. 311
318, 2018.
[4] S. P. Mohanty, D. P. Hughes, and M. Salath´e, “Using deep learning for
image-based plant disease detection,” Frontiers in Plant Science, vol. 7,
p. 1419, 2016.
[5] G. Wang et al., “Automatic image-based plant disease severity estimation
using deep learning,” Computational Intelligence and Neuroscience,
vol. 2017, 2017.
[6] J. G. Barbedo, “Factors influencing the use of deep learning for plant
disease recognition,” Biosystems Engineering, vol. 172, pp. 84–91, 2018.
[7] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers
for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[8] H. Touvron et al., “Training data-efficient image transformers and
distillation through attention,” in ICML, pp. 10347–10357, 2021.
[9] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using
shifted windows,” in ICCV, pp. 10012–10022, 2021.
[10] J. He et al., “CANNet: A CNN and transformer fusion network for
hyperspectral image classification,” IEEE TGRS, vol. 61, pp. 1–14, 2023.
[11] J. Chen et al., “TransUNet: Transformers make strong encoders for
medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021.
[12] Q. Wang et al., “LTR-PCNet: Lightweight transformer with pyramid
convolution for plant disease classification,” Computers and Electronics
in Agriculture, vol. 214, p. 108302, 2023.
[13] S. Sankaran et al., “A review of advanced techniques for detecting plant
diseases,” Computers and Electronics in Agriculture, vol. 72, no. 1,
pp. 1–13, 2010.
[14] S. Sladojevic et al., “Deep neural networks based recognition of plant
diseases by leaf image classification,” Computational Intelligence and
Neuroscience, vol. 2016, 2016.
[15] W. Raden et al., “MobileUNet: A lightweight deep learning model for
plant disease detection,” Procedia Computer Science, vol. 157, pp. 164
171, 2019.
[16] G. Tu et al., “Attention-based hybrid networks for crop disease classifi
cation,” Agriculture, vol. 12, no. 8, p. 1234, 2022.
[17] R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks
via gradient-based localization,” in ICCV, pp. 618–626, 2017.
[18] A. Chattopadhay et al., “Grad-CAM++: Generalized gradient-based
visual explanations for deep convolutional networks,” in WACV, pp. 839
847, 2018.
[19] S. Abnar and W. Zuidema, “Quantifying attention flow in transformers,”
in ACL, pp. 4190–4197, 2020.
[20] H. Chefer et al., “Transformer interpretability beyond attention visual
ization,” in CVPR, pp. 782–791, 2021.
[21] J. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms for hyper
parameter optimization,” in NeurIPS, pp. 2546–2554, 2011.
[22] T. Akiba et al., “Optuna: A next-generation hyperparameter optimization
framework,” in KDD, pp. 2623–2631, 2019.
[23] Y. Li et al., “Automated hyperparameter optimization for plant disease
detection,” IEEE Access, vol. 11, pp. 56789–56801, 2023.
[24] M. T. Islam et al., “Mendeley data: Papaya leaf disease detection,”
Mendeley Data, vol. 1, 2020.
[25] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convo
lutional neural networks,” in ICML, pp. 6105–6114, 2019.
[26] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy
estimation and model selection,” in IJCAI, vol. 14, pp. 1137–1145, 1995.
[27] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”
in ICLR, 2017.
[28] A. Howard et al., “Searching for MobileNetV3,” in ICCV, pp. 1314
1324, 2019.
[29] K. He et al., “Deep residual learning for image recognition,” in CVPR,
pp. 770–778, 2016.
[30] G. Huang et al., “Densely connected convolutional networks,” in CVPR,
pp. 4700–4708, 2017.
[31] Z. Liu et al., “A convnet for the 2020s,” in CVPR, pp. 11976–11986,
2022.
[32] I. S. Ahmad et al., “A systematic literature review on plant disease
detection: Current trends, challenges, and future directions,” Computers
and Electronics in Agriculture, vol. 216, p. 108432, 2024,
[33] L. Zhang et al., “A lightweight CNN-Transformer hybrid model for real
time plant disease detection,” Expert Systems with Applications, vol. 238,
p. 122156, 2024.
[34] R. Wightman, “PyTorch Image Models,” GitHub, 2019. [Online]. Avail
able: https://github.com/huggingface/pytorch-image-models
[35] J. Gildenblat et al., “PyTorch Grad-CAM,” GitHub, 2021. [Online].
Available: https://github.com/jacobgil/pytorch-grad-cam
Downloads
Published
Issue
Section
License
Copyright (c) 2025 AIUB Journal of Science and Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
AJSE contents are under the terms of the Creative Commons Attribution License. This permits anyone to copy, distribute, transmit and adapt the work non-commercially provided the original work and source is appropriately cited.