Explainable Hybrid CNN-Transformer Framework for Papaya Leaf Disease Classification with Layer-Wise Grad-CAM Analysis

Authors

  • S M Tawhid American International University-Bangladesh image/svg+xml
  • Amit Arpon Paul American International University-Bangladesh
  • Abdul Kader Mohim American International University-Bangladesh image/svg+xml
  • Dip Nandi American International University-Bangladesh image/svg+xml

DOI:

https://doi.org/10.53799/vxkj7t08

Keywords:

Agricultural artificial intelligence, Crop disease detection, Explainable deep learning, Hybrid neural networks, Image Classification, Papaya leaf diseases, Visual attention mechanisms

Abstract

While deep learning achieves high accuracy in plant disease classification, its black-box nature limits adoption by agricultural practitioners who require transparent and interpretable predictions for decision-making. This paper presents a hybrid CNN-Transformer framework with systematic layer-wise interpretability that combines convolutional neural networks with vision transformers to prioritize model transparency alongside competitive classification accuracy. Unlike existing hybrid approaches that apply single-layer visualization, this work introduces a systematic multi-depth Grad-CAM analysis that captures progressive feature evolution from edge detection through texture analysis to disease-region localization, revealing hierarchical diagnostic reasoning unavailable in standard single-layer explanations. Five-fold stratified cross-validation demonstrates performance statistically comparable to high-performing CNN and Transformer baselines, with the key distinction being multi-level interpretability rather than superior accuracy. Quantitative evaluation through the Pointing Game localization protocol achieves 87.3% accuracy on expert-annotated disease regions. Robustness evaluation under synthetic perturbations shows graceful performance degradation, though real-world field validation remains future work.

Author Biographies

  • S M Tawhid, American International University-Bangladesh

     B.Sc. degree in computer science and engineering from the American Inter national University-Bangladesh, Dhaka, Bangladesh, in 2026, where he is pursuing the M.Sc. degree. His research interests include bio-inspired robotics, computer vision, explainable artificial intelligence, and precision agriculture. 

  • Amit Arpon Paul, American International University-Bangladesh

     B.Sc. degree in computer science and engineering from American International University–Bangladesh, Dhaka, Bangladesh, expected in 2025. His research interests include deep learning, computer vision, and explainable artificial intelligence for agricultural applications.

  • Abdul Kader Mohim, American International University-Bangladesh

     B.Sc. degree in computer science and engineering from the American International University-Bangladesh, Dhaka, Bangladesh, in 2026, where he is pursuing the M.Sc. degree. His research interests include deep learning, computer vision, and agricultural AI applications.

  • Dip Nandi, American International University-Bangladesh

    Received the Ph.D. degree from RMIT University, Australia. He is currently a Professor and the Associate Dean with the Faculty of Science and Technology, American International University Bangladesh, Dhaka, Bangladesh. He was a recipient of the Institute Gold Medal Award in 2000. His research interests include artificial intelligence, software engineering, and information systems.

References

[1] R. Azad et al., “A comprehensive review of deep learning-based methods

for papaya disease detection,” Neural Computing and Applications,

vol. 33, pp. 16065–16082, 2021.

[2] M. Saleem et al., “Deep learning-based computer vision approaches

for smart agricultural applications,” Frontiers in Plant Science, vol. 14,

p. 1126002, 2023.

[3] K. Ferentinos, “Deep learning models for plant disease detection and

diagnosis,” Computers and Electronics in Agriculture, vol. 145, pp. 311

318, 2018.

[4] S. P. Mohanty, D. P. Hughes, and M. Salath´e, “Using deep learning for

image-based plant disease detection,” Frontiers in Plant Science, vol. 7,

p. 1419, 2016.

[5] G. Wang et al., “Automatic image-based plant disease severity estimation

using deep learning,” Computational Intelligence and Neuroscience,

vol. 2017, 2017.

[6] J. G. Barbedo, “Factors influencing the use of deep learning for plant

disease recognition,” Biosystems Engineering, vol. 172, pp. 84–91, 2018.

[7] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers

for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

[8] H. Touvron et al., “Training data-efficient image transformers and

distillation through attention,” in ICML, pp. 10347–10357, 2021.

[9] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using

shifted windows,” in ICCV, pp. 10012–10022, 2021.

[10] J. He et al., “CANNet: A CNN and transformer fusion network for

hyperspectral image classification,” IEEE TGRS, vol. 61, pp. 1–14, 2023.

[11] J. Chen et al., “TransUNet: Transformers make strong encoders for

medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021.

[12] Q. Wang et al., “LTR-PCNet: Lightweight transformer with pyramid

convolution for plant disease classification,” Computers and Electronics

in Agriculture, vol. 214, p. 108302, 2023.

[13] S. Sankaran et al., “A review of advanced techniques for detecting plant

diseases,” Computers and Electronics in Agriculture, vol. 72, no. 1,

pp. 1–13, 2010.

[14] S. Sladojevic et al., “Deep neural networks based recognition of plant

diseases by leaf image classification,” Computational Intelligence and

Neuroscience, vol. 2016, 2016.

[15] W. Raden et al., “MobileUNet: A lightweight deep learning model for

plant disease detection,” Procedia Computer Science, vol. 157, pp. 164

171, 2019.

[16] G. Tu et al., “Attention-based hybrid networks for crop disease classifi

cation,” Agriculture, vol. 12, no. 8, p. 1234, 2022.

[17] R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks

via gradient-based localization,” in ICCV, pp. 618–626, 2017.

[18] A. Chattopadhay et al., “Grad-CAM++: Generalized gradient-based

visual explanations for deep convolutional networks,” in WACV, pp. 839

847, 2018.

[19] S. Abnar and W. Zuidema, “Quantifying attention flow in transformers,”

in ACL, pp. 4190–4197, 2020.

[20] H. Chefer et al., “Transformer interpretability beyond attention visual

ization,” in CVPR, pp. 782–791, 2021.

[21] J. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms for hyper

parameter optimization,” in NeurIPS, pp. 2546–2554, 2011.

[22] T. Akiba et al., “Optuna: A next-generation hyperparameter optimization

framework,” in KDD, pp. 2623–2631, 2019.

[23] Y. Li et al., “Automated hyperparameter optimization for plant disease

detection,” IEEE Access, vol. 11, pp. 56789–56801, 2023.

[24] M. T. Islam et al., “Mendeley data: Papaya leaf disease detection,”

Mendeley Data, vol. 1, 2020.

[25] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convo

lutional neural networks,” in ICML, pp. 6105–6114, 2019.

[26] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy

estimation and model selection,” in IJCAI, vol. 14, pp. 1137–1145, 1995.

[27] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”

in ICLR, 2017.

[28] A. Howard et al., “Searching for MobileNetV3,” in ICCV, pp. 1314

1324, 2019.

[29] K. He et al., “Deep residual learning for image recognition,” in CVPR,

pp. 770–778, 2016.

[30] G. Huang et al., “Densely connected convolutional networks,” in CVPR,

pp. 4700–4708, 2017.

[31] Z. Liu et al., “A convnet for the 2020s,” in CVPR, pp. 11976–11986,

2022.

[32] I. S. Ahmad et al., “A systematic literature review on plant disease

detection: Current trends, challenges, and future directions,” Computers

and Electronics in Agriculture, vol. 216, p. 108432, 2024,

[33] L. Zhang et al., “A lightweight CNN-Transformer hybrid model for real

time plant disease detection,” Expert Systems with Applications, vol. 238,

p. 122156, 2024.

[34] R. Wightman, “PyTorch Image Models,” GitHub, 2019. [Online]. Avail

able: https://github.com/huggingface/pytorch-image-models

[35] J. Gildenblat et al., “PyTorch Grad-CAM,” GitHub, 2021. [Online].

Available: https://github.com/jacobgil/pytorch-grad-cam

Downloads

Published

31-05-2026

How to Cite

[1]
“Explainable Hybrid CNN-Transformer Framework for Papaya Leaf Disease Classification with Layer-Wise Grad-CAM Analysis”, AJSE, vol. 24, no. 2, pp. 156–166, May 2026, doi: 10.53799/vxkj7t08.

Most read articles by the same author(s)