Probing Adversarial Robustness of Protein Language Models: A Reproducible Case Study of ESM-2 Under Substitution-Based Attacks

Authors

DOI:

https://doi.org/10.53799/gteskx50

Abstract

We present a reproducible case study probing the adversarial robustness of ESM-2 (esm2 t6 8M UR50D), a representative protein language model, under substitution-based adversarial attacks, measuring pseudo-perplexity shift across eight benchmark protein sequences at mutation rates of 1%, 5%, and 10%. Our results show that random non-synonymous substitutions increase ESM-2’s mean perplexity by +9.9%, +11.9%, and +23.5% respectively, with individual sequences exhibiting shifts as high as +66.6%. We find that sequence length modulates robustness: longer, evolutionarily conserved sequences such as Ubiquitin show a clear dose-response to mutation rate, while shorter sequences saturate at low mutation budgets. Beyond model vulnerabilities, we analyze biosecurity risks associated with AI-generated proteins, situating our findings within existing governance frameworks including the Biological Weapons Convention, the NIH P3CO framework, and NSABB guidelines. We propose mitigation strategies including toxicity screening pipelines and access control policies, and identify adversarial training, Bayesian uncertainty estimation, and knowledge distillation as promising defense directions for future investigation. All code and data are publicly available at GitHub repository for ESM2 adversarial robustness.

Downloads

Published

31-05-2026

How to Cite

[1]
“Probing Adversarial Robustness of Protein Language Models: A Reproducible Case Study of ESM-2 Under Substitution-Based Attacks”, AJSE, vol. 24, no. 2, pp. 141–149, May 2026, doi: 10.53799/gteskx50.