Probing Adversarial Robustness of Protein Language Models: A Reproducible Case Study of ESM-2 Under Substitution-Based Attacks
DOI:
https://doi.org/10.53799/gteskx50Abstract
We present a reproducible case study probing the adversarial robustness of ESM-2 (esm2 t6 8M UR50D), a representative protein language model, under substitution-based adversarial attacks, measuring pseudo-perplexity shift across eight benchmark protein sequences at mutation rates of 1%, 5%, and 10%. Our results show that random non-synonymous substitutions increase ESM-2’s mean perplexity by +9.9%, +11.9%, and +23.5% respectively, with individual sequences exhibiting shifts as high as +66.6%. We find that sequence length modulates robustness: longer, evolutionarily conserved sequences such as Ubiquitin show a clear dose-response to mutation rate, while shorter sequences saturate at low mutation budgets. Beyond model vulnerabilities, we analyze biosecurity risks associated with AI-generated proteins, situating our findings within existing governance frameworks including the Biological Weapons Convention, the NIH P3CO framework, and NSABB guidelines. We propose mitigation strategies including toxicity screening pipelines and access control policies, and identify adversarial training, Bayesian uncertainty estimation, and knowledge distillation as promising defense directions for future investigation. All code and data are publicly available at GitHub repository for ESM2 adversarial robustness.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 AIUB Journal of Science and Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
AJSE contents are under the terms of the Creative Commons Attribution License. This permits anyone to copy, distribute, transmit and adapt the work non-commercially provided the original work and source is appropriately cited.