Abstract

Bacteriophages use receptor-binding proteins (RBPs) to adhere to bacterial hosts, yet their sequence and structural diversity remain poorly understood. Tail fibers, a major class of RBPs, are elongated and flexible trimeric proteins, making their full-length structures difficult to resolve experimentally. Advances in deep learning–basedprotein structure prediction, such as AlphaFold2-multimer (AF2M) and ESMFold, provide opportunities for studying these challenging proteins. Here, we introduce RBPseg, a method that combines monomeric ESMFold predictions with a structural-based domain identification approach, to divide tail fiber sequences into manageable fractions for high-confidence modeling with AF2M. Using this approach, we generated complete tail fiber models, validated by single-particle cryo–electron microscopy of five fibers from three phages. A structural classification of 67 fibres identified 16 distinct classes and 89 domains, revealing patterns of modularity, convergence, divergence, and domain swapping. Our findings suggest that these structural classes represent at least 24% of the known tail fibre universe, providing key insights into their evolution and functionality

LINK | PDF | CODE | DATA