Introduction

Over the last several years, deep learning breakthroughs for 3D protein structure prediction have also enabled exciting new architectures for protein design. First came RFDiffusion in 2023, a diffusion model adapted from the RoseTTAFold structural prediction network via a series of de-noising tasks, generating realistic de novo protein backbone structures with desirable functional properties. This network has been widely disseminated and utilized throughout the scientific community, enabling a series of novel design tasks - such as binder design for the SARS-CoV-2 RBD antigen, which we discussed previously. Despite this success, RFDiffusion’s capabilities were optimized for protein-only design tasks, limiting its relevance to the design of more complex macromolecular complexes incorporating small-molecule ligands and nucleic acids. This has left many biological processes still out-of-reach for in silico design: DNA replication, mRNA splicing, epigenetic regulation, metabolic regulation, and photosynthesis - to name a few.

image.png

Building on this success, the Baker Lab introduced RoseTTAFold All-Atom (RFAA) (Krishna et. al), a deep learning network trained on full biological assemblies from the Protein Data Bank. Unlike earlier methods like AlphaFold2 and RoseTTAFold, RFAA models proteins, nucleic acids, small molecules, metals, and covalent modifications by capturing atomic geometry. This breakthrough not only improves protein structure prediction but also enables modeling of complex assemblies. Moreover, the team also fine-tune RoseTTaFold All-Atom for de novo tasks (RFdiffusionAA), enabling the scaffolding of protein pockets around small molecules. Together, these tools set the stage for an unprecedented level of intentional protein design, with a multitude of interesting design tasks.

Design of Photosynthetic Systems

Classic photosynthesis depends on chlorophyll to capture light, primarily in the blue (~430–450 nm) and red (~660–680 nm) regions. This leaves much of the green and orange wavelengths (~500–600 nm) underutilized, significantly constraining overall light energy capture. Moreover, the complex energy transfer in natural systems leads to losses, making the process less efficient than theoretically possible.

Cyanobacteria have partially overcome these limitations by using bilin pigments, like phycoerythrobilin, in their phycobilisomes. These pigments absorb green light, complementing chlorophyll and broadening the range of captured wavelengths.

For engineered photosynthetic systems, enhancing light capture remains a major challenge. Bilins, with their ability to absorb neglected wavelengths, tunable properties, and simple structure, are promising candidates for creating novel light-harvesting proteins. These advances could help overcome the natural inefficiencies of photosynthesis and boost energy capture in synthetic applications.

Workflow: Generating Ligand-Aware Protein Backbones with RFDiffusion All-Atom on Superbio

Step 1: Identify and Characterize Your Ligand of Interest

To plan a diffusion task, first identify your ligand of interest. In Krishna et al, the authors walk through the in silico design and in-vitro validation of three molecules of interest:

For our design task, we are interested in developing a novel protein-binding scaffold for a bilin molcule called phycoerythrin (PEB). While only a first step in a pipeline of in silico and in vitro iterations, this provides a fundamentally new and exciting method for molecular design.