Introduction

In 2020, AlphaFold reshaped the landscape of structural biology by setting a new standard in protein-folding accuracy at the CASP15 benchmark, sparking a transformation in structure-based drug design. The algorithm’s groundbreaking use of multiple sequence alignments (MSAs) and graph-based protein representation became foundational concepts, inspiring tools like RoseTTAFold, developed and open-sourced by the Baker Lab.

While accurate protein structure prediction alone represents a milestone, most essential biological processes depend on protein interactions with other macromolecules, such as RNA and DNA. The ability to predict the structures of protein–nucleic acid complexes therefore has the potential to unveil both fundamental biological mechanisms and novel therapeutic targets.

image.png

Fig. 1. from the RoseTTAFoldNA paper, authored by Baek et. al

This is where RoseTTAFold2NA steps in as a cutting-edge tool for predicting protein–nucleic acid complex structures. RoseTTAFold2NA builds on RoseTTAFold’s architecture by adding nucleotide-specific tokens and refining interaction layers to capture protein–nucleic acid contacts. Trained on a diverse dataset of protein, RNA, DNA, and protein–NA complexes, it accurately models these complex interactions, advancing capabilities in drug discovery.

On Superbio, we have made RoseTTAFold2NA accessible through a user-friendly application, illustrated here with a use case centered on characterizing the endonuclease domain of the retrotransposon LINE-1. The principles demonstrated in this tutorial hold promise for generalizing to a range of putative drug targets.

Workflow Tutorial

The LINE-1 (Long Interspersed Nuclear Element-1) retrotransposon is a genetic element capable of copying and inserting itself into new genomic locations through retrotransposition, facilitated by its two essential proteins, ORF1p and ORF2p. ORF2p, which has recently been studied in detail, enables LINE-1 RNA to be reverse transcribed and integrated into DNA. This process initiates the viral replication life cycle, and is associated with a broad range of diseases, from cancer to autoimmunity (Miller et. al, 2021). In particular, the endonuclease domain of the ORF2p protein — responsible for initiating retrotranscription of the LINE-1 RNA — can serve as an attractive target for therapeutic inhibition, halting the life cycle of the retrovirus.

We aim to obtain the experimentally characterized Protein-DNA complex using only amino acid and DNA sequences in FASTA format, ingested together into RoseTTAFold2NA.

  1. Obtain sequences in .fa format for for the ORF2p endonuclease domain and its binding site:

    ORF2p is one of several proteins comprising the LINE-1 retrotransposase complex. The endonuclease domain of the ORF2p protein is essential for introducing nicks into genomic DNA, which enables ORF2p to initiate the process of integrating viral DNA into the host genome. Therefore, the formation of the endonuclease-DNA complex is a necessary precursor to the propagation of LINE-1 throughout the genome.

    LINE1-EN-protein.fa

    LINE1-EN-DNA.fa

    As described in Miller et. al, the endonuclease domain of the LINE-1 ORF2p protein has been characterized experimentally. We will obtain FASTA sequences to fold together in our RoseTTAFold2NA deployment, then compare the result to the PDB obtained via cryo-EM.

  2. Navigate to RoseTTAFold2NA on Superbio

    Screenshot 2025-01-21 at 8.19.58 AM.png

  3. In the upper upload field, add the .fa file corresponding to the amino acid sequence encoding the LINE-1 ORF2p protein. Simply hit upload

    When the files are successfully uploaded, they will populate the upload fields and green check marks will appear, like so:

    Screenshot 2025-01-21 at 8.22.00 AM.png

  4. Navigate to the bottom of the page, where there will be a single drop-down parameter. Select a label to assign to your second .fa file: either ‘Protein’, ‘RNA’, ‘DNA’, or ‘ssDNA’ depending on the sequence.

    In this case, we will select DNA, as the sequence provided encodes one strand of the dsDNA binding site for the LINE-1 ORF2p endonuclease in the human genome.

    Screenshot 2025-01-21 at 8.22.17 AM.png

  5. Click ‘Submit Job>Run on GPU’ to launch the job. Or, click ‘Need Help’, if stuck!

    Screenshot 2025-01-21 at 8.22.43 AM.png

Examine Results