Screenshot 2024-05-09 at 2.00.57 PM.png

Fig 5a&b, Cui et al (2024), Nature Methods.

In addition to cell embeddings, the pre-trained scGPT model also learned gene token embeddings, which can be used to group functionally related genes. Above, Cui et al. show the performance of zero-shot inference using the pre-trained pan-human scGPT model on the Human Immune dataset. A visualization of HLA gene programs is shown (L), clearly delineating between HLA Class I and Class II programs, which form MHC complexes recognized by distinct CD4 and CD8 T cell subsets. Furthermore, the zero-shot model is used to explore the CD gene network (R), the members of which are routinely used as lineage markers to differentiate immune subsets.

The Superbio scGPT-GRN implementation can perform zero-shot gene network inference on a given .h5ad file, using the pan-human scGPT model. Note that the authors display enhanced performance upon fine-tuning scGPT on similar tissue data. Superbio provides a set of scGPT models fine-tuned on blood, brain, heart, kidney, lung, and pan-cancer tissue respectively. We recommend matching the tissue type to your own data, as described below.

  1. Navigate to “scGPT: Gene Regulatory Network Inference on Pre-Trained Models” on Superbio.

    Screenshot 2024-05-09 at 1.50.43 PM.png

  2. Find scRNA-seq dataset(s) of interest. We recommend searching CELLxGENE or files of interest and downloading the Anndata file (.h5ad format) for scGPT ingestion.

  3. If uploading data from remote or local source, simply click ‘Remote’ or ‘Local’. Note that local file uploads are limited to 100Mb in size, while remote uploads have no limit.

    Screenshot 2024-05-09 at 2.35.01 PM.png

  4. Click ‘Use Demo Data’ if loading the Human Immune dataset provided by Superbio.

    Screenshot 2024-05-09 at 2.35.43 PM.png

  5. If desired, preview the dataset before loading. This information can help us as we set the workflow parameters before launching the workflow.

    Screenshot 2024-05-09 at 2.36.31 PM.png

  6. Now, turn your attention to the workflow parameters on the right. The top three parameters offer ways to QC and filter scRNA-seq data and ensure only high quality events are featured in the pipeline output.

  7. For ‘FILTER GENE COUNTS’, users can choose to exclude genes from the dataset with outlier quantities. For example, a gene that appears only once in a dataset may be excluded from consideration. A typical value is 3, but users can leave this parameter at -1 if they wish to avoid filtering. For ‘FILTER CELL COUNTS’, cell events below a certain gene threshold may be excluded from consideration, as these are likely to be low quality data points. 200 is the recommended threshold. This parameter may be left at -1 if the user wishes to avoid this filtering step. For ‘FILTER GENES ON CLUSTER’, scGPT leverages the Louvain method for community detection to cluster cell events. Once clusters are formed, the user may exclude cluster with fewer than a certain # of genes for QC purposes. The scGPT documentation recommends setting the threshold at 5.

    Screenshot 2024-05-09 at 2.38.01 PM.png