Fig 1e, Cui et al (2024), Nature Methods.
scGPT is a pre-trained model with stacked transformer layers and multi-head attention “that generate cell and gene embedding simultaneously” during pre-training. Above, the authors share a UMAP visualization of the resultant cell embeddings using a 10% sample of the training dataset (~3 million cells) sampled randomly from the 51 tissues represented.
Without the need for additional fine-tuning, the above embeddings can be used to propagate labels like “cell type” or “disease conditions” from the reference map to the query cells.
Navigate to “scGPT: Reference Mapping Using Cell Embedding” on Superbio.
Provide a scRNAseq file in .h5ad format. We recommend downloading a dataset of choice from CELLxGENE, or using the Superbio provided demo data.
In either case, scroll down to the ‘Upload Single Cell RNA-Seq Data File’ window on the scGPT application on Superbio. Click ‘Local’ or ‘Remote’ to upload data from local or remote sources. We recommend remote as file size is unlimited, whereas local uploads are limited to 100Mb.
If using Demo Data, simply click ‘Use Demo Data’ to select and preview the ‘demo_test.h5ad’ file. The Demo Data provided corresponds to the human pacreas scRNA-seq dataset discussed in Cui et al. The .h5ad file can also be found at this Google Drive link.
Now let’s move to setting the workflow parameters. On the right, please specify the column name corresponding to the cell type in your scRNA-seq count matrix. If left blank, the default will be ‘Celltype’. For demo data, the parameter is selected automatically.
Next, select the gene column name, or the name of the column header in your .h5ad that contains gene names. The default is ‘index’. For demo data, this value will be selected automatically.
Once data are selected and parameters are filled with values, click ‘Submit Job’ > ‘Run on GPU’ to launch workflow.
Navigate to ‘Jobs’ on the left hand sidebar, click on the ‘Completed’ job, and view or download your propagated CELLxGENE labels.