by Google Health

Data is processed in two stages, using two different models:

  1. Google Health model generates embeddings for chest X-rays (CXRs) that can more easily train models for specific medical prediction tasks, such as clinical conditions (e.g., Covid-19 or pneumonia). For example, the CXR network can generate embeddings for every image in a given CXR dataset. For these images, the generated embeddings and the labels for the desired target task (such as Covid-19) are used as examples to train a small ML model.
  2. A smaller training pipeline is provided by Superbio, using these embeddings as inputs to predict clinical conditions (e.g., Covid-19) or patient outcomes (e.g., hospitalization). This is part of the process which is catered to the problem at hand.

Images:

Should be provided in either .png or .dcm format, with multiple files provided in a .zip compressed folder. We recommend that at least 200 images be used to train new models. Using more images for training will likely improve results.

Labels:

Should be provided in csv format: with 1 indicating the presence of a condition, and 0 indicating the control group. If multiple labels are provided instead, in string format, in a single column, then a multilabel model will be trained instead.