The Genomic Data Commons is a repository and computational platform for cancer researchers who need to better understand cancer, its clinical progression, and response to therapy.  

The Biospecimen Research Group facilitates the management and oversight of the Genomic Data Commons and its expansive enterprise, including data harmonization workflows processed through a pipeline automation system; robust data access, submission, and analysis tools with programmatic interfaces; a data model and dictionary with 700+ properties; and associated documentation resources. 

This resource supports the submission, harmonization, analysis, and distribution of genomic and clinical data from pivotal cancer research programs such as The Cancer Genome Atlas Program. The Genomic Data Commons harmonizes raw sequence data, applies state-of-the-art methods for generating higher-level data, such as mutation calls and structural variants, and provides scalable downloads and web-based analysis tools.  

The Genomic Data Commons attracts over 80,000 unique visitors monthly from across 90 countries. Functioning as a big data enterprise, it houses more than 18 petabytes of data, with a monthly data download exceeding 2 petabytes. 

Officially launched in 2016, the Genomic Data Commons has been re-designed in a GDC 2.0, which expands on the initial GDC Data Portal by providing a cohort-centric design with scientific analysis tools that guide research. A framework for analysis tools was created to facilitate the operation of scientific tools with improved visualizations within GDC 2.0.  

The GDC integrates new scientific analysis tools, empowering researchers to visualize mutations in protein-coding genes based on consequence type and protein domain. Researchers can explore the topmost mutated cases and genes affected by high-impact mutations in a custom cohort, visualize sequencing reads for a specified gene, position, SNP, or variant, and conduct hierarchical clustering of genes and expression values.  

With the development of GDC 2.0, the Genomic Data Commons strives to establish an application-centric ecosystem, enabling third-party analysis tools to function seamlessly within its framework, thereby amplifying research impact. 

For any further information related to the Genomic Data Commons (GDC), please feel free to contact Sharon Gaheen at BRGSupport@nih.gov.