Cancer Data Science Initiatives

The Cancer Data Science Initiatives team builds interdisciplinary collaborations to develop innovative approaches in the application of data science, computing, and computational science to reduce the burden of cancer and improve the outcomes for cancer patients.

We work collaboratively with multidisciplinary teams and organizations across the research community, nationally and internationally, to accelerate cancer and biomedical research.

Eric A. Stahlberg, Ph.D. Director eric.stahlberg@nih.gov

Advancing digital twins for cancer

Using predictive models to improve the lives of cancer patients

A recent manuscript highlights several cutting-edge pilot projects exploring the future for cancer patient digital twins.

BIO ITWorld Europe 2023

Bringing together thought leaders to advance virtual human modeling, biomedical digital twins

Eric Stahlberg moderated the panel on biomedical digital twins.

Advancing predictive models cancer research

With expertise in artificial intelligence, data science, and scientific computing, we provide leadership and bring diverse groups of experts together to develop cutting-edge approaches for complex research challenges. This includes the development of predictive computational models, building communities and collaborations, and applying scientific computing at scale to advance cancer research.

ATOM Consortium leadership

The Frederick National Laboratory for Cancer Research cofounded and helps lead the Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium, a collaboration to transform drug discovery from a slow and high-failure process into a rapid, integrated, and patient-centric model.  ATOM launched AMPL software to generate machine-learning models that can predict key safety and pharmacokinetic-relevant parameters.

The consortium works on generative molecular design to determine design criteria that consider pharmacology, safety, efficacy, and developability. Our goal is to create an active-learning design platform that enables researchers to selectively incorporate results from mechanistic simulation and human-relevant experimentation to generate and optimize new drug candidates.

Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE)

The IMPROVE project continues to develop a framework to evaluate machine-learning models to characterize and guide improvements. Starting with models that predict tumor response to drug treatment, the IMPROVE project provides an opportunity to explore new model configurations, evaluate model stability, quantify uncertainty, and guide development of new data to make data-driven models even more effective.

Collaboration

Digital twin ideas lab: Innovative, cross-disciplinary research and a roadmap

Interactive NCI/DOE workshop seeding new partnerships.

Virtual Human Global Summit

Global leaders explore the future for medical digital twins

We drive innovative cross-disciplinary collaborations to advance precision predictive oncology.

Data and model resources and clearinghouse

We develop innovative data-management resources that streamline transformation data to FAIR data resources to foster development of new data science innovations.

We established and continue to operate the predictive oncology data and model clearinghouse that enables the research community to access newly developed software, computational and artificial intelligence models, and key datasets.

We host interdisciplinary workshops to foster broad adoption of cutting-edge artificial intelligence and data science analytic tools and capabilities developed by the National Cancer Institute, Frederick National Laboratory for Cancer Research, and the NCI-DOE Collaboration.

Model and Data Clearinghouse (MoDAC)

MoDAC is a data repository and model clearinghouse developed to transition resources to the broader research community. We provide consultation and development support to accelerate applications of high-performance computing platforms, including use of graphics processing units, enable scalable workflows for National Cancer Institute environments, and enable use of artificial intelligence and machine-learning platforms.

Storage and sharing of large, annotated datasets
Download asynchronously to Globus endpoint, AWS S3 bucket
Download synchronously to user’s computer

Computational resources for cancer research

We provide an online resource for cancer investigators and data scientists to get started in the use of emerging computational predictive oncology approaches including predictive models for drug discovery, natural language processing, tumor classification, and multiscale modeling.

ATOM Modeling Pipeline (AMPL)

AMPL is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery and to generate machine-learning models that can predict key safety and pharmacokinetic-relevant parameters.

Benchmarked on large pharmaceutical datasets
Extends functionality of DeepChem

Cancer Distributed Learning Environment (CANDLE)

CANDLE is an open-source, deep-learning software platform brings artificial intelligence acceleration to multiple cancer research areas including DOE Exascale Computing Project, Joint Design of Advanced Computing Solutions for Cancer (JDACS4C). Applications extend to multiple other areas including image analysis and locally runnable capability, while efficiently scaling on powerful supercomputers including NIH Biowulf system.

View a workshop presentation  about CANDLE.

Benchmarks 
Documentation 
FTP site

RAS Initiative study shines the spotlight on less-understood KRAS4a

Clinical Monitoring Research Program to support clinical study exploring alternative to cervical cancer screening

Spring 2024 SeroNews

STAG2 Mutations in the Pathogenesis of Human Cancer

vEM 101: volume Electron Microscopy

Nanotechnology Characterization Laboratory

Scientific Standards Hub

2024 Technology Showcase

Morehouse School of Medicine biospecimens to boost diversity of cancer samples for proteogenomic analysis