The Cancer Data Science Initiatives team builds interdisciplinary collaborations to develop innovative approaches in the application of data science, computing, and computational science to reduce the burden of cancer and improve the outcomes for cancer patients.  

We work collaboratively with multidisciplinary teams and organizations across the research community, nationally and internationally, to accelerate cancer and biomedical research.

Advancing predictive models cancer research 

With expertise in artificial intelligence, data science, and scientific computing, we provide leadership and bring diverse groups of experts together to develop cutting-edge approaches for complex research challenges. This includes the development of predictive computational models, building communities and collaborations, and applying scientific computing at scale to advance cancer research.  

ATOM Consortium leadership 

The Frederick National Laboratory for Cancer Research cofounded and helps lead the Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium, a collaboration to transform drug discovery from a slow and high-failure process into a rapid, integrated, and patient-centric model.  ATOM launched AMPL software to generate machine-learning models that can predict key safety and pharmacokinetic-relevant parameters. 

The consortium works on generative molecular design to determine design criteria that consider pharmacology, safety, efficacy, and developability. Our goal is to create an active-learning design platform that enables researchers to selectively incorporate results from mechanistic simulation and human-relevant experimentation to generate and optimize new drug candidates. 

Additional Content

Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE) 

The IMPROVE project continues to develop a framework to evaluate machine-learning models to characterize and guide improvements. Starting with models that predict tumor response to drug treatment, the IMPROVE project provides an opportunity to explore new model configurations, evaluate model stability, quantify uncertainty, and guide development of new data to make data-driven models even more effective.   

Additional Content

Data and model resources and clearinghouse 

We develop innovative data-management resources that streamline transformation data to FAIR data resources to foster development of new data science innovations.

We established and continue to operate the predictive oncology data and model clearinghouse that enables the research community to access newly developed software, computational and artificial intelligence models, and key datasets.

We host interdisciplinary workshops to foster broad adoption of cutting-edge artificial intelligence and data science analytic tools and capabilities developed by the National Cancer Institute, Frederick National Laboratory for Cancer Research, and the NCI-DOE Collaboration.

Additional Content

Model and Data Clearinghouse (MoDAC) 

MoDAC is a data repository and model clearinghouse developed to transition resources to the broader research community. We provide consultation and development support to accelerate applications of high-performance computing platforms, including use of graphics processing units, enable scalable workflows for National Cancer Institute environments, and enable use of artificial intelligence and machine-learning platforms. 

Additional Content
  • Storage and sharing of large, annotated datasets 

  • Download asynchronously to Globus endpoint, AWS S3 bucket 

  • Download synchronously to user’s computer 

Additional Content

Computational resources for cancer research 

We provide an online resource for cancer investigators and data scientists to get started in the use of emerging computational predictive oncology approaches including predictive models for drug discovery, natural language processing, tumor classification, and multiscale modeling.  

Additional Content

ATOM Modeling Pipeline (AMPL) 

AMPL is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery and to generate machine-learning models that can predict key safety and pharmacokinetic-relevant parameters. 

Additional Content
  • Benchmarked on large pharmaceutical datasets 

  • Extends functionality of DeepChem 

Additional Content

Cancer Distributed Learning Environment (CANDLE) 

CANDLE is an open-source, deep-learning software platform brings artificial intelligence acceleration to multiple cancer research areas including DOE Exascale Computing Project, Joint Design of Advanced Computing Solutions for Cancer (JDACS4C). Applications extend to multiple other areas including image analysis and locally runnable capability, while efficiently scaling on powerful supercomputers including NIH Biowulf system

View a workshop presentation  about CANDLE.

Additional Content
  • Benchmarks  

  • Documentation  

  • FTP site