Center for Technical Operations Support

The Center for Technical Operations Support primarily develops systems and software in support of data sharing, semantics, informatics and scientific operations for the National Cancer Institute and other National Institutes such as the National Institute of Allergy and Infectious Diseases.

Our work is driven by the problems we are asked to solve, so we are not focused on a single solution technology as the problems are diverse. However, we are firmly focused on providing the best solution to our government clients.

This work can be broadly categorized into three categories: Application development, data and semantics management, and application support.

John Otridge, Ph.D. Director john.otridge@nih.gov

Creating powerful resources for impactful research

The applications we help develop and support range from various data commons, events registration websites, and others such as an informatics site for the Serological Sciences Network. We also provide testing support for configuration of administration tools and custom-off-the-shelf-tools used within the Frederick National Laboratory.

To accomplish our work, we have a core group that works closely with staff from our many subcontractor teams. Our extensive use of subcontracts, from both academic and commercial groups, allows us to rapidly extend the perspectives and skills we need to solve the problems we are tasked with by the government.

Our projects range from Drupal-based information sites, providing tier 1 and 2 support for applications, overseeing data generation activities by a subcontractor, to maintaining legacy Oracle clinical systems and to complex multiple million dollar initiatives with many subprojects, such as the Cancer Research Data Commons and the Childhood Cancer Data Initiative.

We also provide development support for legacy systems on Java, JavaScript, Drupal, Google Cloud Platform, Amazon Web Service (AWS), and relational and non-relational databases.

A graphic representation of data science with code strings

Data Commons

We built and implemented various systems and databases for cancer researchers in partnership with the National Cancer Institute, including:

Applications

We created multiple applications for various collaborative initiatives across the National Institutes of Health, including:

Project support

For many years, the group has been an active collaborator in the development and support of projects, including:

Advancing disease research through our state-of-the-art tools

Data sharing has always been an important resource for the research community and with the data sharing policies adopted by the National Cancer Institute and others, the availability of data for the community will increase even more.

We have a critical role in helping build NCI’s Cancer Research Data Commons, which serves as a data central data provider for genomic, proteomic, imaging, population science, immuno-oncology, comparative, and other data types. As part of this project, we developed the BENTO Framework, a state-of-the-art, cloud-based, micro-services platform developed with FAIR principles.

DevOps technologies

Given the data and functionality heterogeneity in our projects, each project is treated separately to determine the optimal technology stack. As a result, we use both relational and non-relational databases and have used OpenSearch to aid with query performance on the very large cloud-based data sets. All these technologies are leveraged within a DataOps process we developed to consistently track data through all processing steps and maintain robustness, integrity and reproducibility.

The first step in data sharing is understanding the data, so we have a dedicated data sciences team that leverage lessons learned from all our projects to ensure a data focus on all our data projects. In addition to more classical data activities, such as data curation and transformation, the data team follow a DataOps model for working with data to ensure data is managed appropriately through all stages of the application development process. Moreover, the data science staff are part of the application development teams to ensure we use the best technologies to deliver data to the relevant community

Our DevOps processes consist of provisioning cloud-based environments; developing pipelines; and deploying, testing, and monitoring our applications in a highly secure and repeatable fashion to advance our mission of supporting cancer research.

Jenkins

We use Jenkins as an orchestrator for most of our DevOps workflows. Development and QA Team members kick off the build, deploy, and data load pipelines across all our applications.

GitHub

We use GitHub as our source code repositories for our Application, Data Operations, Infrastructure Provisioning, and Configuration Management assets. It serves as a source of truth for most of our execution activities.

Docker

Docker is a containerization technology that we use to encapsulate a working environment that runs on various infrastructure platforms on AWS. The source code along with associated dependencies get packaged into a docker container and stored in a centralized docker repository hosted on AWS.

Terraform

Terraform is an open-source infrastructure as code (IaC) tool and allows users to define and provision infrastructure using a declarative configuration language. With Terraform, you can describe the components of your infrastructure, such as servers, networks, and databases, in a configuration file.

Childhood Cancer Data Catalog

Making pediatric cancer data readily available to doctors and researchers worldwide

Creating a searchable catalog of pediatric cancer resources.

NCI Integrated Canine Data Commons

Making data sets accessible to enhance cancer research

We built and deployed this repository to help the medical community maximize canine data’s contributions to oncology.

Our capabilities and specializations

Cloud-based technologies

We use various Amazon Web Services (AWS) cloud technologies to develop powerful cloud-based platforms that make data easily accessible and computable for rapidly analyzing hypotheses from the huge data sets available. We also use AWS to serve our Drupal based project. Applications are typically architected with serverless managed services, and we operate at up to FISMA medium levels. Examples include the Index of NCI Studies and the CCDI Molecular Targets Program.

AWS RDMS
AWS Lambda
AWS OpenSearch
AWS Fargate
AWS ECS
Terraform

Database technologies

In addition to our Cancer Research Data Commons and Childhood Cancer Data Catalog data repositories projects, we developed other systems for data sharing. This includes NCI Metathersaurus, a comprehensive biomedical terminology database providing broad, concept-based mapping of terms from more than 101 biomedical terminologies, with 7,500,000 terms mapped to 3,200,000 concepts representing their shared meanings.

Additionally, we developed EVS-SIP, which permits search and retrieval of terms contained in or across the data dictionaries or data models of repositories participating in the Cancer Research Data Commons and beyond.

AWS Neptune
Neo4j Graph DB
AWS RDS
Oracle
Mongo DB/AWS Document DB
PostgreSQL
MSSQL
MySQL

Imaging and informatics for precision medicine

We oversee the Cancer Research Data Common’s Imaging Data Commons, a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools. Data includes radiology collections from the Cancer Imaging Archive and major NCI initiatives, such as the Cancer Genome Atlas Program, Clinical Proteomic Tumor Analysis Consortium, National Lung Screening Trial, and Human Tumor Analysis Network.

We also provide programmatic support for the National Biomedical Imaging Archive, supporting the interoperability between images and genomic data.

MedICI Challenge Management System for image analysis algorithm development and validation
Standards such as BRIDG, CDISC, and DICOM

RAS Initiative study shines the spotlight on less-understood KRAS4a

Clinical Monitoring Research Program to support clinical study exploring alternative to cervical cancer screening

Spring 2024 SeroNews

STAG2 Mutations in the Pathogenesis of Human Cancer

vEM 101: volume Electron Microscopy

Nanotechnology Characterization Laboratory

Scientific Standards Hub

2024 Technology Showcase

Morehouse School of Medicine biospecimens to boost diversity of cancer samples for proteogenomic analysis

Creating powerful resources for impactful research

Advancing disease research through our state-of-the-art tools

DevOps technologies

Jenkins

GitHub

Docker

Terraform

Making pediatric cancer data readily available to doctors and researchers worldwide

Making data sets accessible to enhance cancer research

Our capabilities and specializations

Cloud-based technologies

Database technologies

Imaging and informatics for precision medicine