The Center for Technical Operations Support primarily develops systems and software in support of data sharing, semantics, informatics and scientific operations for the National Cancer Institute and other National Institutes such as the National Institute of Allergy and Infectious Diseases.
Our work is driven by the problems we are asked to solve, so we are not focused on a single solution technology as the problems are diverse. However, we are firmly focused on providing the best solution to our government clients.
This work can be broadly categorized into three categories: Application development, data and semantics management, and application support.
Creating powerful resources for impactful research
The applications we help develop and support range from various data commons, events registration websites, and others such as an informatics site for the Serological Sciences Network. We also provide testing support for configuration of administration tools and custom-off-the-shelf-tools used within the Frederick National Laboratory.
To accomplish our work, we have a core group that works closely with staff from our many subcontractor teams. Our extensive use of subcontracts, from both academic and commercial groups, allows us to rapidly extend the perspectives and skills we need to solve the problems we are tasked with by the government.
Our projects range from Drupal-based information sites, providing tier 1 and 2 support for applications, overseeing data generation activities by a subcontractor, to maintaining legacy Oracle clinical systems and to complex multiple million dollar initiatives with many subprojects, such as the Cancer Research Data Commons and the Childhood Cancer Data Initiative.
We also provide development support for legacy systems on Java, JavaScript, Drupal, Google Cloud Platform, Amazon Web Service (AWS), and relational and non-relational databases.
Advancing disease research through our state-of-the-art tools
Data sharing has always been an important resource for the research community and with the data sharing policies adopted by the National Cancer Institute and others, the availability of data for the community will increase even more.
We have a critical role in helping build NCI’s Cancer Research Data Commons, which serves as a data central data provider for genomic, proteomic, imaging, population science, immuno-oncology, comparative, and other data types. As part of this project, we developed the BENTO Framework, a state-of-the-art, cloud-based, micro-services platform developed with FAIR principles.
DevOps technologies
Given the data and functionality heterogeneity in our projects, each project is treated separately to determine the optimal technology stack. As a result, we use both relational and non-relational databases and have used OpenSearch to aid with query performance on the very large cloud-based data sets. All these technologies are leveraged within a DataOps process we developed to consistently track data through all processing steps and maintain robustness, integrity and reproducibility.
The first step in data sharing is understanding the data, so we have a dedicated data sciences team that leverage lessons learned from all our projects to ensure a data focus on all our data projects. In addition to more classical data activities, such as data curation and transformation, the data team follow a DataOps model for working with data to ensure data is managed appropriately through all stages of the application development process. Moreover, the data science staff are part of the application development teams to ensure we use the best technologies to deliver data to the relevant community
Our DevOps processes consist of provisioning cloud-based environments; developing pipelines; and deploying, testing, and monitoring our applications in a highly secure and repeatable fashion to advance our mission of supporting cancer research.
Jenkins
We use Jenkins as an orchestrator for most of our DevOps workflows. Development and QA Team members kick off the build, deploy, and data load pipelines across all our applications.
GitHub
We use GitHub as our source code repositories for our Application, Data Operations, Infrastructure Provisioning, and Configuration Management assets. It serves as a source of truth for most of our execution activities.
Docker
Docker is a containerization technology that we use to encapsulate a working environment that runs on various infrastructure platforms on AWS. The source code along with associated dependencies get packaged into a docker container and stored in a centralized docker repository hosted on AWS.
Terraform
Terraform is an open-source infrastructure as code (IaC) tool and allows users to define and provision infrastructure using a declarative configuration language. With Terraform, you can describe the components of your infrastructure, such as servers, networks, and databases, in a configuration file.
Our capabilities and specializations
Cloud-based technologies
We use various Amazon Web Services (AWS) cloud technologies to develop powerful cloud-based platforms that make data easily accessible and computable for rapidly analyzing hypotheses from the huge data sets available. We also use AWS to serve our Drupal based project. Applications are typically architected with serverless managed services, and we operate at up to FISMA medium levels. Examples include the Index of NCI Studies and the CCDI Molecular Targets Program.
-
AWS RDMS
-
AWS Lambda
-
AWS OpenSearch
-
AWS Fargate
-
AWS ECS
-
Terraform
Database technologies
In addition to our Cancer Research Data Commons and Childhood Cancer Data Catalog data repositories projects, we developed other systems for data sharing. This includes NCI Metathersaurus, a comprehensive biomedical terminology database providing broad, concept-based mapping of terms from more than 101 biomedical terminologies, with 7,500,000 terms mapped to 3,200,000 concepts representing their shared meanings.
Additionally, we developed EVS-SIP, which permits search and retrieval of terms contained in or across the data dictionaries or data models of repositories participating in the Cancer Research Data Commons and beyond.
-
AWS Neptune
-
Neo4j Graph DB
-
AWS RDS
-
Oracle
-
Mongo DB/AWS Document DB
-
PostgreSQL
-
MSSQL
-
MySQL
Imaging and informatics for precision medicine
We oversee the Cancer Research Data Common’s Imaging Data Commons, a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools. Data includes radiology collections from the Cancer Imaging Archive and major NCI initiatives, such as the Cancer Genome Atlas Program, Clinical Proteomic Tumor Analysis Consortium, National Lung Screening Trial, and Human Tumor Analysis Network.
We also provide programmatic support for the National Biomedical Imaging Archive, supporting the interoperability between images and genomic data.
-
MedICI Challenge Management System for image analysis algorithm development and validation
-
Standards such as BRIDG, CDISC, and DICOM