The Cancer Imaging Informatics Laboratory is a support team that manages the Cancer Imaging Archive with the objective to increase public availability of high-quality cancer imaging data sets for research, support National Institutes of Health data sharing requirements for the cancer imaging community, enhance reproducibility in research, and create a culture of open data sharing and collaboration among cancer imaging researchers.
Our laboratory also supports the development of new technologies and methodologies, such as clinical imaging data de-identification and curation, radiomics and image characterization, AI and deep learning, and integrative, multi-disciplinary data analysis (e.g. radiogenomics).
Managing the Cancer Imaging Archive
Cancer imaging research requires access to large, standardized, purpose-built imaging collections. Since 2010, the NCI Cancer Imaging Program has counted on our Cancer Imaging Informatics Laboratory to develop, manage, and support the Cancer Imaging Archive (TCIA) to fill the unmet needs of cross-disciplinary image researchers for open access to clinical images.
We provide project management, data curation, data submitter and community relations outreach, and subcontract management.
Each month, over 20,000 unique users visit the archive where they find more than 200 datasets of computed tomography, magnetic resonance imaging, positron emission imaging, x-ray mammography, digitized histopathology slides, and radiation therapy planning imaging studies.
There have been at least 1,800 peer-reviewed publications based upon these TCIA-hosted data, with more likely as most of the collections are open and available for public use.
In addition to supporting the imaging components of major National Cancer Institute data collection initiatives, we lead an advisory group that prioritizes the curation and publishing of researcher-initiated proposals based on how well the data sets fill data gaps to support critical current research for a clinical need, novel/unique datasets, research reproducibility, and investigation of biological hypotheses or other proposed discoveries about the pathophysiological basis of cancer.
Submission and de-identification
The Cancer Imaging Archive provides full research-focused de-identification services and makes its tools and knowledge base available to the scientific community. Since the Cancer Imaging Archive contains a large repository of open-access clinical imaging data, protection of Private Health Information while still preserving the scientific utility of the data is critical.
We have developed robust tools and extensive procedures to transmit, de-identify, and quality assess the medical images submitted to the archive and is staffed with curation experts who review and publish the submitted images. We routinely perform further refinement and testing of advanced, standards-based tools to enable more efficient de-identification of medical image data for public consumption.
Crediting data generators for data and for data reuse
We freely provide standards-based Digital Object Identifiers (DOIs) for each of the Cancer Imaging Archive’s data collections and to researchers using customized data cohorts to enhance research reproducibility and validation, as well as to encourage data submissions from academic researchers.
The DOIs are frequently used to reference data in peer-reviewed publications, support data-use tracking, and provide authorship citations for use in academic CVs.
A resource for the global cancer imaging community
The Cancer Imaging Archive has become a vital resource known throughout the global cancer imaging community, having collected data from over 112 institutions and having served over 1.1 million users from 224 countries and regions. On average, users download more than 2.5 petabytes of data annually.
The archive is a data publisher and recommended repository for Nature, PLOS One, Medical Physics, Elsevier and other leading journals, and over 1,900 peer-reviewed publications that leverage TCIA data have been indexed.
We provide regular updates on social networks and hosts a wide variety of TCIA-centric sessions during annual meetings of the Radiological Society of North America to stimulate interest and cross-fertilize ideas. We publish a TCIA newsletter distributed to 8,000 recipients each month.
Imaging-proteogenomics research support
The Cancer Imaging Archive supports a research community that seeks to connect cancer phenotypes to genotypes. To accomplish this, the archive hosts data sets that connect clinical images with patient genomic data and proteomic data.
The archive is part of National Cancer Institute programs that are collecting medical and pathological images matched to proteomic, as well as genomic, clinical, and pathological data.
We provide leadership, expertise, and imaging data support to National Institutes of Health program activities, including:
- Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) Network
- Cancer Moonshot Biobank
- The Cancer Genome Atlas (TCGA)
- Clinical Proteomic Tumor Analysis Consortium (CPTAC)
- Informatics Technology in Cancer Research
- National Clinical Trials Network (NCTN)
- National Lung Screening Trial (NLST) Data Portal
- Quantitative Imaging Network
As we work on expanding the Cancer Imaging Archive's offerings, we are also trying to expand data sharing capabilities in many of the initiatives we collaborate on.
The archive has been able to absorb and join images from both arms of the NLST trial from the American College of Radiology Imaging Network (ACRIN) and the Lung Screening Study group.
We are establishing first-of-its kind enterprise clinical imaging de-identification and sharing systems, including digital pathology data sharing, within and between the APOLLO Network's collaborators. We are participating in National Cancer Institute efforts to create a Cancer Research Data Commons infrastructure.
Our team also participated in the National Institutes of Health’s COVID-19 pandemic response by providing researchers with five SARS-CoV-2 datasets.
Our capabilities and specializations
Supporting the imaging research community
Our team ensures the research community has the tools and components to use the archive of medical images to its fullest. This includes adding labeled elements to imaging datasets, which scientists can use to develop automated image-analysis approaches.
Design and implement analysis and annotation projects
Promote best practices for sharing scientific data within the research community
Support imaging data sharing in National Cancer Institute grant research networks
Developing new technologies and methods
While managing the publicly accessible resource, we support innovation to enhance The Cancer Imaging Archive and its uses for researchers.
Radiomics
Image characterization
Artificial intelligence and deep learning