Cancer Imaging Informatics Laboratory

The Cancer Imaging Informatics Laboratory is a support team that manages the Cancer Imaging Archive with the objective to increase public availability of high-quality cancer imaging data sets for research, support National Institutes of Health data sharing requirements for the cancer imaging community, enhance reproducibility in research, and create a culture of open data sharing and collaboration among cancer imaging researchers.

Our laboratory also supports the development of new technologies and methodologies, such as clinical imaging data de-identification and curation, radiomics and image characterization, AI and deep learning, and integrative, multi-disciplinary data analysis (e.g. radiogenomics).

John Freymann Director John.freymann@nih.gov

Scan of patient with colorectal liver metastases

Researcher Resource

Access the Cancer Imaging Archive

The Cancer Imaging Archive de-identifies and hosts a substantial archive of medical images of cancer, which are accessible for download.

Access the resource

Managing the Cancer Imaging Archive

Cancer imaging research requires access to large, standardized, purpose-built imaging collections. Since 2010, the NCI Cancer Imaging Program has counted on our Cancer Imaging Informatics Laboratory to develop, manage, and support the Cancer Imaging Archive (TCIA) to fill the unmet needs of cross-disciplinary image researchers for open access to clinical images.

We provide project management, data curation, data submitter and community relations outreach, and subcontract management.

Each month, over 20,000 unique users visit the archive where they find more than 200 datasets of computed tomography, magnetic resonance imaging, positron emission imaging, x-ray mammography, digitized histopathology slides, and radiation therapy planning imaging studies.

There have been at least 1,800 peer-reviewed publications based upon these TCIA-hosted data, with more likely as most of the collections are open and available for public use.

In addition to supporting the imaging components of major National Cancer Institute data collection initiatives, we lead an advisory group that prioritizes the curation and publishing of researcher-initiated proposals based on how well the data sets fill data gaps to support critical current research for a clinical need, novel/unique datasets, research reproducibility, and investigation of biological hypotheses or other proposed discoveries about the pathophysiological basis of cancer.

200+ datasets
1,800 publication citations
20,000 unique users per month

The Cancer Imaging Archive

Submission and de-identification

The Cancer Imaging Archive provides full research-focused de-identification services and makes its tools and knowledge base available to the scientific community. Since the Cancer Imaging Archive contains a large repository of open-access clinical imaging data, protection of Private Health Information while still preserving the scientific utility of the data is critical.

We have developed robust tools and extensive procedures to transmit, de-identify, and quality assess the medical images submitted to the archive and is staffed with curation experts who review and publish the submitted images. We routinely perform further refinement and testing of advanced, standards-based tools to enable more efficient de-identification of medical image data for public consumption.

Crediting data generators for data and for data reuse

We freely provide standards-based Digital Object Identifiers (DOIs) for each of the Cancer Imaging Archive’s data collections and to researchers using customized data cohorts to enhance research reproducibility and validation, as well as to encourage data submissions from academic researchers.

The DOIs are frequently used to reference data in peer-reviewed publications, support data-use tracking, and provide authorship citations for use in academic CVs.

Jupyter Notebooks

Identifying TCIA datasets of interest, downloading them

Training video geared towards beginner data scientists who have some basic experience working with Linux commands, APIs, Python, and Jupyter.

Python tutorial

Match tumor, organ segmentations to appropriate scans

Notebook focused on steps to identify an example segmentation file, find the corresponding reference series, and visualize them together.

A resource for the global cancer imaging community

The Cancer Imaging Archive has become a vital resource known throughout the global cancer imaging community, having collected data from over 112 institutions and having served over 1.1 million users from 224 countries and regions. On average, users download more than 2.5 petabytes of data annually.

The archive is a data publisher and recommended repository for Nature, PLOS One, Medical Physics, Elsevier and other leading journals, and over 1,900 peer-reviewed publications that leverage TCIA data have been indexed.

We provide regular updates on social networks and hosts a wide variety of TCIA-centric sessions during annual meetings of the Radiological Society of North America to stimulate interest and cross-fertilize ideas. We publish a TCIA newsletter distributed to 8,000 recipients each month.

Imaging-proteogenomics research support

The Cancer Imaging Archive supports a research community that seeks to connect cancer phenotypes to genotypes. To accomplish this, the archive hosts data sets that connect clinical images with patient genomic data and proteomic data.

The archive is part of National Cancer Institute programs that are collecting medical and pathological images matched to proteomic, as well as genomic, clinical, and pathological data.

We provide leadership, expertise, and imaging data support to National Institutes of Health program activities, including:

As we work on expanding the Cancer Imaging Archive's offerings, we are also trying to expand data sharing capabilities in many of the initiatives we collaborate on.

The archive has been able to absorb and join images from both arms of the NLST trial from the American College of Radiology Imaging Network (ACRIN) and the Lung Screening Study group.

We are establishing first-of-its kind enterprise clinical imaging de-identification and sharing systems, including digital pathology data sharing, within and between the APOLLO Network's collaborators. We are participating in National Cancer Institute efforts to create a Cancer Research Data Commons infrastructure.

Our team also participated in the National Institutes of Health’s COVID-19 pandemic response by providing researchers with five SARS-CoV-2 datasets.

Annotations, segmentations & classifications

New analysis results

We encourage the research community to publish their analyses of existing TCIA image collections.

Applying cancer resources to COVID-19

The Cancer Imaging Archive posts COVID-19 imaging data to benefit community

We leveraged our flagship program to share COVID-19 data as quickly as possible.

Our capabilities and specializations

Supporting the imaging research community

Our team ensures the research community has the tools and components to use the archive of medical images to its fullest. This includes adding labeled elements to imaging datasets, which scientists can use to develop automated image-analysis approaches.

Design and implement analysis and annotation projects
Promote best practices for sharing scientific data within the research community
Support imaging data sharing in National Cancer Institute grant research networks

Developing new technologies and methods

While managing the publicly accessible resource, we support innovation to enhance The Cancer Imaging Archive and its uses for researchers.

Radiomics
Image characterization
Artificial intelligence and deep learning

CODEX imaging of hepatocellular carcinoma

This high-dimensional data set allows studies into pathophysiological immune cell interactions for liver cancer.

Patient-derived xenograft of adenocarcinoma-pancreas

PDMR-Texture-Analysis

This collection has imaging data from 175 mice for researchers to develop algorithms using neural networks.

RAS Initiative study shines the spotlight on less-understood KRAS4a

Clinical Monitoring Research Program to support clinical study exploring alternative to cervical cancer screening

Spring 2024 SeroNews

STAG2 Mutations in the Pathogenesis of Human Cancer

vEM 101: volume Electron Microscopy

Nanotechnology Characterization Laboratory

Scientific Standards Hub

2024 Technology Showcase

Morehouse School of Medicine biospecimens to boost diversity of cancer samples for proteogenomic analysis

Access the Cancer Imaging Archive

Managing the Cancer Imaging Archive

Submission and de-identification

Crediting data generators for data and for data reuse

Identifying TCIA datasets of interest, downloading them

Match tumor, organ segmentations to appropriate scans

A resource for the global cancer imaging community

Imaging-proteogenomics research support

New analysis results

The Cancer Imaging Archive posts COVID-19 imaging data to benefit community

Our capabilities and specializations

Supporting the imaging research community

Developing new technologies and methods

CODEX imaging of hepatocellular carcinoma

PDMR-Texture-Analysis