Scientists from the National Cancer Institute and the Frederick National Laboratory for Cancer Research are urging medical professionals and researchers to contribute more demographically diverse images to a shared national cancer imaging database, so it better represents the at-large population and does not perpetuate health disparities in analysis of the data.
This becomes even more important as artificial intelligence (AI) programs are trained on the database for potential use by doctors in diagnosing patients and by scientists to develop new treatments.
Data from The Cancer Imaging Archive (TCIA) has already been leveraged in more than 3,000 peer-reviewed articles and is frequently used in studies that advance the accuracy and clinical relevance of cancer imaging through the development of AI algorithms designed to improve the detection, diagnosis, and treatment of cancer.
TCIA hosts a large collection of de-identified medical images of cancer with supporting data on patient outcomes, treatment details, genetic information, and other analyses with most data available to the public. The archive is funded by NCI and managed by FNL. It relies on data contributed by researchers across the country and contains standard-of-care radiology and histopathology imaging from patients participating in multiple NCI-funded efforts to understand the molecular basis of cancer.
The NCI and FNL scientists wrote in May in Radiology: Imaging Cancer that the public health needs are two-fold: First, users need to be better educated about how to access demographically diverse data sets that are already included in the collection, and second, the broader scientific community needs to donate more of these data to this shared resource.
“We believe that by actively working toward improving discoverability of demographic information, engaging the community to submit new data to address these gaps, and providing educational materials to help users find these data in the current system, we can help improve the generalizability and applicability of AI tools developed using our data sets,’’ the NCI and FNL scientists wrote.
For their part, NCI and FNL will work to raise the awareness of the health impact of disparities and encourage data contributions that help address it. They will raise awareness among medical professionals about the value of obtaining demographically diverse data for making clinical decisions.
“Data sets should be representative of the real-world distribution of patients with cancer and include such demographic information along with the images,” said NCI’s Janet Eary, M.D., and Lalitha Shankar, M.D., Ph.D., and FNL’s John Freymann and Justin Kirby.
Their commentary came in response to a paper in the January 2024 issue of the same journal that raised concerns about patient demographics in the imaging archive and the use of those data to train AI software.
The paper by Aidan Dulaney and John Virostko, Ph.D., of the University of Texas at Austin analyzed TCIA demographic data as of April 2023 and compared it with the cancer population of United States.
The scientists found that the median age of TCIA patients was 6.84 years lower than that of the U.S. cancer population and had more female than male patients. Underrepresented groups, when compared with the U.S. cancer population, included American Indians, Alaska Natives, African Americans, and Hispanic people.
On that basis, Dulaney and Virostko expressed concerns that any artificial intelligence software for radiology that was trained on TCIA data may not be well-suited for general clinical use.
Media Inquiries
Mary Ellen Hackett
Manager, Communications Office
301-401-8670