Computing collaboration makes massive microscopy analyses possible
Posted 5/11/2022by Samuel Lopez
On any given day, microscopist Susan Lea, D.Phil., and her team at NCI at Frederick collect more data than can fit on any standard hard drive, creating a challenge for data storage and analysis.
Their average data collection is 5 terabytes, a quantity roughly 20 times the storage capacity of a 256-gigabyte laptop or the equivalent of storing about 1,250 movies.
The data come from a state-of-the-art Titan Krios microscope, a towering instrument that Lea’s team uses for cryo-electron microscopy (cryo-EM). Their unit is tremendously powerful, capable of capturing images at atomic-scale resolution. Since joining NCI at Frederick in 2020, Lea and her team in the Center for Structural Biology, part of NCI’s Center for Cancer Research, have used it to study the 3D structures of important molecular targets.
But the raw cryo-EM data obtained are the first step in the process. Only through rigorous refinement of the data can microscopists obtain the sought-after high-resolution images for which the technology is known.
The enormous scale of collected data requires an equally robust analysis, but it isn’t practical to upload mountains of files to an off-site computing cluster, such as the National Institutes of Health’s more than 105,000-processor Linux cluster, Biowulf in Bethesda, for analysis.
“[I’d] just be sitting there all the time waiting for my multiple terabytes of data to copy from one place to another so I could do something to them,” said Lea, who is chief of the Center for Structural Biology in addition to leading her laboratory.
The analysis must happen in Frederick with as little file movement as possible. Now it can, thanks to a partnership with the Enterprise Information Technology Directorate (EIT) at the Frederick National Laboratory.
Upgrades elevate computing capability
Jay Knight, director of the IT Operations Group, leads EIT’s infrastructure efforts to support Lea’s cryo-EM work. The project began in the spring of 2020, eight months before Lea and her laboratory arrived in Frederick from the University of Oxford in the United Kingdom. The long lead time was necessary, Knight said, since preparations and initial upgrades to the IT network took a minimum of six months.
Knight’s team, specialists covering areas from high-performance computing to networking, installed new, tailored storage on the network. They increased the capacity for data upload to accommodate the volume generated from Lea’s laboratory. A series of firewall adjustments connected the Krios microscope to the network and created a secure location for its data.
Part of the computing environment had to be reconfigured for the analyses, with Frederick’s own newly resurrected high-performance computing cluster receiving an upgrade. All told, Knight said it was a complicated process with a “learning curve.”
“It’s good to learn new and better ways to do anything because we can now take that experience and spread it out to other workflows,” he said.
Partnership puts team at forefront
Though the initial framework is now in place, it’s an ongoing endeavor to not just maintain but also to enhance the process. Lea, Knight and colleagues met weekly since 2020 and remain in close communication about developing needs and opportunities. They continue to enhance the analytical capabilities for leading-edge cryo-EM without burdening the rest of the network.
“The computing [team] have been great, and they've been really interested in establishing a collaboration. I think that's what you need,” Lea said. “This isn't trivial computing to set up, and so we needed IT specialists who are interested in making it … work because you can't just go buy it off the shelf.”
Important contributions are coming from others, too. Hans Elmlund, Ph.D., a regular attendee at the weekly meetings and a senior investigator in the Center for Structural Biology, and his team designed the algorithm that analyzes the data from the Krios.
Their efforts, combined with EIT’s, made it possible for the microscope to upload its data and images while the hours-long scan is underway. This enables the computing cluster to get a jump-start on the analysis as the rest of the scan finishes, providing faster feedback that is crucial.
“When you stop your data collection session, not only do you know that you have collected enough data, but you also have capabilities to sort out the particles that are good from those that are of lower quality so that you can … generate the high-resolution 3D reconstruction straight after data collection is done,” Elmlund said. “We are in the forefront when it comes to this stream-processing aspect.”
For Knight, it’s thrilling to work with such attentive and invested scientists. He believes close partnerships facilitate ways to enhance Frederick’s computing capacity.
While that helps in the short term, it pays even greater dividends in the future: As the improved computing environment becomes available to other scientists in Frederick, more computational work and analyses can occur in-house rather than through Bethesda’s Biowulf or elsewhere. This saves time and effort.
Each improvement also leads to more enhancements. Lea and Knight’s teams already have multiple projects planned.
“Based on our experience with Dr. Lea, … we would like to take that model and see if we can bring it out to the rest of the [Frederick] scientific community, start more partnerships,” Knight said. “We’re here to help.”
Image caption: The support staff for the Frederick FRCE high-performance computing environment, members of EIT’s partnership with Lea’s laboratory. From left: Doug O’Neal, Jonathan Dill, and Geifei Qian.