Columbia University students employ AI to predict chemical properties in hands-on drug discovery project

Seven Columbia University graduate students recently completed drug discovery projects at the Frederick National Laboratory for Cancer Research (FNL) as part of their fall 2021 semester. Since 2020, 40 Columbia University students have completed projects with the FNL for their master’s degree coursework.  

The projects represent a partnership between Columbia University’s Industrial Engineering and Operations Research department and the FNL’s Biomedical Informatics and Data Science (BIDS) directorate, and provide the students an opportunity to work on real-world projects in biomedicine using artificial intelligence (AI) and machine learning.  

“The ongoing collaboration with Columbia University continues to impress and remain a vital part of the FNL effort to prepare the next-generation cancer data science workforce,” said Eric Stahlberg, Ph.D., BIDS director. “It truly is amazing to have the talented Columbia students very quickly come up to speed in the essential biology under the mentorship of FNL scientists, dive into real drug discovery challenges, and deliver meaningful insights using AI and machine learning within a semester project.”  

Innovative projects  

This semester’s projects were in support of the Accelerating Therapeutics for Opportunities in Medicine (ATOM) consortium, a public–private partnership dedicated to accelerating drug discovery, of which the FNL is a co-founder. The students worked closely with FNL mentors, Pinyi Lu, Ph.D. and Ryan Weil, Ph.D., and FNL and ATOM technical project manager, Naomi Ohashi.

“Thanks to Columbia Professor Michael Robbins, we have been working with excellent Columbia students on our AI-based drug discovery projects,” said Ohashi. “I’m glad the students found that their data science knowledge could be applicable to cancer research.”   

The students learned how to generate machine-learning-ready datasets and gained hands-on experience on AI-based drug discovery. They developed a workflow, integrating structure-based and data-driven modeling approaches, to accelerate the discovery of a novel compound to block the action of centromere-associated protein-E (CENP-E). The inhibition of CENP-E is promising as a cancer therapy, but known CENP-E inhibitors have exhibited limited efficacy, and none of them have gone beyond phase I clinical trials, meaning there is a lack of real-world data in humans. The students performed large-scale virtual screening and leveraged the ATOM Modeling Pipeline (AMPL), an open-source software pipeline designed for such drug discovery efforts, to predict chemical properties of different ligands. By the end of the semester, they had successfully identified ligands that can inhibit CENP-E.  

“This collaborative training opportunity helps build a future workforce with integrated expertise in data science and drug discovery,” said Lu. “The experience the students gained from this training will certainly transfer beyond their drug discovery projects.”