Student Blog Series: Uncovering Molecular Patterns Using AI

Graphic showing figures from article.

Cells rely on complex molecular machinery to perform essential functions such as energy production and protein synthesis. To understand how these molecular machines impact health and disease, scientists can study their organization by using state-of-the-art cryo-electron tomography (cryo-ET) technology to acquire high-resolution images of native cellular landscapes. This approach is exciting because it allows researchers to directly observe how and where proteins interact with each other within the cell.

However, locating and identifying proteins at physiological concentrations within crowded cellular environments poses significant technical challenges, and produce massive amounts of data.

In new research published on September 9th in Nature Methods, Duke Computer Science Ph.D. students Qinwen (Wendy) Huang and Ye Zhou developed MiLoPYP, a machine learning algorithm that can mine the interior of cells and help identify rare proteins found at low concentrations. 

To address the sheer size of the data and the molecular crowdedness of cellular environments, MiLoPYP needed to be able to merge information from Terabytes of data, which is where the study hit a barrier: the amount and complexity of the data were simply overwhelming.

To address this, Huang and Zhou built a deep learning framework to quickly sift through large amounts of data in search of repeating patterns that would later be extracted and used to visualize proteins at molecular-level detail using nextPYP, another tool developed in the lab of Alberto Bartesaghi, associate professor of Computer Science.

After confirming that MiLoPYP could simultaneously identify the presence of multiple species, such as ribosomes and microtubules, the question was: what was the smallest population of a given molecular machinery could MiLoPYP uncover? To answer this, the research team analyzed data from native mammalian cells.

Surprisingly, MiLoPYP revealed a previously unreported minor population of ATP synthase, a miniature rotating motor responsible for cell energy production. These results further demonstrated MiLoPYP's ability to identify structural patterns that occur so infrequently that they would be easily missed.

Bartesaghi said, “MiLoPYP has been game-changing in our lab as we can now routinely search through thousands of cellular tomograms to uncover patterns and protein targets that we simply could not see before.”

The research team is excited to see how MiLoPYP can be applied to analyze other challenging cellular datasets. Bartesaghi and his lab work on developing methods to extract high-resolution information from cryo-ET data, and future research in his group will focus on exploring cellular landscapes in greater detail using the new tools.

Contact Information

Lead authors: Dr. Qinwen (Wendy) Huang and Dr. Ye Zhou

Corresponding author: Dr. Alberto Bartesaghi

Bartesaghi Lab, Department of Computer Science, Duke University

Reference: Huang et al., MiLoPYP: self-supervised molecular pattern mining and particle localization in situ, DOI: 10.1038/s41592-024-02403-6, Nature Methods (2024).