<aside> đź”– Leveraging advanced computational techniques, we are now able to unlock new insights into cellular behavior, transforming drug discovery, disease modeling, and our understanding of complex biological processes.
</aside>
Our team, Morphologic AI, aims to use computer vision and deep learning techinques to better understand the working of human cells. The goal of our project is to reveal hidden patterns in cellular structures, which could unlock new ways of understanding and manipulating how cells respond to diseases, drugs and genetic changes.
Typically, researchers rely on fluorescent markers to study cellular behavior. However, techbio company Recursion has shown that machine learning (ML) models can predict cellular features from simple light microscopy images—no fluorescent annotations needed. The catch? These "black-box" ML embeddings are powerful but hard to interpret biologically, which limits how useful they are to the broader biotech community. Interpretability, the ability to explain an ML models’ results is a huge focus of ML research. We aim to make these cellular features more interpretable which has been proven to improve prediction results and allows us to further build on these results. Our project tackles this interpretability problem by segmenting and analyzing individual cells from Recursion's HUVEC datasets. By mapping out biologically explainable morphological features, we can better understand how cells react to treatments and compare these insights with existing machine learning embeddings.
By diving into the morphological effects of the billions of molecules in Recursion’s datasets, we’re offering a new layer of understanding when it comes to drug clustering and the critical form-function relationships that help create safer, more effective therapies. In the long run, we’re building a pipeline that can embed and analyze cellular data across various cell types, diseases, and drug treatments—paving the way for the future of personalized medicine and biotech innovation.
This project doesn’t just make cellular morphology more interpretable, it opens the door for machine learning to make an even bigger impact on drug discovery by bridging the gap between computational models and biological understanding.
For our goal of creating biologically meaningful analysis from the cellular imaging, we designed and implemented a comprehensive Python pipeline tailored for high throughput processing and analysis of human umbilical vein endothelial cell (HUVEC) images.
We worked with three datasets, each presenting unique challenges in terms of scale and complexity.: