I’m a PhD student at the university of Toronto working to create advances at the intersection of machine learning and genomics. I am passionate about the subject because the consequences of discovery are so significant - biology is all around us and yet we understand so little of it.
Prior to starting my PhD I worked as a computational biologist at Deep Genomics where I worked on translating foundational research in machine learning and genomics into pre-clinical applications.
My PhD projects are in the areas of self-supervised contrastive training, and data efficient learning for biological sequences. I am broadly curious and would love to discuss new and ongoing projects.
Y Combinator (S25)
Co-founded blank.bio, we are building RNA foundation models, and were accepted into the Y Combinator Summer 2025 batch.
bioRxiv (in review)
We present mRNABench, a comprehensive benchmarking suite for mature mRNA biology that evaluates the representational quality of embeddings from self-supervised nucleotide foundation models. *Denotes co-first authorship.
bioRxiv (in review)
Orthrus is a Mamba-based mature RNA foundation model pre-trained using a novel self-supervised contrastive learning objective with biological augmentations from splice isoforms and orthologous genes. *Denotes co-first authorship. †Denotes co-supervising authorship.
Advances in Neural Information Processing Systems (NeurIPS 2024)
We analyze the scaling behavior of message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs, demonstrating that GNNs benefit tremendously from increasing scale.
Advances in Neural Information Processing Systems (NeurIPS 2024)
We propose MolPhenix, a model that learns a joint latent space between molecular structures and cellular morphology, significantly improving zero-shot molecular retrieval for drug discovery. *Denotes co-first authorship.
NeurIPS 2024 Workshop - Foundation Models for Science
🏆 Best Paper Award
ICML 2023 Workshop on Computational Biology
Presented at the ICML Workshop on Computational Biology. *Denotes co-first authorship.
ICML 2022 Workshop on Pre-training
We evaluate minima sharpness by taking an adversarial step and measure the change in loss.
Bioinformatics / ISMB 2022
Our publication was accepted to Bioinformatics and I got to present our work at ISMB 2022.
University of Toronto
I started my PhD at University of Toronto with Brendan Frey and Bo Wang!
NPJ Genomic Medicine
We demonstrated that a specific ATP7B variant, prevalent in Wilson disease, causes a splicing error leading to loss of function, clarifying its pathogenic mechanism.
In this work we train a Mamba based model to learn mature mRNA representations using contrastive learning. Our augmentations consist of orthologous, and alternatively spliced transcripts.
We worked with folks from Recursion and Valence to create MolPhenix! It's a novel approach for learning the effects of concentration and molecules on cell morphology using large-scale phenomics data. To do that we innovate on the contrastive learning objective with the S2L loss.
This is a small explainer on IsoCLR or Splicing Up Your Predictions with RNA Contrastive Learning!
I'd like to give a brief explanation for the motivation and the work done in our publication Concerto: a graph neural network approach for molecule carcinogenicity prediction.
We recorded a podcast with my friend Erik Drysdale covering advancements in genomics over the last 20 years, state of predictive system in molecular biology, and a little bit of personal history!
MLCB 2024 Oral Presentation
An oral presentation of the Orthrus paper at the Machine Learning in Computational Biology (MLCB) workshop.
GenBio Invited Talk (2024)
An invited talk at GenBio discussing MolPhenix and its applications in phenomolecular retrieval.
ISMB 2022 Oral Presentation
Pre recorded presentation of the Concerto paper prepared for the ISMB 2022 conference.