I am a final year PhD student in EECS at MIT. I am currently applying for faculty positions in Statistics and Computer Science (as of Fall 2023). I am extremely fortunate to be advised by Caroline Uhler and David Sontag.
Ny research focuses on building the statistical and computational foundations for AI-driven decision-making in scientific applications, and I work on methods for causal structure learning, experimental design, and causal representation learning.
As a member of the Eric and Wendy Schmidt Center, my work is grounded by problems in cellular biology, e.g. predicting the effect of a drug in novel cancer cell lines.
Teaching and Organizing. I really enjoy teaching, mentoring, and organizing academic initiatives. Check out the lecture notes and recordings for the causality class that I developed for MIT’s 2023 January term, and join the ongoing talk series, Causality, Abstraction, Reasoning, and Extrapolation (CARE), that I am co-organizing.
Communication. I believe that communicating technical material is one of the most important and most challenging activities in a research career. As a member of the EECS Communication Lab, I coach researchers on how to communicate more effectively.
Software. I like writing good, easy-to-use code (when time permits 😅), and I am the main developer of causaldag, a Python package for creating, manipulating, and learning causal graphical models. Shoot me an email to chat about the package or my work!
MEng in Electrical Engineering and Computer Science, 2019
Massachusetts Institute of Technology
BSc in Electrical Engineering and Computer Science, 2018
Massachusetts Institute of Technology
Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our identifiability results into a practical method to recover the shared latent causal graph. Moreover, we study how multiple domains reduce errors in falsely detecting shared causal variables in the finite data setting.
An important problem across disciplines is the discovery of interventions that produce a desired outcome. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effect of interventions on the system, is critical in order to identify desirable interventions efficiently. We develop an iterative causal method to identify optimal interventions, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean. We formulate an active learning strategy that uses the samples obtained so far from different interventions to update the belief about the underlying causal model, as well as to identify samples that are most informative about optimal interventions and thus should be acquired in the next batch. The approach employs a Bayesian update for the causal model and prioritizes interventions using a carefully designed, causally informed acquisition function. This acquisition function is evaluated in closed form, allowing for efficient optimization. The resulting algorithms are theoretically grounded with information-theoretic bounds and provable consistency results. We illustrate the method on both synthetic data and real-world biological data, namely gene expression data from Perturb-CITE-seq experiments, to identify optimal perturbations that induce a specific cell state transition; the proposed causal approach is observed to achieve better sample efficiency compared to several baselines. In both cases we observe that the causally informed acquisition function notably outperforms existing criteria allowing for optimal intervention design with significantly less experiments.
Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessary for identifiability - if one latent variable is missing an intervention, we show that there exist distinct models that cannot be distinguished. Conversely, we show that a single intervention on each latent variable is sufficient for identifiability. Our proof uses a generalization of the RQ decomposition of a matrix that replaces the usual orthogonal and upper triangular conditions with analogues depending on a partial order on the rows of the matrix, with partial order determined by a latent causal model. We corroborate our theoretical results with a method for causal disentanglement that accurately recovers a latent causal model.
Many real-world decision-making tasks require learning causal relationships between a set of variables. Typical causal discovery methods, however, require that all variables are observed, which might not be realistic in practice. Unfortunately, in the presence of latent confounding, recovering causal relationships from observational data without making additional assumptions is an ill-posed problem. Fortunately, in practice, additional structure among the confounders can be expected, one such example being pervasive confounding, which has been exploited for consistent causal estimation in the special case of linear causal models. In this paper, we provide a proof and consistent method to estimate causal relationships in the non-linear, pervasive confounding setting. The heart of our procedure relies on the ability to estimate the confounding variation through a simple spectral decomposition of the observed data matrix. We derive a DAG score function based on this insight, prove its consistency in recovering a correct ordering of the DAG, and empirically compare it to existing procedures. We show improved performance on both simulated and real datasets by explicitly accounting for both confounders and non-linear effects.
In this review, we discuss approaches for learning causal structure from data, also called causal discovery. In particular, we focus on approaches for learning directed acyclic graphs (DAGs) and various generalizations which allow for some variables to be unobserved in the available data. We devote special attention to two fundamental combinatorial aspects of causal structure learning. First, we discuss the structure of the search space over causal graphs. Second, we discuss the structure of equivalence classes over causal graphs, i.e., sets of graphs which represent what can be learned from observational data alone, and how these equivalence classes can be refined by adding interventional data.
Consider the problem of determining the effect of a drug on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable - given a large number of different actions (drugs) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the result for every pair while only having to perform experiments on a small subset of pairs. This task, which we label “causal imputation”, is a generalization of the causal transportability problem. In this paper, we provide two main contributions. First, we demonstrate the efficacy of the recently introduced synthetic interventions estimator on the task of causal imputation when applied to the prominent CMAP dataset. Second, we explain the demonstrated success of this estimator by introducing a generic linear structural causal model which accounts for the interaction between cell type and drug.
We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define “latent factor causal models” (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters.
We study the problem of maximum likelihood estimation given one data sample (n=1) over Brownian Motion Tree Models (BMTMs), a class of Gaussian models on trees. BMTMs are often used as a null model in phylogenetics, where the one-sample regime is common. Specifically, we show that, almost surely, the one-sample BMTM maximum likelihood estimator (MLE) exists, is unique, and corresponds to a fully observed tree. Moreover, we provide a polynomial time algorithm for its exact computation. We also consider the MLE over all possible BMTM tree structures in the one-sample case and show that it exists almost surely, that it coincides with the MLE over diagonally dominant M-matrices, and that it admits a unique closed-form solution that corresponds to a path graph. Finally, we explore statistical properties of the one-sample BMTM MLE through numerical experiments.
Transforming a causal system from a given initial state to a desired target state is an important task permeating multiple fields including control theory, biology, and materials science. In causal models, such transformations can be achieved by performing a set of interventions. In this paper, we consider the problem of identifying a shift intervention that matches the desired mean of a system through active learning. We define the Markov equivalence class that is identifiable from shift interventions and propose two active learning strategies that are guaranteed to exactly match a desired mean. We then derive a worst-case lower bound for the number of interventions required and show that these strategies are optimal for certain classes of graphs. In particular, we show that our strategies may require exponentially fewer interventions than the previously considered approaches, which optimize for structure learning in the underlying causal graph. In line with our theoretical results, we also demonstrate experimentally that our proposed active learning strategies require fewer interventions compared to several baselines.