Hello! It seems you’ve found my digital time capsule.
While this site is not my primary hub, I’ve preserved it as a reminder of where it all began. If you’re looking for my up-to-date professional info, please see my LinkedIn Profile.
Thanks for stopping by—I hope you enjoyed this brief trip down memory lane!
I graduated with a Ph.D. in Electrical Engineering from Stanford in April 2020. I worked on statistics and machine learning with applications in biology and medicine, supervised by Professor Chiara Sabatti in the Departments of Statistics and Biomedical Data Science.
My projects have been highly interdisciplinary. I have also had multiple long-term collaborations with Professor Serafim Batzoglou in Computer Science and Professor Calvin Kuo in Medicine. I have been fortunate to apply my training in both engineering and statistics to new biotechnologies during my internships at 10x Genomics and Illumina.
I received my M.S. degree in Statistics from Stanford University and my B.S. degree in Electrical and Computer Engineering from Olin College of Engineering, where I worked with Professor Siddhartan Govindasamy, Professor Sarah Spence Adams, and Professor Denise Troxell. During my undergraduate studies, I was also a visiting research student with Professor Matthew McKay at Hong Kong University of Science and Technology.
I have had the good fortune to have many fruitful collaborations in academia and in industry. These projects would not have been possible without my extremely talented and hard-working collaborators.
The Gene Ontology (GO) is one of the most popular biological knowledge databases for discovery-based omics research. Yet, the complexity of its data structure, represented as a large Directed Acyclic Graph (DAG) with tens of thousands of nodes, and its continuously evolving annotations can limit our interpretation and exploration in many applications.
I designed new algorithms and developed a full-stack software tool (AEGIS) to visualize the GO within an interactive framework. AEGIS overcomes the limitation of existing layout strategies that determine GO term specificity in the hierarchy by integrating both the GO structure and gene annotations into a new DAG layout. I also proposed an interactive visualization tailored for large DAGs by devising a focus-and-context framework, which can be used to navigate the DAG on a fine-grained level, while preserving contextual information of the entire DAG.
Website:
AEGIS and its visualization applications
Paper: Methodology behind AEGIS
Report: Project on D3 visualization
Single-cell technologies are a powerful means to measure gene expression levels of individual cells and to reveal previously unknown heterogeneity and functional diversity among cell populations. However, unknown biological variability and high measurement noise can make it difficult to analyze single-cell data.
Across multiple projects, I have developed new unsupervised learning (dimension reduction and clustering) techniques that are robust across sequencing platforms, analyzed over 50 data sets to build a comprehensive database, and contributed to a number of computational and statistical pipelines for new biotechnologies and applications. A lot of our techniques also extend to other biological data analysis. I have worked on single-cell applications that span immunology, oncology, and stem cell biology.
Paper 1: The tumor immune microenvironment
Paper 2: Long-term expansion of primary hepatocytes
Paper 3: SIMLR: kernel-based similarity learning
Paper 4:
Single-cell RNA-seq database
Paper 5: 10x Genomics droplet technology
Paper 6: Extended application to Hi-C data
Code 1: R, Python and MATLAB code for SIMLR
Code 2: R scripts for 10x droplet technology
The Genotype-Tissue Expression (GTEx) project aims to discover associations between individual genetic variance and the gene expression of different tissues. It consists of 53 types of human tissues from healthy, postmortem donors who were densely genotyped. These tissues are costly to acquire, so one of the main challenges is detecting signals with a very limited sample size.
I developed systematic analysis pipelines to conduct expression quantitative trait loci (eQTLs) studies where the goal is to identify tissue-specific gene expression that is highly correlated with genetic mutation.s With my collaborators, we proposed a selective inference analysis workflow to account for selective biases when analyzing a large number of genes, which can provide accurate confidence interval coverages with limited sample sizes. Additionally, I worked on gene function prediction problems with the same dataset as part of a data mining course project.
Paper: Selection-adjusted Effect-size Estimation
Report: Project report on gene function prediction
I am grateful to my undergraduate mentors who patiently taught me everything from scratch and got me excited about research early on. This theoretical foundation built my intuition for many applied projects in other fields during my graduate studies.
The L(2,1)-labeling of a graph is an extension of the classical graph coloring problem. It is motivated by assigning frequencies to transmitters in a radio network. By investigating a wide family of graphs and enumerating many example graphs, we found general labeling patterns and unique graph structures. In addition, we proved a conjecture in a related coloring problem.
Paper 1: Amalgamation Graphs
Paper 2: Edge-path Replacement Graphs
Paper 3: Injective Labeling of General Graphs
Paper 4: Cartesian Products of Complete Graphs
Paper 5: Generalized Flowers
Using stochastic geometry for the theoretical analysis of wireless networks, we statistically characterized the performance of multi-antenna receivers under more general network topologies than what was available at the time. In addition, analyzing the impact of carrier frequency offsets on multi-user systems that apply orthogonal frequency-division multiplexing, I derived analytical results that determine the impact of unsynchronized interferers.
Paper 1: Non-homogenous Poisson Networks
Paper 2: Networks with Clustered Interferers
Paper 3: Multi-antenna OFDM Systems
Paper 4: Doubly Stochastic Networks