Hello, world! I’m a software engineer focused on search and data-intensive systems. I currently work on code search and AI-assisted software development at Sourcegraph, and serve as an Apache Lucene committer and PMC member. I’m also the author of the generalized random forests (grf) package.
I love building software that’s empowering and joyful to use. I’m an open source enthusiast, and shared my experience joining the open source search community in Finding a home (and career) in the open source community.
Before Sourcegraph, I worked at Elastic on the Elasticsearch search engine, and at Palantir Technologies. I hold an M.S. in Computer Science and B.S. in Math from Stanford University.
Thanks to a new generation of machine learning models that can powerfully represent text as vectors, there’s been a surge of interest in vector-based semantic search. I led Elastic’s effort to introduce vector search in Lucene and Elasticsearch, helping extend these systems to become powerful “vector databases”.
Causal inference allows for determining the effect of an action on a larger system. The generalized random forests (grf) method combines insights from statistics and machine learning to enable causal analysis. I authored the grf software package, which won an inaugural Stanford Open Source Software prize for its research impact, quality, and dedication to open source principles.
During my Master’s degree I researched relation extraction and knowledge base population as part of the Stanford Natural Language Processing group. Our work focused on non-traditional supervision techniques, including multi-label learning, partial supervision, as well as methods to address labelling errors in training data.
⛰️ Backcountry cooking recipes. I’m an avid backpacker and enjoy finding creative ways to eat well outdoors.
📼 Digital mixtapes. A true child of the 90s, I love making playlists for friends and family.
🤓 Unicode. I’m the proud sponsor of Unicode characters μ and σ.