Hello, world! I’m a software engineer focused on search, database systems, and statistical computing. I work at Elastic on the Elasticsearch search engine, and serve as an Apache Lucene committer and PMC member. I’m also the author of the generalized random forests (grf) package.
I love building software that’s empowering and enjoyable to use. I’m an open source enthusiast, and shared my experience joining the open source search community in Finding a home (and career) in the open source community.
Before Elastic I worked at Palantir Technologies, where I led development for the federated search framework.
There’s been a surge of interest in vector search, thanks to a new generation of machine learning models that powerfully represent text and other content as vectors. I helped introduce k-nearest neighbor search in Lucene and Elasticsearch, opening up new possibilites for ranking and recommendations.
Causal inference allows for determining the effect of an action on a larger system. The generalized random forests (grf) method combines insights from statistics and machine learning to enable causal analysis. The associated software package is becoming a popular choice for social scientists investigating causal effects.
During my Master’s degree I researched relation extraction and knowledge base population as part of the Stanford Natural Language Processing group. Our work focused on non-traditional supervision techniques, including multi-label learning, partial supervision, as well as methods to address labelling errors in training data.