Skills Summary

TLDR? I play in a world-class multi-petabyte data warehouse, build open-source software, conduct interdiscplinary research, and love every second.

Data Science

I excel at leveraging my diverse skillset to build impactful data products, tools, and insights. In addition to technical skills in data engineering, statistics, and predictive analysis, my business experience and research expertise help me ask the right questions and provide thought leadership. I am dedicated to building data assets that are scalable, maintainable, and reproducible. I also use my design, writing, and public speaking skills to close the loop and communicate insights to both technical and non-technical stakeholders.

Software Development

I have honed my software development skills contributing to large proprietary code bases and medium-sized open-source projects. I am fluent in version control, unit testing, and continuous integration, and take the craft of writing readable, maintainable, and correct code very seriously. At Shopify, I've used these skills to build data pipelines and streaming services; at NetLab I made key improvements to the open-source data integration toolkit recordlinkage by adding new features and making a major performance bottleneck up to 97 times faster .

Research Design and Implementation

I have three years' experience designing and implementing investigations of complex social phenomena. Past research includes how supervisors' race and gender affect workplace outcomes, and changes in the intellectual diversity of scientific research teams. I leverage my technical background to approach difficult questions in powerful and novel ways. Past research projects have involved analyzing social networks, topic modelling text, and multi-phase survey research.

Technical Skills

I know the tools of the trade, and think deeply when picking the right ones for the job.

Python

As an experienced Python programmer I have contributed to large operational code bases, built upon medium-sized open source toolkits, and developed small data processing toolkits from the ground up. As a data scientist at Shopify, I use Python daily to process and analyze data. In the past, I have contributed to recordlinkage, an open source entity resolution toolkit.

Apache Spark

I am proficient in building scalable data pipelines with Apache Spark using the Spark RDD, DataFrame, DStream, Structured Streaming, and SparkML APIs. I also have experience debugging high-volume data pipelines in a production environment. Through this work I also have basic familiarity with related technologies such as Apache Kafka, HDFS, and Google Cloud Services.

SQL

I am proficient in the use of SQL for data analysis and reporting, in the context of a world-class data warehouse powered by Google Cloud Services, Apache Spark, and Presto. Despite the language's limitations, I hold my SQL queries to the same high level of quality, maintainability, and readability I strive for in traditional programming languages.

R

I have three years' experience using R's powerful libraries for data analysis and data visualization. In addition to data management, data visualization with ggplot2, and classical statistical inference, I use R for structural social network analysis, textual topic modelling, and psychometrics.

scikit-learn + pandas

I have basic proficiency with scikit-learn having used the toolkit to impute missing variables; encode categorical data; train Random Forest and Logistic Regression models; cross-validate predictive models; and apply predictions. My scikit-learn skills are complemented by significant experience processing and analyzing data with pandas and numpy.

C++

I am proficient in developing small- and medium-sized object-oriented programs in C++. My academic training in C++ included a collaborative software development project, best practices for object oriented software development, and core language features.

Scala

I have basic proficiency in Scala, having completed course-based assignments reviewing the fundamentals of functional and object-oriented programming in Scala.

MatLab

I am proficient in MatLab, having completed course-based assignments involving data interpolation, error analysis, image compression, signal processing, and computational linear algebra.

Miscellaneous

My other technical proficiencies include HTML, Git, Markdown, Latex, Bash, Zsh, and ReStructured Text. My workflow features IntelliJ PyCharm, IntelliJ DataGrip, R Studio, and Atom.