Projects

Data Science Learning Tools

DataTutor is a project that uses interactive web apps for helping data scientists understand and explore data science code written in R. The package currently implements a tool called Unravel (paper + talk), which enables data scientists to interactively inspect, understand and explore data wrangling code written using a fluent API like the tidyverse.

pygradethis

pygradethis is a Python autograder to facilitate code output and static code checking. This package is inspired by Daniel Chen’s gradethis. It is designed to be minimal and can be used for Python only, or within learnr for Python exercise checking through the gradethispython R wrapper package.

py2r


An interactive lesson using learnr which can be used to teach Python programmers the basics of dataframe manipulation in R (base or tidyverse). The tutorial is different from others because it provides interactivity via code exercises to reinforce facts about R and relates to Python code / output using reticulate. The tutorial also points out “gotchas” when making the transition, as well as pointing out the advantages of tidyverse (dplyr). [Source]

Transfer Tutor


A small prototype web app that I built using JS to study the usefulness of teaching R from the perspective of Python. Participants found that relating R to their more familiar Python language was useful and they found the incremental stepping through the code helpful. Note: You can skip all screens with google forms, which were used for the study itself.

Automatic WAT Discovery


This is an ongoing project to automatically discover code behavior inconsistencies (WATs) between Python and R using corpus from Kaggle. The project makes use of rpy2 to parse through simple one-liner snippets across the Python and R notebooks. Once parsed, the snippets for each language are executed against generated or provided data. Finally, dataframe outputs are compared between Python and R snippets, using basic metrics like row/column dimensions, cell values and syntactic edit distance.