My research focuses on applying differential privacy to problems in economics, statistical hypothesis testing, and adaptive data analysis in machine learning. At a high level, a differential private algorithm limits the sensitivity of the outcome to any individual data entry from an input dataset. We can leverage the stability guarantees of differential privacy to answer new questions in various areas outside theoretical computer science. This requires developing new differentially private algorithms as well as implementing existing techniques in novel ways. The results from my research can be split roughly into two main areas: 1) incorporating privacy as an additional constraint into an otherwise classical problem; 2) using privacy as a tool to solve new problems where privacy may not be a primary concern. Typically in machine learning an analyst wants to be able to draw valid conclusions from a sample that generalizes to the population it was sampled from. The practice of data analysis has become an intrinsically adaptive process, where a particular dataset may be reused for several, adaptively chosen analyses. Unfortunately, much of the existing theory in statistical inference assumes that each analysis is selected independently of the data. Thus applying the classical theory to this adaptive setting leads to spurious results due to overfitting; in fact, this is suspected to be behind the prevalence of false discovery in empirical science [Simmons et al., 2011, Gelman and Loken, 2014, Wasserstein and Lazar, 2016]. In a surprising result from Dwork et al. 2015, they showed that differential privacy can help preserve statistical validity in adaptive data analysis. Building on this connection, my coauthors and I show that differential privacy can be used to perform statistically valid, adaptive hypothesis tests [Rogers, Roth, Smith, Thakkar 2016]. We then explore natural extensions of differential privacy that are better catered to adaptive data analysis, including allowing the privacy parameters to be chosen as a function of outcomes from previously run analyses [Rogers, Roth, Ullman, Vadhan 2016].
See more on this video at https://www.microsoft.com/en-us/research/video/leveraging-privacy-data-analysis/