Research

My research sits at the intersection of statistical theory and computer science methodology and is part of the modern ascendancy of mining "big data" to produce fundamentally novel science from complicated datasets. Specifically, I seek to illuminate the role played by the nature and quantity of regularization as a tool for improved scientific understanding.

Through this lens, my research can be divided into four intersecting areas: (1) computational approximation methodology, (2) model selection, (3) high-dimensional and nonparametric theory, and (4) applications related to these. However, I find this tripartite categorization to be contrived and detrimental to sound statistical analysis. For this reason, the goal of my current and future work is to explore and exploit the connections between these areas rather than to approach them separately. As such, my contributions in the first three categories have been developed out of what I have identified as a pressing need to justify methodology as implemented in applications rather than in a vacuum devoid of empirical motivation. My research program seeks to generate statistical guarantees for the procedures that applied researchers use while also developing methodology for complicated, high-dimensional problems. I believe that this comprehensive perspective is necessary to produce innovative results. Within this context, much of my work involves what is referred to as regularization---the process of mathematically balancing complex but meaningful scientific models with a preference for simple fundamental structures.I frequently collaborate with Darren Homrighausen on projects involving the predictive properties of machine learning methods such as the lasso; principal components analysis and the diffusion map; and sparse, additive state-space models. I also collaborate with George Loewenstein on a number of applied projects in behavioral science. We have investigated relationship between purchases of caloric beverages and interventions intended to decrease such purchases, the association between sexual activity and health/happiness in married couples, and the effect of conflict of interest notification on academic medical referees.

Some of my research is supported by grants (Model Complexity and High-dimensional statistics in Macroeconomic Forecasting) from the Institute for New Economic Thinking and the National Science Foundation (Approximation-Regularization). For links to publications and working papers, see my research page, also my (possibly outdated) CV.

- “Matrix sketching for alternating direction method of multipliers optimization,” (pdf)
- Symposium on Statistics and Data Science, 2018

- “Statistical implications of (some) computational approximations,” (pdf)
- University of Virginia, Department of Statistics, (2018)

- "Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression" (pdf)
- Joint Statistical Meetings, Baltimore, MD (2017)

- "Compressed and penalized linear regression" (pdf)
- Indiana University, Department of Statistics (2017)
- Toyota Technological Institute at Chicago, Machine Learning Seminar (2017)
- Pontificia Universidad Católica del Perú, Department of Mathematics and Statistics (2017)

- "Estimating beta mixing coefficients with histograms" (pdf)
- American Mathematical Society Spring Central Sectional Meeting, Special Session on Dependence in Probability and Statistics (2017)

- "Approximation-regularization for analysis of large data sets" (pdf) (pdf)
- University of Louisville, Department of Bioinformatics and Biostatistics (2016)
- University of California, Davis, Department of Statistics (2016)
- Indiana University Schools of Public Health and Medicine, Department of Biostatistics (2016)
- National Center for Atmospheric Research (2016)

- "Risk estimation for high-dimensional lasso regression" (pdf)
- Joint Statistical Meetings (2016)

- "Approximate principal components analysis of large data sets"
(pdf) (pdf)
- Yale University, Department of Statistics (2015)
- Indiana University SOIC, Intelligent & Interactive Systems (2015)
- Joint Statistical Meetings (2014)

- "Short course on the predictive viewpoint"
- "Statistical machine learning with structured data" (pdf)
- Indiana University, Department of Statistics (2014)

- "Clustering classical music performance" (pdf)(poster)
- Universite Laval, Department of Mathematics and Statistics (2014)
- 15th IMS New Researchers Conference (2013)

- "The lasso, persistence, and cross-validation"(pdf)(poster)
- 30th international Conference on Machine Learning (ICML) (2013)

- "Nonparametric risk bounds for time series prediction" (pdf)(pdf)(pdf)
- Indiana University, Department of Economics (2012)
- Yahoo! Research (2012)
- George Mason University, Department of Statistics (2012)
- Bocconi University, Department of Decision Sciences (2012)
- Indiana University, Department of Statistics (2012)

- "Estimating beta mixing coefficients" (pdf)
- 14th International Conference on Artificial Intelligence and Statistics (AISTATS) (2011)
- American Statistical Association, Pittsburgh Chapter Annual Meeting (2011)

- "Spectral approximation methods: performance evaluations in clustering and classification" (pdf)
- Classification Society Annual Meeting (2011)

- "Generalization error bounds for state-space models: with an application to economic forecasting" (pdf)
- Joint Statistical Meetings (2010)

- Thesis: Generalization error bounds for state-space models
- Proposal

Perhaps somewhat outdated (pdf)