Teaching

My primary educational goal is to enhance statistical literacy and communication among my students. To me, statistical literacy means combining careful data analysis with scientific expertise to grapple with problems. Given the quantity of data available today, I want my students to learn how to engage in the type of quantitative reasoning and decision making necessary for understanding applications in the sciences, business, and public policy. Going beyond the didacticism of basic statistical literacy, I strive to inform students about the fundamental role statistics plays in the comprehension of real-world phenomena across disciplines and to motivate them to use this knowledge in their daily lives. Finally, I endeavor to cultivate curiosity in my students and to create an open and nurturing learning environment. This conception of my role as a pedagogue has informed the choices I have made in the design and implementation of my courses, and I work every day to inspire my students to develop a similar view. Beyond statistics, I feel strongly that it is important to connect with students from backgrounds that do not come to college with the same financial resources, cultural capital, and training as more privileged demographics.

As part of the Young Scholars Institute at the INET Hong Kong conference from April 2-3, 2013, Darren Homrighausen and I taught a course on statistical learning. The slides, code, data, and other materials from that class are available here. I retaught the course in New York February 24-26, 2015. Those materials are available here.

This is a course in advanced data analysis. Based on the theory of linear models covered in S431, this course will focus on applying many types of data analysis methods to interesting datasets. The focus will be dealing with and describing data rather than on particular methods per se.

The course combines analysis with methodology and computational aspects. It treats both the ``art'' of understanding unfamiliar data and the ``science'' of analyzing that data in terms of statistical properties. The focus will be on practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own lives.

I expect that the material should be appropriate and interesting to students coming from either statistics or informatics/computer science. I will make an effort to provide necessary background so that lectures are accessible to both populations.

Statistical learning theory is a burgeoning research field at the intersection of probability, statistics, computer science, and optimization that studies the performance of computer algorithms for making predictions on the basis of training data.

The following topics will be covered: basics of statistical decision theory; concentration inequalities; supervised and unsupervised learning; empirical risk minimization; complexity-regularized estimation; generalization bounds for learning algorithms; VC dimension and Rademacher complexities; minimax lower bounds; online learning and optimization.

Along with the general theory, we will discuss a number of applications of statistical learning theory to signal processing, information theory, and adaptive control.

This course is intended to help Statistics Ph.D. students learn to read academic literature, write statistical papers, and present their findings orally. It is a required course leading to the data analysis qualifying exam, and is generally open only to statistics students.

This is a course in advanced machine learning. The term "statistical" in the title reflects the emphasis on statistical analysis and methodology, which is the predominant approach in modern machine learning.

The course combines methodology with theoretical foundations and computational aspects. It treats both the "art" of designing good learning algorithms and the "science" of analyzing an algorithm's statistical properties and performance guarantees. Theorems are presented together with practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own research.

The course includes topics in statistical theory that are now becoming important for researchers in machine learning, including consistency, minimax estimation, and concentration of measure. It also presents topics in computation including elements of convex optimization, variational methods, randomized projection algorithms, and techniques for handling large data sets.

I expect that the material should be appropriate and interesting to students coming from either statistics or informatics/computer science. I will make an effort to provide necessary background so that lectures are accessible to both populations.

Mathematical introduction to major areas of probability theory and statistics including set theory and VC dimension, random variables, distribution functions, expectations, exponential families, inequalities and concentration of measure, convergence, estimation theory.

Mathematical introduction to major areas of statistical theory including hypothesis testing, confidence intervals, density estimation, nonparametric regression, classification, and prediction, with special emphasis on minimaxity, tuning parameter selection, and risk.

This course deals with the basic ideas and methods of analyzing data and making inferences about an unknown population based on information contained in a sample. We will study the formalisms behind frequently used statistical methods and develop a link between statistical theory and practice. In addition, we will emphasize the application of statistical methods and the interpretation and analysis of data. The course objectives are as follows:

- To introduce the basic ideas and methods that underlie the mathematical theory of statistics.
- To develop skills in the applications of statistical methods to problems in the sciences and the social sciences.
- To lay the foundations for upper-level classes in statistics, economics, engineering, etc.
- To develop a degree of comfort with statistical software.