S675: Statistical Learning And High-dimensional Data Analysis
S675 explores a variety of methods for detecting structure in multivariate data sets. Major topics include dimension reduction (principal component analysis, multidimensional scaling, manifold learning), unsupervised learning (k-means clustering, spectral clustering), and supervised learning (linear discriminant analysis, support vector machines, nearest neighbor classification).
Semester(s) Offered: Fall
Class time: Monday, Wednesday, Friday : 1:25-2:15
Instructor: Michael Trosset - email@example.com
Other Contact(s): Michael Trosset - firstname.lastname@example.org
Sequence: S675 is not yet part of a sequence. The Department of Statistics and the School of Informatics & Computing hope to coordinate and organize their various courses on machine learning into a 4-course sequence.
Prerequisites: A course in linear algebra is essential. S675 makes extensive use of matrix notation and several matrix factorizations. Some familiarity with vector calculus is also assumed. STAT-S 710 provides more than sufficient background.
Algebra Required?: Used extensively throughout the course, including proofs and homework assignments.
Calculus Required?: Used primarily for concepts and derivations.
Recommended follow-up classes: The Department of Statistics sometimes offers more advanced courses on related topics, e.g., machine learning and model selection.
Substantive Orientation: Any discipline that is concerned with high-dimensional data. Such data can arise in various ways, often as multiple measurements on each of several objects/subjects, as in text mining of microarray experiments, but also as measurements of pairwise proximi
Statistical Orientation: Most of the methods studied in S675 do not assume probability models.
Books used: Lecture notes and journal articles provided by the instructor. See the S675 web page.
Applied/Theoretical: Intermediate. S675 is closely related to, but somewhat more theoretical than SOIC-B 565 (Data Mining) and SOIC-B 555 (Machine Learning).
Software Used: R
How the software is used: Students write programs that implement the methods studied in S675. They use these programs and/or programs written by others to analyze data.
Problem Sets: Weekly.
Data Analysis: Yes, but primary emphasis is on understanding how the methods work. Small, synthetic data sets often serve this objective better than large, real data sets.
Presentations: Each student writes a paper on a topic related to the topics discussed in class.
Keywords: machine learning, multivariate structure, dimension reduction, cluster analysis, classification.