## S675: Statistical Learning And High-dimensional Data Analysis

### Class Description

S675 explores a variety of methods for detecting structure in multivariate data sets. Major topics include dimension reduction (principal component analysis, multidimensional scaling, manifold learning), unsupervised learning (k-means clustering, spectral clustering), and supervised learning (linear discriminant analysis, support vector machines, nearest neighbor classification).

### Class Information

**Semester(s): **Fall

**Semester(s) Offered: ** Fall

**Class time: **
Monday, Wednesday, Friday : 1:25-2:15

**Website: **http://mypage.iu.edu/~mtrosset/675.html

### Contact Information

**Instructor: **Michael Trosset - mtrosset@indiana.edu

**Other Contact(s): ** Michael Trosset - mtrosset@indiana.edu

### Other Details

**Sequence: ** S675 is not yet part of a sequence. The Department of Statistics and the School of Informatics & Computing hope to coordinate and organize their various courses on machine learning into a 4-course sequence.

**Prerequisites: ** A course in linear algebra is essential. S675 makes extensive use of matrix notation and several matrix factorizations. Some familiarity with vector calculus is also assumed. STAT-S 710 provides more than sufficient background.

**Algebra Required?: ** Used extensively throughout the course, including proofs and homework assignments.

**Calculus Required?: ** Used primarily for concepts and derivations.

**Recommended follow-up classes: ** The Department of Statistics sometimes offers more advanced courses on related topics, e.g., machine learning and model selection.

**Substantive Orientation: ** Any discipline that is concerned with high-dimensional data. Such data can arise in various ways, often as multiple measurements on each of several objects/subjects, as in text mining of microarray experiments, but also as measurements of pairwise proximi

**Statistical Orientation: ** Most of the methods studied in S675 do not assume probability models.

**Books used: ** Lecture notes and journal articles provided by the instructor. See the S675 web page.

**Applied/Theoretical: ** Intermediate. S675 is closely related to, but somewhat more theoretical than SOIC-B 565 (Data Mining) and SOIC-B 555 (Machine Learning).

**Software Used: **
R

**How the software is used: ** Students write programs that implement the methods studied in S675. They use these programs and/or programs written by others to analyze data.

**Problem Sets: ** Weekly.

**Data Analysis: ** Yes, but primary emphasis is on understanding how the methods work. Small, synthetic data sets often serve this objective better than large, real data sets.

**Presentations: ** Each student writes a paper on a topic related to the topics discussed in class.

**Exams: ** No

**Keywords: ** machine learning, multivariate structure, dimension reduction, cluster analysis, classification.