I529 : Machine Learning In Bioinformatics
Machine learning techniques have been successful in analyzing biological data because of their capabilities in handling noisy data noise and in generalization. In this class, we will learn basics about probabilistic models and machine learning techniques. We will focus on probabilistic models (Markov models, hidden Markov models, and Bayesian networks) for biological sequence analysis and systems biology. Other machine learning techniques, such as Naive Bayes, neural networks and support vector machines will only be covered briefly.
Semester(s) Offered: Spring
Instructor: Haixu Tang
Other Contact(s): Haixu Tang
Prerequisites: INFO-I519 or equivalent.
Algebra Required?: Basics.
Calculus Required?: Basics.
Day(s) per week offered: Two lectures and one lab every week.
Books used: Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1999.
Formal Computing Lab?: Yes
Software Used: Python, C/C++
How the software is used: For data analysis and visualization.
Problem Sets: 5 home assignments, including 3-4 programming assignments.
Data Analysis: Several small projects and one group project requiring analyzing large biological datasets.
Presentations: Required for the final group project.
Exams: One midterm and one final.
Keywords: Bioinformatics, hidden Markov model, Bayesian network, Expectation-Maximization algorithm, MCMC.
Comments: The students are required to implement algorithms in C/C++ or Python.