Additional Information

This course covers about 75% of the following topics, depending on the year:

  • basic concepts (introduction to data mining, origins of data mining, data mining tasks, relational databases, transactional databased, data warehouses)
  • data (types of data, data quality, similarity metrics, summary statistics, data preprocessing: cleaning, normalization, reduction, transformation, integration)
  • data warehouse and OLAP technology for data mining (multidimensional data model and OLAP operations, warehouse architecture, implementations and relationship with data mining)
  • association rule mining (basic concepts: frequent itemset generation, rule generation ,apriori and FP-growth algorithms, advanced concepts: graph data, sequential algorithms (Bayesian classification k-nearest neighbor, neural networks, classification and regression trees, support vector machines, ensemble methods, handling biased data, and class-imbalanced data)
  • clustering (partitioning methods: k-means and k-medoids, and heirarchical methods: agglomerative/divisive clustering; density-based, graph-based, prototype-based, model-based clustering, clustering with constraints)
  • anomaly detection (statistical approaches to outlier detection, density-based, proximity-based, clustering-based techniques)
  • mining complex types of data (mining spatial, text, time-series and multimedia data, mining web data, mining graphs, mining streaming data)
  • human factors and social issues (ethics of data mining and social impacts, privacy-preserving data mining, user interfaces, data and result visualization)