CAP 5771 Principles of Data Mining [Spring 2015]

Announcements

The homepage is always under construction. Check the course description and syllabus below to decide if this course suits you. 

[January 14th, 2015]  Course materials (lecture notes, references and assignments) will be posted in Moodle.

[January 12th, 2015]  Notes on the class policy

  • No late homeworks will be accepted, and no make-up for the midterm exam will be given.
  • Each student should complete his/her homework and project assignments independently.
  • All students should turn in the assignments at the beginning of the on-campus class on the due date.
  • Academic misconduct will not be tolerated by the University, nor will it be tolerated in the classroom.

Instructor

Dr. Tao Li, Professor
School of Computing and Information Sciences
Florida International University

Office: ECS 365
Email: taoli AT cs.fiu.edu
Office Hours: Thursday 2:30pm-4:30pm or by appointment. If I am not in ECS 365, you can find me at ECS 251 or ECS 268. 

 

TA

TBA

Meeting Time and Location

Tuesday: 19:50pm-22:30pm, GL139

Course Materials

  • [January 14th, 2015]  Course materials (lecture notes, references and assignments) will be posted in Moodle.

Course Description

Data Mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It has gradually matured as a discipline merging ideas from statistics, machine learning, database and etc. This course is designed to give a graduate-level student an introductory survey to the methodologies, technologies, mathematics and algorithms currently needed by people who do research in data mining or who may need to apply data mining techniques to practical applications. Emphasis will be laid on both algorithmic and application issues.

Course Syllabus (Subject to revision)

  • Mathematical Background for Data Mining
    • Probability Theory
    • Information Theory
    • Basic Linear Algebra
    • Expectation and Maximization
  • Association Mining
    • Frequent Set Mining
    • Sequence Mining
  • Classification
    • Decision Tree Learning
    • Nearest Neighbor
    • Support Vector Machines
    • Bayesian Networks, Maximum Likelihood, Maximum Entropy
    • Feature Selection and Dimension Reduction
  • Clustering
    • Traditional approaches (e.g., K-means, Hierarchical etc.)
    • Spectral Clustering, Matrix Factorization
    • Subspace Clustering
    • Co-clustering
  • Ensemble Methods
    • Classifier Combination
    • Cluster Combination

Prerequisites

Basically students need to know at least a programming language (e.g., C/C++, Java or Matlab etc.). Students entering the class with basic knowledge of probability, statistics and algorithms will be at an advantage, but the class will be designed so that anyone with basic mathematical background can catch up and fully participate.

Format and Grading

The course assignments include projects, written homeworks, paper discussions and presentations. Research projects will be designed to improve the critical analysis and problem-solving skills of students. Class attendance is mandatory for regular students. In addition, occasional quizzes will be given in class. Evaluation will be a subjective process, but it will be primarily based on the students' understanding of the course material. Final grades will be calculated as follows.

Quizzes and Class Participation10%
Midterm Exam30%
Final Project30%
Homework Assignments30%

          For online students, the final grades will be calculated as follows:

Midterm Exam 35%
Final Project 35%
Assignments 30%

 

Policies on Assignments and Exams

  • No late homeworks will be accepted, and no make-up for the midterm exam will be given.
  • All online students must come on Campus to take the midterm exam with in-class regular students at the same time as the in-class examination is being given.
  • Each student should complete his/her homework and project assignments independently.
  • All students should turn in the assignments at the beginning of the on-campus class on the due date.
  • Online students should watch all the lecture videos.
  • Academic misconduct will not be tolerated by the University, nor will it be tolerated in the classroom.

Misc Links

Textbooks and References

Textbook

  • Pang-Ning Tan, Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison Wesley, 2005.

References

  • Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006, Second Edition.
  • Tom Mitchell. Machine Learning. McGraw Hill, 1997.
  • Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
  • Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann , 2003. Available on line at FIU Library .
A lot of reading material from top conferences/journals will be made available online or in class as required. In addition, lecture notes will be available on line.

Code of Academic Integrity:

University Policies:

For academic misconduct, sexual harassment, religious holydays, and information on services for students with disabilities, see :
2015 Tao Li. All rights reserved. last Updated: