Fall 2015 -- CSCE 689: Data Mining



Meeting times and location      TR 08:00 am - 09:15 am, THOM 122

Course Description and Prerequisites:

It's the age of big data! We face great challenges to harness big data, make sense of the data, and turn data into knowledge. This course will cover basic concepts, representative algorithms, and applications of data mining. With the rapid advance of computer and internet technologies, a plethora of data accumulates and presents many challenges of big data. Data will not turn into knowledge no matter how big it is and how long it is kept. Mining nuggets from data will help understand interesting patterns buried in data and add values to what we are currently doing in many areas. Data mining is a process that finds the valuables among the mountains of data. We will review and examine the present techniques and theories behind them, and explore new and improved techniques for real world data mining applications. The course is arranged to encourage active class participation, creative thinking, practical problem solving, exploration of novel ideas, and hands-on project development among the participants. A course project on some specific aspect of this emerging field will be given to explore some in-depth issue(s) and gain unique data mining experience and insights.

This is a graduate level course. While there are no official pre-requisites, it may be beneficial for students to have had previous exposure to linear algebra and basic probability theory.

Learning Outcomes:

The goal of this course is deriving a comprehensive understanding of fundamental issues, techniques, applications and future directions of data mining. In particular, by the end of the semester students will be able to:

  • Understand the basic concepts, definition, basic tools and research issues of data mining;
  • Learn classical and the state-of-the-art algorithms in data mining;
  • Understand popular applications in data mining;
  • Learn interesting future research directions and industrial opportunities of this topic.

Instructor Information:

Name Xia "Ben" Hu
Telephone number 979-845-8873
Email address hu@cse.tamu.edu
Office hours TR 9:30 am - 11:30 am
Office location 330B H.R. Bright Building

Textbook and/or Resource Material:

"Introduction to Data mining"
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar
Addison Wesley

"Data Mining: Concepts and Techniques"
Jiawei Han and Micheline Kamber
Morgan Kaufmman Publishers

Grading Policies:

Grading Scale: A = 90-100%, B = 80-90%, C = 70-80%, D = 60-70%, F = below 60%.

The course grading policy is as follows:

  • Class participation and quizzes - 5%
  • Homework assignments -20%
  • Paper Presentations - 15%
  • Project - 40%
  • Exams - 20%

Attendance and Make-up Policies:

Ten quizzes will be randomly taken in the semester and are used to measure the attendance as well. Five quizzes are required for full score. As long as more than five quizzes are received successfully, no extra evidence is needed. Otherwise an excused absence is required. If the number of attendances is less than five, we will deduct one point for each absence. The specific excused absences and rules can be found at http://student-rules.tamu.edu/rule07

Course Topics:

This course will mainly cover the following topics:

  • Patterns, Data
  • Attributes, Missing Value, Aggregation, Sampling, Feature Selection, Dimensionality Reduction
  • Rule-Based Classifiers, Decision Tree
  • Bayes Rules, Naïve Bayes
  • SVM, Basic Optimization
  • Confusion Matrix
  • Similarity Metrics
  • K-means
  • Hierarchical Clustering, MIN, Single Linkage, MAX, Complete Linkage
  • Homophily, Social Influence, Reciprocity
  • Community Detection
  • Conference papers

Other Pertinent Course Information:

Homework: In addition to some regular homework exercises (assignments and quizzes), students are encouraged to participate in classroom discussions and Q&A.

Project: Students are expected to work on some programming projects. We will discuss the format in our first class. The evaluation of the project consists of progress report, project presentation and/or demonstration, and a written report.

Americans with Disabilities Act (ADA):

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, in Cain Hall, Room B118, or call 845-1637. For additional information visit http://disability.tamu.edu

Academic Integrity:

"An Aggie does not lie, cheat, or steal, or tolerate those who do."

Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System. For additional information please visit: http://www.tamu.edu/aggiehonor/