CSCE 470: Information Storage and Retrieval (Fall 2020)

Instructor: Ruihong Huang

  • Location: Web-based
  • Time: T/TH 9:45 am - 11:00 am
  • TA: Arash Pakbin
  • Grader: Prathiksha Shivani Ranganath Prasad
  • Instructor Email: huangrh@cse.tamu.edu
  • Instructor Office: 402B HRBB (Web this year)
  • TA Email: a.pakbin@tamu.edu
  • TA Office: Unknown
  • Grader Email: prathiksha_prasad@tamu.edu
  • Credits: 3
  • Instructor Office Hours: Wed 9:00 am - 11:00 am, or by appointment
  • TA Office Hours: M/W/F 3:00 pm - 5:00 pm, or by appointment

Course Description

This course will cover the theory, design and implementation of text-based information retrieval systems, including algorithms and techniques at the core of modern search systems. Specifically, we will learn the key concepts and models relevant to information retrieval and storage, including efficient text indexing, boolean and probabilistic retrieval models, retrieval evaluation, relevance feedback, document classification, learning to rank, document clustering and link analysis. We will implement key retrieval models on top of an open-source search engine system. Prerequisites: students should have had some exposure to basic probability, statistics, data structures and algorithms. You should be able to learn new software libraries on your own and design and develop functions on top.

Course Goal

Through this course, students will gain solid theoretical knowledge and enough practical experience to develop and diagnose their own search systems in the future. Upon successful completion of the course, a student will be able to (1). Define and explain the key concepts and models relevant to information storage and retrieval. (2). Implement important algorithms, recognize and fix common problems in practice. (3). Design and develop his/her own text search systems addressing customized information needs.

Evaluation Metrics

Two Programming Assignments: 30%
Four Written Assignments: 20%
Class Project: 25%
Participation (in class and in online piazza discussions): 5%
Final Exam (December 7th, 8:00 am - 10:30 am): 20%

The grading policy is as follows:
90-100: A
80-89: B
70-79: C
60-69: D
<60: F

Important Dates

Fianl term exam: on December 7th, Monday, 8:00 am - 10:30 pm

Attendance and Make-up Policies

Every student should attend the class, unless you have an accepted excuse. Please check student rule 7 http://student-rules.tamu.edu/rule07 for details.

Homework Late Policies

For the programming/written homework assignments, you have a total of 5 late days that you can use during the semester. However, a single assignment can be submitted up to 2 days late only. For the purposes of the class, a late day is an indivisible 24-hour unit. Once you exhaust your 5 late days, we will not accept any late submissions. The late policy does not apply to project submissions.

Prerequisite

Students should have taken the course Data Structure and Algorithms (CSCE 221).

Textbook and Material

The primary textbook: Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. Relevant tutorials and papers will also be handed out during the class.

Academic Integrity

"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit: http://aggiehonor.tamu.edu.

Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.

Americans with Disabilities Act (ADA) Statement

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, currently located in the Disability Services building at the Student Services at White Creek complex on west campus or call 979-845-1637. For additional information, visit http://disability.tamu.edu.



Tentative Topics


Week Topic Material
1 Overview and indexing Book Chapters
2 The Boolean Retrieval model Book Chapters
3 Probabilistic IR: Vector space model Book Chapters
4 Probabilistic IR: BM25 Book Chapters
5 Probabilistic IR: language models Book Chapters
6 Document Classification Book Chapters
7 Learning to Rank Book Chapters
8 Relevance Feedback Book Chapters
9 Retrieval Evaluation Book Chapters
10 Flat clustering: k-means Book Chapters
11 Hierarchical clustering: HAC Book Chapters
12 Link analysis Book Chapters
13 Trending topics
14 Project Presentations