CPSC 470/670: Information Storage and Retrieval
Spring Semester, 2007
Time and place: T/TH 12:45pm - 2:00pm Room 104 HRBB
Instructor: Dr. Frank Shipman
Office hours: HRBB 402B, TBA, or by appointment
Description of Course
Information retrieval (IR) covers issues of representation, storage,
and access to very large multimedia document collections. This course
covers the fundamental data structures, algorithms, and access methods
of current information storage and retrieval systems and relates the
various techniques to the design and evaluation of complete retrieval
systems delivered on the Internet and in digital libraries. Course
content includes coverage of algorithms for indexing, compressing, and
querying very large digital collections and tools and techniques for
managing information services on the Internet.
Prerequisites
Students should be able to design and develop large JAVA programs and
learn new software libraries on their own.
Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto,
Addison Wesly and ACM Press
collected journal and conference papers
Major Topics
Topics to be included in the course are:
- overview of information retrieval tasks,
- evaluation of information retrieval systems,
- collections and content types,
- query languages and IR models,
- relevance feedback and clustering,
- text processing and compression,
- interfaces for information retrieval, and
- searching the Web.
The class will include readings, homeworks, exams, and projects.
Projects will be 3-5 person group projects, with more members
indicating a larger project. Individual student's grades for projects
will be influenced by their teamwork as evaluated by the other project
group members. Projects are to include selecting a collection of
materials to provide, using Lucene (or Greenstone or ...) to index
the contents, and create a user interface for searching and browsing
the collection. The projects will develop an initial prototype for
demonstration to the class at the end of the semester and planning an
evaluation of the prototype's success or failure.
Project topics must be approved by the instructor.
Grading
Grading will be based on reading and participation in class, exams,
homeworks, and projects.
For CPSC 470: For CPSC 670:
class participation 10% class participation 10%
exams 45% exams 45%
homeworks 20% homeworks 10%
project 25% project 25%
term paper 15%
Final Report Format
Your final project reports is to be 8-12 pages formatted according to
the ACM Conference Format. You can cut and paste into this format and
use the paragraph styles provided. Here is a link to the MS Word Template. You can find RTF and
Maker Interchange File formats at this ACM SIGCHI page.