CSCE 489: Natural Language Processing Foundation and Techniques (Fall 2018)

Instructor: Ruihong Huang

  • Location: HRBB 113
  • Time: TR 2:20-3:35 pm
  • Instructor Email: huangrh@cse.tamu.edu
  • Instructor Office: 402B HRBB
  • Credits: 3
  • Office Hours: Thursday 10 am -11 am or by appointment

NEWS
  • [08/28] The first meeting will be on 08/28!

Course Description

This is an especially exciting time to study Natural Language Processing (NLP), which aims to enable computers to understand and automatically process human language. This course will focus on NLP fundamentals including language models, automatic syntactic processing and automatic semantic processing, discourse and pragmatics. In addition, this course will also introduce various applications of NLP, including information extraction, sentiment analysis, question and answering, text summarization and machine translation. The students will digest and practice their NLP knowledge and skills by working on programming assignments, in-class quizzes and a final project.

Course Goal

Through this course, students will gain solid theoretical knowledge and enough practical experience to design and develop their own text processing applications in the future.

Evaluation Metrics

You should expect for frequent in-class quizzes (10 in total, roughly one in each week), two programming assignments, an annotation assignment, a class project and a final exam. In addition, you will be awarded for active class participation, penalized for little participation.

Ten in-class quizzes: 20%
Two Programming Assignments: 20%
Annotation Assignment: 10%
The Final Project: 30% (abstract: 5%, project progress report: 5%, presentation+final report+code+data: 20%)
Class participation: 5%
Final Exam (Dec. 12th, 1:00-3:00 pm): 15%

The grading policy is as follows:
90-100: A
80-89: B
70-79: C
60-69: D
<60: F

Attendance and Make-up Policies

Every student should attend the class, unless you have an accepted excuse. Please check student rule 7 http://student-rules.tamu.edu/rule07 for details.

Project

It's important that you work on a real nlp project so that you earn first hand experience of basic text processing and learn to deal with high complexity of human language in concrete applications. You are responsible to develop your project ideas. Then the instructor is available to discuss and shape the project if you like. The scale of the project should be a semester long. By the end of the semester, you should submit your code and data for this project, write a project report of 8 pages at maximum, and prepare a class presentation.

Prerequisite

Students should have taken the course Data Structure and Algorithms (CSCE 221).

Textbook and Material

Required textbook: Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2008. Prentice Hall; 2nd edition. Relevant tutorials and papers will also be handed out during the class.

Academic Integrity

"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit: http://aggiehonor.tamu.edu.

Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.

Americans with Disabilities Act (ADA) Statement

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, currently located in the Disability Services building at the Student Services at White Creek complex on west campus or call 979-845-1637. For additional information, visit http://disability.tamu.edu.



Tentative schedule


Date Topic Material Notes
Introduction
08/28 Course Overview slides
08/30 Text Preprocessing and Regular Expressions slides Chapter 2,3 of J&M
Classification
09/04 Text Classification slides
09/06 Naive Bayes slides p1 out
09/11 Discriminative Models: MaxEnt, Perceptron slides
Language Modeling
09/13 Language Modeling slides Sentence-level LM Discourse Driven LM
09/18 Smoothing slides Chapter 4 of J&M, p1 due
Syntax
09/20 Intro to Parts-of-Speech Tagging slides Chap 5 of J&M
09/25 Sequence Models slides HMM, CRF project abstract due
09/27 Sequence Models cont. slides p2 out
10/02 Parsing slides Chap 13, 14 of J&M
10/04 Parsing cont. slides
10/09 Statistical Parsing slides lexicalized PCFGs
10/11 Intro to Dependency Parsing slides p2 due
10/16 mid-term review annotation assignment out
Semantics
10/18 Intro to Semantics slides Chap 19.1-19.3 of J&M
10/23 Thesaurus-based Word Similarity slides Chap 20.6 of J&M
10/25 Distributional Semantics slides annotation assignment due, Chap 20.7 of J&M, word vectors
10/30 Dense Vectors slides word2vec
11/01 Semantic Role Labeling slides Chap 20.9 of J&M, SRL
Information Extraction
11/06 Intro to IE slides project progress report due
11/08 Relation Extraction slides Chap 22.2 of J&M
11/13 Coreference Resolution slides Chap 21 of J&M
11/15 Event Extraction slides Chap 22.3, 22.4 of J&M
Deep Learning
11/20 Deep Learning slides deep learning for NLP
11/22 no class Thanksgiving holiday! project due
Projects
11/27 Final Project Presentations slides
11/29 Final Project Presentations slides
12/04 Final Project Presentations slides
Final
12/06 Reading day, no class
12/12 (Wed) Final Exam, 1:00-3:00 pm