CSCE-315: Programming Studio (Fall 2013)

Project 1: Database Management System

Due dates and updates

Here are the various due dates. See near the end for details (i.e., what you need to submit for each submission window.

Design documents (9/2)
DBMS engine code (9/9)
Parser code (9/16)
Integrated DBMS+Parser code (9/23)
Final project code + report (9/30)

FAQs and additional clarification:

For the DBMS engine code submission (9/9), You don't need to submit file IO functions since they require the parser.
You DO NOT need to write a general parser that takes in an arbitrary grammar. You just need to hard-code the pseudo-SQL grammar to implement your recursive descent parser.
Table implementation: Vector of vector vs. matrix? (you need to consider the tradeoffs).
DB app code: will be a mixture of C++ code (menu output, user input, etc.) + DML/query language code (called within the C++ code). Users of the DB app will not have direct access to the DML/query language.
Some example inputs for project 1 parser test
You should not assume that blank space is inserted between all tokens.
Coding style and commenting will be taken into account.
Read the "Deliverables" section carefully. For example, example output is required.

Team configuration

This project is a team project, ideal team size is three. If the number of students is not divisible by three, we can have a few teams with size close to three. The teams will be assigned by the instructor, based on the programming proficiency survey.

In a nutshell

This is a two-part project. In the first part, the task is a to implement a simple database management system (DBMS). In the second part, the task is to implement an example application that relies on the DBMS system developed in the first part for its data manipulation needs.

PHASE I: Specification of the DBMS

Database management systems are very complex pieces of software. They support concurrent use of a database, transactions, permission handling, query optimizations, logging, you name it. To be efficient, they utilize highly tuned algorithms developed during the course of decades. So obviously, for a four-week long project, we have to simplify a lot. We thus base our DBMS on relational algebra.

Relational algebra is a formal system for manipulating relations. It consists of only six primitive operations. Each of the operations take relations as arguments, and produce a relation as a result. The operations thus compose freely.

The upside of using relational algebra is that the implementation effort of the DBMS stays manageable. The downside is that queries tend to be more verbose and maybe a bit harder to construct than, say, with SQL.

Terminology:

Database: a collection of relations
Relation: a table with columns and rows
Attribute: a named column of a relation
Domain: the set of admissible values for one or more attributes
Tuple: a row of a relation (sequence of values, one for each attribute of a relation)

Relational algebra

The six operations of (the core of) relational algebra are:

Selection: select the tuples in a relation that satisfy a particular condition.
Projection: select a subset of the attributes in a relation.
Renaming: rename the attributes in a relation.
Set union: compute the union of two relations; the relations must be union-compatible.
Set difference: compute the set difference of two relations; the relations must be union-compatible.
Cross product: compute the Cartesian product of two relations.

Grammar

The communication with the DBMS takes place using a domain-specific language. The grammar of queries in this language is as follows.

query ::= relation-name <- expr ;

relation-name ::= identifier

identifier ::= alpha { ( alpha | digit ) }

alpha ::= a | … | z | A | … | Z | _

digit ::= 0 | … | 9

expr ::= atomic-expr
             | selection
             | projection
             | renaming
             | union
             | difference
             | product

atomic-expr ::= relation-name | ( expr )

selection ::= select ( condition ) atomic-expr

condition ::= conjunction { || conjunction }

conjunction ::= comparison { && comparison }

comparison ::= operand op operand
                     | ( condition )

op ::= == | != | < | > | <= | >=

operand ::= attribute-name | literal

attribute-name ::= identifier
literal ::= intentionally left unspecified

projection ::= project ( attribute-list ) atomic-expr

attribute-list ::= attribute-name { , attribute-name }

renaming ::= rename ( attribute-list ) atomic-expr

union ::= atomic-expr + atomic-expr

difference ::= atomic-expr - atomic-expr

product ::= atomic-expr * atomic-expr

Queries generated from the above grammar compute new relations based on existing relations. Queries can also name those new relations. We need, however, some ways to create some initial relations (constituting a database), update the relations within the database, store the results of queries back to the database, and delete tuples from relations. We use the following commands for these purposes:

command ::= ( open-cmd | close-cmd | write-cmd | exit-cmd | show-cmd | create-cmd | update-cmd | insert-cmd | delete-cmd ) ;

open-cmd ::== OPEN relation-name
close-cmd ::== CLOSE relation-name
write-cmd ::== WRITE relation-name
exit-cmd ::== EXIT
show-cmd ::== SHOW atomic-expr
create-cmd ::= CREATE TABLE relation-name ( typed-attribute-list ) PRIMARY KEY ( attribute-list )

update-cmd ::= UPDATE relation-name SET attribute-name = literal { , attribute-name = literal } WHERE condition

insert-cmd ::= INSERT INTO relation-name VALUES FROM ( literal { , literal } )
| INSERT INTO relation-name VALUES FROM RELATION expr

delete-cmd ::= DELETE FROM relation-name WHERE condition

typed-attribute-list ::= attribute-name type { , attribute-name type }
type ::= VARCHAR ( integer ) | INTEGER
integer ::= digit { digit }

A program in our data manipulation language (DML) is then defined as:

program ::= { ( query | command ) }

Example

CREATE TABLE animals (name VARCHAR(20), kind VARCHAR(8), years INTEGER) PRIMARY KEY (name, kind);

INSERT INTO animals VALUES FROM ("Joe", "cat", 4);
INSERT INTO animals VALUES FROM ("Spot", "dog", 10);
INSERT INTO animals VALUES FROM ("Snoopy", "dog", 3);
INSERT INTO animals VALUES FROM ("Tweety", "bird", 1);
INSERT INTO animals VALUES FROM ("Joe", "bird", 2);

SHOW animals;

dogs <- select (kind == "dog") animals;
old_dogs <- select (age > 10) dogs;

cats_or_dogs <- dogs + (select (kind == "cat") animals);

CREATE TABLE species (kind VARCHAR(10)) PRIMARY KEY (kind);

INSERT INTO species VALUES FROM RELATION project (kind) animals;

a <- rename (aname, akind) (project (name, kind) animals);
common_names <- project (name) (select (aname == name && akind != kind) (a * animals));
answer <- common_names;

SHOW answer;

WRITE animals;
CLOSE animals;

EXIT;

Note that we made a distinction between queries and commands in the grammar of the DML. The result of a query is a view. A view is not stored in the database. Rather, it is a temporary relation whose lifetime ends when a DML program finishes. So only the updates caused by the commands persist from one DML program execution to another.

The relations themselves should be saved in a file in plain ASCII text (DO NOT use binary), using the same DML described above (e.g., CREATE ... INSERT ... INSERT .... ). To make it simple, let us assume that each database file can only store one relation and the filename is the same as the relation name with the suffix ".db". To load a relation from a database file, use the OPEN command. Opening a nonexisting file will result in nothing. To create a new relation file, use the WRITE command (the filename will be by default "relationname.db"). To save all changes to the relation in a database file and close, use the CLOSE command.

To exit from the DML interpreter, use the EXIT command.

To print a certain relation or a view, use the SHOW command.

PHASE II: DB Application

The second phase of this project is to write a simple DB application written in the DML described above. The DB app will run on your DB engine. Since you will need to take user I/O and implement a custom control flow (conditional statements, loops, etc.), the DML described above alone is not enough. Instead of extending the DML to include such non-DB commands, you will use a host language (e.g., C++) to interact with your DBMS.

The host program will provide most of the user interface: displaying menus, taking user input, and showing results. Based on these user inputs, a custom query or command string can be generated and passed on to the DBMS to be parsed and executed..

optional (i.e., as needed): You may also need to retrieve the results of the queries to feed into the host language's control flow. The DBMS object can contain a member function to access the relations, views, and the attributes by their name (string).

This is what an example interaction might look like:

string name;
cin << name;

string query = string("") + 
               "answer <- project (age) ( select (kind == \"dog\" && name == " + name + ") animals )";

rdbms.execute(query);

if (rdbms.relation(relation_name).int_field(field_name) == 10) {
	...
}

The DB application itself will be of your own design, with the following requirements:

The application domain is open-ended: bank, web forum, warehouse app, point-of-sale, auto maintenance shop, etc., but it CANNOT BE one of the following: DVD or any kind of rental app, Library app, or PDA app.
The domain needs to be sufficiently complex, and should include at least three entities and two relation (a total of five relational tables, minimum). Thus, having a single table (e.g., a simple TODO list, etc.) is not acceptable.
The following command/query needs to be used at least once.
- open, close, write, exit, show
- create table, insert into, update, delete
- select, project, +, - , * (note: you have to think hard how to utilize all these in Phase II)
Provide a rough sketch of your application and its functionality in your design document. We will provide you with feedback in case it is too simple or too complex so that you can revise your plan early on.

As an example, here's a spec from the last semester. This was for a DVD rental shop. Think about what should be an entity and what should be a relation.

Customer data: USER-ID, Firstname, Lastname, Phone Number
DVD data: Inventory number, DVD-ID, Title
Rental log: USER-ID, Inventory number, Check out date, Due date
Operations: Add new customer, Add new DVD, Remove customer, Remove DVD, Update customer, Update DVD, List customer, Search customer (by name, phone), List DVD, Search DVD (by ID, title), Search inventory, Check out DVD, Check in DVD, Show rental list by customer, Show customer list by DVD-ID.
The user interface can be a simple scrolling text with CLI (command-line interface) that prints the menu and takes user input from the keyboard.

Note that most of the code in the host language will be user interface, as you will be able to use the DML for most of the required operations.

Deliverables and Requirements

Each team should set up a web page where relevant info are linked. We will maintain a single web page pointing to all team web pages.
Each team should maintain a development log (Google doc or similar) updated by the team members. Give access to the TA. This log will be graded. There is no designated format. We will check your daily progress. The log should be updated daily.
Major routines should include unit testing (each function individually tested by some automated means, using a set of tests).
Demo in the lab may be required.

Design documents: Follow the guidelines in Scott' Hackett's "How to Write an Effective Design Document" (Writing for a Peer Developer). Include all four sections described in the guide.
- The design document should cover both phase 1 (DB engine) and phase 2 (DB app).
- Phase 2 documents should include ER diagram and corresponding relation schema, besides other things.
DBMS engine code: This will be the core DB function library, without the parser.
- Upload your DBMS engine library code.
- You should include a test application that tests all functionality (function calls relating to all queries, all commands).
- Also submit the input and output (text file) from your test application.
- You may omit the file I/O code since it requires the parser.
- Theoretically, you should be able to build a DB app just using this library. (However, we will not do this -- you will use the DML command-line for the most part and not directly call the DBMS library functions from your final DB app.)
Parser code: Upload your parser code. It should be able to take input from the keyboard and accept or reject an arbitrary command or query: Accept syntactically correct command or query, and reject anything that violates the syntax.
- Upload your parser code.
- Also submit the input and output (text file) of your parser.
- Keep in mind how you will later integrate your parser code with the DB engine as you write the parser code.
Integrated DBMS + Parser code: Upload your completed DBMS system.
- The system should have a CLI (command-line interface) where you can type in the commands and queries and it should be fully functional as a DB.
- Include example inputs and actual outputs from your DBMS.
Final project code + DB application code + report: The DB application should be a stand-alone executable. The report should be
- an updated version of your design document
- post production notes (changes you had to make to your design and why, difficulties, solutions, lessons learned)
- individual work load distribution (percentage)
- Development log.
- Include a log of an example session with your DB application: show menu, take user input, application's output and messages, etc.
- The following should be submitted individually:
  - self-evaluation: rate your performance on the scale of 1 to 5, 5 being the best. Briefly discuss how you contributed to the project.
  - peer evaluation: rate all other team members on the scale of 1 to 5, 5 being best. For each member, briefly comment on their contribution.
  - The initial team will continue on with project 2 and 3, unless there is a serious issue with the current team. If you want to leave your team indicate so and provide justification in writing.

Extra Credit

If your team can find another team with which you can swap the Phase I code and build your DB app based on the other team's Phase I code, you will get 5 points extra credit (out of a total of 100) for project 1.

Submission

All submissions should be through ecampus.tamu.edu
Except for the peer evaluation (submit individually), only one person from your team may submit, as a designated submitter. All other submissions will be ignored. See course web page for information on late penalty.

Original concept/design/most of the text by Jaakko Järvi. Modifications by Yoonsuck Choe.