CSCE-315: Programming Studio (Spring 2010)

Project 1: Database Management System

Team configuration

This project is a team project, ideal team size is three. If the number of students is not divisible by three, we can have a few teams with size close to three. My wish is that teams emerge without intervention from the TA or the instructor, but if you need help in team assignments, let us know.

In a nutshell

This is a two-part project. In the first part, the task is a to implement a simple database management system (DBMS). In the second part, the task is to implement an example application that relies on the DBMS system developed in the first part for its data manipulation needs. However, each team will use a DBMS developed by another team.

Specification of the DBMS

Database management systems are very complex pieces of software. They support concurrent use of a database, transactions, permission handling, query optimizations, logging, you name it. To be efficient, they utilize highly tuned algorithms developed during the course of decades. So obviously, for a one-month long project, we have to simplify a lot. We thus base our DBMS on relational algebra.

Relational algebra is a formal system for manipulating relations. It consists of only six primitive operations. Each of the operations take relations as arguments, and produce a relation as a result. The operations thus compose freely.

The upside of using relational algebra is that the implementation effort of the DBMS stays manageable. The downside is that queries tend to be more verbose and maybe a bit harder to construct than, say, with SQL.

Terminology:

Database: a collection of relations
Relation: a table with columns and rows
Attribute: a named column of a relation
Domain: the set of admissible values for one or more attributes
Tuple: a row of a relation (sequence of values, one for each attribute of a relation)

Relational algebra

The six operations of (the core of) relational algebra are:

Selection: select the tuples in a relation that satisfy a particular condition.
Projection: select a subset of the attributes in a relation.
Renaming: rename the attributes in a relation.
Set union: compute the union of two relations; the relations must be union-compatible.
Set difference: compute the set difference of two relations; the relations must be union-compatible.
Cross product: compute the Cartesian product of two relations.

In addition to these operations, we include the natural join operation. The result of the natural join between relations R and S is the set of all combinations of tuples in R and S that are equal on their common attribute names. The common attributes only appear once in the result.

Natural join is expressible using the six fundamental operations, but a direct implementation for joins can reduce the need to use the (expensive) Cartesian product operation.

Grammar

The communication with the DBMS takes place using a domain-specific language. The grammar of queries in this language is as follows.

query ::= relation-name <- expr ;

relation-name ::= identifier

identifier ::= alpha { ( alpha | digit ) }

alpha ::= a | … | z | A | … | Z | _

digit ::= 0 | … | 9

expr ::= atomic-expr
             | selection
             | projection
             | renaming
             | union
             | difference
             | product
             | natural-join

atomic-expr ::= relation-name | ( expr )

selection ::= select ( condition ) atomic-expr

condition ::= conjunction { || conjunction }

conjunction ::= comparison { && comparison }

comparison ::= operand op operand
                     | ( condition )

op ::= == | != | < | > | <= | >=

operand ::= attribute-name | literal

attribute-name ::= identifier
literal ::= intentionally left unspecified

projection ::= project ( attribute-list ) atomic-expr

attribute-list ::= attribute-name { , attribute-name }

renaming ::= rename ( attribute-list ) atomic-expr

union ::= atomic-expr + atomic-expr

difference ::= atomic-expr - atomic-expr

product ::= atomic-expr * atomic-expr

natural-join ::= atomic-expr JOIN atomic-expr

Queries generated from the above grammar compute new relations based on existing relations. Queries can also name those new relations. We need, however, some ways to create some initial relations (constituting a database), update the relations within the database, store the results of queries back to the database, and delete tuples from relations. We use the following commands for these purposes:

command ::= ( create-cmd | update-cmd | insert-cmd | delete-cmd ) ;

create-cmd ::= CREATE TABLE relation-name ( typed-attribute-list ) PRIMARY KEY ( attribute-list )

update-cmd ::= UPDATE relation-name SET attribute-name = literal { , attribute-name = literal } WHERE condition

insert-cmd ::= INSERT INTO relation-name VALUES FROM ( literal { , literal } )
| INSERT INTO relation-name VALUES FROM RELATION expr

delete-cmd ::= DELETE FROM relation-name WHERE condition

typed-attribute-list ::= attribute-name type { , attribute-name type }
type ::= VARCHAR ( integer ) | INTEGER
integer ::= digit { digit }

A program in our data manipulation language (DML) is then defined as:

program ::= { ( query | command ) }

Example

CREATE TABLE animals (name VARCHAR(20), kind VARCHAR(8), years INTEGER) PRIMARY KEY (name, kind);

INSERT INTO animals VALUES FROM ("Joe", "cat", 4);
INSERT INTO animals VALUES FROM ("Spot", "dog", 10);
INSERT INTO animals VALUES FROM ("Snoopy", "dog", 3);
INSERT INTO animals VALUES FROM ("Tweety", "bird", 1);
INSERT INTO animals VALUES FROM ("Joe", "bird", 2);

dogs <- select (kind == "dog") animals;
old_dogs <- select (age > 10) dogs;

cats_or_dogs <- dogs + (select (kind == "cat") animals);

CREATE TABLE species (kind VARCHAR(10)) PRIMARY KEY (kind);

INSERT INTO species VALUES FROM RELATION project (kind) animals;

a <- rename (aname, akind) (project (name, kind) animals);
common_names <- project (name) (select (aname == name && akind != kind) (a * animals));
answer <- common_names;

Interaction between the data manipulation language and the host language

The use of the DBMS from a host program (written, for example, in C++) consists of first generating a DML program (as a string), and then sending the generated program to the DBMS to be executed. There should also be some mechanism to transfer the results of queries to variables of the host program. An example protocol would be, for example, that if a DML program has a relation named answer, that relation is accessible in the host program in some suitable way. For example, its contents could be serialized into a container of type vector<string>, or the function that executes a DML program could accept an STL output iterator as a parameter, and write the result into the sequence indicated by that iterator.

This is what an example interaction might look like:

string name;
cin << name;

string query = string("") + 
               "answer <- project (age) ( select (kind == \"dog\" && name == " + name + ") animals )";

std::vector<string> result;
dbms.execute(animal_database, query, std::back_inserter(result));

Note that we made a distinction between queries and commands in the grammar of the DML. The result of a query is a view. A view is not stored in the database. Rather, it is a temporary relation whose lifetime ends when a DML program finishes. So only the updates caused by the commands persist from one DML program execution to another.

Assignment in details: What should the project teams do

Phase 1:

Implement a DBMS that can support the DML specified above.
Design the interplay between the DBMS and the host language, and implement this functionality.
Design and thoroughly document the API of your DBMS library.
Your DBMS can be a C++ library, which you distribute in source format. Ideally, we'd like you to distribute the DBMS library as a dynamically linked library, but doing that cross-platform is a little tricky. So please distribute your library in source form, with the intention that files other than the necessary header files are not meant to be examined by a client of the library.
Write a concise design document that describes your design and implementation choices, and give short rationale for them.

Phase 2:

Obtain a specification of another team's DBMS.
Write an application, specified below, that uses that DBMS for its data manipulation and query needs.
In the role of a DBMS vendor, receive bug reports from your client, provide appropriate bug fixes or instructions to work around defects. Do this very promptly! Otherwise you are preventing the client team from making progress.
In the role of an application developer, using a DBMS, send bug reports to your DBMS vendor (if necessary). Only if your vendor becomes unresponsive, you are allowed to touch the DBMS sources yourself.

All communication between a DBMS vendor and its client should be by email, with a copy to your TA. Set up a google group and include the TA as a member. This way it is easier to keep track of the discussion thread. Each team should set up a separate google group for this purpose.

The Application: Small Library Manager

You should create an application maintaining a small library (of books). You will need to write routines to create the necessary tables, populate the tables (from a file and/or from user input), and query the tables in useful ways. You may make the application program as fancy as you want. But, at minimum, among the data you should keep in the database are things along the following lines:

Information that the application should track include:

Collection of books and information about the books: call number, title, author(s), publisher, year, edition, ISBN, condition, copy # (if more than one books of the same title, what is this book's unique copy #). Note: this should not include any checkout/return information!
Library clients and information about the individuals: name, ID number, address, phone, email, etc.
Book checkout information: client ID, book info (call number + copy #, etc.), check out date, due date.
Book return information: client ID, book info, original check out date, return date, condition of book upon return, etc.

The application should support most of the following functionality:

List of books checked out by a particular client. Count the number.
List all books with the same title.
List of clients who checked out the same book title. Count the number.
List of overdue books, in total, and by a certain client.
Search book for author, title, etc.
Add new books, remove lost books, remove trashed books
Checkout and return books.
Total number of clients who read (okay, checked out, at least) a certain book.
Generate report cards that involves selection of tuples and renaming of attributes.

Deliverables

Phase 1 design document and API specification, due Thursday, 9/23, 11:59pm.
Phase 1 DBMS library sources, due ~~Sunday 10/3~~Tuesday 10/5, 11:59pm.
Phase 2 Application sources (and user guide if necessary: include ER diagram and relation table design), due ~~Sunday 10/10~~Tuesday 10/12, 11:59pm.
Description of your experiences with your DBMS vendor, due Thursday 10/14, 11:59pm.
Description of your experiences as a DBMS vendor to a client, due Thursday 10/14, 11:59pm.

We may also ask for an evaluation of your efforts, as well as your perception of your teammates' efforts, which may be used for grading purposes. More details to follow.

Original concept/design/most of the text by Jaakko Järvi. Modifications by Yoonsuck Choe.