CSCE-315: Programming Studio (Spring 2011)

Project 1: Database Management System

Team configuration

This project is a team project, ideal team size is three. If the number of students is not divisible by three, we can have a few teams with size close to three. The teams will be assigned by the instructor, based on the programming proficiency survey.

In a nutshell

This is a two-part project. In the first part, the task is a to implement a simple database management system (DBMS). In the second part, the task is to implement an example application that relies on the DBMS system developed in the first part for its data manipulation needs. However, each team will use a DBMS developed by another team (this can change, depending on the maturity of the DBMS from the first part).

Specification of the DBMS

Database management systems are very complex pieces of software. They support concurrent use of a database, transactions, permission handling, query optimizations, logging, you name it. To be efficient, they utilize highly tuned algorithms developed during the course of decades. So obviously, for a one-month long project, we have to simplify a lot. We thus base our DBMS on relational algebra.

Relational algebra is a formal system for manipulating relations. It consists of only six primitive operations. Each of the operations take relations as arguments, and produce a relation as a result. The operations thus compose freely.

The upside of using relational algebra is that the implementation effort of the DBMS stays manageable. The downside is that queries tend to be more verbose and maybe a bit harder to construct than, say, with SQL.

Terminology:

Database
a collection of relations
Relation
a table with columns and rows
Attribute
a named column of a relation
Domain
the set of admissible values for one or more attributes
Tuple
a row of a relation (sequence of values, one for each attribute of a relation)

Relational algebra

The six operations of (the core of) relational algebra are:

  1. Selection: select the tuples in a relation that satisfy a particular condition.
  2. Projection: select a subset of the attributes in a relation.
  3. Renaming: rename the attributes in a relation.
  4. Set union: compute the union of two relations; the relations must be union-compatible.
  5. Set difference: compute the set difference of two relations; the relations must be union-compatible.
  6. Cross product: compute the Cartesian product of two relations.

In addition to these operations, we include the natural join operation. The result of the natural join between relations R and S is the set of all combinations of tuples in R and S that are equal on their common attribute names. The common attributes only appear once in the result.

Natural join is expressible using the six fundamental operations, but a direct implementation for joins can reduce the need to use the (expensive) Cartesian product operation.

Grammar

The communication with the DBMS takes place using a domain-specific language. The grammar of queries in this language is as follows.

query ::= relation-name <- expr ;

relation-name ::= identifier

identifier ::= alpha { ( alpha | digit ) }

alpha ::= a || z | A || Z | _

digit ::= 0 || 9

expr ::= atomic-expr
             | selection
             | projection
             | renaming
             | union
             | difference
             | product
             | natural-join

atomic-expr ::= relation-name | ( expr )

selection ::= select ( condition ) atomic-expr

condition ::= conjunction { || conjunction }

conjunction ::= comparison { && comparison }

comparison ::= operand op operand
                     | ( condition )

op ::= == | != | < | > | <= | >=

operand ::= attribute-name | literal

attribute-name ::= identifier
literal ::= intentionally left unspecified

projection ::= project ( attribute-list ) atomic-expr

attribute-list ::= attribute-name { , attribute-name }

renaming ::= rename ( attribute-list ) atomic-expr

union ::= atomic-expr + atomic-expr

difference ::= atomic-expr - atomic-expr

product ::= atomic-expr * atomic-expr

natural-join ::= atomic-expr JOIN atomic-expr

Queries generated from the above grammar compute new relations based on existing relations. Queries can also name those new relations. We need, however, some ways to create some initial relations (constituting a database), update the relations within the database, store the results of queries back to the database, and delete tuples from relations. We use the following commands for these purposes:

command ::= ( create-cmd | update-cmd | insert-cmd | delete-cmd ) ;

create-cmd ::= CREATE TABLE relation-name ( typed-attribute-list ) PRIMARY KEY ( attribute-list )

update-cmd ::= UPDATE relation-name SET attribute-name = literal { , attribute-name = literal } WHERE condition

insert-cmd ::= INSERT INTO relation-name VALUES FROM ( literal { , literal } )
                       | INSERT INTO relation-name VALUES FROM RELATION expr

delete-cmd ::= DELETE FROM relation-name WHERE condition

typed-attribute-list ::= attribute-name type { , attribute-name type }
type ::= VARCHAR ( integer ) | INTEGER
integer ::= digit { digit }

A program in our data manipulation language (DML) is then defined as:

program ::= { ( query | command ) }

Example

CREATE TABLE animals (name VARCHAR(20), kind VARCHAR(8), years INTEGER) PRIMARY KEY (name, kind);

INSERT INTO animals VALUES FROM ("Joe", "cat", 4);
INSERT INTO animals VALUES FROM ("Spot", "dog", 10);
INSERT INTO animals VALUES FROM ("Snoopy", "dog", 3);
INSERT INTO animals VALUES FROM ("Tweety", "bird", 1);
INSERT INTO animals VALUES FROM ("Joe", "bird", 2);

dogs <- select (kind == "dog") animals;
old_dogs <- select (age > 10) dogs;

cats_or_dogs <- dogs + (select (kind == "cat") animals);

CREATE TABLE species (kind VARCHAR(10)) PRIMARY KEY (kind);

INSERT INTO species VALUES FROM RELATION project (kind) animals;

a <- rename (aname, akind) (project (name, kind) animals);
common_names <- project (name) (select (aname == name && akind != kind) (a * animals));
answer <- common_names;

Interaction between the data manipulation language and the host language

The use of the DBMS from a host program (written, for example, in C++) consists of first generating a DML program (as a string), and then sending the generated program to the DBMS to be executed. There should also be some mechanism to transfer the results of queries to variables of the host program. An example protocol would be, for example, that if a DML program has a relation named answer, that relation is accessible in the host program in some suitable way. For example, its contents could be serialized into a container of type vector<string>, or the function that executes a DML program could accept an STL output iterator as a parameter, and write the result into the sequence indicated by that iterator.

This is what an example interaction might look like:

string name;
cin << name;

string query = string("") + 
               "answer <- project (age) ( select (kind == \"dog\" && name == " + name + ") animals )";

std::vector<string> result;
dbms.execute(animal_database, query, std::back_inserter(result));

Note that we made a distinction between queries and commands in the grammar of the DML. The result of a query is a view. A view is not stored in the database. Rather, it is a temporary relation whose lifetime ends when a DML program finishes. So only the updates caused by the commands persist from one DML program execution to another.

Assignment in details: What should the project teams do

Phase 1:

  1. Implement a DBMS that can support the DML specified above.
  2. Design the interplay between the DBMS and the host language, and implement this functionality.
  3. Design and thoroughly document the API of your DBMS library.

    Your DBMS can be a C++ library, which you distribute in source format. Ideally, we'd like you to distribute the DBMS library as a dynamically linked library, but doing that cross-platform is a little tricky. So please distribute your library in source form, with the intention that files other than the necessary header files are not meant to be examined by a client of the library.

  4. Write a concise design document that describes your design and implementation choices, and give short rationale for them.
*. We will provide a set of test commands and expected outputs. Your DBMS should process all of these commands correctly.

Phase 2:

  1. Obtain a specification of another team's DBMS.
  2. Write an application, specified below, that uses that DBMS for its data manipulation and query needs.
  3. In the role of a DBMS vendor, receive bug reports from your client, provide appropriate bug fixes or instructions to work around defects. Do this very promptly! Otherwise you are preventing the client team from making progress.
  4. In the role of an application developer, using a DBMS, send bug reports to your DBMS vendor (if necessary). Only if your vendor becomes unresponsive, you are allowed to touch the DBMS sources yourself.

All communication between a DBMS vendor and its client should be by email, with a copy to your TA. Set up a google group and include the TA as a member. This way it is easier to keep track of the discussion thread. Each team should set up a separate google group for this purpose.

The Application: Mini PIM (Personal Information Manager)

PalmOS (Palm, Inc.), the predecessor of the (now extinct) Linux-based WebOS (by HP), was an operating system developed mainly for use on PDAs (personal digital assistants). One distinctive characteristic of the PalmOS was that all the data were oragnized and stored in a database system, not on a file system. The four most basic applications that came with PalmOS was Calendar, Memo pad, Address book, and Todo list. Your task is to implement these four applications based on the DBMS from phase I.

These applications can interact in interesting ways, for example, calendar item could be linked with a ToDo item, and memos can be linked to a specific calendar item. Also, address book entries can be linked to specific calendar items. First, design four database schemas for the four applications, and then define relations to link them up. Note that not all pairs of applications need to be linked. The details (specific attributes, etc.) are up to you.

Required functionality includes the following (note: the user interface can be simple ASCII text with text input):

  • Display calendar (day view, week view, month view).
  • Calendar navigation (go forward, go backward, go to date).
  • Link memo item(s) to calendar item.
  • Link address book item(s) to calendar item.
  • Display memo list (sort by title, sort by creation date, show by category).
  • Display ToDo list (sort by name, sort by deadline, show by category, show not completed items, show all items).
  • Check/uncheck ToDo item.
  • Display address book item list (sort by name, show by category).
  • Address book navigation (go forward, go backward, go to lastname beginning with substring).
  • Link memo item(s) to address book item.
  • Common to all applications:
    • Create entry
    • Delete entry
    • Edit entry
    • Search entry (free text, and structured [attribute,value]).
The edit entry function (for all applications) can be really rudimentary. You do not need to program a CUI editor. Just display current content, and ask user to replace it with something new.

Deliverables

  • Phase 1 design document and API specification, due Friday, 9/9 Saturday, 9/10, 11:59pm.
  • Phase 1 DBMS library sources and Application DB schemas, due Wednesday, 9/21 Friday, 9/23, 11:59pm.
  • Phase 2 Application sources (and user guide if necessary: include ER diagram and relation table design), due Wednesday, 9/28, Sunday 10/2, 11:59pm.
  • Description of your experiences with your DBMS vendor, due Friday 9/30, Monday 10/3, 11:59pm.
  • Description of your experiences as a DBMS vendor to a client, due Friday 9/30, Monday 10/3, 11:59pm.

We may also ask for an evaluation of your efforts, as well as your perception of your teammates' efforts, which may be used for grading purposes. More details to follow.

Submission

All submissions should be through CSNET turnin.
Only one person from your team may submit, as a designated submitter. All other submissions will be ignored.

Original concept/design/most of the text by Jaakko Järvi. Modifications by Yoonsuck Choe.