CSCE-315: Programming Studio (Summer 2014)
Project 1: Database Management System
Due dates and updates
Here are the various due dates. See near the end for details (i.e., what you need to submit for each submission window.
- Design documents (due 6/4 Wednesday)
- Parser code (due 6/9 Monday)
- DB core function code (due 6/13 Friday)
- Final project code + report (due 6/16 Monday)
Any updated info about the project will also be posted here.
- Individual score formula corrected.
Team configuration
This project is a team project, ideal team size is four. If the number of students is not divisible by four, we can have a few teams with size close to four. The teams will be assigned by the instructor, based on the programming proficiency survey.
In a nutshell
This project consits of two parts. The task of the first part is to implement a simple database management system (DBMS). In the second pars, you will manually test the DBMS's basic functionality in a simple application domain.
Part I: Specification of the DBMS
Database management systems are very complex pieces of software. They support concurrent use of a database, transactions, permission handling, query optimizations, logging, you name it. To be efficient, they utilize highly tuned algorithms developed during the course of decades. So obviously, for a two-week long project, we have to simplify a lot. We thus base our DBMS on relational algebra.
Relational algebra is a formal system for manipulating relations. It consists of only six primitive operations. Each of the operations take relations as arguments, and produce a relation as a result. The operations thus compose freely.
The upside of using relational algebra is that the implementation effort of the DBMS stays manageable. The downside is that queries tend to be more verbose and maybe a bit harder to construct than, say, with SQL.
Terminology:
- Database
- a collection of relations
- Relation
- a table with columns and rows
- Attribute
- a named column of a relation
- Domain
- the set of admissible values for one or more attributes
- Tuple
- a row of a relation (sequence of values, one for each attribute of a relation)
Relational algebra
The six operations of (the core of) relational algebra are:
- Selection: select the tuples in a relation that satisfy a particular condition.
- Projection: select a subset of the attributes in a relation.
- Renaming: rename the attributes in a relation.
- Set union: compute the union of two relations; the relations must be union-compatible.
- Set difference: compute the set difference of two relations; the relations must be union-compatible.
- Cross product: compute the Cartesian product of two relations.
Grammar
The communication with the DBMS takes place using a domain-specific language. The grammar of queries in this language is as follows.
query ::= relation-name <-
expr ;
relation-name ::= identifier
identifier ::= alpha { ( alpha |
digit ) }
alpha ::= a
|
… |
z
|
A
|
… |
Z
|
_
digit ::= 0
|
… |
9
expr ::= atomic-expr
|
selection
|
projection
|
renaming
|
union
|
difference
|
product
atomic-expr ::= relation-name |
(
expr )
selection ::= select
(
condition )
atomic-expr
condition ::= conjunction { ||
conjunction }
conjunction ::= comparison { &&
comparison }
comparison ::= operand op operand
| (
condition )
op ::= ==
|
!=
|
<
|
>
|
<=
|
>=
operand ::= attribute-name |
literal
attribute-name ::= identifier
literal ::= intentionally left unspecified
projection ::= project
(
attribute-list )
atomic-expr
attribute-list ::= attribute-name { , attribute-name }
renaming ::= rename
(
attribute-list )
atomic-expr
union ::= atomic-expr +
atomic-expr
difference ::= atomic-expr -
atomic-expr
product ::= atomic-expr *
atomic-expr
Queries generated from the above grammar compute new relations based on existing relations. Queries can also name those new relations. We need, however, some ways to create some initial relations (constituting a database), update the relations within the database, store the results of queries back to the database, and delete tuples from relations. We use the following commands for these purposes:
command ::= ( open-cmd |
close-cmd |
write-cmd |
exit-cmd |
show-cmd |
create-cmd |
update-cmd |
insert-cmd |
delete-cmd ) ;
open-cmd ::== OPEN
relation-name
close-cmd ::== CLOSE
relation-name
write-cmd ::== WRITE
relation-name
exit-cmd ::== EXIT
show-cmd ::== SHOW
atomic-expr
create-cmd ::= CREATE TABLE
relation-name (
typed-attribute-list )
PRIMARY KEY
(
attribute-list )
update-cmd ::= UPDATE
relation-name SET
attribute-name =
literal { , attribute-name =
literal } WHERE
condition
insert-cmd ::= INSERT INTO
relation-name VALUES FROM
(
literal { , literal } )
|
INSERT INTO
relation-name VALUES FROM RELATION
expr
delete-cmd ::= DELETE FROM
relation-name WHERE
condition
typed-attribute-list ::= attribute-name type { , attribute-name type }
type ::= VARCHAR (
integer )
|
INTEGER
integer ::= digit { digit }
A program in our data manipulation language (DML) is then defined as:
program ::= { ( query |
command ) }
Example
CREATE TABLE animals (name VARCHAR(20), kind VARCHAR(8), years INTEGER) PRIMARY KEY (name, kind); INSERT INTO animals VALUES FROM ("Joe", "cat", 4); INSERT INTO animals VALUES FROM ("Spot", "dog", 10); INSERT INTO animals VALUES FROM ("Snoopy", "dog", 3); INSERT INTO animals VALUES FROM ("Tweety", "bird", 1); INSERT INTO animals VALUES FROM ("Joe", "bird", 2); SHOW animals; dogs <- select (kind == "dog") animals; old_dogs <- select (age > 10) dogs; cats_or_dogs <- dogs + (select (kind == "cat") animals); CREATE TABLE species (kind VARCHAR(10)) PRIMARY KEY (kind); INSERT INTO species VALUES FROM RELATION project (kind) animals; a <- rename (aname, akind) (project (name, kind) animals); common_names <- project (name) (select (aname == name && akind != kind) (a * animals)); answer <- common_names; SHOW answer; WRITE animals; CLOSE animals; EXIT;
Note that we made a distinction between queries and commands in the grammar of the DML. The result of a query is a view. A view is not stored in the database. Rather, it is a temporary relation whose lifetime ends when a DML program finishes. So only the updates caused by the commands persist from one DML program execution to another.
The relations themselves should be saved in a file in plain ASCII text, using the same DML described above (e.g., CREATE ... INSERT ... INSERT .... ). To make it simple, let us assume that each database file can only store one relation and the filename is the same as the relation name with the suffix ".db". To load a relation from a database file, use the OPEN command. Opening a nonexisting file will result in nothing. To add a new relation to a file, use the WRITE command (the filename will be by default "relationname.db"). To save all changes to the relation in a database file and close, use the CLOSE command.
To exit from the DML interpreter, use the EXIT command.
To print a certain relation or a view, use the SHOW command.
Part II: DB Application Testing
- Customer data: USER-ID, Firstname, Lastname, Phone Number, account balance
- DVD data: Inventory number, DVD-ID, Title
- Rental log: USER-ID, Inventory number, Check out date, Due date
- Operations: Add new customer, Add new DVD, Remove customer, Remove DVD, Update customer, Update DVD, List customer, Search customer (by name, phone), List DVD, Search DVD (by ID, title), Search inventory, Add credit to customer's account (add to current balance), Apply late penalty to customer's account (subtract from current balance), Check out DVD (don't allow if balance is negative), Check in DVD (apply penalty if late), Show rental list by customer, Show customers that have negative account balance. Show customer list by DVD-ID.
Deliverables and Requirements
- All teams must use github.tamu.edu. Give access to the TA and the Instructor.
- Each team must maintain a development log (wiki page in github.tamu.edu titled "Development log") updated by the team members. This log will be graded. There is no designated format, except that you need to time stamp, write down the name, and write a brief description of the activity. We will check your daily progress.
- Major routines should include unit testing.
- Demo in the lab may be required.
- Design documents: Follow the guidelines in Scott' Hackett's "How to Write an Effective Design Document" (Writing for a Peer Developer). Include all four sections described in the guide.
- Set up your design document ("Design document") as a wiki page in github.tamu.edu.
- The design document should cover both phase 1 (DB engine) and phase 2 (DB app testing).
- Documents for phase 2 should include ER diagram and corresponding relation schema, besides other things.
- Grading:
- 20%: all four sections included.
- 50%: Part I DBMS - parser, DB engine
- 30%: Part II DB app testing - ER diagram, relation schema, testing workflow (create, insert, ... )
- Parser code: Upload your parser code. It should be able to
accept or reject an arbitrary command or query: Accept syntactically correct
command or query, and reject anything that violates the syntax.
- 10%: layout, style, comments
- 30%: commands (open, close, ..., delete)
- 40%: queries (select, project, ..., product)
- 10%: condition (conjunction, comparison, operators, etc.)
- 10%: development log
- DB core function code:
Upload the core functions. These are C++ classes and functions that implement the core DB functionality. For example, function calls for creating a relation table data structure, etc. You don't need to link this with the parser just yet. You should be able to write a test program in C++ that directly calls the core DB functions create( ....), insert( ... ), etc.
- 10%: layout, style, comments
- 30%: commands (open, close, ..., delete)
- 40%: queries (select, project, ..., product)
- 10%: condition (conjunction, comparison, operators, etc.)
- 10%: development log
- Final project code + DB app test session log + report:
The DBMS should be a stand-alone executable that integrates the
parser and the DB core functions.
- DBMS engine should compile into a stand-alone executable application so that when you run the application, it works as a DBMS command shell, where you can type in any command or query.
- DB app test session log should include a detailed manual test of the DBMS, starting with the creation, insertion, etc. and the output of those commands. Set up a wiki page called "Test session log".
- post production notes (changes you had to make to your design and why, difficulties, solutions, lessons learned). Make it a wiki page "Post production notes".
- individual work load distribution (percentage, must add up to 100%). Include this in the "Post production notes".
- Formula for individual score calculation is as follows:
individual score = min(sqrt(your percentage/25)*team_score,110)
For example, if your contribution was 20% and your team score was 85, your individual score is min(sqrt(20/25)*85,110) = 76. Note that 25% is the baseline (equal contribution by all four members). If your contribution was 30% and your team score was 85, your individual score is min(sqrt(30/25)*85,110) = 93.
- Formula for individual score calculation is as follows:
- Development log (wiki page).
- Final Grading:
- 5%: Layout, style, comments
- 40%: DBMS engine: completeness, functionality
- 10%: Test session log: completeness, accuracy
- 5%: Post production notes
- 10%: Development log
- 30%: Weighted grades from earlier submissions (design doc, parser, DB core function).
Submission
- All submissions should be through ecampus.tamu.edu
- Design doc submission should be a single PDF file uploaded to eCampus. This will be a printout of your wiki page.
- First, fork your latest project into an archival branch named: Submission 1, Submission 2, and Submission 3, for the three code submissions, respectively.
- Use the "Download ZIP" feature in github and upload the resulting zip file.
- As for the documents (development log, etc.), we will check the github project.
- Late penalty is 1% per 2 hours. So, if you're late 1 day, you lose 12%.
Original concept/design/most of the text by Jaakko Järvi. Modifications by Yoonsuck Choe.