CSCE-315: Programming Studio (Fall 2015)

Project 1: Database Management System

Due dates and updates

Here are the various due dates. See near the end for details (i.e., what you need to submit for each submission window.

Design documents (due 9/8 Tuesday 11:59pm)
DB core function code (DB engine) (due 9/15 Tuesday)
Parser code (due 9/22 Tuesday)
Integrated Parser + DB engine (due 9/29 Tuesday)
Final project code + DB app + report (due 10/6 Tuesday)

Any updated info about the project will also be posted here.

The language to be used is C++. No other languages are allowed.
[9/2/2015] Part II details have been uploaded.

Team configuration

This project is a team project, ideal team size is four. If the number of students is not divisible by four, we can have a few teams with size close to four. The teams will be assigned by the instructor, based on the programming proficiency survey.

In a nutshell

This project consits of two parts. The task of the first part is to implement a simple database management system (DBMS). In the second part, you will write a DB application using your DBMS.

PART I: Specification of the DBMS

Database management systems are very complex pieces of software. They support concurrent use of a database, transactions, permission handling, query optimizations, logging, etc. To be efficient, they utilize highly tuned algorithms developed during the course of decades. So obviously, for a one-month long project, we have to simplify a lot. We thus base our DBMS on relational algebra.

Relational algebra is a formal system for manipulating relations. It consists of only six primitive operations. Each of the operations take relations as arguments, and produce a relation as a result. The operations thus compose freely.

The upside of using relational algebra is that the implementation effort of the DBMS stays manageable. The downside is that queries tend to be more verbose and maybe a bit harder to construct than, say, with SQL.

Terminology:

Database: a collection of relations
Relation: a table with columns and rows
Attribute: a named column of a relation
Domain: the set of admissible values for one or more attributes
Tuple: a row of a relation (sequence of values, one for each attribute of a relation)

Relational algebra

The six operations of (the core of) relational algebra are:

Selection: select the tuples in a relation that satisfy a particular condition.
Projection: select a subset of the attributes in a relation.
Renaming: rename the attributes in a relation.
Set union: compute the union of two relations; the relations must be union-compatible.
Set difference: compute the set difference of two relations; the relations must be union-compatible.
Cross product: compute the Cartesian product of two relations.

Grammar

The communication with the DBMS takes place using a domain-specific language. The grammar of queries in this language is as follows, described in Baccus-Naur Form (BNF). You will implement a Recursive Descent Parser.

query ::= relation-name <- expr ;

relation-name ::= identifier

identifier ::= alpha { ( alpha | digit ) }

alpha ::= a | … | z | A | … | Z | _

digit ::= 0 | … | 9

expr ::= atomic-expr
             | selection
             | projection
             | renaming
             | union
             | difference
             | product

atomic-expr ::= relation-name | ( expr )

selection ::= select ( condition ) atomic-expr

condition ::= conjunction { || conjunction }

conjunction ::= comparison { && comparison }

comparison ::= operand op operand
                     | ( condition )

op ::= == | != | < | > | <= | >=

operand ::= attribute-name | literal

attribute-name ::= identifier
literal ::= intentionally left unspecified (strings, numbers, etc.)

projection ::= project ( attribute-list ) atomic-expr

attribute-list ::= attribute-name { , attribute-name }

renaming ::= rename ( attribute-list ) atomic-expr

union ::= atomic-expr + atomic-expr

difference ::= atomic-expr - atomic-expr

product ::= atomic-expr * atomic-expr

Queries generated from the above grammar compute new relations based on existing relations. Queries can also name those new relations. We need, however, some ways to create some initial relations (constituting a database), update the relations within the database, store the results of queries back to the database, and delete tuples from relations. We use the following commands for these purposes:

command ::= ( open-cmd | close-cmd | save-cmd | exit-cmd | show-cmd | create-cmd | update-cmd | insert-cmd | delete-cmd ) ;

open-cmd ::== OPEN relation-name
close-cmd ::== CLOSE relation-name
save-cmd ::== SAVE relation-name
exit-cmd ::== EXIT
show-cmd ::== SHOW atomic-expr
create-cmd ::= CREATE TABLE relation-name ( typed-attribute-list ) PRIMARY KEY ( attribute-list )

update-cmd ::= UPDATE relation-name SET attribute-name = literal { , attribute-name = literal } WHERE condition

insert-cmd ::= INSERT INTO relation-name VALUES FROM ( literal { , literal } )
| INSERT INTO relation-name VALUES FROM RELATION expr

delete-cmd ::= DELETE FROM relation-name WHERE condition

typed-attribute-list ::= attribute-name type { , attribute-name type }
type ::= VARCHAR ( integer ) | INTEGER
integer ::= digit { digit }

A program in our data manipulation language (DML) is then defined as:

program ::= { ( query | command ) }

Example

CREATE TABLE animals (name VARCHAR(20), kind VARCHAR(8), years INTEGER) PRIMARY KEY (name, kind);

INSERT INTO animals VALUES FROM ("Joe", "cat", 4);
INSERT INTO animals VALUES FROM ("Spot", "dog", 10);
INSERT INTO animals VALUES FROM ("Snoopy", "dog", 3);
INSERT INTO animals VALUES FROM ("Tweety", "bird", 1);
INSERT INTO animals VALUES FROM ("Joe", "bird", 2);

SHOW animals;

dogs <- select (kind == "dog") animals;
old_dogs <- select (age > 10) dogs;

cats_or_dogs <- dogs + (select (kind == "cat") animals);

CREATE TABLE species (kind VARCHAR(10)) PRIMARY KEY (kind);

INSERT INTO species VALUES FROM RELATION project (kind) animals;

a <- rename (aname, akind) (project (name, kind) animals);
common_names <- project (name) (select (aname == name && akind != kind) (a * animals));
answer <- common_names;

SHOW answer;

SAVE animals;

CLOSE animals;

EXIT;

Note that we made a distinction between queries and commands in the grammar of the DML. The result of a query is a view. A view is not stored in the database. Rather, it is a temporary relation whose lifetime ends when a DML program finishes. So only the updates caused by the commands persist from one DML program execution to another.

The relations themselves should be saved in a file in plain ASCII text, using the same DML described above (e.g., CREATE ... INSERT ... INSERT .... ). To make it simple, let us assume that each database file can only store one relation and the filename is the same as the relation name with the suffix ".db". To load a relation from a database file, use the OPEN command. Opening a nonexisting file will result in nothing. To save a new relation or view to a file, use the SAVE command (the filename will be by default "relationname.db"). If you have a new view that you want to save to a file, you can use this SAVE command. To save all changes to the relation in an existing database file and close, use can use the CLOSE command. NOTE: You have to determine the specific behavior of OPEN, etc. For example, if you opened on db file, changed something, and opened the same db file again, does the db on file overwrite what's in memory?

To exit from the DML interpreter, use the EXIT command.

To print a certain relation or a view, use the SHOW command.

PART II: DB Application

Interfacing your DB and your host programming language

Since the basic DB language you developed in part I does not include:

control flow (conditonal statements, loops, etc.)
I/O (keyboard input, arbitrary output)
etc.

it is not possible to write a DB app just using the DB language from part I. To overcome this shortcoming, you will have to write a hybrid code where the main language is C++ (the host language). The host program will provide most of the user interface: displaying menus, taking user input, and showing results. Based on these user inputs, a custom query or command string can be generated and passed on to the DBMS to be parsed and executed.

You may also need to retrieve the results of the queries to feed into the host language's control flow. The DBMS object can contain a member function to access the relations, views, and the attributes by their name (string).

This is what an example interaction might look like:

string name;
cin << name;

// create DB command on the fly
string query = string("") + 
               "answer <- project (age) ( select (kind == \"dog\" && name == " + name + ") animals )";

// pass on the query
rdbms.execute(query);

if (rdbms.relation(relation_name).int_field(field_name) == 10) {
	...
}

The Application:

For part II, you will design and develop a stand-alone DB application.

The DB application will be an old-school BBS (Bulletin Board System) with a text-based user interface. This will be a dramatically simplified version of the real thing.

The basic operations of a BBS include the following.

* Admin (commonly called "sysop")

Admin user's username is "admin".
Account control: The admin can set or modify his/her profile -- name, password, phone number, fax number, postal address.
Board management: The admin can create or delete boards.
Menu management: The admin can create, edit, or delete menu items from the main menu, based on core BBS function (for regular users, bbs navigation, article navigation, messaging, user list viewing, etc.).
Board navigation: The admin can see a list of boards and navigate to a specific board.
Article management: The admin can post, edit, or delete articles of his/her own, and also do the same for any user on the system.
User management: The admin can create, edit, or delete user accounts, or suspend or reactivate a user account.
Group management: The admin can create, edit, or delete groups, or suspend or reactivate a group. The admin can appoint a group manager among the users. The admin and the group managers are allowed to enable/disable access by regular users to boards that they manage.
Message management: The admin can send or receive messages to or from users on the system.
BBS log management: User logins are logged. Username, time stamp, etc. The admin should be able to view the logs and delete items.

* Users

Assume only one user (admin or any user) has access to the system at a time.
Users can create an account and password, with which to log in at a later time. The admin has to approve a new account requests.
Users can view and edit their profile, including group membership.
The user can view the main menu of the BBS and select functions: messaging, board reading, managing account profile, view list of users, etc.
The user can navigate back to the main menu from any point in the BBS.
Users can view the list and navigate between different boards.
Witin each board, the user can view the list of articles and view selected articles.
Users can post, edit, or delete articles (owned by oneself) on the different boards.
Users can view the list of groups and request to join or leave the group.
Users can view a list of messages they received, and may reply or delete the message.
Users can view a list of articles posted by a specific user (including him/herself).

* Basic objects in the system

Accounting information for admin, users, and groups.
Each board has a name and a brief description. Certain boards may only be accessible to users in a specific group.
Each user has the same kind of profile info as the admin.
Each group has a name and a brief description, and one or more boards exclusively available to the group.
Each article has an author, time stamp (last modified), board membership, and article length. You may only keep the latest text (no need to keep track of edits).
Each message has a sender and a receiver, time stamp, and text.

With the basic setting above, design a BBS specialized on a certain topic. It could be about gaming, sports, health, politics, economy, pets, whatever. Populate your BBS with believable boards and groups. All navigation will be through a character-based, scrolling interface (basically a simple menu printed in text that scrolls up the screen). A mock-up would look like as below. Bold would be the user input.

************************************************
** Welcome to Pirate's Guild BBS (since 1975) **
**                                            **
** Dial up: 979-845-54XX                      **
************************************************

** Enter BBS **
1. Login
2. Request new account

Enter choice: 1

** Login **
Enter username: jacksparrow
Enter password: Arrrrrr

***************
** Main Menu **
***************

1. Your account
2. Boards
3. Messages
4. Groups

Enter choice: 2

****************
** Board Menu **
****************

1. List board
2. Go to board (enter shortcut [first few characters])
M. Back to main menu

Enter choice: 1

****************
** Board List **
****************

1. Looting and salvaging
2. Ship rigging and repair
3. Naval intelligence
4. Curse and hex
5. Maps
6. Navigation tips
7. Carribean hot spots (ports, pubs, and inns)
8. Sword fighting
9. Medicine
10. Weather
11. Trading
12. Pirates hall of shame
13. Grog brewing 
14. Arms and armor
15. Fashion
16. Explosives
17. Deck scrubbing
18. On-board dining

M. Back to main menu
B. Back to board menu

Enter choice: 4

***************************
** Board: Curse and Hex  **
***************************

1. [Elizabeth Swann]     09/02/2015 "Calipso is bad omen dude..."
2. [Davie Jones]         09/01/2015 "Has anyone seen Calipso?"
3. [Jack Sparrow]        08/31/2015 "How do I use this darn compass?"
4. [Will Turner]         08/30/2015 "I got this wooden box with a heart in it."
5. [Guybrush Threepwood] 10/01/1990 "Hmm, a voodoo root elixir..."
...

l. List articles (latest)
s. Search articles (by username)
p. Scroll up one article.
n. Scroll down one article.
P. Scroll up one page.
N. Scroll down one page.
num. Select numbered article to read.
M. Back to main menu
B. Back to board menu

Enter choice: 4

...

Deliverables and Requirements

All teams must use github.tamu.edu. Give access to the TA and the Instructor.
Each team must maintain a development log (wiki page in github.tamu.edu titled "Development log") updated by the team members. This log will be graded. There is no designated format, except that you need to time stamp, write down the name, and write a brief description of the activity. We will check your daily progress.
Major routines should include unit testing.
Demo in the lab may be required.

Design documents: Follow the guidelines in Scott' Hackett's "How to Write an Effective Design Document" (Writing for a Peer Developer). Include all four sections described in the guide.
- Set up your design document ("Design document") as a wiki page in github.tamu.edu.
- The design document should cover both phase 1 (DB engine) and phase 2 (DB app testing).
- Documents for phase 2 should include ER diagram and corresponding relation schema, besides other things.
- Grading:
  1. 20%: all four sections included.
  2. 50%: Part I DBMS - parser, DB engine
  3. 30%: Part II DB app testing - ER diagram, relation schema, testing workflow (create, insert, ... )
DB core function code: Upload the core functions. These are C++ classes and functions that implement the core DB functionality. For example, function calls for creating a relation table data structure, etc. You don't need to link this with the parser just yet. You should be able to write a test program in C++ that directly calls the core DB functions create( ....), insert( ... ), etc.
1. 10%: layout, style, comments
2. 30%: commands (open, close, ..., delete)
3. 40%: queries (select, project, ..., product)
4. 10%: condition (conjunction, comparison, operators, etc.)
5. 10%: development log
Parser code: Upload your parser code. It should be able to accept or reject an arbitrary command or query: Accept syntactically correct command or query, and reject anything that violates the syntax.
1. 10%: layout, style, comments
2. 30%: commands (open, close, ..., delete)
3. 40%: queries (select, project, ..., product)
4. 10%: condition (conjunction, comparison, operators, etc.)
5. 10%: development log
Parser+DB engine integrated code: Upload your integrated parser + DB engine. The DBMS engine should compile into a stand-alone executable application so that when you run the application, it works as a DBMS command shell, where you can type in any command or query. Full DB functionality is expected (including file I/O). A full test of the DB shell, based on manually entered commands, should be conducted and the results included in the submission.
1. 10%: layout, style, comments
2. 20%: commands (open, close, ..., delete)
3. 30%: queries (select, project, ..., product)
4. 10%: condition (conjunction, comparison, operators, etc.)
5. 20%: full test of all DB commands -- input and output log
6. 10%: development log
Final project code + DB app + DB app demo + report:
- DB app should compile into a stand-alone executable.
- DB app demo should include a detailed operation of the DB app: input and output log.
- post production notes (changes you had to make to your design and why, difficulties, solutions, lessons learned). Make it a wiki page "Post production notes".
- individual work load distribution (percentage, must add up to 100%). Include this in the "Post production notes".
  - Formula for individual score calculation is as follows:
    individual score = min(sqrt(your percentage/25)*team_score,110)
    For example, if your contribution was 20% and your team score was 85, your individual score is min(sqrt(20/25)*85,110) = 76. Note that 25% is the baseline (equal contribution by all four members). If your contribution was 30% and your team score was 85, your individual score is min(sqrt(30/25)*85,110) = 93.
- Development log (wiki page).
- Final Grading:
  1. 5%: Layout, style, comments
  2. 40%: DBMS engine: completeness, functionality
  3. 10%: Test session log: completeness, accuracy
  4. 5%: Post production notes
  5. 10%: Development log
  6. 30%: Weighted grades from earlier submissions (design doc, parser, DB core function): this gives you some chance to make up for previous blunders.

Submission

All submissions should be through ecampus.tamu.edu
Design doc submission should be a single PDF file uploaded to eCampus. This will be a printout of your wiki page.
First, fork your latest project into an archival branch named: Submission 1, Submission 2, and Submission 3, etc. for the code submissions, respectively.
Use the "Download ZIP" feature in github and upload the resulting zip file.
As for the documents (development log, etc.), we will check the github project.
Late penalty is 1% per 1 hour. So, if you're late 1 day, you lose 24%.

Original concept/design/most of the text by Jaakko Järvi. Modifications by Yoonsuck Choe.