Topics: redundancy, operations and queries on more than one table, views, "physical" and "logical" database.
Extend the pet shop database from Databases I to show which department each pet is in and who manages each department:
| petID | name | … | price | dept | deptMgr |
|---|---|---|---|---|---|
| P1 | (no change) | Canine | Fred | ||
| P2 | Feline | Maxine | |||
| P3 | Rodent | Harry | |||
| P4 | Rodent | Harry | |||
| P5 | Canine | Fred | |||
| P6 | Canine | Fred | |||
| P7 | Feline | Maxine | |||
| P8 | Avian | Harry | |||
What happens if …
How do we feel about these changes? What problems might result?
The cause of the problems is redundancy, the storage of the same information in multiple places. (Address/telephone changes)
The cure is to divide our single table database into multiple tables (at least two, possibly three). We then will have a new problem: our query must draw data from 2 or 3 tables. Before changing the pet database, let's see how we could operate on two tables to combine their information.
The product (×) and natural join ("bowtie" shape) operations.
Consider two tables, A and B:
| size |
|---|
| L |
| S |
| color |
|---|
| yellow |
| blue |
| red |
What are all the combinations of rows of A and B? The × operation gives us the answer: A × B =
| color | size |
|---|---|
| yellow | L |
| yellow | S |
| … (4 more rows) | |
But, frequently, × gives us more information than we want. Consider tables C and D:
| ID | size |
|---|---|
| shirt1 | L |
| dress2 | S |
| ID | color |
|---|---|
| shirt1 | yellow |
| dress2 | blue |
| socks3 | red |
Here is the product:
| C.ID | color | D.ID | size |
|---|---|---|---|
| shirt1 | yellow | shirt1 | L |
| shirt1 | yellow | dress2 | S |
| … (4 more rows) | |||
(Note that since we have ID columns from both tables, we rename them as C.ID and D.ID to avoid ambiguity.)
If we want to know what color and size each item is, we need to pick the rows where C.ID = D.ID:
| C.ID | color | D.ID | size |
|---|---|---|---|
| shirt1 | yellow | shirt1 | L |
| dress2 | blue | dress2 | S |
This combination of operations is so important that relational databases give us a new operation, called the natural join (bowtie symbol), that does this — and a little more: it keeps only one of the identical columns C.ID and D.ID and calls it just "ID":
| ID | color | size |
|---|---|---|
| shirt1 | yellow | L |
| dress2 | blue | S |
A × B is written in SQL as select * from A, B;
A natural join B is written in SQL as select * from A natural join B;
We now split our data into two tables: the pet table and the department table.
| deptID | deptName | deptMgr |
|---|---|---|
| DCAN | Canine | Fred |
| DFEL | Feline | Maxine |
| DROD | Rodent | Harry |
| DAVI | Avian | Harry |
| petID | name | … | price | deptID |
|---|---|---|---|---|
| P1 | (no change) | DCAN | ||
| P2 | DFEL | |||
| P3 | DROD | |||
| P4 | DROD | |||
| P5 | DCAN | |||
| P6 | DCAN | |||
| P7 | DFEL | |||
| P8 | DAVI | |||
Now it's easy to make those managerial changes….
Before going back to our original query, let's try just the × and natural join operations in SQL on our pet shop database:
The × query:
select * from pet, dept;
The natural join query:
select * from pet natural join dept;
Some database systems, including Microsoft Access (and Openoffice.org) do not implement the standard SQL operation natural join, so you must work around the omission. Here is one way:
select petID, name, breed, sex, weight, price,
dept.deptID as deptID, deptName, deptMgr
from pet, dept
where pet.deptID = dept.deptID;
Snyder covers another work-around, using inner join … on …, which may be less easy to understand.
Now that's a mouthful of typing, so to avoid complexity in our queries we can set up the natural join of our two tables as a view. Define the view "petDeptView" based on the query above.
A view is a "virtual relation" which is not physically stored as a table; instead it is stored as a query, which is recomputed whenever we need the view.
Back to our query, which departments have pets weighing 7 pounds or less? To answer that, We can use natural join —
select deptName from pet natural join dept where weight < 7;
or the petDeptView to answer:
select deptName from petDeptView where weight < 7;
Snyder distinguishes between the physical database — the collection of all it tables that are stored as data — and the logical database — which includes the views as well as the physical tables.