Definition: a min heap is a binary tree (ordinarily not a binary search tree!) satisfying:
The shape property: the tree is complete ("perfect"), except for a possible chunk of rightmost nodes taken out of the bottom level.
(Recall: perfect means all leaves are on the same level, and all non-leaves have two children.)
We will use the word heap by itself to mean a min heap.
A max heap is like a min heap, except that each node is greater than or equal to its children.
The order property implies that the root is the minimum element, in a min heap; or that it is the maximum element, in a max heap.
The shape property implies that the height of a heap of N nodes is Θ(log N).
(Defer the representation issues, pp. 370-371).
A priority queue is a sequence of objects ordered by "priority". (Drake takes the priority to be just the object itself.)
The operations on a priority queue are:
Heaps can be implemented so that both of these operations run in time Θ(log N).
Heap sort is a comparison sort with worst-case running time Θ(N log N)
Given a list of data to sort, create an empty heap. Insert the list elements one by one into the heap. Then, begin removing and outputting the minimum elements (roots) of the heap; continue doing so until the heap becomes empty.
Question: why is heap sort's running time Θ(N log N)?
Because of its shape, a heap can efficiently be represented as an array (or array list). Number the nodes in the order of level order traversal:
The numbers in parentheses are the node numbers. Then, store each node in an array, using the node number as the array index:
We also need to keep track of the heap's size, as in an ArrayList. In fact, we can just use an ArrayList.
To operate on this data structure as a heap, we need to be able to find the parent and children of any node. The following formulas relate the node number (array index) of a parent and its children: the node number is I, its parent is numbered P, the left and right children are nodes numbered L and R.
In both L(I) and R(I), the condition is that the result is within the heap's size; otherwise, the node has no such child.
By clever coding, we can implement heap of an array (array list) using no extra space by embedding the heap in the array. The heap is the sorted part of the array, which is initially empty:
To removeMin(), we can almost just remove the root. The time for that would be Θ(1). But, then we no longer have a heap—nor even a tree!
To rebuild the heap, we move the rightmost element of the bottom level to root position. This restores the shape, but may violate the order property.
To restore order, we (call in the National Guard and), let node = root, compare node with its children, and if node is greater than either child, swap node with its smaller child. Repeat, with the node in its new position, until node finds its proper place (i.e., node <= each child).
We now have the right shape and the right order.
The new node initially goes in at the leftmost unoccupied position on the bottom level, as required by the shape property (creating a new level, if needed).
But this may violate the order property. To restore order, we compare the new node with its parent, and, if node < parent, swap them. Repeat with the node in its new position until it finds its place (node ≥ parent).
Question: How long does insert take, including the time to restore the heap?
Two sets are disjoint if they have no elements in common. For example, every regular student in an undergraduate college must be either a freshman, sophomore, junior, or senior. So the four classes are disjoint sets.
We will defer consideration of this topic until chapter 15 on graphs.
Drake calls these "digital search trees",2 but the more common term is trie. Edward Fredkin, inventor of tries, took this name from the word "retrieve" and pronounces it like "tree", but most people say it like "try". Tries are also called prefix trees, which is to me a more descriptive name.
Tries are useful when the keys are sequences, such as strings. Note that comparing strings of length M, N has worst-case time Θ(min(M, N)), whereas comparing two numbers (ints) is Θ(1)). In the Ghost game, used because we need to find words beginning with a prefix (i.e., complete a partially spelled word) and recognize when a word is complete. Can also use in Scrabble for similar reasons.
Main idea: keys are represented as paths through the tree instead of being stored within a node.
A dictionary of words is a good example:
A black node marks the end of a word, but there can be another word that continues from that node.
Start at the root. For each item in the key sequence, take the branch corresponding to that item, creating a new branch if needed. After the last item, mark the node (X).
Start at the root. For each item in the key sequence, take the branch corresponding to that item, but if the branch does not exist, immediately fail. On reaching the end of the key sequence, if there's a mark there, then succeed; otherwise fail.
To find all completions of a prefix, follow the lookup procedure to the end of the prefix, and then return the keys formed from all paths continuing through that point.
Note that these ideas could easily be extended to a map by associating the end-of-key marker with more data (instead of just true or false).
Drake uses a map at each node to associate key items (such as letters) with children (branches). Alternatively, We could use an array, if the number of key items is small, but this might require more space.
Assuming a good hash table for the branch map in each node, lookup and insert are both Θ(L), where L is the maximum length of a search key. Why?
Suffix trees are like prefix trees, but the keys are stored in reverse order, making it easy to answer questions like what words end in "-lay"?
What if the keys are not sequences, but can be serialized with a maximum length? For example, 32-bit integers can be serialized into 32 bits. Wouldn't this allow us to use a trie with Θ(L) = Θ(1) time for insert and lookup time?
Why "digital"? Isn't every computer search tree digital, since computers are digital machines? Does he mean some analogy to the human hand, with the palm branching out to the fingers? But isn't every tree like that? Or is it because of Fredkin's interest in "digital physics" and "digital philosophy"? ↩