Version 5.2.21
(Drake, sec. 14.4)
A red-black tree (RBT) is a height-balanced binary search tree. Height-balanced means (roughly) that the left and right subtrees are "almost" the same height, in a way that guarantees the height of the whole tree is Θ(log N).
In addition to the order property required for all BSTs, a RBT has the following requirements:
Compare these with Drake's 3 conditions on pp. 389-390.
Note that while it is required that every subtree of a red-black tree is a binary search tree, it is not required that every subtree be a red-black tree. In fact, some subtrees will have red roots, which disqualifies them from being red-black trees in themselves.
The red and black conditions imply that the longest path from root to leaf is no more than twice the length of the shortest path from root to leaf.
Think about it. The shortest possible path would consist of all black nodes. The longest possible path would have to have alternating red and black nodes, since every red node has a black parent. Consequently, its length is double that of the shortest possible path. Of course, in a particular RBT, the shortest actual path might not be the shortest possible, and the longest actual path might not be the longest possible.
It is in this sense that a RBT is "balanced". It is (usually) not perfectly balanced, like a "perfect" binary tree. (Ah, so that's why those are called "perfect".) But it is reasonably well balanced, enough to guarantee the following:
Theorem: The height of a RBT containing N nodes is not more than 2 ceil(log(N + 1)), and is therefore Θ(log N).
(Proof omitted).
Search is the same as in BST.
Insertion is in some ways like a BST, and in some ways like a heap. As in a BST, the insertion is made (initially) at a leaf, the same place that would be found by the search procedure. As in a heap, this initial insertion may violate a required condition, and we make some adjustments to restore the condition.
The place for a new node to be inserted is always a leaf, and is found in the same way as for a plain BST.
And we always insert a red node.
Consequently, the insertion cannot violate the black property (inserting a red node does not change the number of black nodes on a path from root to leaf), and we only have to worry about the red property being violated.
Since the new node is red, if it has a red parent, we are in violation of the red property and must repair the damage.
Damage is repaired by a combination of tools, including color changes and "rotations". A rotation rearranges the positions of some of the nodes. For the details of the rotations and color changes, I will diverge from the textbook and cover a simpler method that works for purely functional red-black trees.
If the red condition is violated below the root, then there must be a black node with a red child and a red grandchild. There are four variations of this:
See diagram for details.3
| View SVG | View PNGInsert these values into an initially empty RBT: −8, 0, 6, −25, −17, −10, −15 (or some similar stream of values).
incrTest [1..15] or incrTest [1..31], for example.Deletion is again like in a BST, but the initial deletion of a node may violate either the red or the black condition, and we have to perform more adjustments to restore it to a red-black tree. We omit the details, which are complex.
Since the presentation has taken a functional approach, let's implement it in Haskell.
(Currently, this will not run on merlin because we do not have graphviz installed.)
It seems cruel to do this in Java, by comparison, but here it is. The Java implementation does show some more detail in the visualization---showing each step in rebuilding the tree after an insertion.
Since I have completely redone my Java implementation of functional binary trees and binary search trees, I am providing links to all the source files:
(Currently, this will not run on merlin because we do not have graphviz installed.)
Utilities
Empty subtrees in a red-black tree are sometimes considered as leaves with no data, and called "nils" or "nulls". This is silly and apt to be confusing, because null refers to no object. If a child pointer is null, then there is no child. Although I will not speak in this confusing way, you should know that some people, including Drake, do so. To complicate matters further, they usually say that these "null children" are black nodes. If we say that an empty subtree is a black node, then this adds one to the number of black nodes (the "black height") along every path from the root to an empty subtree. This doesn't change anything, of course: if all black heights were equal before adding 1, then they are all equal after adding 1.↩
Diagram made with Inkscape. Based on a diagram by Chris Okasaki in Purely Functional Data Structures.↩