Version 6.1.11
Handout:
This section surveys some of the useful classes for reading and writing "disk" files*, in the java.io package. (* or in Unix/Linux, just files, which could include any kind of device, disk or otherwise)
For text files:
PrintWriter (p. 468, buffered output with println)
PrintWriter pw = new PrintWriter("drive.txt");
The File class represents a file path (p.469, "driving directions" to locate a file in the file system):
File driveFile = new File("/home/james/drive.txt");
Creating the File object does not "open" the file, or even depend on its real existence.File exists method tells whether the file is actually there on the disk.File.separatorA Scanner with a File argument (p. 470), e.g.,
Scanner scanin = new Scanner(driveFile);
Caution: If you use a file name (a String) instead of a File object, you get a scanner that reads from the string — the file name.)
Essential recipe for processing files:
Close the file:
pw.close();
scanin.close();
Most files are buffered, and if they are not closed, the last outputs may not be written physically to disk.
Programs as data: The source code of a program, in a text file, can itself be data for another program, such as a compiler, editor, formatter, etc.
Binary files (p. 473)
ObjectInputStream, ObjectOutputStream read/write objects.FileInputStream, FileOutputStream connect the object streams to files.Serializable interface makes objects writable and readable. If there is a linked structure of objects, then writing the structure to disk requires traversing it and putting it into some "linear" form (kind of like topological sort, except it may be cyclic). This is called serializing the object.
The Java compiler makes this easy for programmers. Just declare the class implements Serializable. This peculiar interface declares no methods, so there are none to define! The compiler takes care of the details.
Evaluation of binary files
Good:
They are efficient to read and write. E.g., numbers are already in binary form, so they don't need to be converted.
Easy to create in Java — just declare that classes implement the Serializable interface.
Bad:
They're binary! They are not easily understood by a person using a text editor. They require the program which can interpret them, or another program that "knows" the same binary file format.
Fate of files written in proprietary, closed (secret) binary formats. Microsoft Word ".doc" files, Visual Basic "DOT NOT"
Getting even: send somebody a binary attachment that they lack the program to read.
In Java, a new version of the program, or even the same version after recompiling, may be unable to read the saved object file. This can be avoided by declaring a serial version ID. Example: SObject.java
This seems to work okay in Java 1.6.0, even without the serial version ID, however.
(Skip)
Basic idea of external sorting by a k-way merge sort:
Repeat until there is no more data to be sorted:
Merge file0 and file1, writing the result to file2. Note: merge requires very little memory: just the next item in each of the merged files (well, more realistically, there would be a buffer for each file, but anyway it's constant size).
Replace file0 with file2
Deliver file0 as the result; erase other temporary files.
Any good O.S. will provide a system sort command. Use it for sorting files, unless there's a good reason not to.
Examples of the Unix/Linux sort command This sorts text files (not binary files) in various ways:
Sort some.data by alphabetically comparing entire lines, writing the result as sorted.data:
sort some.data > sorted.data
Sort it numerically, using numbers at the beginning of each line, replacing the file:
sort -n some.data > some.data
Sort in reverse numerical order, writing the result to stdout:
sort -r -n some.data
Sort on the second field numerically, resolving ties by sorting on the fourth through fifth fields alphabetically; fields are delimited by space; "k" is for "key":
sort -k 2,2n -k 4,5 some.data
Like the preceding, but with fields delimited by commas:
sort -k 2,2n -k 4,5 -t ',' some.data
Sort on the 5th through 20th characters of the first field:
sort -k 1.5,1.20 some.data
Sort output of program1 and deliver as input to program 2:
program1 | sort | program2
For more about the sort command:
info sort
(More info needed: does 'sort' sort internally when there's sufficient RAM, and externally otherwise? Does it sort externally at all?)
Anyway, they are balanced search trees, but generalized beyond binary search trees..
An m-way search trees is a generalization of binary search tree (where m = 2):
Illustration, with m = 6: 
Order property:
[And is it true that an ordered list is 1-way search tree? Each non-leaf node would have to have one child in such a tree. But no, because each node also would be required to contain 0 keys and 0 values.]
Search in an m-way search tree is an extension of the search in a binary search tree.
B-trees:
(Note: our m is Drake's 2 m.)
Insert 8 into a B-tree of order 5. See the following figures.
(1a) Initial B-tree, before inserting 8.
(1b) After inserting 8 in the leaf (4, 5, 6, 7). The leaf is now too full.
(1c) After splitting the leaf into (4, 5), 6, and (7, 8), and absorbing 6 into its parent. Now the parent node is too full.
(1d) After splitting the parent node into (2, 3), 6, and (9, 10), and absorbing 6 into the root.
(1e) After splitting the root. The tree is now one level higher.
Removing data from a B-tree. To remove a key k and its value:
Search for the element E to be deleted. If it is in an internal mode (not a leaf), replace it with a leaf element R, which is either E's inorder successor or its inorder predecessor, as in BST delete. Delete R from its leaf. This reduces all deletions to deletion from a leaf node. This brings up case 2.
If E is in a leaf node N, remove it from the leaf, and then:
If N is the root, we are done. (Write out the new root node to disk, if it is not empty.)
If N is still adequate in size (≥ ceil(m/2) subtrees), we are done. (Write out to disk.)
Otherwise, if N has a left sibling, let L = the nearest left sibling of N. Let P = their parent.
If L is not minimal, perform a "rotation" in which the rightmost key and subtree are absorbed into P, while the key and subtree in P which just precede N descend into N itself. See Figure 2.
(The "vacuum principle" is at work here: deletion of the key from N creates a vacuum which sucks down a key from P, which creates another vacuum which sucks up a key from L.)
Otherwise (L exists but is minimal), combine ("merge") nodes L and N into a single node, "sucking down" the key in P which is "between" them. See Figure 3.
Otherwise, N has no left sibling, so it must have a right sibling — why? Let R = the nearest right sibling of N, and let P = their parent.
(The details of d.i and d.ii are omitted; they are symmetric to c.i and c.ii, interchanging "left" and "right".)
There are possible variations.
Delete F from a B-tree of order 5. See figures below.
(2a) Initial B-tree.
(2b) After removal of F. The removal sucks E down to fill the place of F, and sucks D up to take the place of E.
Delete F from a B-tree of order 5. See figures below.
(3a) Initial B-tree
(3b) After removing F. The removal sucks E down to fill the place of F. Since (A, D) is so light, it is sucked all the way up and down.
From the B-tree (of order 5? order 4?) shown below, delete these values, in order:
A B-tree with order ...
Red-black trees as B-trees
Convert a 2-3-4 tree to red-black. See figures below.
(5a) The initial 2-3-4 tree.
+-----+
|b7 |
| \ |
| r14|
+-+---+
__________/ | \_______________
/ | \
+---+ +---+ +----------+
|b2 | |b10| | b18 |
|\ | | | | / \ |
| r5| | | |r16 r21|
+-+-+ +---+ +---+--+---+
___/ / \ / \ _/ | \ \______
/ / \ / \ / | \ \
+--+ +----+ +--+ +----+ +--------+ +---+ +---+ +-----+ +---------+
|b1| |b3 | |b6| |b8 | | b12 | |b15| |b17| |b19 | | b23 |
| | | \ | | | | \ | | / \ | | | | | | \ | | / \ |
| | | r4| | | | r9| |r11 r13| | | | | | r20| |r22 r24|
+--+ +----+ +--+ +----+ +--------+ +---+ +---+ +-----+ +---------+
(5b) After replacing each 2-3-4 node with a small red-black tree.
b7
_______/ \__________________
/ \
b2 r14
/ \__ ___________/ \____
/ \ / \
b1 r5 b10 b18
_/ \ ___/ \_ __/ \_____
/ | / \ / \
b3 b6 b8 b12 r16 r21
\ \ / \ _/ \ ___/ \____
\ \ / \ / \ / \
r4 r9 r11 r13 b15 b17 b19 b23
\ _/ \_
\ / \
r20 r22 r24
(5c) The red-black tree that remains, after removing the 2-3-4 scaffolding.