CSCI-C243 Lab 4
Recursion: Quicksort

Due date: 2012 Nov 5
50 points

Improving Quicksort

Implement the improvement to quicksort described by Drake in Exercise 9.15, page 244. In the header comments, identify Peter Drake as the original author and yourself as the reviser of the program, because that's who did what. Use other comments to highlight the sections which you added or changed.

Test your improved quicksort to be sure it sorts correctly.

Provided Code

The Rand class defined in Rand.java can be used to generate randomly ordered array data.

Experiment

Using the experimental techniques presented in Chapter 7 and in lecture, including the Linux time command, measure the running times of the original (author-provided code) and improved quicksort methods:

  1. for randomly generated arrays of numbers;
  2. for arrays of numbers generated in ascending order.

Note: With sufficiently large N, an ascending order quicksort will overflow the stack. If that occurs, try to increase the stack size, and determine the maximum value of N which can be sorted. Report this information along with the stack size you used.

To set the stack size in the Sun java interpreter use the command option -Xss. For example, to set the stack size to 10 M (10,000,000):

$ java -Xss10m SortTest …

Hint: After you sort a randomly generated array, it's no longer in random order. Once it's sorted, it's sorted! If you're going to repeat the sorting in a loop, create a new array each time.

Use a variety of sizes. Take at least three observations for each size N and report the average of the observed times.

Be sure the lists are large enough to allow accurate timing. To ensure accurate measurement, your running times should be at least 1/2 second and may be up to 10 seconds or more; you may need quite a large array to get so slow a running time.

Try to determine the running time just to sort the array, not including things like array initialization and output. Array initialization may take a significant time for large arrays. If you are using the Java System.currentTimeMillis(), simply take the two time measurements immediately before after the sorting method is called. If you use the Linux time command, run the program without sorting (i.e., just initializing the array and whatever else the program does) for each size N, and subtract these times from the total observed times for size N.

Example. If the time to run the program without sorting for size N is 1 second, and the time to run the program with sorting is 10 seconds, then report 9 seconds as the sorting time.

  1. Make a table of the observed sorting times in this format:
    Improved Quicksort Running Times
    N Time(N) (seconds) Number of Observations
  2. Also, graph the running times, plotting N on the x-axis and the time on the y-axis.
  3. Answer the following questions:
    1. Do the running times agree with the prediction that the average running time for quicksort is Θ(N log N)?
    2. Do the running times agree with the prediction that the worst case running time for quicksort is Θ(N2), and that, for the original version, this worst case occurs when the list is already sorted?
    3. What differences do you observe between the running times of the two different implementations on similar data? For some interesting size(s) of N, answer by describing one method as "k times faster" or "100 k percent faster" than the other.

    Hint: To determine whether observed times agree with theoretical predictions, look at the ratios of running times for different sizes of N. For example, if the time is supposed to be quadratic, time(N) = Θ(N2), then for large N and for some constant k, time(N) = approximately k N2. So if N2 = 2 N1, then time(N1) = k N2, and time(N2) = time(2 N1) = k (2 N1)2 = 4 (N1)2. That is, if N2 / N1 = 2, then time(N2) / time(N1) = 4, since 4 = 22. Similarly, if the ratio of the sizes is 10 to 1, then the ratio of the times should be 100 to 1. The same ideas apply to Θ(N log N) times, although, of course, the formulas will differ.

What to Turn in

  1. A report which contains the table(s) and graph(s) and answers to the three questions. Don't just present raw numbers; wrap some prose around the data.
  2. A list of the names of all source files which are needed to compile and run the program, including source files provided by the instructor or the textbook author which you did not modify, but not including files from the Java distribution (i.e., standard Java libraries).
  3. Attach the source code of each file that you authored or modified.

Grading

Point Distribution
Item Points
Report: technical correctness 15
Report: narrative, English grammar, style, etc. 5
Source code: correctness 25
Source code: style 5
TOTAL 50

Revisions

Version 5.2, 2012 Oct 21. Revised dates.
Version 5.1, 2010 Oct 26.
Version 5, 2009 Oct 22.
Version 4, 2008 Oct 16. Added directions concerning stack overflow.
Version 3, 2008 Oct 6. Revised point allocation to make the total 50.