Module 01: Python

Revision history:
Version 7.1, 2011 Sep 1. Consolidated parts 1 and 2. Some parts still need updating for Python 3.
Version 7, 2009 Aug 31. These notes need updating for Python 3. Sorry!
Version 6, 2008 sep 3. Added suggested exercises and solutions.
Version 5, 2008 aug 25. Added recursive variant of an exercise in part 1C; revised solutions of part 1C.
Version: 4; 2008 aug 24. Added exercises and solutions.
Version: 3; 2008 aug 23. Split into two parts, added navigation links.
Version: 2; 2007 aug 27.

Reading

Reading: The Python Tutorial, Version 3, by Guido van Rossum.

Review or introduction, depending on whether you have previously studied Python.

Why Python?

Why Python for distributed programming? Python is widely regarded as a "scripting" language, which means that programs (scripts) are easier to write, compared to languages like C and Java which have to be compiled, but a whole lot slower, because they're interpreted. Some would call Python an "agile" language, which means about the same thing but has a more positive spin.

The speed of a program is dominated by its slowest part. Some programs are CPU-bound, meaning that the CPU is busy almost all the time while other resources are somewhat idle. Some programs are I/O-bound, meaning that the CPU is relatively idle but the input and output devices work like crazy.

Typically, distributed (networked) applications are IO-bound, because the network is usually slower than anything else. Even for an interpreted program, the speed at which the CPU executes statements far exceeds the speed of sending messages to a cooperating program on another machine. Data communication through the network is the bottleneck. (Of course, there are exceptions; some servers for distributed applications are CPU-bound.)

If you are trying to achieve speed, you get the most gain by speeding up what is slowest. Writing a distributed program in the fastest possible language (C, for example) will not speed it up appreciably if the program is IO-bound. Conversely, writing it in a slow, interpreted language will not really hurt much. In most cases, writing a networked application in a "fast" language is probably a premature optimization.

The other bottleneck is the effort required to write the program. One can write a program far more quickly in an agile language like Python than in a traditionally compiled language like C, C++, Java, or COBOL.

Finally, if you do need to write the program for maximum CPU efficiency, you'll probably write it in C. As it turns out, the Python APIs for distributed programming -- facilities like sockets, for example -- are very close to the C language APIs. So much of what we learn in this course using Python is directly applicable to distributed programs written in C.

Python 2 and 3

There are incompatible changes between Python 2 and Python 3. Python 3 is a better language, but some software is not yet compatible with Python 3. The textbook, in spite of its recent revision, still uses Python 2. I am going to use Python 3 as much as possible in class, but it may not always be possible to do so.

Significant changes in Python 3 include:

ChangePython 2Python 3
print is a function instead of a command. print 6, 7, 8 print(6, 7, 8)
Strings are Unicode by default. u'hello' 'hello'
Strings are not bytes, and socket I/O uses bytes. 'hello' b'hello'
Major improvements in class definition and the object system.

For more information, see the Python Wiki article Should I use Python 2 or Python 3 for my development activity?

Highlights of the Python Tutorial

Sections 1–5

  1. Expressions and statements, control flow, functions.
  2. Input and output.
  3. You should be very familiar with strings, including string operations and methods, because you'll need this to parse the input when reading from a socket.
  4. You should be very familiar with lists, tuples, and dictionaries, because they are handy containers for data.
  5. Tuples are like lists, but immutable. Tuples are enclosed with ()'s; lists, with []'s. A one-tuple containing the string "horse" must be written with a trailing comma: ("horse",). If that is way too ugly, you can write a list and convert it to a tuple: tuple(["horse"]).
  6. The functions filter, map, and reduce, and list comprehensions (Sec. 5.1.3) are handy for operating on lists.

Sections 6–12

  1. Errors and exceptions. Very important in network programming, because lots of things can go wrong (network errors).
  2. Modules and classes.

    You need to know about modules, but for this course, you do not need to know (very much) about classes, i.e., the object-oriented side of Python. Classes are wonderful, but for those new to Python, they can be confusing. There is very little use of classes in the textbook, and I will try to steer away from them in lecture. You are welcome to use classes in your homework, if you are comfortable with them, but it is not required.

  3. The Python Standard Library has plenty of goodies. It is worthwhile to have a general sense of what it provides, so that you don't have to re-invent it.

Detailed Notes, Examples, and Suggested Exercises

  1. Numbers, functions, control, and I/O
  2. Executable scripts, strings, more on functions
  3. Data collections: lists, stacks, queues, list comprehensions, tuples, sequences, sets, dictionaries
  4. Modules, I/O, files, exception handling
  5. Classes: old and new-style
  6. The Python standard library