Reading: The Python Tutorial, Version 3, by Guido van Rossum.
Review or introduction, depending on whether you have previously studied Python.
Why Python for distributed programming? Python is widely regarded as a "scripting" language, which means that programs (scripts) are easier to write, compared to languages like C and Java which have to be compiled, but a whole lot slower, because they're interpreted. Some would call Python an "agile" language, which means about the same thing but has a more positive spin.
The speed of a program is dominated by its slowest part. Some programs are CPU-bound, meaning that the CPU is busy almost all the time while other resources are somewhat idle. Some programs are I/O-bound, meaning that the CPU is relatively idle but the input and output devices work like crazy.
Typically, distributed (networked) applications are IO-bound, because the network is usually slower than anything else. Even for an interpreted program, the speed at which the CPU executes statements far exceeds the speed of sending messages to a cooperating program on another machine. Data communication through the network is the bottleneck. (Of course, there are exceptions; some servers for distributed applications are CPU-bound.)
If you are trying to achieve speed, you get the most gain by speeding up what is slowest. Writing a distributed program in the fastest possible language (C, for example) will not speed it up appreciably if the program is IO-bound. Conversely, writing it in a slow, interpreted language will not really hurt much. In most cases, writing a networked application in a "fast" language is probably a premature optimization.
The other bottleneck is the effort required to write the program. One can write a program far more quickly in an agile language like Python than in a traditionally compiled language like C, C++, Java, or COBOL.
Finally, if you do need to write the program for maximum CPU efficiency, you'll probably write it in C. As it turns out, the Python APIs for distributed programming -- facilities like sockets, for example -- are very close to the C language APIs. So much of what we learn in this course using Python is directly applicable to distributed programs written in C.
There are incompatible changes between Python 2 and Python 3. Python 3 is a better language, but some software is not yet compatible with Python 3. The textbook, in spite of its recent revision, still uses Python 2. I am going to use Python 3 as much as possible in class, but it may not always be possible to do so.
Significant changes in Python 3 include:
| Change | Python 2 | Python 3 |
|---|---|---|
print is a function instead of a command. |
print 6, 7, 8 |
print(6, 7, 8) |
| Strings are Unicode by default. | u'hello' |
'hello' |
| Strings are not bytes, and socket I/O uses bytes. | 'hello' |
b'hello' |
| Major improvements in class definition and the object system. | ||
For more information, see the Python Wiki article Should I use Python 2 or Python 3 for my development activity?
("horse",).
If that is way too ugly, you can write a list and convert
it to a tuple: tuple(["horse"]).
You need to know about modules, but for this course, you do not need to know (very much) about classes, i.e., the object-oriented side of Python. Classes are wonderful, but for those new to Python, they can be confusing. There is very little use of classes in the textbook, and I will try to steer away from them in lecture. You are welcome to use classes in your homework, if you are comfortable with them, but it is not required.