Rhodes and Goerzen, chapter 5
(p. 71) Review chr and ord functions, encode and decode methods, and bytes versus str (string) objects.
>>> ord('m')
109
>>> chr(109)
'm'
>>> 'hot potatoes'.encode('utf-8')
b'hot potatoes'
>>> b'hot potatoes'.decode('utf-8')
'hot potatoes'
(p. 72) The codecs module defines the standard encodings (see 6.6. codecs — Codec registry and base classes especially 6.6.6
(p. 73) Fixed and variable length encodings: fixed length encodings use the same number of bits for each character; variable length encodings do not. ASCII and UTF_32 are fixed length; UTF_8 and UTF_16 are variable length, usually using 8 or 16 bits but sometimes using more.
This is only significant if we are dealing with binary data, and for many years the trend in network protocols has been away from binary protocols and towards text protocols.
(p. 73) Network byte order is the standard order for transmitting bytes through the network.
This goes back to the differences between hardware architectures: some architectures ("big-endian") store the most significant byte first, others ("little-endian") store it last. Network byte order is big-endian.
When necessary, use the struct module, or the functions of the socket module to convert to and from network byte order:
socket.ntohl(x) converts from network to host byte order; socket.htonl(x) converts from host to network byte order.socket.ntohs(x) and socket.htons(x).Python standard library reference
(p. 75)
Given that a message might be split into multiple packets, how do you know when you've received a complete message? Options include:
"Stream" in one direction: just recv until you get back an empty string, which means end of data. This precludes replying before you've received everything, so, in effect, there's just one message received.
"Stream" in both directions.
Fixed length messages—usually awkward, because messages contain varying amounts of information.
\r\n, ***, etc.
readline()Multiline messages can be terminated by a sentinel (a common practice in file processing):
...\r\n
...\r\n
END\r\n
Prefix the message with its length (as in HTTP):
3 cat
13 cats and dogs
For blocks of unknown length, send each chunk using the length-prefix method, then send a final chunk to mark the end (as in method 4, or 5 with length 0).
HTTP in fact uses a combination of methods 4, 5, and 6.
The pickle module provides Python object serialization: the ability to turn any Python object into a sequence of bytes, which can then be transmitted through a socket.
The function dumps ("dump to string") produces a bytes representation of the object, and the function loads ("load from string") reconstructs a copy of the object from the bytes:
obj_bytes = pickle.dumps(obj)
obj_copy = pickle.loads(obj_bytes)
In Python 2, the pickle
dumpsandloadsfunctions converted between Python objects and strings; hence the s in their function names. Here, as elsewhere, Python 3 makes a stricter distinction between bytes and string objects.
The dumps function can have a second argument, protocol, with the value 0, 1, 2, or 3. Currently, the constant HIGHEST_PROTOCOL means protocol 3. Using protocol 0 results in an ASCII string representation which is semi-readable by human beings (actually, a bytes object encoding an ASCII string, in Python 3). Using protocol 1-3 results in a binary string representation which is quite incomprehensible to humans, but a little more efficient for the computer and network. However, with the overhead of an XML wrapper and HTTP, it is still not going to be efficient compared to sockets.
Protocol 3 is the recommended and default protocol for Python 3, but it cannot be unpickled by Python 2.
The loads function does not need to be told which protocol to use for unpickling.
obj_bytes_ascii = pickle.dumps(obj, 0)
obj_copy = pickle.loads(obj_bytes_ascii)
obj_bytes_bin = pickle.dumps(obj, 2)
obj_copy = pickle.loads(obj_bytes_bin)
Note that the "pickled" bytes object always ends in a period (p. 79).
Experiment with dumping and loading these values:
lst = [24, 'coyotes', 3.7]
d = {'observed': 24, 'missed': 10}
Python standard library reference:
JSON is JavaScript Object Notation, a popular lightweight format for data interchange.
Basic usage is similar to pickle, with dumps and loads functions:
>>> import json
>>> json.dumps(lst)
'[24, "coyotes", 3.7]'
>>> json.dumps(d)
'{"observed": 24, "missed": 10}'
>>> json.dumps(24)
'24'
>>> json.dumps('barrel of monkeys')
'"barrel of monkeys"'
Note that JSON encoding returns a string, not a bytes object, so it must be converted to bytes before sending through a socket. But if you've wrapped your socket in a filelike object, then you can just write the string.
Decoding is with loads:
>>> json.loads('[24, "coyotes", 3.7]')
[24, 'coyotes', 3.7]
Objects: Rather surprisingly, considering it's JavaScript Object Notation, the Python json module does not directly support encoding or decoding Python objects in general—i.e., if you define your own classes, you'll have to do extra work to encode their instances in JSON. (Is there an inherent problem here, because all JavaScript objects are really dictionaries, but in Python, dictionaries are just one type of object?)
JSON does not support binary data, so you cannot use it (unaided) to send, for example, PNG image files. There is a related format, BSON, which does, but it has no Python standard library module.
Python standard library reference:
XML is important, but complex. We'll peek at it in the chapter on RPC. XML is covered in much more detail in INFO I308, Information Representation.
May be important since the network is probably the bottleneck if you are sending or receiving large amounts of data. Python's zlib module supports the gzip compression method. Note that image, audio, and video data have their own specialized compressed formats; it's probably pointless to gzip over these.
Example:
>>> longs = "a camel was thirsty. " * 5000
>>> longbs = longs.encode()
>>> z = zlib.compress(longbs)
>>> len(z)
315
An optional compressionlevel can be specified from 1 (least, fastest), through 6 (default), to 9 (greatest, slowest):
>>> z = zlib.compress(longbs, 1)
>>> len(z)
693
>>> z = zlib.compress(longbs, 9)
>>> len(z)
315
It can be decompressed:
>>> dbytes = zlib.decompress(z)
>>> len(dbytes)
110000
>>> len(dbytes.decode())
110000
(p. 81) points out that zlib compression is "self-framing", i.e., if you send zlib compressed data followed by other data, the zlib decompressor can tell where its compressed data ends. The example on this page involves creating a decompressobj, using it to decompress, and checking whether it has unused data. …
Python standard library reference:
p. 82 points out a few kinds of exceptions:
Higher-level libraries (higher than socket) may either raise these exceptions, or re-raise their own exceptions.
Exceptions can be handled with the try/except statement:
try:
...
except socket.gaierror as e:
...
except socket.error as e:
...
except:
...
finally:
...
The finally clause provides code that is executed regardless of whether any exception occurred, at the end—to clean up, by closing your socket for example.
To write messages to the standard error stream:
import sys
sys.stderr.write("error message\n")
(The write method returns the number of bytes? characters? written.)