Version 1.11
Required: Rhodes and Goerzen, Chapter 8
This chapter deals with three topics: caches, specifically the use of memcached; message queues; and map-reduce. All relate to the common goal of ramping up our distributed applications to a very large scale; for example, a web site with millions of users.
ATW told me (November, 2011) that his company's main web server is hosted by Amazon, it has 64 GB RAM and 16 quad-core quad-core processors (so, 64 cores). Their web services uses a lot of PHP. He reports that when they started using memcached, it reduced their CPU usage by 50%.
Facebook is a more famous user of memcached (and claims to be the largest user). Several other prominent users are listed on the memcached home page.
Memcached is a distributed key-value store. That is, in Python terms, it's like a big dictionary, that's distributed in multiple nodes on the network. You can add key-value pairs, look them up, update them, and delete them.
What kinds of key-value pairs? SQL queries and their results, dynamically generated web pages—and anything else that would otherwise slow down your service. You can cache the results of function calls.
Basic set-up:
Details:
Risks:
See also Why should you not use Memcached?
A message queue provides asynchronous communication, in the way that email is asynchronous. A sender sends a message; the message goes into a queue, where it remains until the receiver retrieves it.
Message queuing systems are sometimes called message-oriented middleware (MOM) because, of course, they sit between the sender and the receiver.
As Rhodes points out, sending messages via a message queue can be point to point (single sender, single receiver), like a TCP message, but it needn't be. In the publish/subscribe model, a message could be delivered to multiple receivers based on their interests. Receivers can subscribe to particular topics (so each message must specify a topic) and receive all messages on that topic—like subscribing to an email list. Or, receivers can specify filters that select messages based on their content.
From the world of functional programming:
map f xs applies a function f to each element of a list xs and collects the list of results.
map square [2, 7, 20] would produce [4, 49, 400].reduce g init ys uses a function g to accumulate a value from the list ys, starting with init as the initial value.
reduce (+) 0 [4, 49, 400] would produce 453.These operations tend to distribute well to computers on a network. If xs contains 1600 numbers and you have 16 computing nodes, each node could compute square x for 100 of the xs. Once that is done, each node could also add up its 100 squared xs, and then we have only 16 numbers to be added to get the final result.
Scaling Facebook with OpenSource tools (You Tube)