diff --git a/.htaccess b/.htaccess
index 28a220d..b901de6 100644
--- a/.htaccess
+++ b/.htaccess
@@ -1,3 +1,3 @@
-FileETag MTime Size
-
-SetEnv dont-vary
+FileETag MTime Size
+
+SetEnv dont-vary
diff --git a/advanced-iterators.html b/advanced-iterators.html
index fb099d8..ee15c6d 100755
--- a/advanced-iterators.html
+++ b/advanced-iterators.html
@@ -1,647 +1,647 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Advanced Iterators - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 8}
-mark{display:inline}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#advanced-iterators>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
-<h1>Advanced Iterators</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> Great fleas have little fleas upon their backs to bite &#8217;em,<br>And little fleas have lesser fleas, and so ad infinitum. <span class=u>&#x275E;</span><br>&mdash; Augustus De Morgan
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>Just as <a href=regular-expressions.html>regular expressions</a> put <a href=strings.html>strings</a> on steroids, the <code>itertools</code> module puts <a href=iterators.html>iterators</a> on steroids. But first, I want to show you a classic puzzle.
-
-<pre class=nd><code>HAWAII + IDAHO + IOWA + OHIO == STATES
-510199 + 98153 + 9301 + 3593 == 621246
-
-H = 5
-A = 1
-W = 0
-I = 9
-D = 8
-O = 3
-S = 6
-T = 2
-E = 4</code></pre>
-
-<p>Puzzles like this are called <i>cryptarithms</i> or <i>alphametics</i>. The letters spell out actual words, but if you replace each letter with a digit from <code>0&ndash;9</code>, it also &#8220;spells&#8221; an arithmetic equation. The trick is to figure out which letter maps to each digit. All the occurrences of each letter must map to the same digit, no digit can be repeated, and no &#8220;word&#8221; can start with the digit 0.
-
-<aside>The most well-known alphametic puzzle is <code>SEND + MORE = MONEY</code>.</aside>
-
-<p>In this chapter, we&#8217;ll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles <em>in just 14 lines of code</em>.
-
-<p class=d>[<a href=examples/alphametics.py>download <code>alphametics.py</code></a>]
-<pre class=pp><code>import re
-import itertools
-
-def solve(puzzle):
-    words = re.findall('[A-Z]+', puzzle.upper())
-    unique_characters = set(''.join(words))
-    assert len(unique_characters) &lt;= 10, 'Too many letters'
-    first_letters = {word[0] for word in words}
-    n = len(first_letters)
-    sorted_characters = ''.join(first_letters) + \
-        ''.join(unique_characters - first_letters)
-    characters = tuple(ord(c) for c in sorted_characters)
-    digits = tuple(ord(c) for c in '0123456789')
-    zero = digits[0]
-    for guess in itertools.permutations(digits, len(characters)):
-        if zero not in guess[:n]:
-            equation = puzzle.translate(dict(zip(characters, guess)))
-            if eval(equation):
-                return equation
-
-if __name__ == '__main__':
-    import sys
-    for puzzle in sys.argv[1:]:
-        print(puzzle)
-        solution = solve(puzzle)
-        if solution:
-            print(solution)</code></pre>
-
-<p>You can run the program from the command line. On Linux, it would look like this. (These may take some time, depending on the speed of your computer, and there is no progress bar. Just be patient!)
-
-<pre class='nd screen'>
-<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 alphametics.py "HAWAII + IDAHO + IOWA + OHIO == STATES"</kbd>
-<samp>HAWAII + IDAHO + IOWA + OHIO = STATES
-510199 + 98153 + 9301 + 3593 == 621246</samp>
-<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 alphametics.py "I + LOVE + YOU == DORA"</kbd>
-<samp>I + LOVE + YOU == DORA
-1 + 2784 + 975 == 3760</samp>
-<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 alphametics.py "SEND + MORE == MONEY"</kbd>
-<samp>SEND + MORE == MONEY
-9567 + 1085 == 10652</samp></pre>
-
-<p class=a>&#x2042;
-
-<h2 id=re-findall>Finding all occurrences of a pattern</h2>
-
-<p>The first thing this alphametics solver does is find all the letters (A&ndash;Z) in the puzzle.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import re</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>re.findall('[0-9]+', '16 2-by-4s in rows of 8')</kbd>  <span class=u>&#x2460;</span></a>
-<samp class=pp>['16', '2', '4', '8']</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>re.findall('[A-Z]+', 'SEND + MORE == MONEY')</kbd>     <span class=u>&#x2461;</span></a>
-<samp class=pp>['SEND', 'MORE', 'MONEY']</samp></pre>
-<ol>
-<li>The <code>re</code> module is Python&#8217;s implementation of <a href=regular-expressions.html>regular expressions</a>. It has a nifty function called <code>findall()</code> which takes a regular expression pattern and a string, and finds all occurrences of the pattern within the string. In this case, the pattern matches sequences of numbers. The <code>findall()</code> function returns a list of all the substrings that matched the pattern.
-<li>Here the regular expression pattern matches sequences of letters. Again, the return value is a list, and each item in the list is a string that matched the regular expression pattern.
-</ol>
-
-<p>Here&#8217;s another example that will stretch your brain a little.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>re.findall(' s.*? s', "The sixth sick sheikh's sixth sheep's sick.")</kbd>
-<samp class=pp>[' sixth s', " sheikh's s", " sheep's s"]</samp></pre>
-
-<aside>This is the <a href=http://en.wikipedia.org/wiki/Tongue-twister>hardest tongue twister</a> in the English language.</aside>
-
-<p>Surprised? The regular expression looks for a space, an <code>s</code>, and then the shortest possible series of any character (<code>.*?</code>), then a space, then another <code>s</code>. Well, looking at that input string, I see five matches:
-
-<ol>
-<li><code>The<mark> sixth s</mark>ick sheikh's sixth sheep's sick.</code>
-<li><code>The sixth<mark> sick s</mark>heikh's sixth sheep's sick.</code>
-<li><code>The sixth sick<mark> sheikh's s</mark>ixth sheep's sick.</code>
-<li><code>The sixth sick sheikh's<mark> sixth s</mark>heep's sick.</code>
-<li><code>The sixth sick sheikh's sixth<mark> sheep's s</mark>ick.</code>
-</ol>
-
-<p>But the <code>re.findall()</code> function only returned three matches. Specifically, it returned the first, the third, and the fifth. Why is that? Because <em>it doesn&#8217;t return overlapping matches</em>. The first match overlaps with the second, so the first is returned and the second is skipped. Then the third overlaps with the fourth, so the third is returned and the fourth is skipped. Finally, the fifth is returned. Three matches, not five.
-
-<p>This has nothing to do with the alphametics solver; I just thought it was interesting.
-
-<p class=a>&#x2042;
-
-<h2 id=unique-items>Finding the unique items in a sequence</h2>
-
-<p><a href=native-datatypes.html#sets>Sets</a> make it trivial to find the unique items in a sequence.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>a_list = ['The', 'sixth', 'sick', "sheik's", 'sixth', "sheep's", 'sick']</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>set(a_list)</kbd>                      <span class=u>&#x2460;</span></a>
-<samp class=pp>{'sixth', 'The', "sheep's", 'sick', "sheik's"}</samp>
-<samp class=p>>>> </samp><kbd class=pp>a_string = 'EAST IS EAST'</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>set(a_string)</kbd>                    <span class=u>&#x2461;</span></a>
-<samp class=pp>{'A', ' ', 'E', 'I', 'S', 'T'}</samp>
-<samp class=p>>>> </samp><kbd class=pp>words = ['SEND', 'MORE', 'MONEY']</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>''.join(words)</kbd>                   <span class=u>&#x2462;</span></a>
-<samp class=pp>'SENDMOREMONEY'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>set(''.join(words))</kbd>              <span class=u>&#x2463;</span></a>
-<samp class=pp>{'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}</samp></pre>
-<ol>
-<li>Given a list of several strings, the <code>set()</code> function will return a set of unique strings from the list. This makes sense if you think of it like a <code>for</code> loop. Take the first item from the list, put it in the set. Second. Third. Fourth. Fifth&nbsp;&mdash;&nbsp;wait, that&#8217;s in the set already, so it only gets listed once, because Python sets don&#8217;t allow duplicates. Sixth. Seventh&nbsp;&mdash;&nbsp;again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn&#8217;t even need to be sorted first.
-<li>The same technique works with strings, since a string is just a sequence of characters.
-<li>Given a list of strings, <code>''.join(<var>a_list</var>)</code> concatenates all the strings together into one.
-<li>So, given a list of strings, this line of code returns all the unique characters across all the strings, with no duplicates.
-</ol>
-
-<p>The alphametics solver uses this technique to build a set of all the unique characters in the puzzle.
-
-<pre class='nd pp'><code>unique_characters = set(''.join(words))</code></pre>
-
-<p>This list is later used to assign digits to characters as the solver iterates through the possible solutions.
-
-<p class=a>&#x2042;
-
-<h2 id=assert>Making assertions</h2>
-
-<p>Like many programming languages, Python has an <code>assert</code> statement. Here&#8217;s how it works.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>assert 1 + 1 == 2</kbd>                                     <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>assert 1 + 1 == 3</kbd>                                     <span class=u>&#x2461;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-AssertionError</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>assert 2 + 2 == 5, "Only for very large values of 2"</kbd>  <span class=u>&#x2462;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-AssertionError: Only for very large values of 2</samp></pre>
-<ol>
-<li>The <code>assert</code> statement is followed by any valid Python expression. In this case, the expression <code>1 + 1 == 2</code> evaluates to <code>True</code>, so the <code>assert</code> statement does nothing.
-<li>However, if the Python expression evaluates to <code>False</code>, the <code>assert</code> statement will raise an <code>AssertionError</code>.
-<li>You can also include a human-readable message that is printed if the <code>AssertionError</code> is raised.
-</ol>
-
-<p>Therefore, this line of code:
-
-<pre class='nd pp'><code>assert len(unique_characters) &lt;= 10, 'Too many letters'</code></pre>
-
-<p>&hellip;is equivalent to this:
-
-<pre class='nd pp'><code>if len(unique_characters) > 10:
-    raise AssertionError('Too many letters')</code></pre>
-
-<p>The alphametics solver uses this exact <code>assert</code> statement to bail out early if the puzzle contains more than ten unique letters. Since each letter is assigned a unique digit, and there are only ten digits, a puzzle with more than ten unique letters can not possibly have a solution.
-
-<p class=a>&#x2042;
-
-<h2 id=generator-expressions>Generator expressions</h2>
-
-<p>A generator expression is like a <a href=generators.html>generator function</a> without the function.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>unique_characters = {'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>gen = (ord(c) for c in unique_characters)</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>gen</kbd>                                        <span class=u>&#x2461;</span></a>
-<samp class=pp>&lt;generator object &lt;genexpr> at 0x00BADC10></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>next(gen)</kbd>                                  <span class=u>&#x2462;</span></a>
-<samp class=pp>69</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(gen)</kbd>
-<samp class=pp>68</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>tuple(ord(c) for c in unique_characters)</kbd>   <span class=u>&#x2463;</span></a>
-<samp class=pp>(69, 68, 77, 79, 78, 83, 82, 89)</samp></pre>
-<ol>
-<li>A generator expression is like an anonymous function that yields values. The expression itself looks like a <a href=comprehensions.html#listcomprehension>list comprehension</a>, but it&#8217;s wrapped in parentheses instead of square brackets.
-<li>The generator expression returns&hellip; an iterator.
-<li>Calling <code>next(<var>gen</var>)</code> returns the next value from the iterator.
-<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>. In these cases, you don&#8217;t need an extra set of parentheses&nbsp;&mdash;&nbsp;just pass the &#8220;bare&#8221; expression <code>ord(c) for c in unique_characters</code> to the <code>tuple()</code> function, and Python figures out that it&#8217;s a generator expression.
-</ol>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>Using a generator expression instead of a list comprehension can save both <abbr>CPU</abbr> and <abbr>RAM</abbr>. If you&#8217;re building an list just to throw it away (<i>e.g.</i> passing it to <code>tuple()</code> or <code>set()</code>), use a generator expression instead!
-</blockquote>
-
-<p>Here&#8217;s another way to accomplish the same thing, using a <a href=generators.html>generator function</a>:
-
-<pre class='nd pp'><code>def ord_map(a_string):
-    for c in a_string:
-        yield ord(c)
-
-gen = ord_map(unique_characters)</code></pre>
-
-<p>The generator expression is more compact but functionally equivalent.
-
-<p class=a>&#x2042;
-
-<h2 id=permutations>Calculating Permutations&hellip; The Lazy Way!</h2>
-
-<p>First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you&#8217;re doing. Here I&#8217;m talking about combinatorics, but if that doesn&#8217;t mean anything to you, don&#8217;t worry about it. As always, <a href=http://en.wikipedia.org/wiki/Permutation>Wikipedia is your friend</a>.)
-
-<p>The idea is that you take a list of things (could be numbers, could be letters, could be dancing bears) and find all the possible ways to split them up into smaller lists. All the smaller lists have the same size, which can be as small as 1 and as large as the total number of items. Oh, and nothing can be repeated. Mathematicians say things like &#8220;let&#8217;s find the permutations of 3 different items taken 2 at a time,&#8221; which means you have a sequence of 3 items and you want to find all the possible ordered pairs.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>                              <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>perms = itertools.permutations([1, 2, 3], 2)</kbd>  <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>                                   <span class=u>&#x2462;</span></a>
-<samp class=pp>(1, 2)</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>(1, 3)</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<a><samp class=pp>(2, 1)</samp>                                            <span class=u>&#x2463;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>(2, 3)</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>(3, 1)</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>(3, 2)</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>                                   <span class=u>&#x2464;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-StopIteration</samp></pre>
-<ol>
-<li>The <code>itertools</code> module has all kinds of fun stuff in it, including a <code>permutations()</code> function that does all the hard work of finding permutations.
-<li>The <code>permutations()</code> function takes a sequence (here a list of three integers) and a number, which is the number of items you want in each smaller group. The function returns an iterator, which you can use in a <code>for</code> loop or any old place that iterates. Here I&#8217;ll step through the iterator manually to show all the values.
-<li>The first permutation of <code>[1, 2, 3]</code> taken 2 at a time is <code>(1, 2)</code>.
-<li>Note that permutations are ordered: <code>(2, 1)</code> is different than <code>(1, 2)</code>.
-<li>That&#8217;s it! Those are all the permutations of <code>[1, 2, 3]</code> taken 2 at a time. Pairs like <code>(1, 1)</code> and <code>(2, 2)</code> never show up, because they contain repeats so they aren&#8217;t valid permutations. When there are no more permutations, the iterator raises a <code>StopIteration</code> exception.
-</ol>
-
-<aside>The <code>itertools</code> module has all kinds of fun stuff.</aside>
-
-<p>The <code>permutations()</code> function doesn&#8217;t have to take a list. It can take any sequence&nbsp;&mdash;&nbsp;even a string.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>perms = itertools.permutations('ABC', 3)</kbd>  <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<a><samp class=pp>('A', 'B', 'C')</samp>                               <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>('A', 'C', 'B')</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>('B', 'A', 'C')</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>('B', 'C', 'A')</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>('C', 'A', 'B')</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=pp>('C', 'B', 'A')</samp>
-<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-StopIteration</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.permutations('ABC', 3))</kbd>    <span class=u>&#x2462;</span></a>
-<samp class=pp>[('A', 'B', 'C'), ('A', 'C', 'B'),
- ('B', 'A', 'C'), ('B', 'C', 'A'),
- ('C', 'A', 'B'), ('C', 'B', 'A')]</samp></pre>
-<ol>
-<li>A string is just a sequence of characters. For the purposes of finding permutations, the string <code>'ABC'</code> is equivalent to the list <code>['A', 'B', 'C']</code>.
-<li>The first permutation of the 3 items <code>['A', 'B', 'C']</code>, taken 3 at a time, is <code>('A', 'B', 'C')</code>. There are five other permutations&nbsp;&mdash;&nbsp;the same three characters in every conceivable order.
-<li>Since the <code>permutations()</code> function always returns an iterator, an easy way to debug permutations is to pass that iterator to the built-in <code>list()</code> function to see all the permutations immediately.
-</ol>
-
-<p class=a>&#x2042;
-
-<h2 id=more-itertools>Other Fun Stuff in the <code>itertools</code> Module</h2>
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.product('ABC', '123'))</kbd>   <span class=u>&#x2460;</span></a>
-<samp class=pp>[('A', '1'), ('A', '2'), ('A', '3'), 
- ('B', '1'), ('B', '2'), ('B', '3'), 
- ('C', '1'), ('C', '2'), ('C', '3')]</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.combinations('ABC', 2))</kbd>  <span class=u>&#x2461;</span></a>
-<samp class=pp>[('A', 'B'), ('A', 'C'), ('B', 'C')]</samp></pre>
-<ol>
-<li>The <code>itertools.product()</code> function returns an iterator containing the Cartesian product of two sequences.
-<li>The <code>itertools.combinations()</code> function returns an iterator containing all the possible combinations of the given sequence of the given length. This is like the <code>itertools.permutations()</code> function, except combinations don&#8217;t include items that are duplicates of other items in a different order. So <code>itertools.permutations('ABC', 2)</code> will return both <code>('A', 'B')</code> and <code>('B', 'A')</code> (among others), but <code>itertools.combinations('ABC', 2)</code> will not return <code>('B', 'A')</code> because it is a duplicate of <code>('A', 'B')</code> in a different order.
-</ol>
-
-<p class=d>[<a href=examples/favorite-people.txt>download <code>favorite-people.txt</code></a>]
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>names = list(open('examples/favorite-people.txt', encoding='utf-8'))</kbd>  <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>names</kbd>
-<samp class=pp>['Dora\n', 'Ethan\n', 'Wesley\n', 'John\n', 'Anne\n',
-'Mike\n', 'Chris\n', 'Sarah\n', 'Alex\n', 'Lizzie\n']</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>names = [name.rstrip() for name in names]</kbd>                             <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>names</kbd>
-<samp class=pp>['Dora', 'Ethan', 'Wesley', 'John', 'Anne',
-'Mike', 'Chris', 'Sarah', 'Alex', 'Lizzie']</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>names = sorted(names)</kbd>                                                 <span class=u>&#x2462;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>names</kbd>
-<samp class=pp>['Alex', 'Anne', 'Chris', 'Dora', 'Ethan',
-'John', 'Lizzie', 'Mike', 'Sarah', 'Wesley']</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>names = sorted(names, key=len)</kbd>                                        <span class=u>&#x2463;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>names</kbd>
-<samp class=pp>['Alex', 'Anne', 'Dora', 'John', 'Mike',
-'Chris', 'Ethan', 'Sarah', 'Lizzie', 'Wesley']</samp></pre>
-<ol>
-<li>This idiom returns a list of the lines in a text file.
-<li>Unfortunately (for this example), the <code>list(open(<var>filename</var>))</code> idiom also includes the carriage returns at the end of each line. This list comprehension uses the <code>rstrip()</code> string method to strip trailing whitespace from each line. (Strings also have an <code>lstrip()</code> method to strip leading whitespace, and a <code>strip()</code> method which strips both.)
-<li>The <code>sorted()</code> function takes a list and returns it sorted. By default, it sorts alphabetically.
-<li>But the <code>sorted()</code> function can also take a function as the <var>key</var> parameter, and it sorts by that key. In this case, the sort function is <code>len()</code>, so it sorts by <code>len(<var>each item</var>)</code>. Shorter names come first, then longer, then longest.
-</ol>
-
-<p>What does this have to do with the <code>itertools</code> module? I&#8217;m glad you asked.
-
-<pre class=screen>
-&hellip;continuing from the previous interactive shell&hellip;
-<samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>groups = itertools.groupby(names, len)</kbd>  <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>groups</kbd>
-<samp class=pp>&lt;itertools.groupby object at 0x00BB20C0></samp>
-<samp class=p>>>> </samp><kbd class=pp>list(groups)</kbd>
-<samp class=pp>[(4, &lt;itertools._grouper object at 0x00BA8BF0>),
- (5, &lt;itertools._grouper object at 0x00BB4050>),
- (6, &lt;itertools._grouper object at 0x00BB4030>)]</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>groups = itertools.groupby(names, len)</kbd>   <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>for name_length, name_iter in groups:</kbd>    <span class=u>&#x2462;</span></a>
-<samp class=p>... </samp><kbd class=pp>    print('Names with {0:d} letters:'.format(name_length))</kbd>
-<samp class=p>... </samp><kbd class=pp>    for name in name_iter:</kbd>
-<samp class=p>... </samp><kbd class=pp>        print(name)</kbd>
-<samp class=p>... </samp>
-<samp>Names with 4 letters:
-Alex
-Anne
-Dora
-John
-Mike
-Names with 5 letters:
-Chris
-Ethan
-Sarah
-Names with 6 letters:
-Lizzie
-Wesley</samp></pre>
-<ol>
-<li>The <code>itertools.groupby()</code> function takes a sequence and a key function, and returns an iterator that generates pairs. Each pair contains the result of <code>key_function(<var>each item</var>)</code> and another iterator containing all the items that shared that key result.
-<li>Calling the <code>list()</code> function &#8220;exhausted&#8221; the iterator, <i>i.e.</i> you&#8217;ve already generated every item in the iterator to make the list. There&#8217;s no &#8220;reset&#8221; button on an iterator; you can&#8217;t just start over once you&#8217;ve exhausted it. If you want to loop through it again (say, in the upcoming <code>for</code> loop), you need to call <code>itertools.groupby()</code> again to create a new iterator.
-<li>In this example, given a list of names <em>already sorted by length</em>, <code>itertools.groupby(names, len)</code> will put all the 4-letter names in one iterator, all the 5-letter names in another iterator, and so on. The <code>groupby()</code> function is completely generic; it could group strings by first letter, numbers by their number of factors, or any other key function you can think of.
-</ol>
-<!-- YO DAWG, WE HEARD YOU LIKE LOOPING, SO WE PUT AN ITERATOR IN YOUR ITERATOR SO YOU CAN LOOP WHILE YOU LOOP. -->
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>The <code>itertools.groupby()</code> function only works if the input sequence is already sorted by the grouping function. In the example above, you grouped a list of names by the <code>len()</code> function. That only worked because the input list was already sorted by length.
-</blockquote>
-
-<p>Are you watching closely?
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>list(range(0, 3))</kbd>
-<samp class=pp>[0, 1, 2]</samp>
-<samp class=p>>>> </samp><kbd class=pp>list(range(10, 13))</kbd>
-<samp class=pp>[10, 11, 12]</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.chain(range(0, 3), range(10, 13)))</kbd>        <span class=u>&#x2460;</span></a>
-<samp class=pp>[0, 1, 2, 10, 11, 12]</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(zip(range(0, 3), range(10, 13)))</kbd>                    <span class=u>&#x2461;</span></a>
-<samp class=pp>[(0, 10), (1, 11), (2, 12)]</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(zip(range(0, 3), range(10, 14)))</kbd>                    <span class=u>&#x2462;</span></a>
-<samp class=pp>[(0, 10), (1, 11), (2, 12)]</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.zip_longest(range(0, 3), range(10, 14)))</kbd>  <span class=u>&#x2463;</span></a>
-<samp class=pp>[(0, 10), (1, 11), (2, 12), (None, 13)]</samp></pre>
-<ol>
-<li>The <code>itertools.chain()</code> function takes two iterators and returns an iterator that contains all the items from the first iterator, followed by all the items from the second iterator. (Actually, it can take any number of iterators, and it chains them all in the order they were passed to the function.)
-<li>The <code>zip()</code> function does something prosaic that turns out to be extremely useful: it takes any number of sequences and returns an iterator which returns tuples of the first items of each sequence, then the second items of each, then the third, and so on.
-<li>The <code>zip()</code> function stops at the end of the shortest sequence. <code>range(10, 14)</code> has 4 items (10, 11, 12, and 13), but <code>range(0, 3)</code> only has 3, so the <code>zip()</code> function returns an iterator of 3 items.
-<li>On the other hand, the <code>itertools.zip_longest()</code> function stops at the end of the <em>longest</em> sequence, inserting <code>None</code> values for items past the end of the shorter sequences.
-</ol>
-
-<p id=dict-zip>OK, that was all very interesting, but how does it relate to the alphametics solver? Here&#8217;s how:
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')</kbd>
-<samp class=p>>>> </samp><kbd class=pp>guess = ('1', '2', '0', '3', '4', '5', '6', '7')</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>tuple(zip(characters, guess))</kbd>  <span class=u>&#x2460;</span></a>
-<samp class=pp>(('S', '1'), ('M', '2'), ('E', '0'), ('D', '3'),
- ('O', '4'), ('N', '5'), ('R', '6'), ('Y', '7'))</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>dict(zip(characters, guess))</kbd>   <span class=u>&#x2461;</span></a>
-<samp class=pp>{'E': '0', 'D': '3', 'M': '2', 'O': '4',
- 'N': '5', 'S': '1', 'R': '6', 'Y': '7'}</samp></pre>
-<ol>
-<li>Given a list of letters and a list of digits (each represented here as 1-character strings), the <code>zip</code> function will create a pairing of letters and digits, in order.
-<li>Why is that cool? Because that data structure happens to be exactly the right structure to pass to the <code>dict()</code> function to create a dictionary that uses letters as keys and their associated digits as values. (This isn&#8217;t the only way to do it, of course. You could use a <a href=comprehensions.html#dictionarycomprehension>dictionary comprehension</a> to create the dictionary directly.) Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no &#8220;order&#8221; per se), you can see that each letter is associated with the digit, based on the ordering of the original <var>characters</var> and <var>guess</var> sequences.
-</ol>
-
-<p id=guess>The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution.
-
-<pre class='nd pp'><code>characters = tuple(ord(c) for c in sorted_characters)
-digits = tuple(ord(c) for c in '0123456789')
-...
-for guess in itertools.permutations(digits, len(characters)):
-    ...
-<mark>    equation = puzzle.translate(dict(zip(characters, guess)))</mark></code></pre>
-
-<p>But what is this <code>translate()</code> method? Ah, now you&#8217;re getting to the <em>really</em> fun part.
-
-<p class=a>&#x2042;
-
-<h2 id=string-translate>A New Kind Of String Manipulation</h2>
-
-<p>Python strings have many methods. You learned about some of those methods in <a href=strings.html>the Strings chapter</a>: <code>lower()</code>, <code>count()</code>, and <code>format()</code>. Now I want to introduce you to a powerful but little-known string manipulation technique: the <code>translate()</code> method.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>translation_table = {ord('A'): ord('O')}</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>translation_table</kbd>                         <span class=u>&#x2461;</span></a>
-<samp class=pp>{65: 79}</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>'MARK'.translate(translation_table)</kbd>       <span class=u>&#x2462;</span></a>
-<samp class=pp>'MORK'</samp></pre>
-<ol>
-<li>String translation starts with a translation table, which is just a dictionary that maps one character to another. Actually, &#8220;character&#8221; is incorrect&nbsp;&mdash;&nbsp;the translation table really maps one <em>byte</em> to another.
-<li>Remember, bytes in Python 3 are integers. The <code>ord()</code> function returns the <abbr>ASCII</abbr> value of a character, which, in the case of A&ndash;Z, is always a byte from 65 to 90.
-<li>The <code>translate()</code> method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, &#8220;translating&#8221; <code>MARK</code> to <code>MORK</code>.
-</ol>
-
-<aside>Now you&#8217;re getting to the <em>really</em> fun part.</aside>
-
-<p>What does this have to do with solving alphametic puzzles? As it turns out, everything.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>characters = tuple(ord(c) for c in 'SMEDONRY')</kbd>       <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>characters</kbd>
-<samp class=pp>(83, 77, 69, 68, 79, 78, 82, 89)</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>guess = tuple(ord(c) for c in '91570682')</kbd>            <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>guess</kbd>
-<samp class=pp>(57, 49, 53, 55, 48, 54, 56, 50)</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>translation_table = dict(zip(characters, guess))</kbd>     <span class=u>&#x2462;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>translation_table</kbd>
-<samp class=pp>{68: 55, 69: 53, 77: 49, 78: 54, 79: 48, 82: 56, 83: 57, 89: 50}</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>'SEND + MORE == MONEY'.translate(translation_table)</kbd>  <span class=u>&#x2463;</span></a>
-<samp class=pp>'9567 + 1085 == 10652'</samp></pre>
-<ol>
-<li>Using a <a href=#generator-expressions>generator expression</a>, we quickly compute the byte values for each character in a string. <var>characters</var> is an example of the value of <var>sorted_characters</var> in the <code>alphametics.solve()</code> function.
-<li>Using another generator expression, we quickly compute the byte values for each digit in this string. The result, <var>guess</var>, is of the form <a href=#guess>returned by the <code>itertools.permutations()</code> function</a> in the <code>alphametics.solve()</code> function.
-<li>This translation table is generated by <a href=#dict-zip>zipping <var>characters</var> and <var>guess</var> together</a> and building a dictionary from the resulting sequence of pairs. This is exactly what the <code>alphametics.solve()</code> function does inside the <code>for</code> loop.
-<li>Finally, we pass this translation table to the <code>translate()</code> method of the original puzzle string. This converts each letter in the string to the corresponding digit (based on the letters in <var>characters</var> and the digits in <var>guess</var>). The result is a valid Python expression, as a string.
-</ol>
-
-<p>That&#8217;s pretty impressive. But what can you do with a string that happens to be a valid Python expression?
-
-<p class=a>&#x2042;
-
-<h2 id=eval>Evaluating Arbitrary Strings As Python Expressions</h2>
-
-<p>This is the final piece of the puzzle (or rather, the final piece of the puzzle solver). After all that fancy string manipulation, we&#8217;re left with a string like <code>'9567 + 1085 == 10652'</code>. But that&#8217;s a string, and what good is a string? Enter <code>eval()</code>, the universal Python evaluation tool.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>eval('1 + 1 == 2')</kbd>
-<samp class=pp>True</samp>
-<samp class=p>>>> </samp><kbd class=pp>eval('1 + 1 == 3')</kbd>
-<samp class=pp>False</samp>
-<samp class=p>>>> </samp><kbd class=pp>eval('9567 + 1085 == 10652')</kbd>
-<samp class=pp>True</samp></pre>
-
-<p>But wait, there&#8217;s more! The <code>eval()</code> function isn&#8217;t limited to boolean expressions. It can handle <em>any</em> Python expression and returns <em>any</em> datatype.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>eval('"A" + "B"')</kbd>
-<samp class=pp>'AB'</samp>
-<samp class=p>>>> </samp><kbd class=pp>eval('"MARK".translate({65: 79})')</kbd>
-<samp class=pp>'MORK'</samp>
-<samp class=p>>>> </samp><kbd class=pp>eval('"AAAAA".count("A")')</kbd>
-<samp class=pp>5</samp>
-<samp class=p>>>> </samp><kbd class=pp>eval('["*"] * 5')</kbd>
-<samp class=pp>['*', '*', '*', '*', '*']</samp></pre>
-
-<p>But wait, that&#8217;s not all!
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>x = 5</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("x * 5")</kbd>         <span class=u>&#x2460;</span></a>
-<samp class=pp>25</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("pow(x, 2)")</kbd>     <span class=u>&#x2461;</span></a>
-<samp class=pp>25</samp>
-<samp class=p>>>> </samp><kbd class=pp>import math</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("math.sqrt(x)")</kbd>  <span class=u>&#x2462;</span></a>
-<samp class=pp>2.2360679774997898</samp></pre>
-<ol>
-<li>The expression that <code>eval()</code> takes can reference global variables defined outside the <code>eval()</code>. If called within a function, it can reference local variables too.
-<li>And functions.
-<li>And modules.
-</ol>
-
-<p>Hey, wait a minute&hellip;
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import subprocess</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd>                  <span class=u>&#x2460;</span></a>
-<samp class=pp>'Desktop         Library         Pictures \
- Documents       Movies          Public   \
- Music           Sites'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm /some/random/file')")</kbd>  <span class=u>&#x2461;</span></a></pre>
-<ol>
-<li>The <code>subprocess</code> module allows you to run arbitrary shell commands and get the result as a Python string.
-<li>Arbitrary shell commands can have permanent consequences.
-</ol>
-
-<p>It&#8217;s even worse than that, because there&#8217;s a global <code>__import__()</code> function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of <code>eval()</code>, you can construct a single expression that will wipe out all your files:
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')")</kbd>  <span class=u>&#x2460;</span></a></pre>
-<ol>
-<li>Now imagine the output of <code>'rm -rf ~'</code>. Actually there wouldn&#8217;t be any output, but you wouldn&#8217;t have any files left either.
-</ol>
-
-<p class=xxxl>eval() is EVIL
-
-<p>Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use <code>eval()</code> on trusted input. Of course, the trick is figuring out what&#8217;s &#8220;trusted.&#8221; But here&#8217;s something I know for certain: you should <b>NOT</b> take this alphametics solver and put it on the internet as a fun little web service. Don&#8217;t make the mistake of thinking, &#8220;Gosh, the function does a lot of string manipulation before getting a string to evaluate; <em>I can&#8217;t imagine</em> how someone could exploit that.&#8221; Someone <b>WILL</b> figure out how to sneak nasty executable code past all that string manipulation (<a href=http://www.securityfocus.com/blogs/746>stranger things have happened</a>), and then you can kiss your server goodbye.
-
-<p>But surely there&#8217;s <em>some</em> way to evaluate expressions safely? To put <code>eval()</code> in a sandbox where it can&#8217;t access or harm the outside world? Well, yes and no.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>x = 5</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("x * 5", {}, {})</kbd>               <span class=u>&#x2460;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-  File "&lt;string>", line 1, in &lt;module>
-NameError: name 'x' is not defined</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("x * 5", {"x": x}, {})</kbd>         <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>import math</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("math.sqrt(x)", {"x": x}, {})</kbd>  <span class=u>&#x2462;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-  File "&lt;string>", line 1, in &lt;module>
-NameError: name 'math' is not defined</samp></pre>
-<ol>
-<li>The second and third parameters passed to the <code>eval()</code> function act as the global and local namespaces for evaluating the expression. In this case, they are both empty, which means that when the string <code>"x * 5"</code> is evaluated, there is no reference to <var>x</var> in either the global or local namespace, so <code>eval()</code> throws an exception.
-<li>You can selectively include specific values in the global namespace by listing them individually. Then those&nbsp;&mdash;&nbsp;and only those&nbsp;&mdash;&nbsp;variables will be available during evaluation.
-<li>Even though you just imported the <code>math</code> module, you didn&#8217;t include it in the namespace passed to the <code>eval()</code> function, so the evaluation failed.
-</ol>
-
-<p>Gee, that was easy. Lemme make an alphametics web service now!
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("pow(5, 2)", {}, {})</kbd>                   <span class=u>&#x2460;</span></a>
-<samp class=pp>25</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('math').sqrt(5)", {}, {})</kbd>  <span class=u>&#x2461;</span></a>
-<samp class=pp>2.2360679774997898</samp></pre>
-<ol>
-<li>Even though you&#8217;ve passed empty dictionaries for the global and local namespaces, all of Python&#8217;s built-in functions are still available during evaluation. So <code>pow(5, 2)</code> works, because <code>5</code> and <code>2</code> are literals, and <code>pow()</code> is a built-in function.
-<li>Unfortunately (and if you don&#8217;t see why it&#8217;s unfortunate, read on), the <code>__import__()</code> function is also a built-in function, so it works too.
-</ol>
-
-<p>Yeah, that means you can still do nasty things, even if you explicitly set the global and local namespaces to empty dictionaries when calling <code>eval()</code>:
-
-<pre class='nd screen'><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')", {}, {})</kbd></pre>
-
-<p>Oops. I&#8217;m glad I didn&#8217;t make that alphametics web service. Is there <em>any</em> way to use <code>eval()</code> safely? Well, yes and no.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>eval("__import__('math').sqrt(5)",</kbd>
-<a><samp class=p>... </samp><kbd class=pp>    {"__builtins__":None}, {})</kbd>          <span class=u>&#x2460;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-  File "&lt;string>", line 1, in &lt;module>
-NameError: name '__import__' is not defined</samp>
-<samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm -rf /')",</kbd>
-<a><samp class=p>... </samp><kbd class=pp>    {"__builtins__":None}, {})</kbd>          <span class=u>&#x2461;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-  File "&lt;string>", line 1, in &lt;module>
-NameError: name '__import__' is not defined</samp></pre>
-<ol>
-<li>To evaluate untrusted expressions safely, you need to define a global namespace dictionary that maps <code>"__builtins__"</code> to <code>None</code>, the Python null value. Internally, the &#8220;built-in&#8221; functions are contained within a pseudo-module called <code>"__builtins__"</code>. This pseudo-module (<i>i.e.</i> the set of built-in functions) is made available to evaluated expressions unless you explicitly override it.
-<li>Be sure you&#8217;ve overridden <code>__builtins__</code>. Not <code>__builtin__</code>, <code>__built-ins__</code>, or some other variation that will work just fine but expose you to catastrophic risks.
-</ol>
-
-<p>So <code>eval()</code> is safe now? Well, yes and no.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>eval("2 ** 2147483647",</kbd>
-<a><samp class=p>... </samp><kbd class=pp>    {"__builtins__":None}, {})</kbd>          <span class=u>&#x2460;</span></a>
-</pre>
-<ol>
-<li>Even without access to <code>__builtins__</code>, you can still launch a denial-of-service attack. For example, trying to raise <code>2</code> to the <code>2147483647</code><sup>th</sup> power will spike your server&#8217;s <abbr>CPU</abbr> utilization to 100% for quite some time. (If you&#8217;re trying this in the interactive shell, press <kbd>Ctrl-C</kbd> a few times to break out of it.) Technically this expression <em>will</em> return a value eventually, but in the meantime your server will be doing a whole lot of nothing.
-</ol>
-
-<p>In the end, it <em>is</em> possible to safely evaluate untrusted Python expressions, for some definition of &#8220;safe&#8221; that turns out not to be terribly useful in real life. It&#8217;s fine if you&#8217;re just playing around, and it&#8217;s fine if you only ever pass it trusted input. But anything else is just asking for trouble.
-
-<p class=a>&#x2042;
-
-<h2 id=alphametics-finale>Putting It All Together</h2>
-
-<p>To recap: this program solves alphametic puzzles by brute force, <i>i.e.</i> through an exhaustive search of all possible solutions. To do this, it&hellip;
-
-<ol>
-<li><a href=#re-findall>Finds all the letters in the puzzle</a> with the <code>re.findall()</code> function
-<li><a href=#unique-items>Find all the <em>unique</em> letters in the puzzle</a> with sets and the <code>set()</code> function
-<li><a href=#assert>Checks if there are more than 10 unique letters</a> (meaning the puzzle is definitely unsolvable) with an <code>assert</code> statement
-<li><a href=#generator-objects>Converts the letters to their ASCII equivalents</a> with a generator object
-<li><a href=#permutations>Calculates all the possible solutions</a> with the <code>itertools.permutations()</code> function
-<li><a href=#string-translate>Converts each possible solution to a Python expression</a> with the <code>translate()</code> string method
-<li><a href=#eval>Tests each possible solution by evaluating the Python expression</a> with the <code>eval()</code> function
-<li>Returns the first solution that evaluates to <code>True</code>
-</ol>
-
-<p>&hellip;in just 14 lines of code.
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-
-<ul>
-<li><a href=http://docs.python.org/3.1/library/itertools.html><code>itertools</code> module</a>
-<li><a href=http://www.doughellmann.com/PyMOTW/itertools/><code>itertools</code>&nbsp;&mdash;&nbsp;Iterator functions for efficient looping</a>
-<li><a href=http://blip.tv/file/1947373/>Watch Raymond Hettinger&#8217;s &#8220;Easy AI with Python&#8221; talk</a> at PyCon 2009
-<li><a href=http://code.activestate.com/recipes/576615/>Recipe 576615: Alphametics solver</a>, Raymond Hettinger&#8217;s original alphametics solver for Python 2
-<li><a href=http://code.activestate.com/recipes/users/178123/>More of Raymond Hettinger&#8217;s recipes</a> in the ActiveState Code repository
-<li><a href=http://en.wikipedia.org/wiki/Verbal_arithmetic>Alphametics on Wikipedia</a>
-<li><a href=http://www.tkcs-collins.com/truman/alphamet/index.shtml>Alphametics Index</a>, including <a href=http://www.tkcs-collins.com/truman/alphamet/alphamet.shtml>lots of puzzles</a> and <a href=http://www.tkcs-collins.com/truman/alphamet/alpha_gen.shtml>a generator to make your own</a>
-</ul>
-
-<p>Many thanks to Raymond Hettinger for agreeing to relicense his code so I could port it to Python 3 and use it as the basis for this chapter.
-
-<p class=v><a href=iterators.html rel=prev title='back to &#8220;Classes &amp; Iterators&#8221;'><span class=u>&#x261C;</span></a> <a href=unit-testing.html rel=next title='onward to &#8220;Unit Testing&#8221;'><span class=u>&#x261E;</span></a>
-
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Advanced Iterators - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 8}
+mark{display:inline}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#advanced-iterators>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
+<h1>Advanced Iterators</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> Great fleas have little fleas upon their backs to bite &#8217;em,<br>And little fleas have lesser fleas, and so ad infinitum. <span class=u>&#x275E;</span><br>&mdash; Augustus De Morgan
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>Just as <a href=regular-expressions.html>regular expressions</a> put <a href=strings.html>strings</a> on steroids, the <code>itertools</code> module puts <a href=iterators.html>iterators</a> on steroids. But first, I want to show you a classic puzzle.
+
+<pre class=nd><code>HAWAII + IDAHO + IOWA + OHIO == STATES
+510199 + 98153 + 9301 + 3593 == 621246
+
+H = 5
+A = 1
+W = 0
+I = 9
+D = 8
+O = 3
+S = 6
+T = 2
+E = 4</code></pre>
+
+<p>Puzzles like this are called <i>cryptarithms</i> or <i>alphametics</i>. The letters spell out actual words, but if you replace each letter with a digit from <code>0&ndash;9</code>, it also &#8220;spells&#8221; an arithmetic equation. The trick is to figure out which letter maps to each digit. All the occurrences of each letter must map to the same digit, no digit can be repeated, and no &#8220;word&#8221; can start with the digit 0.
+
+<aside>The most well-known alphametic puzzle is <code>SEND + MORE = MONEY</code>.</aside>
+
+<p>In this chapter, we&#8217;ll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles <em>in just 14 lines of code</em>.
+
+<p class=d>[<a href=examples/alphametics.py>download <code>alphametics.py</code></a>]
+<pre class=pp><code>import re
+import itertools
+
+def solve(puzzle):
+    words = re.findall('[A-Z]+', puzzle.upper())
+    unique_characters = set(''.join(words))
+    assert len(unique_characters) &lt;= 10, 'Too many letters'
+    first_letters = {word[0] for word in words}
+    n = len(first_letters)
+    sorted_characters = ''.join(first_letters) + \
+        ''.join(unique_characters - first_letters)
+    characters = tuple(ord(c) for c in sorted_characters)
+    digits = tuple(ord(c) for c in '0123456789')
+    zero = digits[0]
+    for guess in itertools.permutations(digits, len(characters)):
+        if zero not in guess[:n]:
+            equation = puzzle.translate(dict(zip(characters, guess)))
+            if eval(equation):
+                return equation
+
+if __name__ == '__main__':
+    import sys
+    for puzzle in sys.argv[1:]:
+        print(puzzle)
+        solution = solve(puzzle)
+        if solution:
+            print(solution)</code></pre>
+
+<p>You can run the program from the command line. On Linux, it would look like this. (These may take some time, depending on the speed of your computer, and there is no progress bar. Just be patient!)
+
+<pre class='nd screen'>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 alphametics.py "HAWAII + IDAHO + IOWA + OHIO == STATES"</kbd>
+<samp>HAWAII + IDAHO + IOWA + OHIO = STATES
+510199 + 98153 + 9301 + 3593 == 621246</samp>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 alphametics.py "I + LOVE + YOU == DORA"</kbd>
+<samp>I + LOVE + YOU == DORA
+1 + 2784 + 975 == 3760</samp>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 alphametics.py "SEND + MORE == MONEY"</kbd>
+<samp>SEND + MORE == MONEY
+9567 + 1085 == 10652</samp></pre>
+
+<p class=a>&#x2042;
+
+<h2 id=re-findall>Finding all occurrences of a pattern</h2>
+
+<p>The first thing this alphametics solver does is find all the letters (A&ndash;Z) in the puzzle.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import re</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>re.findall('[0-9]+', '16 2-by-4s in rows of 8')</kbd>  <span class=u>&#x2460;</span></a>
+<samp class=pp>['16', '2', '4', '8']</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>re.findall('[A-Z]+', 'SEND + MORE == MONEY')</kbd>     <span class=u>&#x2461;</span></a>
+<samp class=pp>['SEND', 'MORE', 'MONEY']</samp></pre>
+<ol>
+<li>The <code>re</code> module is Python&#8217;s implementation of <a href=regular-expressions.html>regular expressions</a>. It has a nifty function called <code>findall()</code> which takes a regular expression pattern and a string, and finds all occurrences of the pattern within the string. In this case, the pattern matches sequences of numbers. The <code>findall()</code> function returns a list of all the substrings that matched the pattern.
+<li>Here the regular expression pattern matches sequences of letters. Again, the return value is a list, and each item in the list is a string that matched the regular expression pattern.
+</ol>
+
+<p>Here&#8217;s another example that will stretch your brain a little.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>re.findall(' s.*? s', "The sixth sick sheikh's sixth sheep's sick.")</kbd>
+<samp class=pp>[' sixth s', " sheikh's s", " sheep's s"]</samp></pre>
+
+<aside>This is the <a href=http://en.wikipedia.org/wiki/Tongue-twister>hardest tongue twister</a> in the English language.</aside>
+
+<p>Surprised? The regular expression looks for a space, an <code>s</code>, and then the shortest possible series of any character (<code>.*?</code>), then a space, then another <code>s</code>. Well, looking at that input string, I see five matches:
+
+<ol>
+<li><code>The<mark> sixth s</mark>ick sheikh's sixth sheep's sick.</code>
+<li><code>The sixth<mark> sick s</mark>heikh's sixth sheep's sick.</code>
+<li><code>The sixth sick<mark> sheikh's s</mark>ixth sheep's sick.</code>
+<li><code>The sixth sick sheikh's<mark> sixth s</mark>heep's sick.</code>
+<li><code>The sixth sick sheikh's sixth<mark> sheep's s</mark>ick.</code>
+</ol>
+
+<p>But the <code>re.findall()</code> function only returned three matches. Specifically, it returned the first, the third, and the fifth. Why is that? Because <em>it doesn&#8217;t return overlapping matches</em>. The first match overlaps with the second, so the first is returned and the second is skipped. Then the third overlaps with the fourth, so the third is returned and the fourth is skipped. Finally, the fifth is returned. Three matches, not five.
+
+<p>This has nothing to do with the alphametics solver; I just thought it was interesting.
+
+<p class=a>&#x2042;
+
+<h2 id=unique-items>Finding the unique items in a sequence</h2>
+
+<p><a href=native-datatypes.html#sets>Sets</a> make it trivial to find the unique items in a sequence.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>a_list = ['The', 'sixth', 'sick', "sheik's", 'sixth', "sheep's", 'sick']</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>set(a_list)</kbd>                      <span class=u>&#x2460;</span></a>
+<samp class=pp>{'sixth', 'The', "sheep's", 'sick', "sheik's"}</samp>
+<samp class=p>>>> </samp><kbd class=pp>a_string = 'EAST IS EAST'</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>set(a_string)</kbd>                    <span class=u>&#x2461;</span></a>
+<samp class=pp>{'A', ' ', 'E', 'I', 'S', 'T'}</samp>
+<samp class=p>>>> </samp><kbd class=pp>words = ['SEND', 'MORE', 'MONEY']</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>''.join(words)</kbd>                   <span class=u>&#x2462;</span></a>
+<samp class=pp>'SENDMOREMONEY'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>set(''.join(words))</kbd>              <span class=u>&#x2463;</span></a>
+<samp class=pp>{'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}</samp></pre>
+<ol>
+<li>Given a list of several strings, the <code>set()</code> function will return a set of unique strings from the list. This makes sense if you think of it like a <code>for</code> loop. Take the first item from the list, put it in the set. Second. Third. Fourth. Fifth&nbsp;&mdash;&nbsp;wait, that&#8217;s in the set already, so it only gets listed once, because Python sets don&#8217;t allow duplicates. Sixth. Seventh&nbsp;&mdash;&nbsp;again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn&#8217;t even need to be sorted first.
+<li>The same technique works with strings, since a string is just a sequence of characters.
+<li>Given a list of strings, <code>''.join(<var>a_list</var>)</code> concatenates all the strings together into one.
+<li>So, given a list of strings, this line of code returns all the unique characters across all the strings, with no duplicates.
+</ol>
+
+<p>The alphametics solver uses this technique to build a set of all the unique characters in the puzzle.
+
+<pre class='nd pp'><code>unique_characters = set(''.join(words))</code></pre>
+
+<p>This list is later used to assign digits to characters as the solver iterates through the possible solutions.
+
+<p class=a>&#x2042;
+
+<h2 id=assert>Making assertions</h2>
+
+<p>Like many programming languages, Python has an <code>assert</code> statement. Here&#8217;s how it works.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>assert 1 + 1 == 2</kbd>                                     <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>assert 1 + 1 == 3</kbd>                                     <span class=u>&#x2461;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+AssertionError</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>assert 2 + 2 == 5, "Only for very large values of 2"</kbd>  <span class=u>&#x2462;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+AssertionError: Only for very large values of 2</samp></pre>
+<ol>
+<li>The <code>assert</code> statement is followed by any valid Python expression. In this case, the expression <code>1 + 1 == 2</code> evaluates to <code>True</code>, so the <code>assert</code> statement does nothing.
+<li>However, if the Python expression evaluates to <code>False</code>, the <code>assert</code> statement will raise an <code>AssertionError</code>.
+<li>You can also include a human-readable message that is printed if the <code>AssertionError</code> is raised.
+</ol>
+
+<p>Therefore, this line of code:
+
+<pre class='nd pp'><code>assert len(unique_characters) &lt;= 10, 'Too many letters'</code></pre>
+
+<p>&hellip;is equivalent to this:
+
+<pre class='nd pp'><code>if len(unique_characters) > 10:
+    raise AssertionError('Too many letters')</code></pre>
+
+<p>The alphametics solver uses this exact <code>assert</code> statement to bail out early if the puzzle contains more than ten unique letters. Since each letter is assigned a unique digit, and there are only ten digits, a puzzle with more than ten unique letters can not possibly have a solution.
+
+<p class=a>&#x2042;
+
+<h2 id=generator-expressions>Generator expressions</h2>
+
+<p>A generator expression is like a <a href=generators.html>generator function</a> without the function.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>unique_characters = {'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>gen = (ord(c) for c in unique_characters)</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>gen</kbd>                                        <span class=u>&#x2461;</span></a>
+<samp class=pp>&lt;generator object &lt;genexpr> at 0x00BADC10></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>next(gen)</kbd>                                  <span class=u>&#x2462;</span></a>
+<samp class=pp>69</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(gen)</kbd>
+<samp class=pp>68</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>tuple(ord(c) for c in unique_characters)</kbd>   <span class=u>&#x2463;</span></a>
+<samp class=pp>(69, 68, 77, 79, 78, 83, 82, 89)</samp></pre>
+<ol>
+<li>A generator expression is like an anonymous function that yields values. The expression itself looks like a <a href=comprehensions.html#listcomprehension>list comprehension</a>, but it&#8217;s wrapped in parentheses instead of square brackets.
+<li>The generator expression returns&hellip; an iterator.
+<li>Calling <code>next(<var>gen</var>)</code> returns the next value from the iterator.
+<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>. In these cases, you don&#8217;t need an extra set of parentheses&nbsp;&mdash;&nbsp;just pass the &#8220;bare&#8221; expression <code>ord(c) for c in unique_characters</code> to the <code>tuple()</code> function, and Python figures out that it&#8217;s a generator expression.
+</ol>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>Using a generator expression instead of a list comprehension can save both <abbr>CPU</abbr> and <abbr>RAM</abbr>. If you&#8217;re building an list just to throw it away (<i>e.g.</i> passing it to <code>tuple()</code> or <code>set()</code>), use a generator expression instead!
+</blockquote>
+
+<p>Here&#8217;s another way to accomplish the same thing, using a <a href=generators.html>generator function</a>:
+
+<pre class='nd pp'><code>def ord_map(a_string):
+    for c in a_string:
+        yield ord(c)
+
+gen = ord_map(unique_characters)</code></pre>
+
+<p>The generator expression is more compact but functionally equivalent.
+
+<p class=a>&#x2042;
+
+<h2 id=permutations>Calculating Permutations&hellip; The Lazy Way!</h2>
+
+<p>First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you&#8217;re doing. Here I&#8217;m talking about combinatorics, but if that doesn&#8217;t mean anything to you, don&#8217;t worry about it. As always, <a href=http://en.wikipedia.org/wiki/Permutation>Wikipedia is your friend</a>.)
+
+<p>The idea is that you take a list of things (could be numbers, could be letters, could be dancing bears) and find all the possible ways to split them up into smaller lists. All the smaller lists have the same size, which can be as small as 1 and as large as the total number of items. Oh, and nothing can be repeated. Mathematicians say things like &#8220;let&#8217;s find the permutations of 3 different items taken 2 at a time,&#8221; which means you have a sequence of 3 items and you want to find all the possible ordered pairs.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>                              <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>perms = itertools.permutations([1, 2, 3], 2)</kbd>  <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>                                   <span class=u>&#x2462;</span></a>
+<samp class=pp>(1, 2)</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>(1, 3)</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<a><samp class=pp>(2, 1)</samp>                                            <span class=u>&#x2463;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>(2, 3)</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>(3, 1)</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>(3, 2)</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>                                   <span class=u>&#x2464;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+StopIteration</samp></pre>
+<ol>
+<li>The <code>itertools</code> module has all kinds of fun stuff in it, including a <code>permutations()</code> function that does all the hard work of finding permutations.
+<li>The <code>permutations()</code> function takes a sequence (here a list of three integers) and a number, which is the number of items you want in each smaller group. The function returns an iterator, which you can use in a <code>for</code> loop or any old place that iterates. Here I&#8217;ll step through the iterator manually to show all the values.
+<li>The first permutation of <code>[1, 2, 3]</code> taken 2 at a time is <code>(1, 2)</code>.
+<li>Note that permutations are ordered: <code>(2, 1)</code> is different than <code>(1, 2)</code>.
+<li>That&#8217;s it! Those are all the permutations of <code>[1, 2, 3]</code> taken 2 at a time. Pairs like <code>(1, 1)</code> and <code>(2, 2)</code> never show up, because they contain repeats so they aren&#8217;t valid permutations. When there are no more permutations, the iterator raises a <code>StopIteration</code> exception.
+</ol>
+
+<aside>The <code>itertools</code> module has all kinds of fun stuff.</aside>
+
+<p>The <code>permutations()</code> function doesn&#8217;t have to take a list. It can take any sequence&nbsp;&mdash;&nbsp;even a string.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>perms = itertools.permutations('ABC', 3)</kbd>  <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<a><samp class=pp>('A', 'B', 'C')</samp>                               <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>('A', 'C', 'B')</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>('B', 'A', 'C')</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>('B', 'C', 'A')</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>('C', 'A', 'B')</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=pp>('C', 'B', 'A')</samp>
+<samp class=p>>>> </samp><kbd class=pp>next(perms)</kbd>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+StopIteration</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.permutations('ABC', 3))</kbd>    <span class=u>&#x2462;</span></a>
+<samp class=pp>[('A', 'B', 'C'), ('A', 'C', 'B'),
+ ('B', 'A', 'C'), ('B', 'C', 'A'),
+ ('C', 'A', 'B'), ('C', 'B', 'A')]</samp></pre>
+<ol>
+<li>A string is just a sequence of characters. For the purposes of finding permutations, the string <code>'ABC'</code> is equivalent to the list <code>['A', 'B', 'C']</code>.
+<li>The first permutation of the 3 items <code>['A', 'B', 'C']</code>, taken 3 at a time, is <code>('A', 'B', 'C')</code>. There are five other permutations&nbsp;&mdash;&nbsp;the same three characters in every conceivable order.
+<li>Since the <code>permutations()</code> function always returns an iterator, an easy way to debug permutations is to pass that iterator to the built-in <code>list()</code> function to see all the permutations immediately.
+</ol>
+
+<p class=a>&#x2042;
+
+<h2 id=more-itertools>Other Fun Stuff in the <code>itertools</code> Module</h2>
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.product('ABC', '123'))</kbd>   <span class=u>&#x2460;</span></a>
+<samp class=pp>[('A', '1'), ('A', '2'), ('A', '3'), 
+ ('B', '1'), ('B', '2'), ('B', '3'), 
+ ('C', '1'), ('C', '2'), ('C', '3')]</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.combinations('ABC', 2))</kbd>  <span class=u>&#x2461;</span></a>
+<samp class=pp>[('A', 'B'), ('A', 'C'), ('B', 'C')]</samp></pre>
+<ol>
+<li>The <code>itertools.product()</code> function returns an iterator containing the Cartesian product of two sequences.
+<li>The <code>itertools.combinations()</code> function returns an iterator containing all the possible combinations of the given sequence of the given length. This is like the <code>itertools.permutations()</code> function, except combinations don&#8217;t include items that are duplicates of other items in a different order. So <code>itertools.permutations('ABC', 2)</code> will return both <code>('A', 'B')</code> and <code>('B', 'A')</code> (among others), but <code>itertools.combinations('ABC', 2)</code> will not return <code>('B', 'A')</code> because it is a duplicate of <code>('A', 'B')</code> in a different order.
+</ol>
+
+<p class=d>[<a href=examples/favorite-people.txt>download <code>favorite-people.txt</code></a>]
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>names = list(open('examples/favorite-people.txt', encoding='utf-8'))</kbd>  <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>names</kbd>
+<samp class=pp>['Dora\n', 'Ethan\n', 'Wesley\n', 'John\n', 'Anne\n',
+'Mike\n', 'Chris\n', 'Sarah\n', 'Alex\n', 'Lizzie\n']</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>names = [name.rstrip() for name in names]</kbd>                             <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>names</kbd>
+<samp class=pp>['Dora', 'Ethan', 'Wesley', 'John', 'Anne',
+'Mike', 'Chris', 'Sarah', 'Alex', 'Lizzie']</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>names = sorted(names)</kbd>                                                 <span class=u>&#x2462;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>names</kbd>
+<samp class=pp>['Alex', 'Anne', 'Chris', 'Dora', 'Ethan',
+'John', 'Lizzie', 'Mike', 'Sarah', 'Wesley']</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>names = sorted(names, key=len)</kbd>                                        <span class=u>&#x2463;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>names</kbd>
+<samp class=pp>['Alex', 'Anne', 'Dora', 'John', 'Mike',
+'Chris', 'Ethan', 'Sarah', 'Lizzie', 'Wesley']</samp></pre>
+<ol>
+<li>This idiom returns a list of the lines in a text file.
+<li>Unfortunately (for this example), the <code>list(open(<var>filename</var>))</code> idiom also includes the carriage returns at the end of each line. This list comprehension uses the <code>rstrip()</code> string method to strip trailing whitespace from each line. (Strings also have an <code>lstrip()</code> method to strip leading whitespace, and a <code>strip()</code> method which strips both.)
+<li>The <code>sorted()</code> function takes a list and returns it sorted. By default, it sorts alphabetically.
+<li>But the <code>sorted()</code> function can also take a function as the <var>key</var> parameter, and it sorts by that key. In this case, the sort function is <code>len()</code>, so it sorts by <code>len(<var>each item</var>)</code>. Shorter names come first, then longer, then longest.
+</ol>
+
+<p>What does this have to do with the <code>itertools</code> module? I&#8217;m glad you asked.
+
+<pre class=screen>
+&hellip;continuing from the previous interactive shell&hellip;
+<samp class=p>>>> </samp><kbd class=pp>import itertools</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>groups = itertools.groupby(names, len)</kbd>  <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>groups</kbd>
+<samp class=pp>&lt;itertools.groupby object at 0x00BB20C0></samp>
+<samp class=p>>>> </samp><kbd class=pp>list(groups)</kbd>
+<samp class=pp>[(4, &lt;itertools._grouper object at 0x00BA8BF0>),
+ (5, &lt;itertools._grouper object at 0x00BB4050>),
+ (6, &lt;itertools._grouper object at 0x00BB4030>)]</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>groups = itertools.groupby(names, len)</kbd>   <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>for name_length, name_iter in groups:</kbd>    <span class=u>&#x2462;</span></a>
+<samp class=p>... </samp><kbd class=pp>    print('Names with {0:d} letters:'.format(name_length))</kbd>
+<samp class=p>... </samp><kbd class=pp>    for name in name_iter:</kbd>
+<samp class=p>... </samp><kbd class=pp>        print(name)</kbd>
+<samp class=p>... </samp>
+<samp>Names with 4 letters:
+Alex
+Anne
+Dora
+John
+Mike
+Names with 5 letters:
+Chris
+Ethan
+Sarah
+Names with 6 letters:
+Lizzie
+Wesley</samp></pre>
+<ol>
+<li>The <code>itertools.groupby()</code> function takes a sequence and a key function, and returns an iterator that generates pairs. Each pair contains the result of <code>key_function(<var>each item</var>)</code> and another iterator containing all the items that shared that key result.
+<li>Calling the <code>list()</code> function &#8220;exhausted&#8221; the iterator, <i>i.e.</i> you&#8217;ve already generated every item in the iterator to make the list. There&#8217;s no &#8220;reset&#8221; button on an iterator; you can&#8217;t just start over once you&#8217;ve exhausted it. If you want to loop through it again (say, in the upcoming <code>for</code> loop), you need to call <code>itertools.groupby()</code> again to create a new iterator.
+<li>In this example, given a list of names <em>already sorted by length</em>, <code>itertools.groupby(names, len)</code> will put all the 4-letter names in one iterator, all the 5-letter names in another iterator, and so on. The <code>groupby()</code> function is completely generic; it could group strings by first letter, numbers by their number of factors, or any other key function you can think of.
+</ol>
+<!-- YO DAWG, WE HEARD YOU LIKE LOOPING, SO WE PUT AN ITERATOR IN YOUR ITERATOR SO YOU CAN LOOP WHILE YOU LOOP. -->
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>The <code>itertools.groupby()</code> function only works if the input sequence is already sorted by the grouping function. In the example above, you grouped a list of names by the <code>len()</code> function. That only worked because the input list was already sorted by length.
+</blockquote>
+
+<p>Are you watching closely?
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>list(range(0, 3))</kbd>
+<samp class=pp>[0, 1, 2]</samp>
+<samp class=p>>>> </samp><kbd class=pp>list(range(10, 13))</kbd>
+<samp class=pp>[10, 11, 12]</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.chain(range(0, 3), range(10, 13)))</kbd>        <span class=u>&#x2460;</span></a>
+<samp class=pp>[0, 1, 2, 10, 11, 12]</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(zip(range(0, 3), range(10, 13)))</kbd>                    <span class=u>&#x2461;</span></a>
+<samp class=pp>[(0, 10), (1, 11), (2, 12)]</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(zip(range(0, 3), range(10, 14)))</kbd>                    <span class=u>&#x2462;</span></a>
+<samp class=pp>[(0, 10), (1, 11), (2, 12)]</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(itertools.zip_longest(range(0, 3), range(10, 14)))</kbd>  <span class=u>&#x2463;</span></a>
+<samp class=pp>[(0, 10), (1, 11), (2, 12), (None, 13)]</samp></pre>
+<ol>
+<li>The <code>itertools.chain()</code> function takes two iterators and returns an iterator that contains all the items from the first iterator, followed by all the items from the second iterator. (Actually, it can take any number of iterators, and it chains them all in the order they were passed to the function.)
+<li>The <code>zip()</code> function does something prosaic that turns out to be extremely useful: it takes any number of sequences and returns an iterator which returns tuples of the first items of each sequence, then the second items of each, then the third, and so on.
+<li>The <code>zip()</code> function stops at the end of the shortest sequence. <code>range(10, 14)</code> has 4 items (10, 11, 12, and 13), but <code>range(0, 3)</code> only has 3, so the <code>zip()</code> function returns an iterator of 3 items.
+<li>On the other hand, the <code>itertools.zip_longest()</code> function stops at the end of the <em>longest</em> sequence, inserting <code>None</code> values for items past the end of the shorter sequences.
+</ol>
+
+<p id=dict-zip>OK, that was all very interesting, but how does it relate to the alphametics solver? Here&#8217;s how:
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')</kbd>
+<samp class=p>>>> </samp><kbd class=pp>guess = ('1', '2', '0', '3', '4', '5', '6', '7')</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>tuple(zip(characters, guess))</kbd>  <span class=u>&#x2460;</span></a>
+<samp class=pp>(('S', '1'), ('M', '2'), ('E', '0'), ('D', '3'),
+ ('O', '4'), ('N', '5'), ('R', '6'), ('Y', '7'))</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>dict(zip(characters, guess))</kbd>   <span class=u>&#x2461;</span></a>
+<samp class=pp>{'E': '0', 'D': '3', 'M': '2', 'O': '4',
+ 'N': '5', 'S': '1', 'R': '6', 'Y': '7'}</samp></pre>
+<ol>
+<li>Given a list of letters and a list of digits (each represented here as 1-character strings), the <code>zip</code> function will create a pairing of letters and digits, in order.
+<li>Why is that cool? Because that data structure happens to be exactly the right structure to pass to the <code>dict()</code> function to create a dictionary that uses letters as keys and their associated digits as values. (This isn&#8217;t the only way to do it, of course. You could use a <a href=comprehensions.html#dictionarycomprehension>dictionary comprehension</a> to create the dictionary directly.) Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no &#8220;order&#8221; per se), you can see that each letter is associated with the digit, based on the ordering of the original <var>characters</var> and <var>guess</var> sequences.
+</ol>
+
+<p id=guess>The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution.
+
+<pre class='nd pp'><code>characters = tuple(ord(c) for c in sorted_characters)
+digits = tuple(ord(c) for c in '0123456789')
+...
+for guess in itertools.permutations(digits, len(characters)):
+    ...
+<mark>    equation = puzzle.translate(dict(zip(characters, guess)))</mark></code></pre>
+
+<p>But what is this <code>translate()</code> method? Ah, now you&#8217;re getting to the <em>really</em> fun part.
+
+<p class=a>&#x2042;
+
+<h2 id=string-translate>A New Kind Of String Manipulation</h2>
+
+<p>Python strings have many methods. You learned about some of those methods in <a href=strings.html>the Strings chapter</a>: <code>lower()</code>, <code>count()</code>, and <code>format()</code>. Now I want to introduce you to a powerful but little-known string manipulation technique: the <code>translate()</code> method.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>translation_table = {ord('A'): ord('O')}</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>translation_table</kbd>                         <span class=u>&#x2461;</span></a>
+<samp class=pp>{65: 79}</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>'MARK'.translate(translation_table)</kbd>       <span class=u>&#x2462;</span></a>
+<samp class=pp>'MORK'</samp></pre>
+<ol>
+<li>String translation starts with a translation table, which is just a dictionary that maps one character to another. Actually, &#8220;character&#8221; is incorrect&nbsp;&mdash;&nbsp;the translation table really maps one <em>byte</em> to another.
+<li>Remember, bytes in Python 3 are integers. The <code>ord()</code> function returns the <abbr>ASCII</abbr> value of a character, which, in the case of A&ndash;Z, is always a byte from 65 to 90.
+<li>The <code>translate()</code> method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, &#8220;translating&#8221; <code>MARK</code> to <code>MORK</code>.
+</ol>
+
+<aside>Now you&#8217;re getting to the <em>really</em> fun part.</aside>
+
+<p>What does this have to do with solving alphametic puzzles? As it turns out, everything.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>characters = tuple(ord(c) for c in 'SMEDONRY')</kbd>       <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>characters</kbd>
+<samp class=pp>(83, 77, 69, 68, 79, 78, 82, 89)</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>guess = tuple(ord(c) for c in '91570682')</kbd>            <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>guess</kbd>
+<samp class=pp>(57, 49, 53, 55, 48, 54, 56, 50)</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>translation_table = dict(zip(characters, guess))</kbd>     <span class=u>&#x2462;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>translation_table</kbd>
+<samp class=pp>{68: 55, 69: 53, 77: 49, 78: 54, 79: 48, 82: 56, 83: 57, 89: 50}</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>'SEND + MORE == MONEY'.translate(translation_table)</kbd>  <span class=u>&#x2463;</span></a>
+<samp class=pp>'9567 + 1085 == 10652'</samp></pre>
+<ol>
+<li>Using a <a href=#generator-expressions>generator expression</a>, we quickly compute the byte values for each character in a string. <var>characters</var> is an example of the value of <var>sorted_characters</var> in the <code>alphametics.solve()</code> function.
+<li>Using another generator expression, we quickly compute the byte values for each digit in this string. The result, <var>guess</var>, is of the form <a href=#guess>returned by the <code>itertools.permutations()</code> function</a> in the <code>alphametics.solve()</code> function.
+<li>This translation table is generated by <a href=#dict-zip>zipping <var>characters</var> and <var>guess</var> together</a> and building a dictionary from the resulting sequence of pairs. This is exactly what the <code>alphametics.solve()</code> function does inside the <code>for</code> loop.
+<li>Finally, we pass this translation table to the <code>translate()</code> method of the original puzzle string. This converts each letter in the string to the corresponding digit (based on the letters in <var>characters</var> and the digits in <var>guess</var>). The result is a valid Python expression, as a string.
+</ol>
+
+<p>That&#8217;s pretty impressive. But what can you do with a string that happens to be a valid Python expression?
+
+<p class=a>&#x2042;
+
+<h2 id=eval>Evaluating Arbitrary Strings As Python Expressions</h2>
+
+<p>This is the final piece of the puzzle (or rather, the final piece of the puzzle solver). After all that fancy string manipulation, we&#8217;re left with a string like <code>'9567 + 1085 == 10652'</code>. But that&#8217;s a string, and what good is a string? Enter <code>eval()</code>, the universal Python evaluation tool.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>eval('1 + 1 == 2')</kbd>
+<samp class=pp>True</samp>
+<samp class=p>>>> </samp><kbd class=pp>eval('1 + 1 == 3')</kbd>
+<samp class=pp>False</samp>
+<samp class=p>>>> </samp><kbd class=pp>eval('9567 + 1085 == 10652')</kbd>
+<samp class=pp>True</samp></pre>
+
+<p>But wait, there&#8217;s more! The <code>eval()</code> function isn&#8217;t limited to boolean expressions. It can handle <em>any</em> Python expression and returns <em>any</em> datatype.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>eval('"A" + "B"')</kbd>
+<samp class=pp>'AB'</samp>
+<samp class=p>>>> </samp><kbd class=pp>eval('"MARK".translate({65: 79})')</kbd>
+<samp class=pp>'MORK'</samp>
+<samp class=p>>>> </samp><kbd class=pp>eval('"AAAAA".count("A")')</kbd>
+<samp class=pp>5</samp>
+<samp class=p>>>> </samp><kbd class=pp>eval('["*"] * 5')</kbd>
+<samp class=pp>['*', '*', '*', '*', '*']</samp></pre>
+
+<p>But wait, that&#8217;s not all!
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>x = 5</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("x * 5")</kbd>         <span class=u>&#x2460;</span></a>
+<samp class=pp>25</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("pow(x, 2)")</kbd>     <span class=u>&#x2461;</span></a>
+<samp class=pp>25</samp>
+<samp class=p>>>> </samp><kbd class=pp>import math</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("math.sqrt(x)")</kbd>  <span class=u>&#x2462;</span></a>
+<samp class=pp>2.2360679774997898</samp></pre>
+<ol>
+<li>The expression that <code>eval()</code> takes can reference global variables defined outside the <code>eval()</code>. If called within a function, it can reference local variables too.
+<li>And functions.
+<li>And modules.
+</ol>
+
+<p>Hey, wait a minute&hellip;
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import subprocess</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd>                  <span class=u>&#x2460;</span></a>
+<samp class=pp>'Desktop         Library         Pictures \
+ Documents       Movies          Public   \
+ Music           Sites'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm /some/random/file')")</kbd>  <span class=u>&#x2461;</span></a></pre>
+<ol>
+<li>The <code>subprocess</code> module allows you to run arbitrary shell commands and get the result as a Python string.
+<li>Arbitrary shell commands can have permanent consequences.
+</ol>
+
+<p>It&#8217;s even worse than that, because there&#8217;s a global <code>__import__()</code> function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of <code>eval()</code>, you can construct a single expression that will wipe out all your files:
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')")</kbd>  <span class=u>&#x2460;</span></a></pre>
+<ol>
+<li>Now imagine the output of <code>'rm -rf ~'</code>. Actually there wouldn&#8217;t be any output, but you wouldn&#8217;t have any files left either.
+</ol>
+
+<p class=xxxl>eval() is EVIL
+
+<p>Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use <code>eval()</code> on trusted input. Of course, the trick is figuring out what&#8217;s &#8220;trusted.&#8221; But here&#8217;s something I know for certain: you should <b>NOT</b> take this alphametics solver and put it on the internet as a fun little web service. Don&#8217;t make the mistake of thinking, &#8220;Gosh, the function does a lot of string manipulation before getting a string to evaluate; <em>I can&#8217;t imagine</em> how someone could exploit that.&#8221; Someone <b>WILL</b> figure out how to sneak nasty executable code past all that string manipulation (<a href=http://www.securityfocus.com/blogs/746>stranger things have happened</a>), and then you can kiss your server goodbye.
+
+<p>But surely there&#8217;s <em>some</em> way to evaluate expressions safely? To put <code>eval()</code> in a sandbox where it can&#8217;t access or harm the outside world? Well, yes and no.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>x = 5</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("x * 5", {}, {})</kbd>               <span class=u>&#x2460;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+  File "&lt;string>", line 1, in &lt;module>
+NameError: name 'x' is not defined</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("x * 5", {"x": x}, {})</kbd>         <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>import math</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("math.sqrt(x)", {"x": x}, {})</kbd>  <span class=u>&#x2462;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+  File "&lt;string>", line 1, in &lt;module>
+NameError: name 'math' is not defined</samp></pre>
+<ol>
+<li>The second and third parameters passed to the <code>eval()</code> function act as the global and local namespaces for evaluating the expression. In this case, they are both empty, which means that when the string <code>"x * 5"</code> is evaluated, there is no reference to <var>x</var> in either the global or local namespace, so <code>eval()</code> throws an exception.
+<li>You can selectively include specific values in the global namespace by listing them individually. Then those&nbsp;&mdash;&nbsp;and only those&nbsp;&mdash;&nbsp;variables will be available during evaluation.
+<li>Even though you just imported the <code>math</code> module, you didn&#8217;t include it in the namespace passed to the <code>eval()</code> function, so the evaluation failed.
+</ol>
+
+<p>Gee, that was easy. Lemme make an alphametics web service now!
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("pow(5, 2)", {}, {})</kbd>                   <span class=u>&#x2460;</span></a>
+<samp class=pp>25</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('math').sqrt(5)", {}, {})</kbd>  <span class=u>&#x2461;</span></a>
+<samp class=pp>2.2360679774997898</samp></pre>
+<ol>
+<li>Even though you&#8217;ve passed empty dictionaries for the global and local namespaces, all of Python&#8217;s built-in functions are still available during evaluation. So <code>pow(5, 2)</code> works, because <code>5</code> and <code>2</code> are literals, and <code>pow()</code> is a built-in function.
+<li>Unfortunately (and if you don&#8217;t see why it&#8217;s unfortunate, read on), the <code>__import__()</code> function is also a built-in function, so it works too.
+</ol>
+
+<p>Yeah, that means you can still do nasty things, even if you explicitly set the global and local namespaces to empty dictionaries when calling <code>eval()</code>:
+
+<pre class='nd screen'><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')", {}, {})</kbd></pre>
+
+<p>Oops. I&#8217;m glad I didn&#8217;t make that alphametics web service. Is there <em>any</em> way to use <code>eval()</code> safely? Well, yes and no.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>eval("__import__('math').sqrt(5)",</kbd>
+<a><samp class=p>... </samp><kbd class=pp>    {"__builtins__":None}, {})</kbd>          <span class=u>&#x2460;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+  File "&lt;string>", line 1, in &lt;module>
+NameError: name '__import__' is not defined</samp>
+<samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm -rf /')",</kbd>
+<a><samp class=p>... </samp><kbd class=pp>    {"__builtins__":None}, {})</kbd>          <span class=u>&#x2461;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+  File "&lt;string>", line 1, in &lt;module>
+NameError: name '__import__' is not defined</samp></pre>
+<ol>
+<li>To evaluate untrusted expressions safely, you need to define a global namespace dictionary that maps <code>"__builtins__"</code> to <code>None</code>, the Python null value. Internally, the &#8220;built-in&#8221; functions are contained within a pseudo-module called <code>"__builtins__"</code>. This pseudo-module (<i>i.e.</i> the set of built-in functions) is made available to evaluated expressions unless you explicitly override it.
+<li>Be sure you&#8217;ve overridden <code>__builtins__</code>. Not <code>__builtin__</code>, <code>__built-ins__</code>, or some other variation that will work just fine but expose you to catastrophic risks.
+</ol>
+
+<p>So <code>eval()</code> is safe now? Well, yes and no.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>eval("2 ** 2147483647",</kbd>
+<a><samp class=p>... </samp><kbd class=pp>    {"__builtins__":None}, {})</kbd>          <span class=u>&#x2460;</span></a>
+</pre>
+<ol>
+<li>Even without access to <code>__builtins__</code>, you can still launch a denial-of-service attack. For example, trying to raise <code>2</code> to the <code>2147483647</code><sup>th</sup> power will spike your server&#8217;s <abbr>CPU</abbr> utilization to 100% for quite some time. (If you&#8217;re trying this in the interactive shell, press <kbd>Ctrl-C</kbd> a few times to break out of it.) Technically this expression <em>will</em> return a value eventually, but in the meantime your server will be doing a whole lot of nothing.
+</ol>
+
+<p>In the end, it <em>is</em> possible to safely evaluate untrusted Python expressions, for some definition of &#8220;safe&#8221; that turns out not to be terribly useful in real life. It&#8217;s fine if you&#8217;re just playing around, and it&#8217;s fine if you only ever pass it trusted input. But anything else is just asking for trouble.
+
+<p class=a>&#x2042;
+
+<h2 id=alphametics-finale>Putting It All Together</h2>
+
+<p>To recap: this program solves alphametic puzzles by brute force, <i>i.e.</i> through an exhaustive search of all possible solutions. To do this, it&hellip;
+
+<ol>
+<li><a href=#re-findall>Finds all the letters in the puzzle</a> with the <code>re.findall()</code> function
+<li><a href=#unique-items>Find all the <em>unique</em> letters in the puzzle</a> with sets and the <code>set()</code> function
+<li><a href=#assert>Checks if there are more than 10 unique letters</a> (meaning the puzzle is definitely unsolvable) with an <code>assert</code> statement
+<li><a href=#generator-objects>Converts the letters to their ASCII equivalents</a> with a generator object
+<li><a href=#permutations>Calculates all the possible solutions</a> with the <code>itertools.permutations()</code> function
+<li><a href=#string-translate>Converts each possible solution to a Python expression</a> with the <code>translate()</code> string method
+<li><a href=#eval>Tests each possible solution by evaluating the Python expression</a> with the <code>eval()</code> function
+<li>Returns the first solution that evaluates to <code>True</code>
+</ol>
+
+<p>&hellip;in just 14 lines of code.
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+
+<ul>
+<li><a href=http://docs.python.org/3.1/library/itertools.html><code>itertools</code> module</a>
+<li><a href=http://www.doughellmann.com/PyMOTW/itertools/><code>itertools</code>&nbsp;&mdash;&nbsp;Iterator functions for efficient looping</a>
+<li><a href=http://blip.tv/file/1947373/>Watch Raymond Hettinger&#8217;s &#8220;Easy AI with Python&#8221; talk</a> at PyCon 2009
+<li><a href=http://code.activestate.com/recipes/576615/>Recipe 576615: Alphametics solver</a>, Raymond Hettinger&#8217;s original alphametics solver for Python 2
+<li><a href=http://code.activestate.com/recipes/users/178123/>More of Raymond Hettinger&#8217;s recipes</a> in the ActiveState Code repository
+<li><a href=http://en.wikipedia.org/wiki/Verbal_arithmetic>Alphametics on Wikipedia</a>
+<li><a href=http://www.tkcs-collins.com/truman/alphamet/index.shtml>Alphametics Index</a>, including <a href=http://www.tkcs-collins.com/truman/alphamet/alphamet.shtml>lots of puzzles</a> and <a href=http://www.tkcs-collins.com/truman/alphamet/alpha_gen.shtml>a generator to make your own</a>
+</ul>
+
+<p>Many thanks to Raymond Hettinger for agreeing to relicense his code so I could port it to Python 3 and use it as the basis for this chapter.
+
+<p class=v><a href=iterators.html rel=prev title='back to &#8220;Classes &amp; Iterators&#8221;'><span class=u>&#x261C;</span></a> <a href=unit-testing.html rel=next title='onward to &#8220;Unit Testing&#8221;'><span class=u>&#x261E;</span></a>
+
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/colophon.html b/colophon.html
index 0784e32..aa1df0d 100644
--- a/colophon.html
+++ b/colophon.html
@@ -1,87 +1,87 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<meta name=robots content=noindex>
-<title>Colophon - Dive Into Python 3</title>
-<link rel=stylesheet href=dip3.css>
-<style>
-h1:before,h2:before{content:''}
-.ss{float:right;margin:0 0 1.75em 1.75em}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<h1>Colophon</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> <i lang=fr>Je n&#8217;ai fait celle-ci plus longue que parce que je n&#8217;ai pas eu le loisir de la faire plus courte.</i><br>(I would have written a shorter letter, but I did not have the time.) <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Blaise_Pascal>Blaise Pascal</a>
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>This book, like all books, was a labor of love. Oh sure, I got paid the medium-sized bucks for it, but nobody writes technical books for the money. And since this book is available on the web as well as on paper, I spent a lot of time fiddling with webby stuff when I should have been writing.
-
-<p class='ss nm'><img src=i/openclipart.org_media_files_johnny_automatic_5261.png width=314 height=273 alt='[typewriter]'>
-
-<p>The online edition loads as efficiently as possible. Efficiency never happens by accident; I spent many hours making it so. Perhaps too many hours. Yes, almost certainly too many hours. Never underestimate the depths to which a procrastinating writer will sink.
-
-<p>I won&#8217;t bore you with all the details. Wait, yes&nbsp;&mdash;&nbsp;I will bore you with all the details. But here&#8217;s the short version.
-
-<ol>
-<li>HTML is minimized, then served <a href=http://httpd.apache.org/docs/trunk/mod/mod_deflate.html>compressed</a>.
-<li>Scripts and stylesheets are minimized by <a href=http://developer.yahoo.com/yui/compressor/>YUI Compressor</a> (and also served compressed).
-<li>Scripts are combined to reduce HTTP requests.
-<li>Stylesheets are combined and inlined to reduce HTTP requests.
-<li>Unused CSS selectors and properties are <a href=http://hg.diveintopython3.org/file/default/util/lesscss.py>removed on a page-by-page basis</a> with a little help from <a href=http://pyquery.org/>pyquery</a>.
-<li>HTTP caching and other server-side options are optimized based on advice from <a href=http://developer.yahoo.com/yslow/>YSlow</a> and <a href=http://code.google.com/speed/page-speed/>Page Speed</a>.
-<li>Pages use <a href=http://www.alanwood.net/unicode/unicode_samples.html>Unicode characters</a> in place of images wherever possible.
-<li>Images are optimized with <a href=http://optipng.sourceforge.net/>OptiPNG</a>.
-<li>The entire book was <a href=http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition>lovingly hand-authored in HTML 5</a> to avoid markup cruft.
-</ol>
-
-<p class=a>&#x2042;
-
-<h2 id=typography>Typography</h2>
-
-<p>vertical rhythm, best available ampersand, curly quotes/apostrophes, other stuff from webtypography.net
-
-<p class=a>&#x2042;
-
-<h2 id=graphics>Graphics</h2>
-
-<p>Unicode, callouts, font-family issues on Windows
-
-<p class=a>&#x2042;
-
-<h2 id=performance>Performance</h2>
-
-<p>"Dive Into History 2009 edition", minimizing CSS + JS + HTML, inline CSS, optimizing images
-
-<p class=a>&#x2042;
-
-<h2 id=fun>Fun stuff</h2>
-
-<p>Quotes, constrained writing(?), PapayaWhip
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-
-<ul>
-<li><a href=http://webtypography.net/toc/>The Elements of Typographic Style Applied to the Web</a>
-<li><a href=http://www.alistapart.com/articles/settingtypeontheweb>Setting Type on the Web to a Baseline Grid</a>
-<li><a href=http://24ways.org/2006/compose-to-a-vertical-rhythm>Compose to a Vertical Rhythm</a>
-<li><a href=http://simplebits.com/notebook/2008/08/14/ampersands.html>Use the Best Available Ampersand</a>
-<li><a href=http://alanwood.net/unicode/>Unicode Support in HTML, Fonts, and Web Browsers</a>
-<li><a href=http://developer.yahoo.com/yslow/>YSlow</a> for <a href=http://getfirebug.com/>Firebug</a>
-<li><a href=http://developer.yahoo.com/performance/rules.html>Best Practices for Speeding Up Your Web Site</a>
-<li><a href=http://stevesouders.com/hpws/rules.php>14 Rules for Faster-Loading Web Sites</a>
-<li><a href=http://developer.yahoo.com/yui/compressor/>YUI Compressor</a>
-<li><a href=http://code.google.com/speed/page-speed/>Google Page Speed</a>
-<li><a href=http://code.google.com/speed/page-speed/docs/using.html>Using Google Page Speed</a>
-<li><a href=http://optipng.sourceforge.net/>OptiPNG</a>
-</ul>
-
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/dip3.js></script>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<!DOCTYPE html>
+<meta charset=utf-8>
+<meta name=robots content=noindex>
+<title>Colophon - Dive Into Python 3</title>
+<link rel=stylesheet href=dip3.css>
+<style>
+h1:before,h2:before{content:''}
+.ss{float:right;margin:0 0 1.75em 1.75em}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<h1>Colophon</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> <i lang=fr>Je n&#8217;ai fait celle-ci plus longue que parce que je n&#8217;ai pas eu le loisir de la faire plus courte.</i><br>(I would have written a shorter letter, but I did not have the time.) <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Blaise_Pascal>Blaise Pascal</a>
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>This book, like all books, was a labor of love. Oh sure, I got paid the medium-sized bucks for it, but nobody writes technical books for the money. And since this book is available on the web as well as on paper, I spent a lot of time fiddling with webby stuff when I should have been writing.
+
+<p class='ss nm'><img src=i/openclipart.org_media_files_johnny_automatic_5261.png width=314 height=273 alt='[typewriter]'>
+
+<p>The online edition loads as efficiently as possible. Efficiency never happens by accident; I spent many hours making it so. Perhaps too many hours. Yes, almost certainly too many hours. Never underestimate the depths to which a procrastinating writer will sink.
+
+<p>I won&#8217;t bore you with all the details. Wait, yes&nbsp;&mdash;&nbsp;I will bore you with all the details. But here&#8217;s the short version.
+
+<ol>
+<li>HTML is minimized, then served <a href=http://httpd.apache.org/docs/trunk/mod/mod_deflate.html>compressed</a>.
+<li>Scripts and stylesheets are minimized by <a href=http://developer.yahoo.com/yui/compressor/>YUI Compressor</a> (and also served compressed).
+<li>Scripts are combined to reduce HTTP requests.
+<li>Stylesheets are combined and inlined to reduce HTTP requests.
+<li>Unused CSS selectors and properties are <a href=http://hg.diveintopython3.org/file/default/util/lesscss.py>removed on a page-by-page basis</a> with a little help from <a href=http://pyquery.org/>pyquery</a>.
+<li>HTTP caching and other server-side options are optimized based on advice from <a href=http://developer.yahoo.com/yslow/>YSlow</a> and <a href=http://code.google.com/speed/page-speed/>Page Speed</a>.
+<li>Pages use <a href=http://www.alanwood.net/unicode/unicode_samples.html>Unicode characters</a> in place of images wherever possible.
+<li>Images are optimized with <a href=http://optipng.sourceforge.net/>OptiPNG</a>.
+<li>The entire book was <a href=http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition>lovingly hand-authored in HTML 5</a> to avoid markup cruft.
+</ol>
+
+<p class=a>&#x2042;
+
+<h2 id=typography>Typography</h2>
+
+<p>vertical rhythm, best available ampersand, curly quotes/apostrophes, other stuff from webtypography.net
+
+<p class=a>&#x2042;
+
+<h2 id=graphics>Graphics</h2>
+
+<p>Unicode, callouts, font-family issues on Windows
+
+<p class=a>&#x2042;
+
+<h2 id=performance>Performance</h2>
+
+<p>"Dive Into History 2009 edition", minimizing CSS + JS + HTML, inline CSS, optimizing images
+
+<p class=a>&#x2042;
+
+<h2 id=fun>Fun stuff</h2>
+
+<p>Quotes, constrained writing(?), PapayaWhip
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+
+<ul>
+<li><a href=http://webtypography.net/toc/>The Elements of Typographic Style Applied to the Web</a>
+<li><a href=http://www.alistapart.com/articles/settingtypeontheweb>Setting Type on the Web to a Baseline Grid</a>
+<li><a href=http://24ways.org/2006/compose-to-a-vertical-rhythm>Compose to a Vertical Rhythm</a>
+<li><a href=http://simplebits.com/notebook/2008/08/14/ampersands.html>Use the Best Available Ampersand</a>
+<li><a href=http://alanwood.net/unicode/>Unicode Support in HTML, Fonts, and Web Browsers</a>
+<li><a href=http://developer.yahoo.com/yslow/>YSlow</a> for <a href=http://getfirebug.com/>Firebug</a>
+<li><a href=http://developer.yahoo.com/performance/rules.html>Best Practices for Speeding Up Your Web Site</a>
+<li><a href=http://stevesouders.com/hpws/rules.php>14 Rules for Faster-Loading Web Sites</a>
+<li><a href=http://developer.yahoo.com/yui/compressor/>YUI Compressor</a>
+<li><a href=http://code.google.com/speed/page-speed/>Google Page Speed</a>
+<li><a href=http://code.google.com/speed/page-speed/docs/using.html>Using Google Page Speed</a>
+<li><a href=http://optipng.sourceforge.net/>OptiPNG</a>
+</ul>
+
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/dip3.js></script>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
diff --git a/files.html b/files.html
index 474a5f2..f3edefc 100644
--- a/files.html
+++ b/files.html
@@ -1,607 +1,607 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Files - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 11}
-mark{display:inline}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#files>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
-<h1>Files</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> A nine mile walk is no joke, especially in the rain. <span class=u>&#x275E;</span><br>&mdash; Harry Kemelman, <cite>The Nine Mile Walk</cite>
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>My Windows laptop had 38,493 files before I installed a single application. Installing Python 3 added almost 3,000 files to that total. Files are the primary storage paradigm of every major operating system; the concept is so ingrained that most people would have trouble <a href=http://en.wikipedia.org/wiki/Computer_file#History>imagining an alternative</a>. Your computer is, metaphorically speaking, drowning in files.
-
-<h2 id=reading>Reading From Text Files</h2>
-
-<p>Before you can read from a file, you need to open it. Opening a file in Python couldn&#8217;t be easier:
-
-<pre class='nd pp'><code>a_file = open('examples/chinese.txt', encoding='utf-8')</code></pre>
-
-<p>Python has a built-in <code>open()</code> function, which takes a filename as an argument. Here the filename is <code class=pp>'examples/chinese.txt'</code>. There are five interesting things about this filename:
-
-<ol>
-<li>It&#8217;s not just the name of a file; it&#8217;s a combination of a directory path and a filename. A hypothetical file-opening function could have taken two arguments&nbsp;&mdash;&nbsp;a directory path and a filename&nbsp;&mdash;&nbsp;but the <code>open()</code> function only takes one. In Python, whenever you need a &#8220;filename,&#8221; you can include some or all of a directory path as well.
-<li>The directory path uses a forward slash, but I didn&#8217;t say what operating system I was using. Windows uses backward slashes to denote subdirectories, while Mac OS X and Linux use forward slashes. But in Python, forward slashes always Just Work, even on Windows.
-<li>The directory path does not begin with a slash or a drive letter, so it is called a <i>relative path</i>. Relative to what, you might ask? Patience, grasshopper.
-<li>It&#8217;s a string. All modern operating systems (even Windows!) use Unicode to store the names of files and directories. Python 3 fully supports non-<abbr>ASCII</abbr> pathnames.
-<li>It doesn&#8217;t need to be on your local disk. You might have a network drive mounted. That &#8220;file&#8221; might be a figment of <a href=http://en.wikipedia.org/wiki/Filesystem_in_Userspace>an entirely virtual filesystem</a>. If your computer considers it a file and can access it as a file, Python can open it.
-</ol>
-
-<p>But that call to the <code>open()</code> function didn&#8217;t stop at the filename. There&#8217;s another argument, called <code>encoding</code>. Oh dear, <a href=strings.html#boring-stuff>that sounds dreadfully familiar</a>.
-
-<h3 id=encoding>Character Encoding Rears Its Ugly Head</h3>
-
-<p>Bytes are bytes; <a href=strings.html#byte-arrays>characters are an abstraction</a>. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a &#8220;text file&#8221; from disk, how does Python convert that sequence of bytes into a sequence of characters? It decodes the bytes according to a specific character encoding algorithm and returns a sequence of Unicode characters (otherwise known as a string).
-
-<pre>
-# This example was created on Windows. Other platforms may
-# behave differently, for reasons outlined below.
-<samp class=p>>>> </samp><kbd class=pp>file = open('examples/chinese.txt')</kbd>
-<samp class=p>>>> </samp><kbd class=pp>a_string = file.read()</kbd>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-  File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode
-    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
-UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: character maps to &lt;undefined></samp>
-<samp class=p>>>> </samp></pre>
-
-<aside>The default encoding is platform-dependent.</aside>
-
-<p>What just happened? You didn&#8217;t specify a character encoding, so Python is forced to use the default encoding. What&#8217;s the default encoding? If you look closely at the traceback, you can see that it&#8217;s dying in <code>cp1252.py</code>, meaning that Python is using CP-1252 as the default encoding here. (CP-1252 is a common encoding on computers running Microsoft Windows.) The CP-1252 character set doesn&#8217;t support the characters that are in this file, so the read fails with an ugly <code>UnicodeDecodeError</code>.
-
-<p>But wait, it&#8217;s worse than that! The default encoding is <em>platform-dependent</em>, so this code <em>might</em> work on your computer (if your default encoding is <abbr>UTF-8</abbr>), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252).
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>If you need to get the default character encoding, import the <code>locale</code> module and call <code>locale.getpreferredencoding()</code>. On my Windows laptop, it returns <code>'cp1252'</code>, but on my Linux box upstairs, it returns <code>'UTF8'</code>. I can&#8217;t even maintain consistency in my own house! Your results may be different (even on Windows) depending on which version of your operating system you have installed and how your regional/language settings are configured. This is why it&#8217;s so important to specify the encoding every time you open a file.
-
-</blockquote>
-
-<h3 id=file-objects>Stream Objects</h3>
-
-<p>So far, all we know is that Python has a built-in function called <code>open()</code>. The <code>open()</code> function returns a <i>stream object</i>, which has methods and attributes for getting information about and manipulating a stream of characters.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.name</kbd>                                              <span class=u>&#x2460;</span></a>
-<samp class=pp>'examples/chinese.txt'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.encoding</kbd>                                          <span class=u>&#x2461;</span></a>
-<samp class=pp>'utf-8'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.mode</kbd>                                              <span class=u>&#x2462;</span></a>
-<samp class=pp>'r'</samp></pre>
-<ol>
-<li>The <code>name</code> attribute reflects the name you passed in to the <code>open()</code> function when you opened the file. It is not normalized to an absolute pathname.
-<li>Likewise, <code>encoding</code> attribute reflects the encoding you passed in to the <code>open()</code> function. If you didn&#8217;t specify the encoding when you opened the file (bad developer!) then the <code>encoding</code> attribute will reflect <code>locale.getpreferredencoding()</code>.
-<li>The <code>mode</code> attribute tells you in which mode the file was opened. You can pass an optional <var>mode</var> parameter to the <code>open()</code> function. You didn&#8217;t specify a mode when you opened this file, so Python defaults to <code>'r'</code>, which means &#8220;open for reading only, in text mode.&#8221; As you&#8217;ll see later in this chapter, the file mode serves several purposes; different modes let you write to a file, append to a file, or open a file in binary mode (in which you deal with bytes instead of strings).
-</ol>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>The <a href=http://docs.python.org/3.1/library/io.html#module-interface>documentation for the <code>open()</code> function</a> lists all the possible file modes.
-</blockquote>
-
-<h3 id=read>Reading Data From A Text File</h3>
-
-<p>After you open a file for reading, you&#8217;ll probably want to read from it at some point.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                                            <span class=u>&#x2460;</span></a>
-<samp class=pp>'Dive Into Python 是为有经验的程序员编写的一本 Python 书。\n'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                                            <span class=u>&#x2461;</span></a>
-<samp class=pp>''</samp></pre>
-<ol>
-<li>Once you open a file (with the correct encoding), reading from it is just a matter of calling the stream object&#8217;s <code>read()</code> method. The result is a string.
-<li>Perhaps somewhat surprisingly, reading the file again does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
-</ol>
-
-<aside>Always specify an <code>encoding</code> parameter when you open a file.</aside>
-
-<p>What if you want to re-read a file?
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                      <span class=u>&#x2460;</span></a>
-<samp class=pp>''</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                     <span class=u>&#x2461;</span></a>
-<samp class=pp>0</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(16)</kbd>                    <span class=u>&#x2462;</span></a>
-<samp class=pp>'Dive Into Python'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                     <span class=u>&#x2463;</span></a>
-<samp class=pp>' '</samp>
-<samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>
-<samp class=pp>'是'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                      <span class=u>&#x2464;</span></a>
-<samp class=pp>20</samp></pre>
-<ol>
-<li>Since you&#8217;re still at the end of the file, further calls to the stream object&#8217;s <code>read()</code> method simply return an empty string.
-<li>The <code>seek()</code> method moves to a specific byte position in a file.
-<li>The <code>read()</code> method can take an optional parameter, the number of characters to read.
-<li>If you like, you can even read one character at a time.
-<li>16 + 1 + 1 = &hellip; 20?
-</ol>
-
-<p>Let&#8217;s try that again.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(17)</kbd>                    <span class=u>&#x2460;</span></a>
-<samp class=pp>17</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                     <span class=u>&#x2461;</span></a>
-<samp class=pp>'是'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                      <span class=u>&#x2462;</span></a>
-<samp class=pp>20</samp></pre>
-<ol>
-<li>Move to the 17<sup>th</sup> byte.
-<li>Read one character.
-<li>Now you&#8217;re on the 20<sup>th</sup> byte.
-</ol>
-
-<p>Do you see it yet? The <code>seek()</code> and <code>tell()</code> methods always count <em>bytes</em>, but since you opened this file as text, the <code>read()</code> method counts <em>characters</em>. Chinese characters <a href=strings.html#boring-stuff>require multiple bytes to encode in <abbr>UTF-8</abbr></a>. The English characters in the file only require one byte each, so you might be misled into thinking that the <code>seek()</code> and <code>read()</code> methods are counting the same thing. But that&#8217;s only true for some characters.
-
-<p>But wait, it gets worse!
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(18)</kbd>                         <span class=u>&#x2460;</span></a>
-<samp class=pp>18</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                          <span class=u>&#x2461;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;pyshell#12>", line 1, in &lt;module>
-    a_file.read(1)
-  File "C:\Python31\lib\codecs.py", line 300, in decode
-    (result, consumed) = self._buffer_decode(data, self.errors, final)
-UnicodeDecodeError: 'utf8' codec can't decode byte 0x98 in position 0: unexpected code byte</samp></pre>
-<ol>
-<li>Move to the 18<sup>th</sup> byte and try to read one character.
-<li>Why does this fail? Because there isn&#8217;t a character at the 18<sup>th</sup> byte. The nearest character starts at the 17<sup>th</sup> byte (and goes for three bytes). Trying to read a character from the middle will fail with a <code>UnicodeDecodeError</code>.
-</ol>
-
-<h3 id=close>Closing Files</h3>
-
-<p>Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It&#8217;s important to close files as soon as you&#8217;re finished with them.
-
-<pre class='nd screen'>
-# continued from the previous example
-<samp class=p>>>> </samp><kbd class=pp>a_file.close()</kbd></pre>
-
-<p>Well <em>that</em> was anticlimactic.
-
-<p>The stream object <var>a_file</var> still exists; calling its <code>close()</code> method doesn&#8217;t destroy the object itself. But it&#8217;s not terribly useful.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                           <span class=u>&#x2460;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;pyshell#24>", line 1, in &lt;module>
-    a_file.read()
-ValueError: I/O operation on closed file.</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                          <span class=u>&#x2461;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;pyshell#25>", line 1, in &lt;module>
-    a_file.seek(0)
-ValueError: I/O operation on closed file.</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                           <span class=u>&#x2462;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;pyshell#26>", line 1, in &lt;module>
-    a_file.tell()
-ValueError: I/O operation on closed file.</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.close()</kbd>                          <span class=u>&#x2463;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.closed</kbd>                           <span class=u>&#x2464;</span></a>
-<samp class=pp>True</samp></pre>
-<ol>
-<li>You can&#8217;t read from a closed file; that raises an <code>IOError</code> exception.
-<li>You can&#8217;t seek in a closed file either.
-<li>There&#8217;s no current position in a closed file, so the <code>tell()</code> method also fails.
-<li>Perhaps surprisingly, calling the <code>close()</code> method on a stream object whose file has been closed does <em>not</em> raise an exception. It&#8217;s just a no-op.
-<li>Closed stream objects do have one useful attribute: the <code>closed</code> attribute will confirm that the file is closed.
-</ol>
-
-<h3 id=with>Closing Files Automatically</h3>
-
-<aside><code>try..finally</code> is good. <code>with</code> is better.</aside>
-
-<p>Stream objects have an explicit <code>close()</code> method, but what happens if your code has a bug and crashes before you call <code>close()</code>? That file could theoretically stay open for much longer than necessary. While you&#8217;re debugging on your local computer, that&#8217;s not a big deal. On a production server, maybe it is.
-
-<p>Python 2 had a solution for this: the <code>try..finally</code> block. That still works in Python 3, and you may see it in other people&#8217;s code or in older code that was <a href=case-study-porting-chardet-to-python-3.html>ported to Python 3</a>. But Python 2.5 introduced a cleaner solution, which is now the preferred solution in Python 3: the <code>with</code> statement.
-
-<pre class='nd pp'><code>with open('examples/chinese.txt', encoding='utf-8') as a_file:
-    a_file.seek(17)
-    a_character = a_file.read(1)
-    print(a_character)</code></pre>
-
-<p>This code calls <code>open()</code>, but it never calls <code>a_file.close()</code>. The <code>with</code> statement starts a code block, like an <code>if</code> statement or a <code>for</code> loop. Inside this code block, you can use the variable <var>a_file</var> as the stream object returned from the call to <code>open()</code>. All the regular stream object methods are available&nbsp;&mdash;&nbsp;<code>seek()</code>, <code>read()</code>, whatever you need. When the <code>with</code> block ends, <em>Python calls <code>a_file.close()</code> automatically</em>.
-
-<p>Here&#8217;s the kicker: no matter how or when you exit the <code>with</code> block, Python will close that file&hellip; even if you &#8220;exit&#8221; it via an unhandled exception. That&#8217;s right, even if your code raises an exception and your entire program comes to a screeching halt, that file will get closed. Guaranteed.
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>In technical terms, the <code>with</code> statement creates a <dfn>runtime context</dfn>. In these examples, the stream object acts as a <dfn>context manager</dfn>. Python creates the stream object <var>a_file</var> and tells it that it is entering a runtime context. When the <code>with</code> code block is completed, Python tells the stream object that it is exiting the runtime context, and the stream object calls its own <code>close()</code> method. See <a href=special-method-names.html#context-managers>Appendix B, &#8220;Classes That Can Be Used in a <code>with</code> Block&#8221;</a> for details.
-</blockquote>
-
-<p>There&#8217;s nothing file-specific about the <code>with</code> statement; it&#8217;s just a generic framework for creating runtime contexts and telling objects that they&#8217;re entering and exiting a runtime context. If the object in question is a stream object, then it does useful file-like things (like closing the file automatically). But that behavior is defined in the stream object, not in the <code>with</code> statement. There are lots of other ways to use context managers that have nothing to do with files. You can even create your own, as you&#8217;ll see later in this chapter.
-
-<h3 id=for>Reading Data One Line At A Time</h3>
-
-<p>A &#8220;line&#8221; of a text file is just what you think it is&nbsp;&mdash;&nbsp;you type a few words and press <kbd>ENTER</kbd>, and now you&#8217;re on a new line. A line of text is a sequence of characters delimited by&hellip; what exactly? Well, it&#8217;s complicated, because text files can use several different characters to mark the end of a line. Every operating system has its own convention. Some use a carriage return character, others use a line feed character, and some use both characters at the end of every line.
-
-<p>Now breathe a sigh of relief, because <em>Python handles line endings automatically</em> by default. If you say, &#8220;I want to read this text file one line at a time,&#8221; Python will figure out which kind of line ending the text file uses and and it will all Just Work.
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>If you need fine-grained control over what&#8217;s considered a line ending, you can pass the optional <code>newline</code> parameter to the <code>open()</code> function. See <a href=http://docs.python.org/3.1/library/io.html#module-interface>the <code>open()</code> function documentation</a> for all the gory details.
-</blockquote>
-
-<p>So, how do you actually do it? Read a file one line at a time, that is. It&#8217;s so simple, it&#8217;s beautiful.
-
-<p class=d>[<a href=examples/oneline.py>download <code>oneline.py</code></a>]
-<pre class=pp><code>line_number = 0
-<a>with open('examples/favorite-people.txt', encoding='utf-8') as a_file:  <span class=u>&#x2460;</span></a>
-<a>    for a_line in a_file:                                               <span class=u>&#x2461;</span></a>
-        line_number += 1
-<a>        print('{:>4} {}'.format(line_number, a_line.rstrip()))          <span class=u>&#x2462;</span></a></code></pre>
-<ol>
-<li>Using <a href=#with>the <code>with</code> pattern</a>, you safely open the file and let Python close it for you.
-<li>To read a file one line at a time, use a <code>for</code> loop. That&#8217;s it. Besides having explicit methods like <code>read()</code>, <em>the stream object is also an <a href=iterators.html>iterator</a></em> which spits out a single line every time you ask for a value.
-<li>Using <a href=strings.html#formatting-strings>the <code>format()</code> string method</a>, you can print out the line number and the line itself. The format specifier <code>{:>4}</code> means &#8220;print this argument right-justified within 4 spaces.&#8221; The <var>a_line</var> variable contains the complete line, carriage returns and all. The <code>rstrip()</code> string method removes the trailing whitespace, including the carriage return characters.
-</ol>
-
-<pre class=screen>
-<samp class=p>you@localhost:~/diveintopython3$ </samp><kbd class=pp>python3 examples/oneline.py</kbd>
-<samp>   1 Dora
-   2 Ethan
-   3 Wesley
-   4 John
-   5 Anne
-   6 Mike
-   7 Chris
-   8 Sarah
-   9 Alex
-  10 Lizzie</samp></pre>
-
-<blockquote class=pf>
-<p>Did you get this error?
-<pre class='nd screen'>
-<samp class=p>you@localhost:~/diveintopython3$ </samp><kbd class=pp>python3 examples/oneline.py</kbd>
-<samp class=traceback>Traceback (most recent call last):
-  File "examples/oneline.py", line 4, in &lt;module>
-    print('{:>4} {}'.format(line_number, a_line.rstrip()))
-ValueError: zero length field name in format</samp></pre>
-<p>If so, you&#8217;re probably using Python 3.0. You should really upgrade to Python 3.1.
-<p>Python 3.0 supported string formatting, but only with <a href=strings.html#formatting-strings>explicitly numbered format specifiers</a>. Python 3.1 allows you to omit the argument indexes in your format specifiers. Here is the Python 3.0-compatible version for comparison:
-<pre class='pp nd'><code>print('{<mark>0</mark>:>4} {<mark>1</mark>}'.format(line_number, a_line.rstrip()))</code></pre>
-</blockquote>
-
-<p class=a>&#x2042;
-
-<h2 id=writing>Writing to Text Files</h2>
-
-<aside>Just open a file and start writing.</aside>
-
-<p>You can write to files in much the same way that you read from them. First you open a file and get a stream object, then you use methods on the stream object to write data to the file, then you close the file.
-
-<p>To open a file for writing, use the <code>open()</code> function and specify the write mode. There are two file modes for writing:
-
-<ul>
-<li>&#8220;Write&#8221; mode will overwrite the file. Pass <code>mode='w'</code> to the <code>open()</code> function.
-<li>&#8220;Append&#8221; mode will add data to the end of the file. Pass <code>mode='a'</code> to the <code>open()</code> function.
-</ul>
-
-<p>Either mode will create the file automatically if it doesn&#8217;t already exist, so there&#8217;s never a need for any sort of fiddly &#8220;if the file doesn&#8217;t exist yet, create a new empty file just so you can open it for the first time&#8221; function. Just open a file and start writing.
-
-<p>You should always close a file as soon as you&#8217;re done writing to it, to release the file handle and ensure that the data is actually written to disk. As with reading data from a file, you can call the stream object&#8217;s <code>close()</code> method, or you can use the <code>with</code> statement and let Python close the file for you. I bet you can guess which technique I recommend.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>with open('test.log', mode='w', encoding='utf-8') as a_file:</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>... </samp><kbd class=pp>    a_file.write('test succeeded')</kbd>                            <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>with open('test.log', encoding='utf-8') as a_file:</kbd>
-<samp class=p>... </samp><kbd class=pp>    print(a_file.read())</kbd>                              
-<samp class=pp>test succeeded</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>with open('test.log', mode='a', encoding='utf-8') as a_file:</kbd>  <span class=u>&#x2462;</span></a>
-<samp class=p>... </samp><kbd class=pp>    a_file.write('and again')</kbd>
-<samp class=p>>>> </samp><kbd class=pp>with open('test.log', encoding='utf-8') as a_file:</kbd>
-<samp class=p>... </samp><kbd class=pp>    print(a_file.read())</kbd>                              
-<a><samp class=pp>test succeededand again</samp>                                           <span class=u>&#x2463;</span></a></pre>
-<ol>
-<li>You start boldly by creating the new file <code>test.log</code> (or overwriting the existing file), and opening the file for writing. The <code>mode='w'</code> parameter means open the file for writing. Yes, that&#8217;s all as dangerous as it sounds. I hope you didn&#8217;t care about the previous contents of that file (if any), because that data is gone now.
-<li>You can add data to the newly opened file with the <code>write()</code> method of the stream object returned by the <code>open()</code> function. After the <code>with</code> block ends, Python automatically closes the file.
-<li>That was so fun, let&#8217;s do it again. But this time, with <code>mode='a'</code> to append to the file instead of overwriting it. Appending will <em>never</em> harm the existing contents of the file.
-<li>Both the original line you wrote and the second line you appended are now in the file <code>test.log</code>. Also note that neither carriage returns nor line feeds are included. Since you didn&#8217;t write them explicitly to the file either time, the file doesn&#8217;t include them. You can write a carriage return with the <code>'\r'</code> character, and/or a line feed with the <code>'\n'</code> character. Since you didn&#8217;t do either, everything you wrote to the file ended up on one line.
-</ol>
-
-<h3 id=encoding-again>Character Encoding Again</h3>
-
-<p>Did you notice the <code>encoding</code> parameter that got passed in to the <code>open()</code> function while you were <a href=#writing>opening a file for writing</a>? It&#8217;s important; don&#8217;t ever leave it out! As you saw in the beginning of this chapter, files don&#8217;t contain <i>strings</i>, they contain <i>bytes</i>. Reading a &#8220;string&#8221; from a text file only works because you told Python what encoding to use to read a stream of bytes and convert it to a string. Writing text to a file presents the same problem in reverse. You can&#8217;t write characters to a file; <a href=strings.html#byte-arrays>characters are an abstraction</a>. In order to write to the file, Python needs to know how to convert your string into a sequence of bytes. The only way to be sure it&#8217;s performing the correct conversion is to specify the <code>encoding</code> parameter when you open the file for writing.
-
-<p class=a>&#x2042;
-
-<h2 id=binary>Binary Files</h2>
-
-<p class=ss><img src=examples/beauregard.jpg alt='my dog Beauregard' width=100 height=100>
-
-<p>Not all files contain text. Some of them contain pictures of my dog.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>an_image = open('examples/beauregard.jpg', mode='rb')</kbd>                <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>an_image.mode</kbd>                                                        <span class=u>&#x2461;</span></a>
-<samp class=pp>'rb'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>an_image.name</kbd>                                                        <span class=u>&#x2462;</span></a>
-<samp class=pp>'examples/beauregard.jpg'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>an_image.encoding</kbd>                                                    <span class=u>&#x2463;</span></a>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-AttributeError: '_io.BufferedReader' object has no attribute 'encoding'</samp></pre>
-<ol>
-<li>Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the <code>mode</code> parameter contains a <code>'b'</code> character.
-<li>The stream object you get from opening a file in binary mode has many of the same attributes, including <code>mode</code>, which reflects the <code>mode</code> parameter you passed into the <code>open()</code> function.
-<li>Binary stream objects also have a <code>name</code> attribute, just like text stream objects.
-<li>Here&#8217;s one difference, though: a binary stream object has no <code>encoding</code> attribute. That makes sense, right? You&#8217;re reading (or writing) bytes, not strings, so there&#8217;s no conversion for Python to do. What you get out of a binary file is exactly what you put into it, no conversion necessary.
-</ol>
-
-<p>Did I mention you&#8217;re reading bytes? Oh yes you are.
-
-<pre class=screen>
-# continued from the previous example
-<samp class=p>>>> </samp><kbd class=pp>an_image.tell()</kbd>
-<samp class=pp>0</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>data = an_image.read(3)</kbd>  <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>data</kbd>
-<samp class=pp>b'\xff\xd8\xff'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>type(data)</kbd>               <span class=u>&#x2461;</span></a>
-<samp class=pp>&lt;class 'bytes'></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>an_image.tell()</kbd>          <span class=u>&#x2462;</span></a>
-<samp class=pp>3</samp>
-<samp class=p>>>> </samp><kbd class=pp>an_image.seek(0)</kbd>
-<samp class=pp>0</samp>
-<samp class=p>>>> </samp><kbd class=pp>data = an_image.read()</kbd>
-<samp class=p>>>> </samp><kbd class=pp>len(data)</kbd>
-<samp class=pp>3150</samp></pre>
-<ol>
-<li>Like text files, you can read binary files a little bit at a time. But there&#8217;s a crucial difference&hellip;
-<li>&hellip;you&#8217;re reading bytes, not strings. Since you opened the file in binary mode, the <code>read()</code> method takes <em>the number of bytes to read</em>, not the number of characters.
-<li>That means that there&#8217;s never <a href=#read>an unexpected mismatch</a> between the number you passed into the <code>read()</code> method and the position index you get out of the <code>tell()</code> method. The <code>read()</code> method reads bytes, and the <code>seek()</code> and <code>tell()</code> methods track the number of bytes read. For binary files, they&#8217;ll always agree.
-</ol>
-
-<p class=a>&#x2042;
-
-<h2 id=file-like-objects>Stream Objects From Non-File Sources</h2>
-
-<aside>To read from a fake file, just call <code>read()</code>.</aside>
-
-<p>Imagine you&#8217;re writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it, and close it before exiting. But you shouldn&#8217;t do that. Instead, your <abbr>API</abbr> should take <em>an arbitrary stream object</em>.
-
-<p>In the simplest case, a stream object is anything with a <code>read()</code> method which takes an optional <var>size</var> parameter and returns a string. When called with no <var>size</var> parameter, the <code>read()</code> method should read everything there is to read from the input source and return all the data as a single value. When called with a <var>size</var> parameter, it reads that much from the input source and returns that much data. When called again, it picks up where it left off and returns the next chunk of data.
-
-<p>That sounds exactly like the stream object you get from opening a real file. The difference is that <em>you&#8217;re not limiting yourself to real files</em>. The input source that&#8217;s being &#8220;read&#8221; could be anything: a web page, a string in memory, even the output of another program. As long as your functions take a stream object and simply call the object&#8217;s <code>read()</code> method, you can handle any input source that acts like a file, without specific code to handle each kind of input.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>a_string = 'PapayaWhip is the new black.'</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>import io</kbd>                                  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file = io.StringIO(a_string)</kbd>             <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                              <span class=u>&#x2462;</span></a>
-<samp class=pp>'PapayaWhip is the new black.'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                              <span class=u>&#x2463;</span></a>
-<samp class=pp>''</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                             <span class=u>&#x2464;</span></a>
-<samp class=pp>0</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(10)</kbd>                            <span class=u>&#x2465;</span></a>
-<samp class=pp>'PapayaWhip'</samp>
-<samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                       
-<samp class=pp>10</samp>
-<samp class=p>>>> </samp><kbd class=pp>a_file.seek(18)</kbd>
-<samp class=pp>18</samp>
-<samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>
-<samp class=pp>'new black.'</samp></pre>
-<ol>
-<li>The <code>io</code> module defines the <code>StringIO</code> class that you can use to treat a string in memory as a file.
-<li>To create a stream object out of a string, create an instance of the <code>io.StringIO()</code> class and pass it the string you want to use as your &#8220;file&#8221; data. Now you have a stream object, and you can do all sorts of stream-like things with it.
-<li>Calling the <code>read()</code> method &#8220;reads&#8221; the entire &#8220;file,&#8221; which in the case of a <code>StringIO</code> object simply returns the original string.
-<li>Just like a real file, calling the <code>read()</code> method again returns an empty string.
-<li>You can explicitly seek to the beginning of the string, just like seeking through a real file, by using the <code>seek()</code> method of the <code>StringIO</code> object.
-<li>You can also read the string in chunks, by passing a <var>size</var> parameter to the <code>read()</code> method.
-</ol>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span><code>io.StringIO</code> lets you treat a string as a text file. There&#8217;s also a <code>io.BytesIO</code> class, which lets you treat a byte array as a binary file.
-</blockquote>
-
-<h3 id=gzip>Handling Compressed Files</h3>
-
-<p>The Python standard library contains modules that support reading and writing compressed files. There are a number of different compression schemes; the two most popular on non-Windows systems are <a href=http://docs.python.org/3.1/library/gzip.html>gzip</a> and <a href=http://docs.python.org/3.1/library/bz2.html>bzip2</a>. (You may have also encountered <a href=http://docs.python.org/3.1/library/zipfile.html>PKZIP archives</a> and <a href=http://docs.python.org/3.1/library/tarfile.html>GNU Tar archives</a>. Python has modules for those, too.)
-
-<p>The <code>gzip</code> module lets you create a stream object for reading or writing a gzip-compressed file. The stream object it gives you supports the <code>read()</code> method (if you opened it for reading) or the <code>write()</code> method (if you opened it for writing). That means you can use the methods you&#8217;ve already learned for regular files to <em>directly read or write a gzip-compressed file</em>, without creating a temporary file to store the decompressed data.
-
-<p>As an added bonus, it supports the <code>with</code> statement too, so you can let Python automatically close your gzip-compressed file when you&#8217;re done with it.
-
-<pre class='nd screen'>
-<samp class=p>you@localhost:~$ </samp><kbd>python3</kbd>
-
-<samp class=p>>>> </samp><kbd class=pp>import gzip</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>with gzip.open('out.log.gz', mode='wb') as z_file:</kbd>                                      <span class=u>&#x2460;</span></a>
-<samp class=p>... </samp><kbd class=pp>  z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))</kbd>
-<samp class=p>... </samp>
-<samp class=p>>>> </samp><kbd class=pp>exit()</kbd>
-
-<a><samp class=p>you@localhost:~$ </samp><kbd>ls -l out.log.gz</kbd>                                                           <span class=u>&#x2461;</span></a>
-<samp>-rw-r--r--  1 mark mark    79 2009-07-19 14:29 out.log.gz</samp>
-<a><samp class=p>you@localhost:~$ </samp><kbd>gunzip out.log.gz</kbd>                                                          <span class=u>&#x2462;</span></a>
-<a><samp class=p>you@localhost:~$ </samp><kbd>cat out.log</kbd>                                                                <span class=u>&#x2463;</span></a>
-<samp>A nine mile walk is no joke, especially in the rain.</samp></pre>
-<ol>
-<li>You should always open gzipped files in binary mode. (Note the <code>'b'</code> character in the <code>mode</code> argument.)
-<li>I constructed this example on Linux. If you&#8217;re not familiar with the command line, this command is showing the &#8220;long listing&#8221; of the gzip-compressed file you just created in the Python Shell. This listing shows that the file exists (good), and that it is 79 bytes long. That&#8217;s actually larger than the string you started with! The gzip file format includes a fixed-length header that contains some metadata about the file, so it&#8217;s inefficient for extremely small files.
-<li>The <code>gunzip</code> command (pronounced &#8220;gee-unzip&#8221;) decompresses the file and stores the contents in a new file named the same as the compressed file but without the <code>.gz</code> file extension.
-<li>The <code>cat</code> command displays the contents of a file. This file contains the string you originally wrote directly to the compressed file <code>out.log.gz</code> from within the Python Shell.
-</ol>
-
-<blockquote class=pf>
-<p>Did you get this error?
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>with gzip.open('out.log.gz', mode='wb') as z_file:</kbd>
-<samp class=p>... </samp><kbd class=pp>        z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))</kbd>
-<samp class=p>... </samp>
-<samp class=traceback>Traceback (most recent call last):
- File "&lt;stdin>", line 1, in &lt;module>
-AttributeError: 'GzipFile' object has no attribute '__exit__'</samp></pre>
-<p>If so, you&#8217;re probably using Python 3.0. You should really upgrade to Python 3.1.
-<p>Python 3.0 had a <code>gzip</code> module, but it did not support using a gzipped-file object as a context manager. Python 3.1 added the ability to use gzipped-file objects in a <code>with</code> statement.
-</blockquote>
-
-<p class=a>&#x2042;
-
-<h2 id=stdio>Standard Input, Output, and Error</h2>
-
-<aside><code>sys.stdin</code>, <code>sys.stdout</code>, <code>sys.stderr</code>.</aside>
-
-<p>Command-line gurus are already familiar with the concept of standard input, standard output, and standard error. This section is for the rest of you.
-
-<p>Standard output and standard error (commonly abbreviated <code>stdout</code> and <code>stderr</code>) are pipes that are built into every <abbr>UNIX</abbr>-like system, including Mac OS X and Linux. When you call the <code>print()</code> function, the thing you&#8217;re printing is sent to the <code>stdout</code> pipe. When your program crashes and prints out a traceback, it goes to the <code>stderr</code> pipe. By default, both of these pipes are just connected to the terminal window where you are working; when your program prints something, you see the output in your terminal window, and when a program crashes, you see the traceback in your terminal window too. In the graphical Python Shell, the <code>stdout</code> and <code>stderr</code> pipes default to your &#8220;Interactive Window&#8221;.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>for i in range(3):</kbd>
-<a><samp class=p>... </samp><kbd class=pp>    print('PapayaWhip')</kbd>        <span class=u>&#x2460;</span></a>
-<samp>PapayaWhip
-PapayaWhip
-PapayaWhip</samp>
-<samp class=p>>>> </samp><kbd class=pp>import sys</kbd>
-<samp class=p>>>> </samp><kbd class=pp>for i in range(3):</kbd>
-<a><samp class=p>... </samp><kbd class=pp>sys.stdout.write('is the')</kbd>     <span class=u>&#x2461;</span></a>
-<samp>is theis theis the</samp>
-<samp class=p>>>> </samp><kbd class=pp>for i in range(3):</kbd>
-<a><samp class=p>... </samp><kbd class=pp>sys.stderr.write('new black')</kbd>  <span class=u>&#x2462;</span></a>
-<samp>new blacknew blacknew black</samp></pre>
-<ol>
-<li>The <code>print()</code> function, in a loop. Nothing surprising here.
-<li><code>stdout</code> is defined in the <code>sys</code> module, and it is a <a href=#file-like-objects>stream object</a>. Calling its <code>write()</code> function will print out whatever string you give it. In fact, this is what the <code>print</code> function really does; it adds a carriage return to the end of the string you&#8217;re printing, and calls <code>sys.stdout.write</code>.
-<li>In the simplest case, <code>sys.stdout</code> and <code>sys.stderr</code> send their output to the same place: the Python <abbr>IDE</abbr> (if you&#8217;re in one), or the terminal (if you&#8217;re running Python from the command line). Like standard output, standard error does not add carriage returns for you. If you want carriage returns, you&#8217;ll need to write carriage return characters.
-</ol>
-
-<p><code>sys.stdout</code> and <code>sys.stderr</code> are stream objects, but they are write-only. Attempting to call their <code>read()</code> method will always raise an <code>IOError</code>.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>import sys</kbd>
-<samp class=p>>>> </samp><kbd class=pp>sys.stdout.read()</kbd>
-<samp class=traceback>Traceback (most recent call last):
-  File "&lt;stdin>", line 1, in &lt;module>
-IOError: not readable</samp></pre>
-
-<h3 id=redirect>Redirecting Standard Output</h3>
-
-<p><code>sys.stdout</code> and <code>sys.stderr</code> are stream objects, albeit ones that only support writing. But they&#8217;re not constants; they&#8217;re variables. That means you can assign them a new value&nbsp;&mdash;&nbsp;any other stream object&nbsp;&mdash;&nbsp;to redirect their output.
-
-<p class=d>[<a href=examples/stdout.py>download <code>stdout.py</code></a>]
-<pre class=pp><code>import sys
-
-class RedirectStdoutTo:
-    def __init__(self, out_new):
-        self.out_new = out_new
-
-    def __enter__(self):
-        self.out_old = sys.stdout
-        sys.stdout = self.out_new
-
-    def __exit__(self, *args):
-        sys.stdout = self.out_old
-
-print('A')
-with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):
-    print('B')
-print('C')</code></pre>
-
-<p>Check this out:
-
-<pre class='nd screen'>
-<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 stdout.py</kbd>
-<samp>A
-C</samp>
-<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>cat out.log</kbd>
-<samp>B</samp></pre>
-
-<blockquote class=pf>
-<p>Did you get this error?
-<pre class='nd screen'>
-<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd class=pp>python3 stdout.py</kbd>
-<samp class=traceback>  File "stdout.py", line 15
-    with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):
-                                                              ^
-SyntaxError: invalid syntax</samp></pre>
-<p>If so, you&#8217;re probably using Python 3.0. You should really upgrade to Python 3.1.
-<p>Python 3.0 supported the <code>with</code> statement, but each statement can only use one context manager. Python 3.1 allows you to chain multiple context managers in a single <code>with</code> statement.
-</blockquote>
-
-<p>Let&#8217;s take the last part first.
-
-<pre class=pp><code>print('A')
-with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):
-    print('B')
-print('C')</code></pre>
-
-<p>That&#8217;s a complicated <code>with</code> statement. Let me rewrite it as something more recognizable.
-
-<pre class=pp><code>with open('out.log', mode='w', encoding='utf-8') as a_file:
-    with RedirectStdoutTo(a_file):
-        print('B')</code></pre>
-
-<p>As the rewrite shows, you actually have <em>two</em> <code>with</code> statements, one nested within the scope of the other. The &#8220;outer&#8221; <code>with</code> statement should be familiar by now: it opens a <abbr>UTF-8</abbr>-encoded text file named <code>out.log</code> for writing and assigns the stream object to a variable named <var>a_file</var>. But that&#8217;s not the only thing odd here. 
-<pre class='nd pp'><code>with RedirectStdoutTo(a_file):</code></pre>
-
-<p>Where&#8217;s the <code>as</code> clause? The <code>with</code> statement doesn&#8217;t actually require one. Just like you can call a function and ignore its return value, you can have a <code>with</code> statement that doesn&#8217;t assign the <code>with</code> context to a variable. In this case, you&#8217;re only interested in the side effects of the <code>RedirectStdoutTo</code> context.
-
-<p>What are those side effects? Take a look inside the <code>RedirectStdoutTo</code> class. This class is a custom <a href=special-method-names.html#context-managers>context manager</a>. Any class can be a context manager by defining two <a href=iterators.html#a-fibonacci-iterator>special methods</a>: <code>__enter__()</code> and <code>__exit__()</code>.
-
-<pre class=pp><code>class RedirectStdoutTo:
-<a>    def __init__(self, out_new):    <span class=u>&#x2460;</span></a>
-        self.out_new = out_new
-
-<a>    def __enter__(self):            <span class=u>&#x2461;</span></a>
-        self.out_old = sys.stdout
-        sys.stdout = self.out_new
-
-<a>    def __exit__(self, *args):      <span class=u>&#x2462;</span></a>
-        sys.stdout = self.out_old</code></pre>
-<ol>
-<li>The <code>__init__()</code> method is called immediately after an instance is created. It takes one parameter, the stream object that you want to use as standard output for the life of the context. This method just saves the stream object in an instance variable so other methods can use it later.
-<li>The <code>__enter__()</code> method is a <a href=iterators.html#a-fibonacci-iterator>special class method</a>; Python calls it when entering a context (<i>i.e.</i> at the beginning of the <code>with</code> statement). This method saves the current value of <code>sys.stdout</code> in <var>self.out_old</var>, then redirects standard output by assigning <var>self.out_new</var> to <var>sys.stdout</var>.
-<li>The <code>__exit__()</code> method is another special class method; Python calls it when exiting the context (<i>i.e.</i> at the end of the <code>with</code> statement). This method restores standard output to its original value by assigning the saved <var>self.out_old</var> value to <var>sys.stdout</var>.
-</ol>
-
-<p>Putting it all together:
-
-<pre class=pp><code>
-<a>print('A')                                                                             <span class=u>&#x2460;</span></a>
-<a>with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):  <span class=u>&#x2461;</span></a>
-<a>    print('B')                                                                         <span class=u>&#x2462;</span></a>
-<a>print('C')                                                                             <span class=u>&#x2463;</span></a></code></pre>
-<ol>
-<li>This will print to the <abbr>IDE</abbr> &#8220;Interactive Window&#8221; (or the terminal, if running the script from the command line).
-<li>This <a href=#with><code>with</code> statement</a> takes <em>a comma-separated list of contexts</em>. The comma-separated list acts like a series of nested <code>with</code> blocks. The first context listed is the &#8220;outer&#8221; block; the last one listed is the &#8220;inner&#8221; block. The first context opens a file; the second context redirects <code>sys.stdout</code> to the stream object that was created in the first context.
-<li>Because this <code>print()</code> function is executed with the context created by the <code>with</code> statement, it will not print to the screen; it will write to the file <code>out.log</code>.
-<li>The <code>with</code> code block is over. Python has told each context manager to do whatever it is they do upon exiting a context. The context managers form a last-in-first-out stack. Upon exiting, the second context changed <code>sys.stdout</code> back to its original value, then the first context closed the file named <code>out.log</code>. Since standard output has been restored to its original value, calling the <code>print()</code> function will once again print to the screen.
-</ol>
-
-<p>Redirecting standard error works exactly the same way, using <code>sys.stderr</code> instead of <code>sys.stdout</code>.
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-
-<ul>
-<li><a href=http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files>Reading and writing files</a> in the Python.org tutorial
-<li><a href=http://docs.python.org/3.1/library/io.html><code>io</code> module</a>
-<li><a href=http://docs.python.org/3.1/library/stdtypes.html#file-objects>Stream objects</a>
-<li><a href=http://docs.python.org/3.1/library/stdtypes.html#context-manager-types>Context manager types</a>
-<li><a href=http://docs.python.org/3.1/library/sys.html#sys.stdout><code>sys.stdout</code> and <code>sys.stderr</code></a>
-<li><a href=http://en.wikipedia.org/wiki/Filesystem_in_Userspace><abbr>FUSE</abbr> on Wikipedia</a>
-</ul>
-
-<p class=v><a href=refactoring.html rel=prev title='back to &#8220;Refactoring&#8221;'><span class=u>&#x261C;</span></a> <a href=xml.html rel=next title='onward to &#8220;XML&#8221;'><span class=u>&#x261E;</span></a>
-
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Files - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 11}
+mark{display:inline}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#files>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
+<h1>Files</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> A nine mile walk is no joke, especially in the rain. <span class=u>&#x275E;</span><br>&mdash; Harry Kemelman, <cite>The Nine Mile Walk</cite>
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>My Windows laptop had 38,493 files before I installed a single application. Installing Python 3 added almost 3,000 files to that total. Files are the primary storage paradigm of every major operating system; the concept is so ingrained that most people would have trouble <a href=http://en.wikipedia.org/wiki/Computer_file#History>imagining an alternative</a>. Your computer is, metaphorically speaking, drowning in files.
+
+<h2 id=reading>Reading From Text Files</h2>
+
+<p>Before you can read from a file, you need to open it. Opening a file in Python couldn&#8217;t be easier:
+
+<pre class='nd pp'><code>a_file = open('examples/chinese.txt', encoding='utf-8')</code></pre>
+
+<p>Python has a built-in <code>open()</code> function, which takes a filename as an argument. Here the filename is <code class=pp>'examples/chinese.txt'</code>. There are five interesting things about this filename:
+
+<ol>
+<li>It&#8217;s not just the name of a file; it&#8217;s a combination of a directory path and a filename. A hypothetical file-opening function could have taken two arguments&nbsp;&mdash;&nbsp;a directory path and a filename&nbsp;&mdash;&nbsp;but the <code>open()</code> function only takes one. In Python, whenever you need a &#8220;filename,&#8221; you can include some or all of a directory path as well.
+<li>The directory path uses a forward slash, but I didn&#8217;t say what operating system I was using. Windows uses backward slashes to denote subdirectories, while Mac OS X and Linux use forward slashes. But in Python, forward slashes always Just Work, even on Windows.
+<li>The directory path does not begin with a slash or a drive letter, so it is called a <i>relative path</i>. Relative to what, you might ask? Patience, grasshopper.
+<li>It&#8217;s a string. All modern operating systems (even Windows!) use Unicode to store the names of files and directories. Python 3 fully supports non-<abbr>ASCII</abbr> pathnames.
+<li>It doesn&#8217;t need to be on your local disk. You might have a network drive mounted. That &#8220;file&#8221; might be a figment of <a href=http://en.wikipedia.org/wiki/Filesystem_in_Userspace>an entirely virtual filesystem</a>. If your computer considers it a file and can access it as a file, Python can open it.
+</ol>
+
+<p>But that call to the <code>open()</code> function didn&#8217;t stop at the filename. There&#8217;s another argument, called <code>encoding</code>. Oh dear, <a href=strings.html#boring-stuff>that sounds dreadfully familiar</a>.
+
+<h3 id=encoding>Character Encoding Rears Its Ugly Head</h3>
+
+<p>Bytes are bytes; <a href=strings.html#byte-arrays>characters are an abstraction</a>. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a &#8220;text file&#8221; from disk, how does Python convert that sequence of bytes into a sequence of characters? It decodes the bytes according to a specific character encoding algorithm and returns a sequence of Unicode characters (otherwise known as a string).
+
+<pre>
+# This example was created on Windows. Other platforms may
+# behave differently, for reasons outlined below.
+<samp class=p>>>> </samp><kbd class=pp>file = open('examples/chinese.txt')</kbd>
+<samp class=p>>>> </samp><kbd class=pp>a_string = file.read()</kbd>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+  File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode
+    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
+UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: character maps to &lt;undefined></samp>
+<samp class=p>>>> </samp></pre>
+
+<aside>The default encoding is platform-dependent.</aside>
+
+<p>What just happened? You didn&#8217;t specify a character encoding, so Python is forced to use the default encoding. What&#8217;s the default encoding? If you look closely at the traceback, you can see that it&#8217;s dying in <code>cp1252.py</code>, meaning that Python is using CP-1252 as the default encoding here. (CP-1252 is a common encoding on computers running Microsoft Windows.) The CP-1252 character set doesn&#8217;t support the characters that are in this file, so the read fails with an ugly <code>UnicodeDecodeError</code>.
+
+<p>But wait, it&#8217;s worse than that! The default encoding is <em>platform-dependent</em>, so this code <em>might</em> work on your computer (if your default encoding is <abbr>UTF-8</abbr>), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252).
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>If you need to get the default character encoding, import the <code>locale</code> module and call <code>locale.getpreferredencoding()</code>. On my Windows laptop, it returns <code>'cp1252'</code>, but on my Linux box upstairs, it returns <code>'UTF8'</code>. I can&#8217;t even maintain consistency in my own house! Your results may be different (even on Windows) depending on which version of your operating system you have installed and how your regional/language settings are configured. This is why it&#8217;s so important to specify the encoding every time you open a file.
+
+</blockquote>
+
+<h3 id=file-objects>Stream Objects</h3>
+
+<p>So far, all we know is that Python has a built-in function called <code>open()</code>. The <code>open()</code> function returns a <i>stream object</i>, which has methods and attributes for getting information about and manipulating a stream of characters.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.name</kbd>                                              <span class=u>&#x2460;</span></a>
+<samp class=pp>'examples/chinese.txt'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.encoding</kbd>                                          <span class=u>&#x2461;</span></a>
+<samp class=pp>'utf-8'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.mode</kbd>                                              <span class=u>&#x2462;</span></a>
+<samp class=pp>'r'</samp></pre>
+<ol>
+<li>The <code>name</code> attribute reflects the name you passed in to the <code>open()</code> function when you opened the file. It is not normalized to an absolute pathname.
+<li>Likewise, <code>encoding</code> attribute reflects the encoding you passed in to the <code>open()</code> function. If you didn&#8217;t specify the encoding when you opened the file (bad developer!) then the <code>encoding</code> attribute will reflect <code>locale.getpreferredencoding()</code>.
+<li>The <code>mode</code> attribute tells you in which mode the file was opened. You can pass an optional <var>mode</var> parameter to the <code>open()</code> function. You didn&#8217;t specify a mode when you opened this file, so Python defaults to <code>'r'</code>, which means &#8220;open for reading only, in text mode.&#8221; As you&#8217;ll see later in this chapter, the file mode serves several purposes; different modes let you write to a file, append to a file, or open a file in binary mode (in which you deal with bytes instead of strings).
+</ol>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>The <a href=http://docs.python.org/3.1/library/io.html#module-interface>documentation for the <code>open()</code> function</a> lists all the possible file modes.
+</blockquote>
+
+<h3 id=read>Reading Data From A Text File</h3>
+
+<p>After you open a file for reading, you&#8217;ll probably want to read from it at some point.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                                            <span class=u>&#x2460;</span></a>
+<samp class=pp>'Dive Into Python 是为有经验的程序员编写的一本 Python 书。\n'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                                            <span class=u>&#x2461;</span></a>
+<samp class=pp>''</samp></pre>
+<ol>
+<li>Once you open a file (with the correct encoding), reading from it is just a matter of calling the stream object&#8217;s <code>read()</code> method. The result is a string.
+<li>Perhaps somewhat surprisingly, reading the file again does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
+</ol>
+
+<aside>Always specify an <code>encoding</code> parameter when you open a file.</aside>
+
+<p>What if you want to re-read a file?
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                      <span class=u>&#x2460;</span></a>
+<samp class=pp>''</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                     <span class=u>&#x2461;</span></a>
+<samp class=pp>0</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(16)</kbd>                    <span class=u>&#x2462;</span></a>
+<samp class=pp>'Dive Into Python'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                     <span class=u>&#x2463;</span></a>
+<samp class=pp>' '</samp>
+<samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>
+<samp class=pp>'是'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                      <span class=u>&#x2464;</span></a>
+<samp class=pp>20</samp></pre>
+<ol>
+<li>Since you&#8217;re still at the end of the file, further calls to the stream object&#8217;s <code>read()</code> method simply return an empty string.
+<li>The <code>seek()</code> method moves to a specific byte position in a file.
+<li>The <code>read()</code> method can take an optional parameter, the number of characters to read.
+<li>If you like, you can even read one character at a time.
+<li>16 + 1 + 1 = &hellip; 20?
+</ol>
+
+<p>Let&#8217;s try that again.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(17)</kbd>                    <span class=u>&#x2460;</span></a>
+<samp class=pp>17</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                     <span class=u>&#x2461;</span></a>
+<samp class=pp>'是'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                      <span class=u>&#x2462;</span></a>
+<samp class=pp>20</samp></pre>
+<ol>
+<li>Move to the 17<sup>th</sup> byte.
+<li>Read one character.
+<li>Now you&#8217;re on the 20<sup>th</sup> byte.
+</ol>
+
+<p>Do you see it yet? The <code>seek()</code> and <code>tell()</code> methods always count <em>bytes</em>, but since you opened this file as text, the <code>read()</code> method counts <em>characters</em>. Chinese characters <a href=strings.html#boring-stuff>require multiple bytes to encode in <abbr>UTF-8</abbr></a>. The English characters in the file only require one byte each, so you might be misled into thinking that the <code>seek()</code> and <code>read()</code> methods are counting the same thing. But that&#8217;s only true for some characters.
+
+<p>But wait, it gets worse!
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(18)</kbd>                         <span class=u>&#x2460;</span></a>
+<samp class=pp>18</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                          <span class=u>&#x2461;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;pyshell#12>", line 1, in &lt;module>
+    a_file.read(1)
+  File "C:\Python31\lib\codecs.py", line 300, in decode
+    (result, consumed) = self._buffer_decode(data, self.errors, final)
+UnicodeDecodeError: 'utf8' codec can't decode byte 0x98 in position 0: unexpected code byte</samp></pre>
+<ol>
+<li>Move to the 18<sup>th</sup> byte and try to read one character.
+<li>Why does this fail? Because there isn&#8217;t a character at the 18<sup>th</sup> byte. The nearest character starts at the 17<sup>th</sup> byte (and goes for three bytes). Trying to read a character from the middle will fail with a <code>UnicodeDecodeError</code>.
+</ol>
+
+<h3 id=close>Closing Files</h3>
+
+<p>Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It&#8217;s important to close files as soon as you&#8217;re finished with them.
+
+<pre class='nd screen'>
+# continued from the previous example
+<samp class=p>>>> </samp><kbd class=pp>a_file.close()</kbd></pre>
+
+<p>Well <em>that</em> was anticlimactic.
+
+<p>The stream object <var>a_file</var> still exists; calling its <code>close()</code> method doesn&#8217;t destroy the object itself. But it&#8217;s not terribly useful.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                           <span class=u>&#x2460;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;pyshell#24>", line 1, in &lt;module>
+    a_file.read()
+ValueError: I/O operation on closed file.</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                          <span class=u>&#x2461;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;pyshell#25>", line 1, in &lt;module>
+    a_file.seek(0)
+ValueError: I/O operation on closed file.</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                           <span class=u>&#x2462;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;pyshell#26>", line 1, in &lt;module>
+    a_file.tell()
+ValueError: I/O operation on closed file.</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.close()</kbd>                          <span class=u>&#x2463;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.closed</kbd>                           <span class=u>&#x2464;</span></a>
+<samp class=pp>True</samp></pre>
+<ol>
+<li>You can&#8217;t read from a closed file; that raises an <code>IOError</code> exception.
+<li>You can&#8217;t seek in a closed file either.
+<li>There&#8217;s no current position in a closed file, so the <code>tell()</code> method also fails.
+<li>Perhaps surprisingly, calling the <code>close()</code> method on a stream object whose file has been closed does <em>not</em> raise an exception. It&#8217;s just a no-op.
+<li>Closed stream objects do have one useful attribute: the <code>closed</code> attribute will confirm that the file is closed.
+</ol>
+
+<h3 id=with>Closing Files Automatically</h3>
+
+<aside><code>try..finally</code> is good. <code>with</code> is better.</aside>
+
+<p>Stream objects have an explicit <code>close()</code> method, but what happens if your code has a bug and crashes before you call <code>close()</code>? That file could theoretically stay open for much longer than necessary. While you&#8217;re debugging on your local computer, that&#8217;s not a big deal. On a production server, maybe it is.
+
+<p>Python 2 had a solution for this: the <code>try..finally</code> block. That still works in Python 3, and you may see it in other people&#8217;s code or in older code that was <a href=case-study-porting-chardet-to-python-3.html>ported to Python 3</a>. But Python 2.5 introduced a cleaner solution, which is now the preferred solution in Python 3: the <code>with</code> statement.
+
+<pre class='nd pp'><code>with open('examples/chinese.txt', encoding='utf-8') as a_file:
+    a_file.seek(17)
+    a_character = a_file.read(1)
+    print(a_character)</code></pre>
+
+<p>This code calls <code>open()</code>, but it never calls <code>a_file.close()</code>. The <code>with</code> statement starts a code block, like an <code>if</code> statement or a <code>for</code> loop. Inside this code block, you can use the variable <var>a_file</var> as the stream object returned from the call to <code>open()</code>. All the regular stream object methods are available&nbsp;&mdash;&nbsp;<code>seek()</code>, <code>read()</code>, whatever you need. When the <code>with</code> block ends, <em>Python calls <code>a_file.close()</code> automatically</em>.
+
+<p>Here&#8217;s the kicker: no matter how or when you exit the <code>with</code> block, Python will close that file&hellip; even if you &#8220;exit&#8221; it via an unhandled exception. That&#8217;s right, even if your code raises an exception and your entire program comes to a screeching halt, that file will get closed. Guaranteed.
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>In technical terms, the <code>with</code> statement creates a <dfn>runtime context</dfn>. In these examples, the stream object acts as a <dfn>context manager</dfn>. Python creates the stream object <var>a_file</var> and tells it that it is entering a runtime context. When the <code>with</code> code block is completed, Python tells the stream object that it is exiting the runtime context, and the stream object calls its own <code>close()</code> method. See <a href=special-method-names.html#context-managers>Appendix B, &#8220;Classes That Can Be Used in a <code>with</code> Block&#8221;</a> for details.
+</blockquote>
+
+<p>There&#8217;s nothing file-specific about the <code>with</code> statement; it&#8217;s just a generic framework for creating runtime contexts and telling objects that they&#8217;re entering and exiting a runtime context. If the object in question is a stream object, then it does useful file-like things (like closing the file automatically). But that behavior is defined in the stream object, not in the <code>with</code> statement. There are lots of other ways to use context managers that have nothing to do with files. You can even create your own, as you&#8217;ll see later in this chapter.
+
+<h3 id=for>Reading Data One Line At A Time</h3>
+
+<p>A &#8220;line&#8221; of a text file is just what you think it is&nbsp;&mdash;&nbsp;you type a few words and press <kbd>ENTER</kbd>, and now you&#8217;re on a new line. A line of text is a sequence of characters delimited by&hellip; what exactly? Well, it&#8217;s complicated, because text files can use several different characters to mark the end of a line. Every operating system has its own convention. Some use a carriage return character, others use a line feed character, and some use both characters at the end of every line.
+
+<p>Now breathe a sigh of relief, because <em>Python handles line endings automatically</em> by default. If you say, &#8220;I want to read this text file one line at a time,&#8221; Python will figure out which kind of line ending the text file uses and and it will all Just Work.
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>If you need fine-grained control over what&#8217;s considered a line ending, you can pass the optional <code>newline</code> parameter to the <code>open()</code> function. See <a href=http://docs.python.org/3.1/library/io.html#module-interface>the <code>open()</code> function documentation</a> for all the gory details.
+</blockquote>
+
+<p>So, how do you actually do it? Read a file one line at a time, that is. It&#8217;s so simple, it&#8217;s beautiful.
+
+<p class=d>[<a href=examples/oneline.py>download <code>oneline.py</code></a>]
+<pre class=pp><code>line_number = 0
+<a>with open('examples/favorite-people.txt', encoding='utf-8') as a_file:  <span class=u>&#x2460;</span></a>
+<a>    for a_line in a_file:                                               <span class=u>&#x2461;</span></a>
+        line_number += 1
+<a>        print('{:>4} {}'.format(line_number, a_line.rstrip()))          <span class=u>&#x2462;</span></a></code></pre>
+<ol>
+<li>Using <a href=#with>the <code>with</code> pattern</a>, you safely open the file and let Python close it for you.
+<li>To read a file one line at a time, use a <code>for</code> loop. That&#8217;s it. Besides having explicit methods like <code>read()</code>, <em>the stream object is also an <a href=iterators.html>iterator</a></em> which spits out a single line every time you ask for a value.
+<li>Using <a href=strings.html#formatting-strings>the <code>format()</code> string method</a>, you can print out the line number and the line itself. The format specifier <code>{:>4}</code> means &#8220;print this argument right-justified within 4 spaces.&#8221; The <var>a_line</var> variable contains the complete line, carriage returns and all. The <code>rstrip()</code> string method removes the trailing whitespace, including the carriage return characters.
+</ol>
+
+<pre class=screen>
+<samp class=p>you@localhost:~/diveintopython3$ </samp><kbd class=pp>python3 examples/oneline.py</kbd>
+<samp>   1 Dora
+   2 Ethan
+   3 Wesley
+   4 John
+   5 Anne
+   6 Mike
+   7 Chris
+   8 Sarah
+   9 Alex
+  10 Lizzie</samp></pre>
+
+<blockquote class=pf>
+<p>Did you get this error?
+<pre class='nd screen'>
+<samp class=p>you@localhost:~/diveintopython3$ </samp><kbd class=pp>python3 examples/oneline.py</kbd>
+<samp class=traceback>Traceback (most recent call last):
+  File "examples/oneline.py", line 4, in &lt;module>
+    print('{:>4} {}'.format(line_number, a_line.rstrip()))
+ValueError: zero length field name in format</samp></pre>
+<p>If so, you&#8217;re probably using Python 3.0. You should really upgrade to Python 3.1.
+<p>Python 3.0 supported string formatting, but only with <a href=strings.html#formatting-strings>explicitly numbered format specifiers</a>. Python 3.1 allows you to omit the argument indexes in your format specifiers. Here is the Python 3.0-compatible version for comparison:
+<pre class='pp nd'><code>print('{<mark>0</mark>:>4} {<mark>1</mark>}'.format(line_number, a_line.rstrip()))</code></pre>
+</blockquote>
+
+<p class=a>&#x2042;
+
+<h2 id=writing>Writing to Text Files</h2>
+
+<aside>Just open a file and start writing.</aside>
+
+<p>You can write to files in much the same way that you read from them. First you open a file and get a stream object, then you use methods on the stream object to write data to the file, then you close the file.
+
+<p>To open a file for writing, use the <code>open()</code> function and specify the write mode. There are two file modes for writing:
+
+<ul>
+<li>&#8220;Write&#8221; mode will overwrite the file. Pass <code>mode='w'</code> to the <code>open()</code> function.
+<li>&#8220;Append&#8221; mode will add data to the end of the file. Pass <code>mode='a'</code> to the <code>open()</code> function.
+</ul>
+
+<p>Either mode will create the file automatically if it doesn&#8217;t already exist, so there&#8217;s never a need for any sort of fiddly &#8220;if the file doesn&#8217;t exist yet, create a new empty file just so you can open it for the first time&#8221; function. Just open a file and start writing.
+
+<p>You should always close a file as soon as you&#8217;re done writing to it, to release the file handle and ensure that the data is actually written to disk. As with reading data from a file, you can call the stream object&#8217;s <code>close()</code> method, or you can use the <code>with</code> statement and let Python close the file for you. I bet you can guess which technique I recommend.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>with open('test.log', mode='w', encoding='utf-8') as a_file:</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>... </samp><kbd class=pp>    a_file.write('test succeeded')</kbd>                            <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>with open('test.log', encoding='utf-8') as a_file:</kbd>
+<samp class=p>... </samp><kbd class=pp>    print(a_file.read())</kbd>                              
+<samp class=pp>test succeeded</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>with open('test.log', mode='a', encoding='utf-8') as a_file:</kbd>  <span class=u>&#x2462;</span></a>
+<samp class=p>... </samp><kbd class=pp>    a_file.write('and again')</kbd>
+<samp class=p>>>> </samp><kbd class=pp>with open('test.log', encoding='utf-8') as a_file:</kbd>
+<samp class=p>... </samp><kbd class=pp>    print(a_file.read())</kbd>                              
+<a><samp class=pp>test succeededand again</samp>                                           <span class=u>&#x2463;</span></a></pre>
+<ol>
+<li>You start boldly by creating the new file <code>test.log</code> (or overwriting the existing file), and opening the file for writing. The <code>mode='w'</code> parameter means open the file for writing. Yes, that&#8217;s all as dangerous as it sounds. I hope you didn&#8217;t care about the previous contents of that file (if any), because that data is gone now.
+<li>You can add data to the newly opened file with the <code>write()</code> method of the stream object returned by the <code>open()</code> function. After the <code>with</code> block ends, Python automatically closes the file.
+<li>That was so fun, let&#8217;s do it again. But this time, with <code>mode='a'</code> to append to the file instead of overwriting it. Appending will <em>never</em> harm the existing contents of the file.
+<li>Both the original line you wrote and the second line you appended are now in the file <code>test.log</code>. Also note that neither carriage returns nor line feeds are included. Since you didn&#8217;t write them explicitly to the file either time, the file doesn&#8217;t include them. You can write a carriage return with the <code>'\r'</code> character, and/or a line feed with the <code>'\n'</code> character. Since you didn&#8217;t do either, everything you wrote to the file ended up on one line.
+</ol>
+
+<h3 id=encoding-again>Character Encoding Again</h3>
+
+<p>Did you notice the <code>encoding</code> parameter that got passed in to the <code>open()</code> function while you were <a href=#writing>opening a file for writing</a>? It&#8217;s important; don&#8217;t ever leave it out! As you saw in the beginning of this chapter, files don&#8217;t contain <i>strings</i>, they contain <i>bytes</i>. Reading a &#8220;string&#8221; from a text file only works because you told Python what encoding to use to read a stream of bytes and convert it to a string. Writing text to a file presents the same problem in reverse. You can&#8217;t write characters to a file; <a href=strings.html#byte-arrays>characters are an abstraction</a>. In order to write to the file, Python needs to know how to convert your string into a sequence of bytes. The only way to be sure it&#8217;s performing the correct conversion is to specify the <code>encoding</code> parameter when you open the file for writing.
+
+<p class=a>&#x2042;
+
+<h2 id=binary>Binary Files</h2>
+
+<p class=ss><img src=examples/beauregard.jpg alt='my dog Beauregard' width=100 height=100>
+
+<p>Not all files contain text. Some of them contain pictures of my dog.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>an_image = open('examples/beauregard.jpg', mode='rb')</kbd>                <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>an_image.mode</kbd>                                                        <span class=u>&#x2461;</span></a>
+<samp class=pp>'rb'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>an_image.name</kbd>                                                        <span class=u>&#x2462;</span></a>
+<samp class=pp>'examples/beauregard.jpg'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>an_image.encoding</kbd>                                                    <span class=u>&#x2463;</span></a>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+AttributeError: '_io.BufferedReader' object has no attribute 'encoding'</samp></pre>
+<ol>
+<li>Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the <code>mode</code> parameter contains a <code>'b'</code> character.
+<li>The stream object you get from opening a file in binary mode has many of the same attributes, including <code>mode</code>, which reflects the <code>mode</code> parameter you passed into the <code>open()</code> function.
+<li>Binary stream objects also have a <code>name</code> attribute, just like text stream objects.
+<li>Here&#8217;s one difference, though: a binary stream object has no <code>encoding</code> attribute. That makes sense, right? You&#8217;re reading (or writing) bytes, not strings, so there&#8217;s no conversion for Python to do. What you get out of a binary file is exactly what you put into it, no conversion necessary.
+</ol>
+
+<p>Did I mention you&#8217;re reading bytes? Oh yes you are.
+
+<pre class=screen>
+# continued from the previous example
+<samp class=p>>>> </samp><kbd class=pp>an_image.tell()</kbd>
+<samp class=pp>0</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>data = an_image.read(3)</kbd>  <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>data</kbd>
+<samp class=pp>b'\xff\xd8\xff'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>type(data)</kbd>               <span class=u>&#x2461;</span></a>
+<samp class=pp>&lt;class 'bytes'></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>an_image.tell()</kbd>          <span class=u>&#x2462;</span></a>
+<samp class=pp>3</samp>
+<samp class=p>>>> </samp><kbd class=pp>an_image.seek(0)</kbd>
+<samp class=pp>0</samp>
+<samp class=p>>>> </samp><kbd class=pp>data = an_image.read()</kbd>
+<samp class=p>>>> </samp><kbd class=pp>len(data)</kbd>
+<samp class=pp>3150</samp></pre>
+<ol>
+<li>Like text files, you can read binary files a little bit at a time. But there&#8217;s a crucial difference&hellip;
+<li>&hellip;you&#8217;re reading bytes, not strings. Since you opened the file in binary mode, the <code>read()</code> method takes <em>the number of bytes to read</em>, not the number of characters.
+<li>That means that there&#8217;s never <a href=#read>an unexpected mismatch</a> between the number you passed into the <code>read()</code> method and the position index you get out of the <code>tell()</code> method. The <code>read()</code> method reads bytes, and the <code>seek()</code> and <code>tell()</code> methods track the number of bytes read. For binary files, they&#8217;ll always agree.
+</ol>
+
+<p class=a>&#x2042;
+
+<h2 id=file-like-objects>Stream Objects From Non-File Sources</h2>
+
+<aside>To read from a fake file, just call <code>read()</code>.</aside>
+
+<p>Imagine you&#8217;re writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it, and close it before exiting. But you shouldn&#8217;t do that. Instead, your <abbr>API</abbr> should take <em>an arbitrary stream object</em>.
+
+<p>In the simplest case, a stream object is anything with a <code>read()</code> method which takes an optional <var>size</var> parameter and returns a string. When called with no <var>size</var> parameter, the <code>read()</code> method should read everything there is to read from the input source and return all the data as a single value. When called with a <var>size</var> parameter, it reads that much from the input source and returns that much data. When called again, it picks up where it left off and returns the next chunk of data.
+
+<p>That sounds exactly like the stream object you get from opening a real file. The difference is that <em>you&#8217;re not limiting yourself to real files</em>. The input source that&#8217;s being &#8220;read&#8221; could be anything: a web page, a string in memory, even the output of another program. As long as your functions take a stream object and simply call the object&#8217;s <code>read()</code> method, you can handle any input source that acts like a file, without specific code to handle each kind of input.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>a_string = 'PapayaWhip is the new black.'</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>import io</kbd>                                  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file = io.StringIO(a_string)</kbd>             <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                              <span class=u>&#x2462;</span></a>
+<samp class=pp>'PapayaWhip is the new black.'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                              <span class=u>&#x2463;</span></a>
+<samp class=pp>''</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                             <span class=u>&#x2464;</span></a>
+<samp class=pp>0</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(10)</kbd>                            <span class=u>&#x2465;</span></a>
+<samp class=pp>'PapayaWhip'</samp>
+<samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                       
+<samp class=pp>10</samp>
+<samp class=p>>>> </samp><kbd class=pp>a_file.seek(18)</kbd>
+<samp class=pp>18</samp>
+<samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>
+<samp class=pp>'new black.'</samp></pre>
+<ol>
+<li>The <code>io</code> module defines the <code>StringIO</code> class that you can use to treat a string in memory as a file.
+<li>To create a stream object out of a string, create an instance of the <code>io.StringIO()</code> class and pass it the string you want to use as your &#8220;file&#8221; data. Now you have a stream object, and you can do all sorts of stream-like things with it.
+<li>Calling the <code>read()</code> method &#8220;reads&#8221; the entire &#8220;file,&#8221; which in the case of a <code>StringIO</code> object simply returns the original string.
+<li>Just like a real file, calling the <code>read()</code> method again returns an empty string.
+<li>You can explicitly seek to the beginning of the string, just like seeking through a real file, by using the <code>seek()</code> method of the <code>StringIO</code> object.
+<li>You can also read the string in chunks, by passing a <var>size</var> parameter to the <code>read()</code> method.
+</ol>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span><code>io.StringIO</code> lets you treat a string as a text file. There&#8217;s also a <code>io.BytesIO</code> class, which lets you treat a byte array as a binary file.
+</blockquote>
+
+<h3 id=gzip>Handling Compressed Files</h3>
+
+<p>The Python standard library contains modules that support reading and writing compressed files. There are a number of different compression schemes; the two most popular on non-Windows systems are <a href=http://docs.python.org/3.1/library/gzip.html>gzip</a> and <a href=http://docs.python.org/3.1/library/bz2.html>bzip2</a>. (You may have also encountered <a href=http://docs.python.org/3.1/library/zipfile.html>PKZIP archives</a> and <a href=http://docs.python.org/3.1/library/tarfile.html>GNU Tar archives</a>. Python has modules for those, too.)
+
+<p>The <code>gzip</code> module lets you create a stream object for reading or writing a gzip-compressed file. The stream object it gives you supports the <code>read()</code> method (if you opened it for reading) or the <code>write()</code> method (if you opened it for writing). That means you can use the methods you&#8217;ve already learned for regular files to <em>directly read or write a gzip-compressed file</em>, without creating a temporary file to store the decompressed data.
+
+<p>As an added bonus, it supports the <code>with</code> statement too, so you can let Python automatically close your gzip-compressed file when you&#8217;re done with it.
+
+<pre class='nd screen'>
+<samp class=p>you@localhost:~$ </samp><kbd>python3</kbd>
+
+<samp class=p>>>> </samp><kbd class=pp>import gzip</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>with gzip.open('out.log.gz', mode='wb') as z_file:</kbd>                                      <span class=u>&#x2460;</span></a>
+<samp class=p>... </samp><kbd class=pp>  z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))</kbd>
+<samp class=p>... </samp>
+<samp class=p>>>> </samp><kbd class=pp>exit()</kbd>
+
+<a><samp class=p>you@localhost:~$ </samp><kbd>ls -l out.log.gz</kbd>                                                           <span class=u>&#x2461;</span></a>
+<samp>-rw-r--r--  1 mark mark    79 2009-07-19 14:29 out.log.gz</samp>
+<a><samp class=p>you@localhost:~$ </samp><kbd>gunzip out.log.gz</kbd>                                                          <span class=u>&#x2462;</span></a>
+<a><samp class=p>you@localhost:~$ </samp><kbd>cat out.log</kbd>                                                                <span class=u>&#x2463;</span></a>
+<samp>A nine mile walk is no joke, especially in the rain.</samp></pre>
+<ol>
+<li>You should always open gzipped files in binary mode. (Note the <code>'b'</code> character in the <code>mode</code> argument.)
+<li>I constructed this example on Linux. If you&#8217;re not familiar with the command line, this command is showing the &#8220;long listing&#8221; of the gzip-compressed file you just created in the Python Shell. This listing shows that the file exists (good), and that it is 79 bytes long. That&#8217;s actually larger than the string you started with! The gzip file format includes a fixed-length header that contains some metadata about the file, so it&#8217;s inefficient for extremely small files.
+<li>The <code>gunzip</code> command (pronounced &#8220;gee-unzip&#8221;) decompresses the file and stores the contents in a new file named the same as the compressed file but without the <code>.gz</code> file extension.
+<li>The <code>cat</code> command displays the contents of a file. This file contains the string you originally wrote directly to the compressed file <code>out.log.gz</code> from within the Python Shell.
+</ol>
+
+<blockquote class=pf>
+<p>Did you get this error?
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>with gzip.open('out.log.gz', mode='wb') as z_file:</kbd>
+<samp class=p>... </samp><kbd class=pp>        z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))</kbd>
+<samp class=p>... </samp>
+<samp class=traceback>Traceback (most recent call last):
+ File "&lt;stdin>", line 1, in &lt;module>
+AttributeError: 'GzipFile' object has no attribute '__exit__'</samp></pre>
+<p>If so, you&#8217;re probably using Python 3.0. You should really upgrade to Python 3.1.
+<p>Python 3.0 had a <code>gzip</code> module, but it did not support using a gzipped-file object as a context manager. Python 3.1 added the ability to use gzipped-file objects in a <code>with</code> statement.
+</blockquote>
+
+<p class=a>&#x2042;
+
+<h2 id=stdio>Standard Input, Output, and Error</h2>
+
+<aside><code>sys.stdin</code>, <code>sys.stdout</code>, <code>sys.stderr</code>.</aside>
+
+<p>Command-line gurus are already familiar with the concept of standard input, standard output, and standard error. This section is for the rest of you.
+
+<p>Standard output and standard error (commonly abbreviated <code>stdout</code> and <code>stderr</code>) are pipes that are built into every <abbr>UNIX</abbr>-like system, including Mac OS X and Linux. When you call the <code>print()</code> function, the thing you&#8217;re printing is sent to the <code>stdout</code> pipe. When your program crashes and prints out a traceback, it goes to the <code>stderr</code> pipe. By default, both of these pipes are just connected to the terminal window where you are working; when your program prints something, you see the output in your terminal window, and when a program crashes, you see the traceback in your terminal window too. In the graphical Python Shell, the <code>stdout</code> and <code>stderr</code> pipes default to your &#8220;Interactive Window&#8221;.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>for i in range(3):</kbd>
+<a><samp class=p>... </samp><kbd class=pp>    print('PapayaWhip')</kbd>        <span class=u>&#x2460;</span></a>
+<samp>PapayaWhip
+PapayaWhip
+PapayaWhip</samp>
+<samp class=p>>>> </samp><kbd class=pp>import sys</kbd>
+<samp class=p>>>> </samp><kbd class=pp>for i in range(3):</kbd>
+<a><samp class=p>... </samp><kbd class=pp>sys.stdout.write('is the')</kbd>     <span class=u>&#x2461;</span></a>
+<samp>is theis theis the</samp>
+<samp class=p>>>> </samp><kbd class=pp>for i in range(3):</kbd>
+<a><samp class=p>... </samp><kbd class=pp>sys.stderr.write('new black')</kbd>  <span class=u>&#x2462;</span></a>
+<samp>new blacknew blacknew black</samp></pre>
+<ol>
+<li>The <code>print()</code> function, in a loop. Nothing surprising here.
+<li><code>stdout</code> is defined in the <code>sys</code> module, and it is a <a href=#file-like-objects>stream object</a>. Calling its <code>write()</code> function will print out whatever string you give it. In fact, this is what the <code>print</code> function really does; it adds a carriage return to the end of the string you&#8217;re printing, and calls <code>sys.stdout.write</code>.
+<li>In the simplest case, <code>sys.stdout</code> and <code>sys.stderr</code> send their output to the same place: the Python <abbr>IDE</abbr> (if you&#8217;re in one), or the terminal (if you&#8217;re running Python from the command line). Like standard output, standard error does not add carriage returns for you. If you want carriage returns, you&#8217;ll need to write carriage return characters.
+</ol>
+
+<p><code>sys.stdout</code> and <code>sys.stderr</code> are stream objects, but they are write-only. Attempting to call their <code>read()</code> method will always raise an <code>IOError</code>.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>import sys</kbd>
+<samp class=p>>>> </samp><kbd class=pp>sys.stdout.read()</kbd>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in &lt;module>
+IOError: not readable</samp></pre>
+
+<h3 id=redirect>Redirecting Standard Output</h3>
+
+<p><code>sys.stdout</code> and <code>sys.stderr</code> are stream objects, albeit ones that only support writing. But they&#8217;re not constants; they&#8217;re variables. That means you can assign them a new value&nbsp;&mdash;&nbsp;any other stream object&nbsp;&mdash;&nbsp;to redirect their output.
+
+<p class=d>[<a href=examples/stdout.py>download <code>stdout.py</code></a>]
+<pre class=pp><code>import sys
+
+class RedirectStdoutTo:
+    def __init__(self, out_new):
+        self.out_new = out_new
+
+    def __enter__(self):
+        self.out_old = sys.stdout
+        sys.stdout = self.out_new
+
+    def __exit__(self, *args):
+        sys.stdout = self.out_old
+
+print('A')
+with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):
+    print('B')
+print('C')</code></pre>
+
+<p>Check this out:
+
+<pre class='nd screen'>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 stdout.py</kbd>
+<samp>A
+C</samp>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>cat out.log</kbd>
+<samp>B</samp></pre>
+
+<blockquote class=pf>
+<p>Did you get this error?
+<pre class='nd screen'>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd class=pp>python3 stdout.py</kbd>
+<samp class=traceback>  File "stdout.py", line 15
+    with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):
+                                                              ^
+SyntaxError: invalid syntax</samp></pre>
+<p>If so, you&#8217;re probably using Python 3.0. You should really upgrade to Python 3.1.
+<p>Python 3.0 supported the <code>with</code> statement, but each statement can only use one context manager. Python 3.1 allows you to chain multiple context managers in a single <code>with</code> statement.
+</blockquote>
+
+<p>Let&#8217;s take the last part first.
+
+<pre class=pp><code>print('A')
+with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):
+    print('B')
+print('C')</code></pre>
+
+<p>That&#8217;s a complicated <code>with</code> statement. Let me rewrite it as something more recognizable.
+
+<pre class=pp><code>with open('out.log', mode='w', encoding='utf-8') as a_file:
+    with RedirectStdoutTo(a_file):
+        print('B')</code></pre>
+
+<p>As the rewrite shows, you actually have <em>two</em> <code>with</code> statements, one nested within the scope of the other. The &#8220;outer&#8221; <code>with</code> statement should be familiar by now: it opens a <abbr>UTF-8</abbr>-encoded text file named <code>out.log</code> for writing and assigns the stream object to a variable named <var>a_file</var>. But that&#8217;s not the only thing odd here. 
+<pre class='nd pp'><code>with RedirectStdoutTo(a_file):</code></pre>
+
+<p>Where&#8217;s the <code>as</code> clause? The <code>with</code> statement doesn&#8217;t actually require one. Just like you can call a function and ignore its return value, you can have a <code>with</code> statement that doesn&#8217;t assign the <code>with</code> context to a variable. In this case, you&#8217;re only interested in the side effects of the <code>RedirectStdoutTo</code> context.
+
+<p>What are those side effects? Take a look inside the <code>RedirectStdoutTo</code> class. This class is a custom <a href=special-method-names.html#context-managers>context manager</a>. Any class can be a context manager by defining two <a href=iterators.html#a-fibonacci-iterator>special methods</a>: <code>__enter__()</code> and <code>__exit__()</code>.
+
+<pre class=pp><code>class RedirectStdoutTo:
+<a>    def __init__(self, out_new):    <span class=u>&#x2460;</span></a>
+        self.out_new = out_new
+
+<a>    def __enter__(self):            <span class=u>&#x2461;</span></a>
+        self.out_old = sys.stdout
+        sys.stdout = self.out_new
+
+<a>    def __exit__(self, *args):      <span class=u>&#x2462;</span></a>
+        sys.stdout = self.out_old</code></pre>
+<ol>
+<li>The <code>__init__()</code> method is called immediately after an instance is created. It takes one parameter, the stream object that you want to use as standard output for the life of the context. This method just saves the stream object in an instance variable so other methods can use it later.
+<li>The <code>__enter__()</code> method is a <a href=iterators.html#a-fibonacci-iterator>special class method</a>; Python calls it when entering a context (<i>i.e.</i> at the beginning of the <code>with</code> statement). This method saves the current value of <code>sys.stdout</code> in <var>self.out_old</var>, then redirects standard output by assigning <var>self.out_new</var> to <var>sys.stdout</var>.
+<li>The <code>__exit__()</code> method is another special class method; Python calls it when exiting the context (<i>i.e.</i> at the end of the <code>with</code> statement). This method restores standard output to its original value by assigning the saved <var>self.out_old</var> value to <var>sys.stdout</var>.
+</ol>
+
+<p>Putting it all together:
+
+<pre class=pp><code>
+<a>print('A')                                                                             <span class=u>&#x2460;</span></a>
+<a>with open('out.log', mode='w', encoding='utf-8') as a_file, RedirectStdoutTo(a_file):  <span class=u>&#x2461;</span></a>
+<a>    print('B')                                                                         <span class=u>&#x2462;</span></a>
+<a>print('C')                                                                             <span class=u>&#x2463;</span></a></code></pre>
+<ol>
+<li>This will print to the <abbr>IDE</abbr> &#8220;Interactive Window&#8221; (or the terminal, if running the script from the command line).
+<li>This <a href=#with><code>with</code> statement</a> takes <em>a comma-separated list of contexts</em>. The comma-separated list acts like a series of nested <code>with</code> blocks. The first context listed is the &#8220;outer&#8221; block; the last one listed is the &#8220;inner&#8221; block. The first context opens a file; the second context redirects <code>sys.stdout</code> to the stream object that was created in the first context.
+<li>Because this <code>print()</code> function is executed with the context created by the <code>with</code> statement, it will not print to the screen; it will write to the file <code>out.log</code>.
+<li>The <code>with</code> code block is over. Python has told each context manager to do whatever it is they do upon exiting a context. The context managers form a last-in-first-out stack. Upon exiting, the second context changed <code>sys.stdout</code> back to its original value, then the first context closed the file named <code>out.log</code>. Since standard output has been restored to its original value, calling the <code>print()</code> function will once again print to the screen.
+</ol>
+
+<p>Redirecting standard error works exactly the same way, using <code>sys.stderr</code> instead of <code>sys.stdout</code>.
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+
+<ul>
+<li><a href=http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files>Reading and writing files</a> in the Python.org tutorial
+<li><a href=http://docs.python.org/3.1/library/io.html><code>io</code> module</a>
+<li><a href=http://docs.python.org/3.1/library/stdtypes.html#file-objects>Stream objects</a>
+<li><a href=http://docs.python.org/3.1/library/stdtypes.html#context-manager-types>Context manager types</a>
+<li><a href=http://docs.python.org/3.1/library/sys.html#sys.stdout><code>sys.stdout</code> and <code>sys.stderr</code></a>
+<li><a href=http://en.wikipedia.org/wiki/Filesystem_in_Userspace><abbr>FUSE</abbr> on Wikipedia</a>
+</ul>
+
+<p class=v><a href=refactoring.html rel=prev title='back to &#8220;Refactoring&#8221;'><span class=u>&#x261C;</span></a> <a href=xml.html rel=next title='onward to &#8220;XML&#8221;'><span class=u>&#x261E;</span></a>
+
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/generators.html b/generators.html
index 1f965fe..1b01e47 100755
--- a/generators.html
+++ b/generators.html
@@ -1,418 +1,418 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Closures &amp; Generators - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 6}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#generators>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
-<h1>Closures <i class=baa>&amp;</i> Generators</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> My spelling is Wobbly. It&#8217;s good spelling but it Wobbles, and the letters get in the wrong places. <span class=u>&#x275E;</span><br>&mdash; Winnie-the-Pooh
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>Having grown up the son of a librarian and an English major, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin (to name a few). Actually, &#8220;borrows&#8221; is the wrong word; &#8220;pillages&#8221; is more like it. Or perhaps &#8220;assimilates&#8221;&nbsp;&mdash;&nbsp;like the Borg. Yes, I like that.
-<p class=c><code>We are the Borg. Your linguistic and etymological distinctiveness will be added to our own. Resistance is futile.</code>
-<p>In this chapter, you&#8217;re going to learn about plural nouns. Also, functions that return other functions, advanced regular expressions, and generators. But first, let&#8217;s talk about how to make plural nouns. (If you haven&#8217;t read <a href=regular-expressions.html>the chapter on regular expressions</a>, now would be a good time. This chapter assumes you understand the basics of regular expressions, and it quickly descends into more advanced uses.)
-<p>If you grew up in an English-speaking country or learned English in a formal school setting, you&#8217;re probably familiar with the basic rules:
-<ul>
-<li>If a word ends in S, X, or Z, add ES. <i>Bass</i> becomes <i>basses</i>, <i>fax</i> becomes <i>faxes</i>, and <i>waltz</i> becomes <i>waltzes</i>.
-<li>If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What&#8217;s a noisy H? One that gets combined with other letters to make a sound that you can hear. So <i>coach</i> becomes <i>coaches</i> and <i>rash</i> becomes <i>rashes</i>, because you can hear the CH and SH sounds when you say them. But <i>cheetah</i> becomes <i>cheetahs</i>, because the H is silent.
-<li>If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. So <i>vacancy</i> becomes <i>vacancies</i>, but <i>day</i> becomes <i>days</i>.
-<li>If all else fails, just add S and hope for the best.
-</ul>
-<p>(I know, there are a lot of exceptions. <i>Man</i> becomes <i>men</i> and <i>woman</i> becomes <i>women</i>, but <i>human</i> becomes <i>humans</i>. <i>Mouse</i> becomes <i>mice</i> and <i>louse</i> becomes <i>lice</i>, but <i>house</i> becomes <i>houses</i>. <i>Knife</i> becomes <i>knives</i> and <i>wife</i> becomes <i>wives</i>, but <i>lowlife</i> becomes <i>lowlifes</i>. And don&#8217;t even get me started on words that are their own plural, like <i>sheep</i>, <i>deer</i>, and <i>haiku</i>.)
-<p>Other languages, of course, are completely different.
-<p>Let&#8217;s design a Python library that automatically pluralizes English nouns. We&#8217;ll start with just these four rules, but keep in mind that you&#8217;ll inevitably need to add more.
-<p class=a>&#x2042;
-
-<h2 id=i-know>I Know, Let&#8217;s Use Regular Expressions!</h2>
-<p>So you&#8217;re looking at words, which, at least in English, means you&#8217;re looking at strings of characters. You have rules that say you need to find different combinations of characters, then do different things to them. This sounds like a job for regular expressions!
-<p class=d>[<a href=examples/plural1.py>download <code>plural1.py</code></a>]
-<pre class=pp><code>import re
-
-def plural(noun):          
-<a>    if re.search('[sxz]$', noun):             <span class=u>&#x2460;</span></a>
-<a>        return re.sub('$', 'es', noun)        <span class=u>&#x2461;</span></a>
-    elif re.search('[^aeioudgkprt]h$', noun):
-        return re.sub('$', 'es', noun)       
-    elif re.search('[^aeiou]y$', noun):      
-        return re.sub('y$', 'ies', noun)     
-    else:
-        return noun + 's'</code></pre>
-<ol>
-<li>This is a regular expression, but it uses a syntax you didn&#8217;t see in <a href=regular-expressions.html><i>Regular Expressions</i></a>. The square brackets mean &#8220;match exactly one of these characters.&#8221; So <code>[sxz]</code> means &#8220;<code>s</code>, or <code>x</code>, or <code>z</code>&#8221;, but only one of them. The <code>$</code> should be familiar; it matches the end of string. Combined, this regular expression tests whether <var>noun</var> ends with <code>s</code>, <code>x</code>, or <code>z</code>.
-<li>This <code>re.sub()</code> function performs regular expression-based string substitutions.
-</ol>
-
-<p>Let&#8217;s look at regular expression substitutions in more detail.
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import re</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>re.search('[abc]', 'Mark')</kbd>    <span class=u>&#x2460;</span></a>
-&lt;_sre.SRE_Match object at 0x001C1FA8>
-<a><samp class=p>>>> </samp><kbd class=pp>re.sub('[abc]', 'o', 'Mark')</kbd>  <span class=u>&#x2461;</span></a>
-<samp class=pp>'Mork'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>re.sub('[abc]', 'o', 'rock')</kbd>  <span class=u>&#x2462;</span></a>
-<samp class=pp>'rook'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>re.sub('[abc]', 'o', 'caps')</kbd>  <span class=u>&#x2463;</span></a>
-<samp class=pp>'oops'</samp></pre>
-<ol>
-<li>Does the string <code>Mark</code> contain <code>a</code>, <code>b</code>, or <code>c</code>? Yes, it contains <code>a</code>.
-<li>OK, now find <code>a</code>, <code>b</code>, or <code>c</code>, and replace it with <code>o</code>. <code>Mark</code> becomes <code>Mork</code>.
-<li>The same function turns <code>rock</code> into <code>rook</code>.
-<li>You might think this would turn <code>caps</code> into <code>oaps</code>, but it doesn&#8217;t. <code>re.sub</code> replaces <em>all</em> of the matches, not just the first one. So this regular expression turns <code>caps</code> into <code>oops</code>, because both the <code>c</code> and the <code>a</code> get turned into <code>o</code>.
-</ol>
-
-<p>And now, back to the <code>plural()</code> function&hellip;
-
-<pre class=pp><code>def plural(noun):          
-    if re.search('[sxz]$', noun):            
-<a>        return re.sub('$', 'es', noun)         <span class=u>&#x2460;</span></a>
-<a>    elif re.search('[^aeioudgkprt]h$', noun):  <span class=u>&#x2461;</span></a>
-        return re.sub('$', 'es', noun)
-<a>    elif re.search('[^aeiou]y$', noun):        <span class=u>&#x2462;</span></a>
-        return re.sub('y$', 'ies', noun)     
-    else:
-        return noun + 's'</code></pre>
-<ol>
-<li>Here, you&#8217;re replacing the end of the string (matched by <code>$</code>) with the string <code>es</code>. In other words, adding <code>es</code> to the string. You could accomplish the same thing with string concatenation, for example <code>noun + 'es'</code>, but I chose to use regular expressions for each rule, for reasons that will become clear later in the chapter.
-<li>Look closely, this is another new variation. The <code>^</code> as the first character inside the square brackets means something special: negation. <code>[^abc]</code> means &#8220;any single character <em>except</em> <code>a</code>, <code>b</code>, or <code>c</code>&#8221;. So <code>[^aeioudgkprt]</code> means any character except <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, <code>u</code>, <code>d</code>, <code>g</code>, <code>k</code>, <code>p</code>, <code>r</code>, or <code>t</code>. Then that character needs to be followed by <code>h</code>, followed by end of string. You&#8217;re looking for words that end in H where the H can be heard.
-<li>Same pattern here: match words that end in Y, where the character before the Y is <em>not</em> <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>. You&#8217;re looking for words that end in Y that sounds like I.
-</ol>
-
-<p>Let&#8217;s look at negation regular expressions in more detail.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import re</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'vacancy')</kbd>  <span class=u>&#x2460;</span></a>
-&lt;_sre.SRE_Match object at 0x001C1FA8>
-<a><samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'boy')</kbd>      <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp>
-<samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'day')</kbd>
-<samp class=p>>>> </samp>
-<a><samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'pita')</kbd>     <span class=u>&#x2462;</span></a>
-<samp class=p>>>> </samp></pre>
-<ol>
-<li><code>vacancy</code> matches this regular expression, because it ends in <code>cy</code>, and <code>c</code> is not <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>.
-<li><code>boy</code> does not match, because it ends in <code>oy</code>, and you specifically said that the character before the <code>y</code> could not be <code>o</code>. <code>day</code> does not match, because it ends in <code>ay</code>.
-<li><code>pita</code> does not match, because it does not end in <code>y</code>.
-</ol>
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>re.sub('y$', 'ies', 'vacancy')</kbd>               <span class=u>&#x2460;</span></a>
-<samp class=pp>'vacancies'</samp>
-<samp class=p>>>> </samp><kbd class=pp>re.sub('y$', 'ies', 'agency')</kbd>
-<samp class=pp>'agencies'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>re.sub('([^aeiou])y$', r'\1ies', 'vacancy')</kbd>  <span class=u>&#x2461;</span></a>
-<samp class=pp>'vacancies'</samp></pre>
-<ol>
-<li>This regular expression turns <code>vacancy</code> into <code>vacancies</code> and <code>agency</code> into <code>agencies</code>, which is what you wanted. Note that it would also turn <code>boy</code> into <code>boies</code>, but that will never happen in the function because you did that <code>re.search</code> first to find out whether you should do this <code>re.sub</code>.
-<li>Just in passing, I want to point out that it is possible to combine these two regular expressions (one to find out if the rule applies, and another to actually apply it) into a single regular expression. Here&#8217;s what that would look like. Most of it should look familiar: you&#8217;re using a remembered group, which you learned in <a href=regular-expressions.html#phonenumbers>Case study: Parsing Phone Numbers</a>. The group is used to remember the character before the letter <code>y</code>. Then in the substitution string, you use a new syntax, <code>\1</code>, which means &#8220;hey, that first group you remembered? put it right here.&#8221; In this case, you remember the <code>c</code> before the <code>y</code>; when you do the substitution, you substitute <code>c</code> in place of <code>c</code>, and <code>ies</code> in place of <code>y</code>. (If you have more than one remembered group, you can use <code>\2</code> and <code>\3</code> and so on.)
-</ol>
-<p>Regular expression substitutions are extremely powerful, and the <code>\1</code> syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder to read, and it doesn&#8217;t directly map to the way you first described the pluralizing rules. You originally laid out rules like &#8220;if the word ends in S, X, or Z, then add ES&#8221;. If you look at this function, you have two lines of code that say &#8220;if the word ends in S, X, or Z, then add ES&#8221;. It doesn&#8217;t get much more direct than that.
-
-<p class=a>&#x2042;
-
-<h2 id=a-list-of-functions>A List Of Functions</h2>
-
-<p>Now you&#8217;re going to add a level of abstraction. You started by defining a list of rules: if this, do that, otherwise go to the next rule. Let&#8217;s temporarily complicate part of the program so you can simplify another part.
-
-<p class=d>[<a href=examples/plural2.py>download <code>plural2.py</code></a>]
-<pre class=pp><code>import re
-
-def match_sxz(noun):
-    return re.search('[sxz]$', noun)
-
-def apply_sxz(noun):
-    return re.sub('$', 'es', noun)
-
-def match_h(noun):
-    return re.search('[^aeioudgkprt]h$', noun)
-
-def apply_h(noun):
-    return re.sub('$', 'es', noun)
-
-<a>def match_y(noun):                             <span class=u>&#x2460;</span></a>
-    return re.search('[^aeiou]y$', noun)
-        
-<a>def apply_y(noun):                             <span class=u>&#x2461;</span></a>
-    return re.sub('y$', 'ies', noun)
-
-def match_default(noun):
-    return True
-
-def apply_default(noun):
-    return noun + 's'
-
-<a>rules = ((match_sxz, apply_sxz),               <span class=u>&#x2462;</span></a>
-         (match_h, apply_h),
-         (match_y, apply_y),
-         (match_default, apply_default)
-         )
-
-def plural(noun):           
-<a>    for matches_rule, apply_rule in rules:       <span class=u>&#x2463;</span></a>
-        if matches_rule(noun):
-            return apply_rule(noun)</code></pre>
-<ol>
-<li>Now, each match rule is its own function which returns the results of calling the <code>re.search()</code> function.
-<li>Each apply rule is also its own function which calls the <code>re.sub()</code> function to apply the appropriate pluralization rule.
-<li>Instead of having one function (<code>plural()</code>) with multiple rules, you have the <code>rules</code> data structure, which is a sequence of pairs of functions.
-<li>Since the rules have been broken out into a separate data structure, the new <code>plural()</code> function can be reduced to a few lines of code. Using a <code>for</code> loop, you can pull out the match and apply rules two at a time (one match, one apply) from the <var>rules</var> structure. On the first iteration of the <code>for</code> loop, <var>matches_rule</var> will get <code>match_sxz</code>, and <var>apply_rule</var> will get <code>apply_sxz</code>. On the second iteration (assuming you get that far), <var>matches_rule</var> will be assigned <code>match_h</code>, and <var>apply_rule</var> will be assigned <code>apply_h</code>. The function is guaranteed to return something eventually, because the final match rule (<code>match_default</code>) simply returns <code>True</code>, meaning the corresponding apply rule (<code>apply_default</code>) will always be applied.
-</ol>
-
-<aside>The &#8220;rules&#8221; variable is a sequence of pairs of functions.</aside>
-<p>The reason this technique works is that <a href=your-first-python-program.html#everythingisanobject>everything in Python is an object</a>, including functions. The <var>rules</var> data structure contains functions&nbsp;&mdash;&nbsp;not names of functions, but actual function objects. When they get assigned in the <code>for</code> loop, then <var>matches_rule</var> and <var>apply_rule</var> are actual functions that you can call. On the first iteration of the <code>for</code> loop, this is equivalent to calling <code>matches_sxz(noun)</code>, and if it returns a match, calling <code>apply_sxz(noun)</code>.
-
-<p>If this additional level of abstraction is confusing, try unrolling the function to see the equivalence. The entire <code>for</code> loop is equivalent to the following:
-
-<pre class='nd pp'><code>
-def plural(noun):
-    if match_sxz(noun):
-        return apply_sxz(noun)
-    if match_h(noun):
-        return apply_h(noun)
-    if match_y(noun):
-        return apply_y(noun)
-    if match_default(noun):
-        return apply_default(noun)</code></pre>
-
-<p>The benefit here is that the <code>plural()</code> function is now simplified. It takes a sequence of rules, defined elsewhere, and iterates through them in a generic fashion.
-
-<ol>
-<li>Get a match rule
-<li>Does it match? Then call the apply rule and return the result.
-<li>No match? Go to step 1.
-</ol>
-
-<p>The rules could be defined anywhere, in any way. The <code>plural()</code> function doesn&#8217;t care.
-
-<p>Now, was adding this level of abstraction worth it? Well, not yet. Let&#8217;s consider what it would take to add a new rule to the function. In the first example, it would require adding an <code>if</code> statement to the <code>plural()</code> function. In this second example, it would require adding two functions, <code>match_foo()</code> and <code>apply_foo()</code>, and then updating the <var>rules</var> sequence to specify where in the order the new match and apply functions should be called relative to the other rules.
-
-<p>But this is really just a stepping stone to the next section. Let&#8217;s move on&hellip;
-
-<p class=a>&#x2042;
-
-<h2 id=a-list-of-patterns>A List Of Patterns</h2>
-
-<p>Defining separate named functions for each match and apply rule isn&#8217;t really necessary. You never call them directly; you add them to the <var>rules</var> sequence and call them through there. Furthermore, each function follows one of two patterns. All the match functions call <code>re.search()</code>, and all the apply functions call <code>re.sub()</code>. Let&#8217;s factor out the patterns so that defining new rules can be easier.
-
-<p class=d>[<a href=examples/plural3.py>download <code>plural3.py</code></a>]
-<pre class=pp><code>import re
-
-def build_match_and_apply_functions(pattern, search, replace):
-<a>    def matches_rule(word):                                     <span class=u>&#x2460;</span></a>
-        return re.search(pattern, word)
-<a>    def apply_rule(word):                                       <span class=u>&#x2461;</span></a>
-        return re.sub(search, replace, word)
-<a>    return (matches_rule, apply_rule)                           <span class=u>&#x2462;</span></a></code></pre>
-<ol>
-<li><code>build_match_and_apply_functions()</code> is a function that builds other functions dynamically. It takes <var>pattern</var>, <var>search</var> and <var>replace</var>, then defines a <code>matches_rule()</code> function which calls <code>re.search()</code> with the <var>pattern</var> that was passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>matches_rule()</code> function you&#8217;re building. Whoa.
-<li>Building the apply function works the same way. The apply function is a function that takes one parameter, and calls <code>re.sub()</code> with the <var>search</var> and <var>replace</var> parameters that were passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>apply_rule()</code> function you&#8217;re building. This technique of using the values of outside parameters within a dynamic function is called <em>closures</em>. You&#8217;re essentially defining constants within the apply function you&#8217;re building: it takes one parameter (<var>word</var>), but it then acts on that plus two other values (<var>search</var> and <var>replace</var>) which were set when you defined the apply function.
-<li>Finally, the <code>build_match_and_apply_functions()</code> function returns a tuple of two values: the two functions you just created. The constants you defined within those functions (<var>pattern</var> within the <code>matches_rule()</code> function, and <var>search</var> and <var>replace</var> within the <code>apply_rule()</code> function) stay with those functions, even after you return from <code>build_match_and_apply_functions()</code>. That&#8217;s insanely cool.
-</ol>
-
-<p>If this is incredibly confusing (and it should be, this is weird stuff), it may become clearer when you see how to use it.
-
-<pre class=pp><code><a>patterns = \                                                        <span class=u>&#x2460;</span></a>
-  (
-    ('[sxz]$',           '$',  'es'),
-    ('[^aeioudgkprt]h$', '$',  'es'),
-    ('(qu|[^aeiou])y$',  'y$', 'ies'),
-<a>    ('$',                '$',  's')                                 <span class=u>&#x2461;</span></a>
-  )
-<a>rules = [build_match_and_apply_functions(pattern, search, replace)  <span class=u>&#x2462;</span></a>
-         for (pattern, search, replace) in patterns]</code></pre>
-<ol>
-<li>Our pluralization &#8220;rules&#8221; are now defined as a tuple of tuples of <em>strings</em> (not functions). The first string in each group is the regular expression pattern that you would use in <code>re.search()</code> to see if this rule matches. The second and third strings in each group are the search and replace expressions you would use in <code>re.sub()</code> to actually apply the rule to turn a noun into its plural.
-<li>There&#8217;s a slight change here, in the fallback rule. In the previous example, the <code>match_default()</code> function simply returned <code>True</code>, meaning that if none of the more specific rules matched, the code would simply add an <code>s</code> to the end of the given word. This example does something functionally equivalent. The final regular expression asks whether the word has an end (<code>$</code> matches the end of a string). Of course, every string has an end, even an empty string, so this expression always matches. Thus, it serves the same purpose as the <code>match_default()</code> function that always returned <code>True</code>: it ensures that if no more specific rule matches, the code adds an <code>s</code> to the end of the given word.
-<li>This line is magic. It takes the sequence of strings in <var>patterns</var> and turns them into a sequence of functions. How? By &#8220;mapping&#8221; the strings to the <code>build_match_and_apply_functions()</code> function. That is, it takes each triplet of strings and calls the <code>build_match_and_apply_functions()</code> function with those three strings as arguments. The <code>build_match_and_apply_functions()</code> function returns a tuple of two functions. This means that <var>rules</var> ends up being functionally equivalent to the previous example: a list of tuples, where each tuple is a pair of functions. The first function is the match function that calls <code>re.search()</code>, and the second function is the apply function that calls <code>re.sub()</code>.
-</ol>
-
-<p>Rounding out this version of the script is the main entry point, the <code>plural()</code> function.
-
-<pre class=pp><code>def plural(noun):
-<a>    for matches_rule, apply_rule in rules:  <span class=u>&#x2460;</span></a>
-        if matches_rule(noun):
-            return apply_rule(noun)</code></pre>
-<ol>
-<li>Since the <var>rules</var> list is the same as the previous example (really, it is), it should come as no surprise that the <code>plural()</code> function hasn&#8217;t changed at all. It&#8217;s completely generic; it takes a list of rule functions and calls them in order. It doesn&#8217;t care how the rules are defined. In the previous example, they were defined as separate named functions. Now they are built dynamically by mapping the output of the <code>build_match_and_apply_functions()</code> function onto a list of raw strings. It doesn&#8217;t matter; the <code>plural()</code> function still works the same way.
-</ol>
-
-<p class=a>&#x2042;
-
-<h2 id=a-file-of-patterns>A File Of Patterns</h2>
-
-<p>You&#8217;ve factored out all the duplicate code and added enough abstractions so that the pluralization rules are defined in a list of strings. The next logical step is to take these strings and put them in a separate file, where they can be maintained separately from the code that uses them.
-
-<p>First, let&#8217;s create a text file that contains the rules you want. No fancy data structures, just whitespace-delimited strings in three columns. Let&#8217;s call it <code>plural4-rules.txt</code>.
-
-<p class=d>[<a href=examples/plural4-rules.txt>download <code>plural4-rules.txt</code></a>]
-<pre class='nd pp'><code>[sxz]$               $    es
-[^aeioudgkprt]h$     $    es
-[^aeiou]y$          y$    ies
-$                    $    s</code></pre>
-
-<p>Now let&#8217;s see how you can use this rules file.
-
-<p class=d>[<a href=examples/plural4.py>download <code>plural4.py</code></a>]
-<pre class=pp><code>import re
-
-<a>def build_match_and_apply_functions(pattern, search, replace):  <span class=u>&#x2460;</span></a>
-    def matches_rule(word):
-        return re.search(pattern, word)
-    def apply_rule(word):
-        return re.sub(search, replace, word)
-    return (matches_rule, apply_rule)
-
-rules = []
-<a>with open('plural4-rules.txt', encoding='utf-8') as pattern_file:  <span class=u>&#x2461;</span></a>
-<a>    for line in pattern_file:                                      <span class=u>&#x2462;</span></a>
-<a>        pattern, search, replace = line.split(None, 3)             <span class=u>&#x2463;</span></a>
-<a>        rules.append(build_match_and_apply_functions(              <span class=u>&#x2464;</span></a>
-                pattern, search, replace))</code></pre>
-<ol>
-<li>The <code>build_match_and_apply_functions()</code> function has not changed. You&#8217;re still using closures to build two functions dynamically that use variables defined in the outer function.
-<li>The global <code>open()</code> function opens a file and returns a file object. In this case, the file we&#8217;re opening contains the pattern strings for pluralizing nouns. The <code>with</code> statement creates what&#8217;s called a <i>context</i>: when the <code>with</code> block ends, Python will automatically close the file, even if an exception is raised inside the <code>with</code> block. You&#8217;ll learn more about <code>with</code> blocks and file objects in the <a href=files.html>Files</a> chapter.
-<li>The <code>for line in &lt;fileobject></code> idiom reads data from the open file, one line at a time, and assigns the text to the <var>line</var> variable. You&#8217;ll learn more about reading from files in the <a href=files.html>Files</a> chapter.
-<li>Each line in the file really has three values, but they&#8217;re separated by whitespace (tabs or spaces, it makes no difference). To split it out, use the <code>split()</code> string method. The first argument to the <code>split()</code> method is <code>None</code>, which means &#8220;split on any whitespace (tabs or spaces, it makes no difference).&#8221; The second argument is <code>3</code>, which means &#8220;split on whitespace 3 times, then leave the rest of the line alone.&#8221; A line like <code>[sxz]$ $ es</code> will be broken up into the list <code>['[sxz]$', '$', 'es']</code>, which means that <var>pattern</var> will get <code>'[sxz]$'</code>, <var>search</var> will get <code>'$'</code>, and <var>replace</var> will get <code>'es'</code>. That&#8217;s a lot of power in one little line of code.
-<li>Finally, you pass <code>pattern</code>, <code>search</code>, and <code>replace</code> to the <code>build_match_and_apply_functions()</code> function, which returns a tuple of functions. You append this tuple to the <var>rules</var> list, and <var>rules</var> ends up storing the list of match and apply functions that the <code>plural()</code> function expects.
-</ol>
-
-<p>The improvement here is that you&#8217;ve completely separated the pluralization rules into an external file, so it can be maintained separately from the code that uses it. Code is code, data is data, and life is good.
-
-<p class=a>&#x2042;
-
-<h2 id=generators>Generators</h2>
-
-<p>Wouldn&#8217;t it be grand to have a generic <code>plural()</code> function that parses the rules file? Get rules, check for a match, apply appropriate transformation, go to next rule. That&#8217;s all the <code>plural()</code> function has to do, and that&#8217;s all the <code>plural()</code> function should do.
-
-<p class=d>[<a href=examples/plural5.py>download <code>plural5.py</code></a>]
-<pre class='nd pp'><code>def rules(rules_filename):
-    with open(rules_filename, encoding='utf-8') as pattern_file:
-        for line in pattern_file:
-            pattern, search, replace = line.split(None, 3)
-            yield build_match_and_apply_functions(pattern, search, replace)
-
-def plural(noun, rules_filename='plural5-rules.txt'):
-    for matches_rule, apply_rule in rules(rules_filename):
-        if matches_rule(noun):
-            return apply_rule(noun)
-    raise ValueError('no matching rule for {0}'.format(noun))</code></pre>
-
-<p>How the heck does <em>that</em> work? Let&#8217;s look at an interactive example first.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>def make_counter(x):</kbd>
-<samp class=p>... </samp><kbd class=pp>    print('entering make_counter')</kbd>
-<samp class=p>... </samp><kbd class=pp>    while True:</kbd>
-<a><samp class=p>... </samp><kbd class=pp>        yield x</kbd>                    <span class=u>&#x2460;</span></a>
-<samp class=p>... </samp><kbd class=pp>        print('incrementing x')</kbd>
-<samp class=p>... </samp><kbd class=pp>        x = x + 1</kbd>
-<samp class=p>... </samp>
-<a><samp class=p>>>> </samp><kbd class=pp>counter = make_counter(2)</kbd>          <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>counter</kbd>                            <span class=u>&#x2462;</span></a>
-&lt;generator object at 0x001C9C10>
-<a><samp class=p>>>> </samp><kbd class=pp>next(counter)</kbd>                      <span class=u>&#x2463;</span></a>
-<samp>entering make_counter
-2</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>next(counter)</kbd>                      <span class=u>&#x2464;</span></a>
-<samp>incrementing x
-3</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>next(counter)</kbd>                      <span class=u>&#x2465;</span></a>
-<samp>incrementing x
-4</samp></pre>
-<ol>
-<li>The presence of the <code>yield</code> keyword in <code>make_counter</code> means that this is not a normal function. It is a special kind of function which generates values one at a time. You can think of it as a resumable function. Calling it will return a <i>generator</i> that can be used to generate successive values of <var>x</var>.
-<li>To create an instance of the <code>make_counter</code> generator, just call it like any other function. Note that this does not actually execute the function code. You can tell this because the first line of the <code>make_counter()</code> function calls <code>print()</code>, but nothing has been printed yet.
-<li>The <code>make_counter()</code> function returns a generator object.
-<li>The <code>next()</code> function takes a generator object and returns its next value. The first time you call <code>next()</code> with the <var>counter</var> generator, it executes the code in <code>make_counter()</code> up to the first <code>yield</code> statement, then returns the value that was yielded. In this case, that will be <code>2</code>, because you originally created the generator by calling <code>make_counter(2)</code>.
-<li>Repeatedly calling <code>next()</code> with the same generator object resumes exactly where it left off and continues until it hits the next <code>yield</code> statement. All variables, local state, <i class=baa>&amp;</i>c. are saved on <code>yield</code> and restored on <code>next()</code>. The next line of code waiting to be executed calls <code>print()</code>, which prints <samp>incrementing x</samp>. After that, the statement <code>x = x + 1</code>. Then it loops through the <code>while</code> loop again, and the first thing it hits is the statement <code>yield x</code>, which saves the state of everything and returns the current value of <var>x</var> (now <code>3</code>).
-<li>The second time you call <code>next(counter)</code>, you do all the same things again, but this time <var>x</var> is now <code>4</code>.
-</ol>
-
-<p>Since <code>make_counter</code> sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing <var>x</var> and spitting out values. But let&#8217;s look at more productive uses of generators instead.
-
-<h3 id=a-fibonacci-generator>A Fibonacci Generator</h3>
-
-<aside>&#8220;yield&#8221; pauses a function. &#8220;next()&#8221; resumes where it left off.</aside>
-
-<p class=d>[<a href=examples/fibonacci.py>download <code>fibonacci.py</code></a>]
-<pre class=pp><code>def fib(max):
-<a>    a, b = 0, 1          <span class=u>&#x2460;</span></a>
-    while a &lt; max:
-<a>        yield a          <span class=u>&#x2461;</span></a>
-<a>        a, b = b, a + b  <span class=u>&#x2462;</span></a></code></pre>
-<ol>
-<li>The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with 0 and <code>1</code>, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: <var>a</var> starts at 0, and <var>b</var> starts at <code>1</code>.
-<li><var>a</var> is the current number in the sequence, so yield it.
-<li><var>b</var> is the next number in the sequence, so assign that to <var>a</var>, but also calculate the next value (<code>a + b</code>) and assign that to <var>b</var> for later use. Note that this happens in parallel; if <var>a</var> is <code>3</code> and <var>b</var> is <code>5</code>, then <code>a, b = b, a + b</code> will set <var>a</var> to <code>5</code> (the previous value of <var>b</var>) and <var>b</var> to <code>8</code> (the sum of the previous values of <var>a</var> and <var>b</var>).
-</ol>
-
-<p>So you have a function that spits out successive Fibonacci numbers. Sure, you could do that with recursion, but this way is easier to read. Also, it works well with <code>for</code> loops.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>from fibonacci import fib</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>for n in fib(1000):</kbd>      <span class=u>&#x2460;</span></a>
-<a><samp class=p>... </samp><kbd class=pp>    print(n, end=' ')</kbd>    <span class=u>&#x2461;</span></a>
-<samp class=pp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>list(fib(1000))</kbd>          <span class=u>&#x2462;</span></a>
-<samp class=pp>[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]</samp></pre>
-<ol>
-<li>You can use a generator like <code>fib()</code> in a <code>for</code> loop directly. The <code>for</code> loop will automatically call the <code>next()</code> function to get values from the <code>fib()</code> generator and assign them to the <code>for</code> loop index variable (<var>n</var>).
-<li>Each time through the <code>for</code> loop, <var>n</var> gets a new value from the <code>yield</code> statement in <code>fib()</code>, and all you have to do is print it out. Once <code>fib()</code> runs out of numbers (<var>a</var> becomes bigger than <var>max</var>, which in this case is <code>1000</code>), then the <code>for</code> loop exits gracefully.
-<li>This is a useful idiom: pass a generator to the <code>list()</code> function, and it will iterate through the entire generator (just like the <code>for</code> loop in the previous example) and return a list of all the values.
-</ol>
-
-<h3 id=a-plural-rule-generator>A Plural Rule Generator</h3>
-
-<p>Let&#8217;s go back to <code>plural5.py</code> and see how this version of the <code>plural()</code> function works.
-
-<pre class=pp><code>def rules(rules_filename):
-    with open(rules_filename, encoding='utf-8') as pattern_file:
-        for line in pattern_file:
-<a>            pattern, search, replace = line.split(None, 3)                   <span class=u>&#x2460;</span></a>
-<a>            yield build_match_and_apply_functions(pattern, search, replace)  <span class=u>&#x2461;</span></a>
-
-def plural(noun, rules_filename='plural5-rules.txt'):
-<a>    for matches_rule, apply_rule in rules(rules_filename):                   <span class=u>&#x2462;</span></a>
-        if matches_rule(noun):
-            return apply_rule(noun)
-    raise ValueError('no matching rule for {0}'.format(noun))</code></pre>
-<ol>
-<li>No magic here. Remember that the lines of the rules file have three values separated by whitespace, so you use <code>line.split(None, 3)</code> to get the three &#8220;columns&#8221; and assign them to three local variables.
-<li><em>And then you yield.</em> What do you yield? Two functions, built dynamically with your old friend, <code>build_match_and_apply_functions()</code>, which is identical to the previous examples. In other words, <code>rules()</code> is a generator that spits out match and apply functions <em>on demand</em>.
-<li>Since <code>rules()</code> is a generator, you can use it directly in a <code>for</code> loop. The first time through the <code>for</code> loop, you will call the <code>rules()</code> function, which will open the pattern file, read the first line, dynamically build a match function and an apply function from the patterns on that line, and yield the dynamically built functions. The second time through the <code>for</code> loop, you will pick up exactly where you left off in <code>rules()</code> (which was in the middle of the <code>for line in pattern_file</code> loop). The first thing it will do is read the next line of the file (which is still open), dynamically build another match and apply function based on the patterns on that line in the file, and yield the two functions.
-</ol>
-
-<p>What have you gained over stage 4? Startup time. In stage 4, when you imported the <code>plural4</code> module, it read the entire patterns file and built a list of all the possible rules, before you could even think about calling the <code>plural()</code> function. With generators, you can do everything lazily: you read the first rule and create functions and try them, and if that works you don&#8217;t ever read the rest of the file or create any other functions.
-
-<p>What have you lost? Performance! Every time you call the <code>plural()</code> function, the <code>rules()</code> generator starts over from the beginning&nbsp;&mdash;&nbsp;which means re-opening the patterns file and reading from the beginning, one line at a time.
-
-<p>What if you could have the best of both worlds: minimal startup cost (don&#8217;t execute any code on <code>import</code>), <em>and</em> maximum performance (don&#8217;t build the same functions over and over again). Oh, and you still want to keep the rules in a separate file (because code is code and data is data), just as long as you never have to read the same line twice.
-
-<p>To do that, you&#8217;ll need to build your own iterator. But before you do <em>that</em>, you need to learn about Python classes.
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-<ul>
-<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
-<li><a href=http://effbot.org/zone/python-with-statement.htm>Understanding Python&#8217;s &#8220;with&#8221; statement</a>
-<li><a href=http://ynniv.com/blog/2007/08/closures-in-python.html>Closures in Python</a>
-<li><a href=http://en.wikipedia.org/wiki/Fibonacci_number>Fibonacci numbers</a>
-<li><a href=http://www2.gsu.edu/~wwwesl/egw/crump.htm>English Irregular Plural Nouns</a>
-</ul>
-
-<p class=v><a href=regular-expressions.html rel=prev title='back to &#8220;Regular Expressions&#8221;'><span class=u>&#x261C;</span></a> <a href=iterators.html rel=next title='onward to &#8220;Classes &amp; Iterators&#8221;'><span class=u>&#x261E;</span></a>
-
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Closures &amp; Generators - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 6}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#generators>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
+<h1>Closures <i class=baa>&amp;</i> Generators</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> My spelling is Wobbly. It&#8217;s good spelling but it Wobbles, and the letters get in the wrong places. <span class=u>&#x275E;</span><br>&mdash; Winnie-the-Pooh
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>Having grown up the son of a librarian and an English major, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin (to name a few). Actually, &#8220;borrows&#8221; is the wrong word; &#8220;pillages&#8221; is more like it. Or perhaps &#8220;assimilates&#8221;&nbsp;&mdash;&nbsp;like the Borg. Yes, I like that.
+<p class=c><code>We are the Borg. Your linguistic and etymological distinctiveness will be added to our own. Resistance is futile.</code>
+<p>In this chapter, you&#8217;re going to learn about plural nouns. Also, functions that return other functions, advanced regular expressions, and generators. But first, let&#8217;s talk about how to make plural nouns. (If you haven&#8217;t read <a href=regular-expressions.html>the chapter on regular expressions</a>, now would be a good time. This chapter assumes you understand the basics of regular expressions, and it quickly descends into more advanced uses.)
+<p>If you grew up in an English-speaking country or learned English in a formal school setting, you&#8217;re probably familiar with the basic rules:
+<ul>
+<li>If a word ends in S, X, or Z, add ES. <i>Bass</i> becomes <i>basses</i>, <i>fax</i> becomes <i>faxes</i>, and <i>waltz</i> becomes <i>waltzes</i>.
+<li>If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What&#8217;s a noisy H? One that gets combined with other letters to make a sound that you can hear. So <i>coach</i> becomes <i>coaches</i> and <i>rash</i> becomes <i>rashes</i>, because you can hear the CH and SH sounds when you say them. But <i>cheetah</i> becomes <i>cheetahs</i>, because the H is silent.
+<li>If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. So <i>vacancy</i> becomes <i>vacancies</i>, but <i>day</i> becomes <i>days</i>.
+<li>If all else fails, just add S and hope for the best.
+</ul>
+<p>(I know, there are a lot of exceptions. <i>Man</i> becomes <i>men</i> and <i>woman</i> becomes <i>women</i>, but <i>human</i> becomes <i>humans</i>. <i>Mouse</i> becomes <i>mice</i> and <i>louse</i> becomes <i>lice</i>, but <i>house</i> becomes <i>houses</i>. <i>Knife</i> becomes <i>knives</i> and <i>wife</i> becomes <i>wives</i>, but <i>lowlife</i> becomes <i>lowlifes</i>. And don&#8217;t even get me started on words that are their own plural, like <i>sheep</i>, <i>deer</i>, and <i>haiku</i>.)
+<p>Other languages, of course, are completely different.
+<p>Let&#8217;s design a Python library that automatically pluralizes English nouns. We&#8217;ll start with just these four rules, but keep in mind that you&#8217;ll inevitably need to add more.
+<p class=a>&#x2042;
+
+<h2 id=i-know>I Know, Let&#8217;s Use Regular Expressions!</h2>
+<p>So you&#8217;re looking at words, which, at least in English, means you&#8217;re looking at strings of characters. You have rules that say you need to find different combinations of characters, then do different things to them. This sounds like a job for regular expressions!
+<p class=d>[<a href=examples/plural1.py>download <code>plural1.py</code></a>]
+<pre class=pp><code>import re
+
+def plural(noun):          
+<a>    if re.search('[sxz]$', noun):             <span class=u>&#x2460;</span></a>
+<a>        return re.sub('$', 'es', noun)        <span class=u>&#x2461;</span></a>
+    elif re.search('[^aeioudgkprt]h$', noun):
+        return re.sub('$', 'es', noun)       
+    elif re.search('[^aeiou]y$', noun):      
+        return re.sub('y$', 'ies', noun)     
+    else:
+        return noun + 's'</code></pre>
+<ol>
+<li>This is a regular expression, but it uses a syntax you didn&#8217;t see in <a href=regular-expressions.html><i>Regular Expressions</i></a>. The square brackets mean &#8220;match exactly one of these characters.&#8221; So <code>[sxz]</code> means &#8220;<code>s</code>, or <code>x</code>, or <code>z</code>&#8221;, but only one of them. The <code>$</code> should be familiar; it matches the end of string. Combined, this regular expression tests whether <var>noun</var> ends with <code>s</code>, <code>x</code>, or <code>z</code>.
+<li>This <code>re.sub()</code> function performs regular expression-based string substitutions.
+</ol>
+
+<p>Let&#8217;s look at regular expression substitutions in more detail.
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import re</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>re.search('[abc]', 'Mark')</kbd>    <span class=u>&#x2460;</span></a>
+&lt;_sre.SRE_Match object at 0x001C1FA8>
+<a><samp class=p>>>> </samp><kbd class=pp>re.sub('[abc]', 'o', 'Mark')</kbd>  <span class=u>&#x2461;</span></a>
+<samp class=pp>'Mork'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>re.sub('[abc]', 'o', 'rock')</kbd>  <span class=u>&#x2462;</span></a>
+<samp class=pp>'rook'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>re.sub('[abc]', 'o', 'caps')</kbd>  <span class=u>&#x2463;</span></a>
+<samp class=pp>'oops'</samp></pre>
+<ol>
+<li>Does the string <code>Mark</code> contain <code>a</code>, <code>b</code>, or <code>c</code>? Yes, it contains <code>a</code>.
+<li>OK, now find <code>a</code>, <code>b</code>, or <code>c</code>, and replace it with <code>o</code>. <code>Mark</code> becomes <code>Mork</code>.
+<li>The same function turns <code>rock</code> into <code>rook</code>.
+<li>You might think this would turn <code>caps</code> into <code>oaps</code>, but it doesn&#8217;t. <code>re.sub</code> replaces <em>all</em> of the matches, not just the first one. So this regular expression turns <code>caps</code> into <code>oops</code>, because both the <code>c</code> and the <code>a</code> get turned into <code>o</code>.
+</ol>
+
+<p>And now, back to the <code>plural()</code> function&hellip;
+
+<pre class=pp><code>def plural(noun):          
+    if re.search('[sxz]$', noun):            
+<a>        return re.sub('$', 'es', noun)         <span class=u>&#x2460;</span></a>
+<a>    elif re.search('[^aeioudgkprt]h$', noun):  <span class=u>&#x2461;</span></a>
+        return re.sub('$', 'es', noun)
+<a>    elif re.search('[^aeiou]y$', noun):        <span class=u>&#x2462;</span></a>
+        return re.sub('y$', 'ies', noun)     
+    else:
+        return noun + 's'</code></pre>
+<ol>
+<li>Here, you&#8217;re replacing the end of the string (matched by <code>$</code>) with the string <code>es</code>. In other words, adding <code>es</code> to the string. You could accomplish the same thing with string concatenation, for example <code>noun + 'es'</code>, but I chose to use regular expressions for each rule, for reasons that will become clear later in the chapter.
+<li>Look closely, this is another new variation. The <code>^</code> as the first character inside the square brackets means something special: negation. <code>[^abc]</code> means &#8220;any single character <em>except</em> <code>a</code>, <code>b</code>, or <code>c</code>&#8221;. So <code>[^aeioudgkprt]</code> means any character except <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, <code>u</code>, <code>d</code>, <code>g</code>, <code>k</code>, <code>p</code>, <code>r</code>, or <code>t</code>. Then that character needs to be followed by <code>h</code>, followed by end of string. You&#8217;re looking for words that end in H where the H can be heard.
+<li>Same pattern here: match words that end in Y, where the character before the Y is <em>not</em> <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>. You&#8217;re looking for words that end in Y that sounds like I.
+</ol>
+
+<p>Let&#8217;s look at negation regular expressions in more detail.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import re</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'vacancy')</kbd>  <span class=u>&#x2460;</span></a>
+&lt;_sre.SRE_Match object at 0x001C1FA8>
+<a><samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'boy')</kbd>      <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp>
+<samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'day')</kbd>
+<samp class=p>>>> </samp>
+<a><samp class=p>>>> </samp><kbd class=pp>re.search('[^aeiou]y$', 'pita')</kbd>     <span class=u>&#x2462;</span></a>
+<samp class=p>>>> </samp></pre>
+<ol>
+<li><code>vacancy</code> matches this regular expression, because it ends in <code>cy</code>, and <code>c</code> is not <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>.
+<li><code>boy</code> does not match, because it ends in <code>oy</code>, and you specifically said that the character before the <code>y</code> could not be <code>o</code>. <code>day</code> does not match, because it ends in <code>ay</code>.
+<li><code>pita</code> does not match, because it does not end in <code>y</code>.
+</ol>
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>re.sub('y$', 'ies', 'vacancy')</kbd>               <span class=u>&#x2460;</span></a>
+<samp class=pp>'vacancies'</samp>
+<samp class=p>>>> </samp><kbd class=pp>re.sub('y$', 'ies', 'agency')</kbd>
+<samp class=pp>'agencies'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>re.sub('([^aeiou])y$', r'\1ies', 'vacancy')</kbd>  <span class=u>&#x2461;</span></a>
+<samp class=pp>'vacancies'</samp></pre>
+<ol>
+<li>This regular expression turns <code>vacancy</code> into <code>vacancies</code> and <code>agency</code> into <code>agencies</code>, which is what you wanted. Note that it would also turn <code>boy</code> into <code>boies</code>, but that will never happen in the function because you did that <code>re.search</code> first to find out whether you should do this <code>re.sub</code>.
+<li>Just in passing, I want to point out that it is possible to combine these two regular expressions (one to find out if the rule applies, and another to actually apply it) into a single regular expression. Here&#8217;s what that would look like. Most of it should look familiar: you&#8217;re using a remembered group, which you learned in <a href=regular-expressions.html#phonenumbers>Case study: Parsing Phone Numbers</a>. The group is used to remember the character before the letter <code>y</code>. Then in the substitution string, you use a new syntax, <code>\1</code>, which means &#8220;hey, that first group you remembered? put it right here.&#8221; In this case, you remember the <code>c</code> before the <code>y</code>; when you do the substitution, you substitute <code>c</code> in place of <code>c</code>, and <code>ies</code> in place of <code>y</code>. (If you have more than one remembered group, you can use <code>\2</code> and <code>\3</code> and so on.)
+</ol>
+<p>Regular expression substitutions are extremely powerful, and the <code>\1</code> syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder to read, and it doesn&#8217;t directly map to the way you first described the pluralizing rules. You originally laid out rules like &#8220;if the word ends in S, X, or Z, then add ES&#8221;. If you look at this function, you have two lines of code that say &#8220;if the word ends in S, X, or Z, then add ES&#8221;. It doesn&#8217;t get much more direct than that.
+
+<p class=a>&#x2042;
+
+<h2 id=a-list-of-functions>A List Of Functions</h2>
+
+<p>Now you&#8217;re going to add a level of abstraction. You started by defining a list of rules: if this, do that, otherwise go to the next rule. Let&#8217;s temporarily complicate part of the program so you can simplify another part.
+
+<p class=d>[<a href=examples/plural2.py>download <code>plural2.py</code></a>]
+<pre class=pp><code>import re
+
+def match_sxz(noun):
+    return re.search('[sxz]$', noun)
+
+def apply_sxz(noun):
+    return re.sub('$', 'es', noun)
+
+def match_h(noun):
+    return re.search('[^aeioudgkprt]h$', noun)
+
+def apply_h(noun):
+    return re.sub('$', 'es', noun)
+
+<a>def match_y(noun):                             <span class=u>&#x2460;</span></a>
+    return re.search('[^aeiou]y$', noun)
+        
+<a>def apply_y(noun):                             <span class=u>&#x2461;</span></a>
+    return re.sub('y$', 'ies', noun)
+
+def match_default(noun):
+    return True
+
+def apply_default(noun):
+    return noun + 's'
+
+<a>rules = ((match_sxz, apply_sxz),               <span class=u>&#x2462;</span></a>
+         (match_h, apply_h),
+         (match_y, apply_y),
+         (match_default, apply_default)
+         )
+
+def plural(noun):           
+<a>    for matches_rule, apply_rule in rules:       <span class=u>&#x2463;</span></a>
+        if matches_rule(noun):
+            return apply_rule(noun)</code></pre>
+<ol>
+<li>Now, each match rule is its own function which returns the results of calling the <code>re.search()</code> function.
+<li>Each apply rule is also its own function which calls the <code>re.sub()</code> function to apply the appropriate pluralization rule.
+<li>Instead of having one function (<code>plural()</code>) with multiple rules, you have the <code>rules</code> data structure, which is a sequence of pairs of functions.
+<li>Since the rules have been broken out into a separate data structure, the new <code>plural()</code> function can be reduced to a few lines of code. Using a <code>for</code> loop, you can pull out the match and apply rules two at a time (one match, one apply) from the <var>rules</var> structure. On the first iteration of the <code>for</code> loop, <var>matches_rule</var> will get <code>match_sxz</code>, and <var>apply_rule</var> will get <code>apply_sxz</code>. On the second iteration (assuming you get that far), <var>matches_rule</var> will be assigned <code>match_h</code>, and <var>apply_rule</var> will be assigned <code>apply_h</code>. The function is guaranteed to return something eventually, because the final match rule (<code>match_default</code>) simply returns <code>True</code>, meaning the corresponding apply rule (<code>apply_default</code>) will always be applied.
+</ol>
+
+<aside>The &#8220;rules&#8221; variable is a sequence of pairs of functions.</aside>
+<p>The reason this technique works is that <a href=your-first-python-program.html#everythingisanobject>everything in Python is an object</a>, including functions. The <var>rules</var> data structure contains functions&nbsp;&mdash;&nbsp;not names of functions, but actual function objects. When they get assigned in the <code>for</code> loop, then <var>matches_rule</var> and <var>apply_rule</var> are actual functions that you can call. On the first iteration of the <code>for</code> loop, this is equivalent to calling <code>matches_sxz(noun)</code>, and if it returns a match, calling <code>apply_sxz(noun)</code>.
+
+<p>If this additional level of abstraction is confusing, try unrolling the function to see the equivalence. The entire <code>for</code> loop is equivalent to the following:
+
+<pre class='nd pp'><code>
+def plural(noun):
+    if match_sxz(noun):
+        return apply_sxz(noun)
+    if match_h(noun):
+        return apply_h(noun)
+    if match_y(noun):
+        return apply_y(noun)
+    if match_default(noun):
+        return apply_default(noun)</code></pre>
+
+<p>The benefit here is that the <code>plural()</code> function is now simplified. It takes a sequence of rules, defined elsewhere, and iterates through them in a generic fashion.
+
+<ol>
+<li>Get a match rule
+<li>Does it match? Then call the apply rule and return the result.
+<li>No match? Go to step 1.
+</ol>
+
+<p>The rules could be defined anywhere, in any way. The <code>plural()</code> function doesn&#8217;t care.
+
+<p>Now, was adding this level of abstraction worth it? Well, not yet. Let&#8217;s consider what it would take to add a new rule to the function. In the first example, it would require adding an <code>if</code> statement to the <code>plural()</code> function. In this second example, it would require adding two functions, <code>match_foo()</code> and <code>apply_foo()</code>, and then updating the <var>rules</var> sequence to specify where in the order the new match and apply functions should be called relative to the other rules.
+
+<p>But this is really just a stepping stone to the next section. Let&#8217;s move on&hellip;
+
+<p class=a>&#x2042;
+
+<h2 id=a-list-of-patterns>A List Of Patterns</h2>
+
+<p>Defining separate named functions for each match and apply rule isn&#8217;t really necessary. You never call them directly; you add them to the <var>rules</var> sequence and call them through there. Furthermore, each function follows one of two patterns. All the match functions call <code>re.search()</code>, and all the apply functions call <code>re.sub()</code>. Let&#8217;s factor out the patterns so that defining new rules can be easier.
+
+<p class=d>[<a href=examples/plural3.py>download <code>plural3.py</code></a>]
+<pre class=pp><code>import re
+
+def build_match_and_apply_functions(pattern, search, replace):
+<a>    def matches_rule(word):                                     <span class=u>&#x2460;</span></a>
+        return re.search(pattern, word)
+<a>    def apply_rule(word):                                       <span class=u>&#x2461;</span></a>
+        return re.sub(search, replace, word)
+<a>    return (matches_rule, apply_rule)                           <span class=u>&#x2462;</span></a></code></pre>
+<ol>
+<li><code>build_match_and_apply_functions()</code> is a function that builds other functions dynamically. It takes <var>pattern</var>, <var>search</var> and <var>replace</var>, then defines a <code>matches_rule()</code> function which calls <code>re.search()</code> with the <var>pattern</var> that was passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>matches_rule()</code> function you&#8217;re building. Whoa.
+<li>Building the apply function works the same way. The apply function is a function that takes one parameter, and calls <code>re.sub()</code> with the <var>search</var> and <var>replace</var> parameters that were passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>apply_rule()</code> function you&#8217;re building. This technique of using the values of outside parameters within a dynamic function is called <em>closures</em>. You&#8217;re essentially defining constants within the apply function you&#8217;re building: it takes one parameter (<var>word</var>), but it then acts on that plus two other values (<var>search</var> and <var>replace</var>) which were set when you defined the apply function.
+<li>Finally, the <code>build_match_and_apply_functions()</code> function returns a tuple of two values: the two functions you just created. The constants you defined within those functions (<var>pattern</var> within the <code>matches_rule()</code> function, and <var>search</var> and <var>replace</var> within the <code>apply_rule()</code> function) stay with those functions, even after you return from <code>build_match_and_apply_functions()</code>. That&#8217;s insanely cool.
+</ol>
+
+<p>If this is incredibly confusing (and it should be, this is weird stuff), it may become clearer when you see how to use it.
+
+<pre class=pp><code><a>patterns = \                                                        <span class=u>&#x2460;</span></a>
+  (
+    ('[sxz]$',           '$',  'es'),
+    ('[^aeioudgkprt]h$', '$',  'es'),
+    ('(qu|[^aeiou])y$',  'y$', 'ies'),
+<a>    ('$',                '$',  's')                                 <span class=u>&#x2461;</span></a>
+  )
+<a>rules = [build_match_and_apply_functions(pattern, search, replace)  <span class=u>&#x2462;</span></a>
+         for (pattern, search, replace) in patterns]</code></pre>
+<ol>
+<li>Our pluralization &#8220;rules&#8221; are now defined as a tuple of tuples of <em>strings</em> (not functions). The first string in each group is the regular expression pattern that you would use in <code>re.search()</code> to see if this rule matches. The second and third strings in each group are the search and replace expressions you would use in <code>re.sub()</code> to actually apply the rule to turn a noun into its plural.
+<li>There&#8217;s a slight change here, in the fallback rule. In the previous example, the <code>match_default()</code> function simply returned <code>True</code>, meaning that if none of the more specific rules matched, the code would simply add an <code>s</code> to the end of the given word. This example does something functionally equivalent. The final regular expression asks whether the word has an end (<code>$</code> matches the end of a string). Of course, every string has an end, even an empty string, so this expression always matches. Thus, it serves the same purpose as the <code>match_default()</code> function that always returned <code>True</code>: it ensures that if no more specific rule matches, the code adds an <code>s</code> to the end of the given word.
+<li>This line is magic. It takes the sequence of strings in <var>patterns</var> and turns them into a sequence of functions. How? By &#8220;mapping&#8221; the strings to the <code>build_match_and_apply_functions()</code> function. That is, it takes each triplet of strings and calls the <code>build_match_and_apply_functions()</code> function with those three strings as arguments. The <code>build_match_and_apply_functions()</code> function returns a tuple of two functions. This means that <var>rules</var> ends up being functionally equivalent to the previous example: a list of tuples, where each tuple is a pair of functions. The first function is the match function that calls <code>re.search()</code>, and the second function is the apply function that calls <code>re.sub()</code>.
+</ol>
+
+<p>Rounding out this version of the script is the main entry point, the <code>plural()</code> function.
+
+<pre class=pp><code>def plural(noun):
+<a>    for matches_rule, apply_rule in rules:  <span class=u>&#x2460;</span></a>
+        if matches_rule(noun):
+            return apply_rule(noun)</code></pre>
+<ol>
+<li>Since the <var>rules</var> list is the same as the previous example (really, it is), it should come as no surprise that the <code>plural()</code> function hasn&#8217;t changed at all. It&#8217;s completely generic; it takes a list of rule functions and calls them in order. It doesn&#8217;t care how the rules are defined. In the previous example, they were defined as separate named functions. Now they are built dynamically by mapping the output of the <code>build_match_and_apply_functions()</code> function onto a list of raw strings. It doesn&#8217;t matter; the <code>plural()</code> function still works the same way.
+</ol>
+
+<p class=a>&#x2042;
+
+<h2 id=a-file-of-patterns>A File Of Patterns</h2>
+
+<p>You&#8217;ve factored out all the duplicate code and added enough abstractions so that the pluralization rules are defined in a list of strings. The next logical step is to take these strings and put them in a separate file, where they can be maintained separately from the code that uses them.
+
+<p>First, let&#8217;s create a text file that contains the rules you want. No fancy data structures, just whitespace-delimited strings in three columns. Let&#8217;s call it <code>plural4-rules.txt</code>.
+
+<p class=d>[<a href=examples/plural4-rules.txt>download <code>plural4-rules.txt</code></a>]
+<pre class='nd pp'><code>[sxz]$               $    es
+[^aeioudgkprt]h$     $    es
+[^aeiou]y$          y$    ies
+$                    $    s</code></pre>
+
+<p>Now let&#8217;s see how you can use this rules file.
+
+<p class=d>[<a href=examples/plural4.py>download <code>plural4.py</code></a>]
+<pre class=pp><code>import re
+
+<a>def build_match_and_apply_functions(pattern, search, replace):  <span class=u>&#x2460;</span></a>
+    def matches_rule(word):
+        return re.search(pattern, word)
+    def apply_rule(word):
+        return re.sub(search, replace, word)
+    return (matches_rule, apply_rule)
+
+rules = []
+<a>with open('plural4-rules.txt', encoding='utf-8') as pattern_file:  <span class=u>&#x2461;</span></a>
+<a>    for line in pattern_file:                                      <span class=u>&#x2462;</span></a>
+<a>        pattern, search, replace = line.split(None, 3)             <span class=u>&#x2463;</span></a>
+<a>        rules.append(build_match_and_apply_functions(              <span class=u>&#x2464;</span></a>
+                pattern, search, replace))</code></pre>
+<ol>
+<li>The <code>build_match_and_apply_functions()</code> function has not changed. You&#8217;re still using closures to build two functions dynamically that use variables defined in the outer function.
+<li>The global <code>open()</code> function opens a file and returns a file object. In this case, the file we&#8217;re opening contains the pattern strings for pluralizing nouns. The <code>with</code> statement creates what&#8217;s called a <i>context</i>: when the <code>with</code> block ends, Python will automatically close the file, even if an exception is raised inside the <code>with</code> block. You&#8217;ll learn more about <code>with</code> blocks and file objects in the <a href=files.html>Files</a> chapter.
+<li>The <code>for line in &lt;fileobject></code> idiom reads data from the open file, one line at a time, and assigns the text to the <var>line</var> variable. You&#8217;ll learn more about reading from files in the <a href=files.html>Files</a> chapter.
+<li>Each line in the file really has three values, but they&#8217;re separated by whitespace (tabs or spaces, it makes no difference). To split it out, use the <code>split()</code> string method. The first argument to the <code>split()</code> method is <code>None</code>, which means &#8220;split on any whitespace (tabs or spaces, it makes no difference).&#8221; The second argument is <code>3</code>, which means &#8220;split on whitespace 3 times, then leave the rest of the line alone.&#8221; A line like <code>[sxz]$ $ es</code> will be broken up into the list <code>['[sxz]$', '$', 'es']</code>, which means that <var>pattern</var> will get <code>'[sxz]$'</code>, <var>search</var> will get <code>'$'</code>, and <var>replace</var> will get <code>'es'</code>. That&#8217;s a lot of power in one little line of code.
+<li>Finally, you pass <code>pattern</code>, <code>search</code>, and <code>replace</code> to the <code>build_match_and_apply_functions()</code> function, which returns a tuple of functions. You append this tuple to the <var>rules</var> list, and <var>rules</var> ends up storing the list of match and apply functions that the <code>plural()</code> function expects.
+</ol>
+
+<p>The improvement here is that you&#8217;ve completely separated the pluralization rules into an external file, so it can be maintained separately from the code that uses it. Code is code, data is data, and life is good.
+
+<p class=a>&#x2042;
+
+<h2 id=generators>Generators</h2>
+
+<p>Wouldn&#8217;t it be grand to have a generic <code>plural()</code> function that parses the rules file? Get rules, check for a match, apply appropriate transformation, go to next rule. That&#8217;s all the <code>plural()</code> function has to do, and that&#8217;s all the <code>plural()</code> function should do.
+
+<p class=d>[<a href=examples/plural5.py>download <code>plural5.py</code></a>]
+<pre class='nd pp'><code>def rules(rules_filename):
+    with open(rules_filename, encoding='utf-8') as pattern_file:
+        for line in pattern_file:
+            pattern, search, replace = line.split(None, 3)
+            yield build_match_and_apply_functions(pattern, search, replace)
+
+def plural(noun, rules_filename='plural5-rules.txt'):
+    for matches_rule, apply_rule in rules(rules_filename):
+        if matches_rule(noun):
+            return apply_rule(noun)
+    raise ValueError('no matching rule for {0}'.format(noun))</code></pre>
+
+<p>How the heck does <em>that</em> work? Let&#8217;s look at an interactive example first.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>def make_counter(x):</kbd>
+<samp class=p>... </samp><kbd class=pp>    print('entering make_counter')</kbd>
+<samp class=p>... </samp><kbd class=pp>    while True:</kbd>
+<a><samp class=p>... </samp><kbd class=pp>        yield x</kbd>                    <span class=u>&#x2460;</span></a>
+<samp class=p>... </samp><kbd class=pp>        print('incrementing x')</kbd>
+<samp class=p>... </samp><kbd class=pp>        x = x + 1</kbd>
+<samp class=p>... </samp>
+<a><samp class=p>>>> </samp><kbd class=pp>counter = make_counter(2)</kbd>          <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>counter</kbd>                            <span class=u>&#x2462;</span></a>
+&lt;generator object at 0x001C9C10>
+<a><samp class=p>>>> </samp><kbd class=pp>next(counter)</kbd>                      <span class=u>&#x2463;</span></a>
+<samp>entering make_counter
+2</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>next(counter)</kbd>                      <span class=u>&#x2464;</span></a>
+<samp>incrementing x
+3</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>next(counter)</kbd>                      <span class=u>&#x2465;</span></a>
+<samp>incrementing x
+4</samp></pre>
+<ol>
+<li>The presence of the <code>yield</code> keyword in <code>make_counter</code> means that this is not a normal function. It is a special kind of function which generates values one at a time. You can think of it as a resumable function. Calling it will return a <i>generator</i> that can be used to generate successive values of <var>x</var>.
+<li>To create an instance of the <code>make_counter</code> generator, just call it like any other function. Note that this does not actually execute the function code. You can tell this because the first line of the <code>make_counter()</code> function calls <code>print()</code>, but nothing has been printed yet.
+<li>The <code>make_counter()</code> function returns a generator object.
+<li>The <code>next()</code> function takes a generator object and returns its next value. The first time you call <code>next()</code> with the <var>counter</var> generator, it executes the code in <code>make_counter()</code> up to the first <code>yield</code> statement, then returns the value that was yielded. In this case, that will be <code>2</code>, because you originally created the generator by calling <code>make_counter(2)</code>.
+<li>Repeatedly calling <code>next()</code> with the same generator object resumes exactly where it left off and continues until it hits the next <code>yield</code> statement. All variables, local state, <i class=baa>&amp;</i>c. are saved on <code>yield</code> and restored on <code>next()</code>. The next line of code waiting to be executed calls <code>print()</code>, which prints <samp>incrementing x</samp>. After that, the statement <code>x = x + 1</code>. Then it loops through the <code>while</code> loop again, and the first thing it hits is the statement <code>yield x</code>, which saves the state of everything and returns the current value of <var>x</var> (now <code>3</code>).
+<li>The second time you call <code>next(counter)</code>, you do all the same things again, but this time <var>x</var> is now <code>4</code>.
+</ol>
+
+<p>Since <code>make_counter</code> sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing <var>x</var> and spitting out values. But let&#8217;s look at more productive uses of generators instead.
+
+<h3 id=a-fibonacci-generator>A Fibonacci Generator</h3>
+
+<aside>&#8220;yield&#8221; pauses a function. &#8220;next()&#8221; resumes where it left off.</aside>
+
+<p class=d>[<a href=examples/fibonacci.py>download <code>fibonacci.py</code></a>]
+<pre class=pp><code>def fib(max):
+<a>    a, b = 0, 1          <span class=u>&#x2460;</span></a>
+    while a &lt; max:
+<a>        yield a          <span class=u>&#x2461;</span></a>
+<a>        a, b = b, a + b  <span class=u>&#x2462;</span></a></code></pre>
+<ol>
+<li>The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with 0 and <code>1</code>, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: <var>a</var> starts at 0, and <var>b</var> starts at <code>1</code>.
+<li><var>a</var> is the current number in the sequence, so yield it.
+<li><var>b</var> is the next number in the sequence, so assign that to <var>a</var>, but also calculate the next value (<code>a + b</code>) and assign that to <var>b</var> for later use. Note that this happens in parallel; if <var>a</var> is <code>3</code> and <var>b</var> is <code>5</code>, then <code>a, b = b, a + b</code> will set <var>a</var> to <code>5</code> (the previous value of <var>b</var>) and <var>b</var> to <code>8</code> (the sum of the previous values of <var>a</var> and <var>b</var>).
+</ol>
+
+<p>So you have a function that spits out successive Fibonacci numbers. Sure, you could do that with recursion, but this way is easier to read. Also, it works well with <code>for</code> loops.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>from fibonacci import fib</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>for n in fib(1000):</kbd>      <span class=u>&#x2460;</span></a>
+<a><samp class=p>... </samp><kbd class=pp>    print(n, end=' ')</kbd>    <span class=u>&#x2461;</span></a>
+<samp class=pp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>list(fib(1000))</kbd>          <span class=u>&#x2462;</span></a>
+<samp class=pp>[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]</samp></pre>
+<ol>
+<li>You can use a generator like <code>fib()</code> in a <code>for</code> loop directly. The <code>for</code> loop will automatically call the <code>next()</code> function to get values from the <code>fib()</code> generator and assign them to the <code>for</code> loop index variable (<var>n</var>).
+<li>Each time through the <code>for</code> loop, <var>n</var> gets a new value from the <code>yield</code> statement in <code>fib()</code>, and all you have to do is print it out. Once <code>fib()</code> runs out of numbers (<var>a</var> becomes bigger than <var>max</var>, which in this case is <code>1000</code>), then the <code>for</code> loop exits gracefully.
+<li>This is a useful idiom: pass a generator to the <code>list()</code> function, and it will iterate through the entire generator (just like the <code>for</code> loop in the previous example) and return a list of all the values.
+</ol>
+
+<h3 id=a-plural-rule-generator>A Plural Rule Generator</h3>
+
+<p>Let&#8217;s go back to <code>plural5.py</code> and see how this version of the <code>plural()</code> function works.
+
+<pre class=pp><code>def rules(rules_filename):
+    with open(rules_filename, encoding='utf-8') as pattern_file:
+        for line in pattern_file:
+<a>            pattern, search, replace = line.split(None, 3)                   <span class=u>&#x2460;</span></a>
+<a>            yield build_match_and_apply_functions(pattern, search, replace)  <span class=u>&#x2461;</span></a>
+
+def plural(noun, rules_filename='plural5-rules.txt'):
+<a>    for matches_rule, apply_rule in rules(rules_filename):                   <span class=u>&#x2462;</span></a>
+        if matches_rule(noun):
+            return apply_rule(noun)
+    raise ValueError('no matching rule for {0}'.format(noun))</code></pre>
+<ol>
+<li>No magic here. Remember that the lines of the rules file have three values separated by whitespace, so you use <code>line.split(None, 3)</code> to get the three &#8220;columns&#8221; and assign them to three local variables.
+<li><em>And then you yield.</em> What do you yield? Two functions, built dynamically with your old friend, <code>build_match_and_apply_functions()</code>, which is identical to the previous examples. In other words, <code>rules()</code> is a generator that spits out match and apply functions <em>on demand</em>.
+<li>Since <code>rules()</code> is a generator, you can use it directly in a <code>for</code> loop. The first time through the <code>for</code> loop, you will call the <code>rules()</code> function, which will open the pattern file, read the first line, dynamically build a match function and an apply function from the patterns on that line, and yield the dynamically built functions. The second time through the <code>for</code> loop, you will pick up exactly where you left off in <code>rules()</code> (which was in the middle of the <code>for line in pattern_file</code> loop). The first thing it will do is read the next line of the file (which is still open), dynamically build another match and apply function based on the patterns on that line in the file, and yield the two functions.
+</ol>
+
+<p>What have you gained over stage 4? Startup time. In stage 4, when you imported the <code>plural4</code> module, it read the entire patterns file and built a list of all the possible rules, before you could even think about calling the <code>plural()</code> function. With generators, you can do everything lazily: you read the first rule and create functions and try them, and if that works you don&#8217;t ever read the rest of the file or create any other functions.
+
+<p>What have you lost? Performance! Every time you call the <code>plural()</code> function, the <code>rules()</code> generator starts over from the beginning&nbsp;&mdash;&nbsp;which means re-opening the patterns file and reading from the beginning, one line at a time.
+
+<p>What if you could have the best of both worlds: minimal startup cost (don&#8217;t execute any code on <code>import</code>), <em>and</em> maximum performance (don&#8217;t build the same functions over and over again). Oh, and you still want to keep the rules in a separate file (because code is code and data is data), just as long as you never have to read the same line twice.
+
+<p>To do that, you&#8217;ll need to build your own iterator. But before you do <em>that</em>, you need to learn about Python classes.
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+<ul>
+<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
+<li><a href=http://effbot.org/zone/python-with-statement.htm>Understanding Python&#8217;s &#8220;with&#8221; statement</a>
+<li><a href=http://ynniv.com/blog/2007/08/closures-in-python.html>Closures in Python</a>
+<li><a href=http://en.wikipedia.org/wiki/Fibonacci_number>Fibonacci numbers</a>
+<li><a href=http://www2.gsu.edu/~wwwesl/egw/crump.htm>English Irregular Plural Nouns</a>
+</ul>
+
+<p class=v><a href=regular-expressions.html rel=prev title='back to &#8220;Regular Expressions&#8221;'><span class=u>&#x261C;</span></a> <a href=iterators.html rel=next title='onward to &#8220;Classes &amp; Iterators&#8221;'><span class=u>&#x261E;</span></a>
+
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/http-web-services.html b/http-web-services.html
index 6518ab4..435d631 100755
--- a/http-web-services.html
+++ b/http-web-services.html
@@ -1,1003 +1,1003 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>HTTP Web Services - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 14}
-mark{display:inline}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=root value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#http-web-services>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
-<h1>HTTP Web Services</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> A ruffled mind makes a restless pillow. <span class=u>&#x275E;</span><br>&mdash; Charlotte Bront&euml;
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>Philosophically, I can describe HTTP web services in 12 words: exchanging data with remote servers using nothing but the operations of <abbr>HTTP</abbr>. If you want to get data from the server, use <abbr>HTTP</abbr> <code>GET</code>. If you want to send new data to the server, use <abbr>HTTP</abbr> <code>POST</code>. Some more advanced <abbr>HTTP</abbr> web service <abbr>API</abbr>s also allow creating, modifying, and deleting data, using <abbr>HTTP</abbr> <code>PUT</code> and <abbr>HTTP</abbr> <code>DELETE</code>. That&#8217;s it. No registries, no envelopes, no wrappers, no tunneling. The &#8220;verbs&#8221; built into the <abbr>HTTP</abbr> protocol (<code>GET</code>, <code>POST</code>, <code>PUT</code>, and <code>DELETE</code>) map directly to application-level operations for retrieving, creating, modifying, and deleting data.
-
-<p>The main advantage of this approach is simplicity, and its simplicity has proven popular. Data&nbsp;&mdash;&nbsp;usually <a href=xml.html><abbr>XML</abbr></a> or <a href=serializing.html#json><abbr>JSON</abbr></a>&nbsp;&mdash;&nbsp;can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an <abbr>HTTP</abbr> library for downloading it. Debugging is also easier; because each resource in an <abbr>HTTP</abbr> web service has a unique address (in the form of a <abbr>URL</abbr>), you can load it in your web browser and immediately see the raw data.
-
-<p>Examples of <abbr>HTTP</abbr> web services:
-<ul>
-<li><a href=http://code.google.com/apis/gdata/>Google Data <abbr>API</abbr>s</a> allow you to interact with a wide variety of Google services, including <a href=http://www.blogger.com/>Blogger</a> and <a href=http://www.youtube.com/>YouTube</a>.
-<li><a href=http://www.flickr.com/services/api/>Flickr Services</a> allow you to upload and download photos from <a href=http://www.flickr.com/>Flickr</a>.
-<li><a href=http://apiwiki.twitter.com/>Twitter <abbr>API</abbr></a> allows you to publish status updates on <a href=http://twitter.com/>Twitter</a>.
-<li><a href='http://www.programmableweb.com/apis/directory/1?sort=mashups'>&hellip;and many more</a>
-</ul>
-
-<p>Python 3 comes with two different libraries for interacting with <abbr>HTTP</abbr> web services:
-
-<ul>
-<li><a href=http://docs.python.org/3.1/library/http.client.html><code>http.client</code></a> is a low-level library that implements <a href=http://www.w3.org/Protocols/rfc2616/rfc2616.html><abbr>RFC</abbr> 2616</a>, the <abbr>HTTP</abbr> protocol.
-<li><a href=http://docs.python.org/3.1/library/urllib.request.html><code>urllib.request</code></a> is an abstraction layer built on top of <code>http.client</code>. It provides a standard <abbr>API</abbr> for accessing both <abbr>HTTP</abbr> and <abbr>FTP</abbr> servers, automatically follows <abbr>HTTP</abbr> redirects, and handles some common forms of <abbr>HTTP</abbr> authentication.
-</ul>
-
-<p>So which one should you use? Neither of them. Instead, you should use <a href=http://code.google.com/p/httplib2/><code>httplib2</code></a>, an open source third-party library that implements <abbr>HTTP</abbr> more fully than <code>http.client</code> but provides a better abstraction than <code>urllib.request</code>.
-
-<p>To understand why <code>httplib2</code> is the right choice, you first need to understand <abbr>HTTP</abbr>.
-
-<p class=a>&#x2042;
-
-<h2 id=http-features>Features of HTTP</h2>
-
-<p>There are five important features which all <abbr>HTTP</abbr> clients should support.
-
-<h3 id=caching>Caching</h3>
-
-<p>The most important thing to understand about any type of web service is that network access is incredibly expensive. I don&#8217;t mean &#8220;dollars and cents&#8221; expensive (although bandwidth ain&#8217;t free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, <i>latency</i> (the time it takes to send a request and start retrieving data in a response) can still be higher than you anticipated. A router misbehaves, a packet is dropped, an intermediate proxy is under attack&nbsp;&mdash;&nbsp;there&#8217;s <a href=http://isc.sans.org/>never a dull moment</a> on the public internet, and there may be nothing you can do about it.
-
-<aside><code>Cache-Control: max-age</code> means &#8220;don't bug me until next week.&#8221;</aside>
-
-<p><abbr>HTTP</abbr> is designed with caching in mind. There is an entire class of devices (called &#8220;caching proxies&#8221;) whose only job is to sit between you and the rest of the world and minimize network access. Your company or <abbr>ISP</abbr> almost certainly maintains caching proxies, even if you&#8217;re unaware of them. They work because caching built into the <abbr>HTTP</abbr> protocol.
-
-<p>Here&#8217;s a concrete example of how caching works. You visit <a href=http://diveintomark.org/><code>diveintomark.org</code></a> in your browser. That page includes a background image, <a href=http://wearehugh.com/m.jpg><code>wearehugh.com/m.jpg</code></a>. When your browser downloads that image, the server includes the following <abbr>HTTP</abbr> headers:
-
-<pre class=nd><code>HTTP/1.1 200 OK
-Date: Sun, 31 May 2009 17:14:04 GMT
-Server: Apache
-Last-Modified: Fri, 22 Aug 2008 04:28:16 GMT
-ETag: "3075-ddc8d800"
-Accept-Ranges: bytes
-Content-Length: 12405
-<mark>Cache-Control: max-age=31536000, public</mark>
-<mark>Expires: Mon, 31 May 2010 17:14:04 GMT</mark>
-Connection: close
-Content-Type: image/jpeg</code></pre>
-
-<p>The <code>Cache-Control</code> and <code>Expires</code> headers tell your browser (and any caching proxies between you and the server) that this image can be cached for up to a year. <em>A year!</em> And if, in the next year, you visit another page which also includes a link to this image, your browser will load the image from its cache <em>without generating any network activity whatsoever</em>.
-
-<p>But wait, it gets better. Let&#8217;s say your browser purges the image from your local cache for some reason. Maybe it ran out of disk space; maybe you manually cleared the cache. Whatever. But the <abbr>HTTP</abbr> headers said that this data could be cached by public caching proxies. (Technically, the important thing is what the headers <em>don&#8217;t</em> say; the <code>Cache-Control</code> header doesn&#8217;t have the <code>private</code> keyword, so this data is cacheable by default.) Caching proxies are designed to have tons of storage space, probably far more than your local browser has allocated.
-
-<p>If your company or <abbr>ISP</abbr> maintain a caching proxy, the proxy may still have the image cached. When you visit <code>diveintomark.org</code> again, your browser will look in its local cache for the image, but it won&#8217;t find it, so it will make a network request to try to download it from the remote server. But if the caching proxy still has a copy of the image, it will intercept that request and serve the image from <em>its</em> cache. That means that your request will never reach the remote server; in fact, it will never leave your company&#8217;s network. That makes for a faster download (fewer network hops) and saves your company money (less data being downloaded from the outside world).
-
-<p><abbr>HTTP</abbr> caching only works when everybody does their part. On one side, servers need to send the correct headers in their response. On the other side, clients need to understand and respect those headers before they request the same data twice. The proxies in the middle are not a panacea; they can only be as smart as the servers and clients allow them to be.
-
-<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support caching, but <code>httplib2</code> does.
-
-<h3 id=last-modified>Last-Modified Checking</h3>
-
-<p>Some data never changes, while other data changes all the time. In between, there is a vast field of data that <em>might</em> have changed, but hasn&#8217;t. CNN.com&#8217;s feed is updated every few minutes, but my weblog&#8217;s feed may not change for days or weeks at a time. In the latter case, I don&#8217;t want to tell clients to cache my feed for weeks at a time, because then when I do actually post something, people may not read it for weeks (because they&#8217;re respecting my cache headers which said &#8220;don&#8217;t bother checking this feed for weeks&#8221;). On the other hand, I don&#8217;t want clients downloading my entire feed once an hour if it hasn&#8217;t changed!
-
-<aside><code>304: Not Modified</code> means &#8220;same shit, different day.&#8221;</aside>
-
-<p><abbr>HTTP</abbr> has a solution to this, too. When you request data for the first time, the server can send back a <code>Last-Modified</code> header. This is exactly what it sounds like: the date that the data was changed. That background image referenced from <code>diveintomark.org</code> included a <code>Last-Modified</code> header.
-
-<pre class=nd><code>HTTP/1.1 200 OK
-Date: Sun, 31 May 2009 17:14:04 GMT
-Server: Apache
-<mark>Last-Modified: Fri, 22 Aug 2008 04:28:16 GMT</mark>
-ETag: "3075-ddc8d800"
-Accept-Ranges: bytes
-Content-Length: 12405
-Cache-Control: max-age=31536000, public
-Expires: Mon, 31 May 2010 17:14:04 GMT
-Connection: close
-Content-Type: image/jpeg
-</code></pre>
-
-<p>When you request the same data a second (or third or fourth) time, you can send an <code>If-Modified-Since</code> header with your request, with the date you got back from the server last time. If the data has changed since then, then the server ignores the <code>If-Modified-Since</code> header and just gives you the new data with a <code>200</code> status code. But if the data <em>hasn&#8217;t</em> changed since then, the server sends back a special <abbr>HTTP</abbr> <code>304</code> status code, which means &#8220;this data hasn&#8217;t changed since the last time you asked for it.&#8221; You can test this on the command line, using <a href=http://curl.haxx.se/>curl</a>:
-
-<pre class='nd screen'>
-<samp class=p>you@localhost:~$ </samp><kbd>curl -I <mark>-H "If-Modified-Since: Fri, 22 Aug 2008 04:28:16 GMT"</mark> http://wearehugh.com/m.jpg</kbd>
-<samp>HTTP/1.1 304 Not Modified
-Date: Sun, 31 May 2009 18:04:39 GMT
-Server: Apache
-Connection: close
-ETag: "3075-ddc8d800"
-Expires: Mon, 31 May 2010 18:04:39 GMT
-Cache-Control: max-age=31536000, public</samp></pre>
-
-<p>Why is this an improvement?  Because when the server sends a <code>304</code>, <em>it doesn&#8217;t re-send the data</em>. All you get is the status code. Even after your cached copy has expired, last-modified checking ensures that you won&#8217;t download the same data twice if it hasn&#8217;t changed. (As an extra bonus, this <code>304</code> response also includes caching headers. Proxies will keep a copy of data even after it officially &#8220;expires,&#8221; in the hopes that the data hasn&#8217;t <em>really</em> changed and the next request responds with a <code>304</code> status code and updated cache information.)
-
-<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support last-modified date checking, but <code>httplib2</code> does.
-
-<h3 id=etags>ETag Checking</h3>
-
-<p>ETags are an alternate way to accomplish the same thing as the <a href=#last-modified>last-modified checking</a>. With Etags, the server sends a hash code in an <code>ETag</code> header along with the data you requested. (Exactly how this hash is determined is entirely up to the server. The only requirement is that it changes when the data changes.) That background image referenced from <code>diveintomark.org</code> had an <code>ETag</code> header.
-
-<pre class=nd><code>HTTP/1.1 200 OK
-Date: Sun, 31 May 2009 17:14:04 GMT
-Server: Apache
-Last-Modified: Fri, 22 Aug 2008 04:28:16 GMT
-<mark>ETag: "3075-ddc8d800"</mark>
-Accept-Ranges: bytes
-Content-Length: 12405
-Cache-Control: max-age=31536000, public
-Expires: Mon, 31 May 2010 17:14:04 GMT
-Connection: close
-Content-Type: image/jpeg
-</code></pre>
-
-<aside><code>ETag</code> means &#8220;there&#8217;s nothing new under the sun.&#8221;</aside>
-
-<p>The second time you request the same data, you include the ETag hash in an <code>If-None-Match</code> header of your request. If the data hasn&#8217;t changed, the server will send you back a <code>304</code> status code. As with the last-modified date checking, the server sends back <em>only</em> the <code>304</code> status code; it doesn&#8217;t send you the same data a second time. By including the ETag hash in your second request, you&#8217;re telling the server that there&#8217;s no need to re-send the same data if it still matches this hash, since <a href=#caching>you still have the data from the last time</a>.
-
-<p>Again with the <kbd>curl</kbd>:
-
-<pre class='nd screen'>
-<a><samp class=p>you@localhost:~$ </samp><kbd>curl -I <mark>-H "If-None-Match: \"3075-ddc8d800\""</mark> http://wearehugh.com/m.jpg</kbd>  <span class=u>&#x2460;</span></a>
-<samp>HTTP/1.1 304 Not Modified
-Date: Sun, 31 May 2009 18:04:39 GMT
-Server: Apache
-Connection: close
-ETag: "3075-ddc8d800"
-Expires: Mon, 31 May 2010 18:04:39 GMT
-Cache-Control: max-age=31536000, public</samp></pre>
-<ol>
-<li>ETags are commonly enclosed in quotation marks, but <em>the quotation marks are part of the value</em>. That means you need to send the quotation marks back to the server in the <code>If-None-Match</code> header.
-</ol>
-
-<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support ETags, but <code>httplib2</code> does.
-
-<h3 id=compression>Compression</h3>
-
-<p>When you talk about <abbr>HTTP</abbr> web services, you&#8217;re almost always talking about moving text-based data back and forth over the wire. Maybe it&#8217;s <abbr>XML</abbr>, maybe it&#8217;s <abbr>JSON</abbr>, maybe it&#8217;s just <a href=strings.html#boring-stuff title='there ain&#8217;t no such thing as plain text'>plain text</a>. Regardless of the format, text compresses well. The example feed in <a href=xml.html>the XML chapter</a> is 3070 bytes uncompressed, but would be 941 bytes after gzip compression. That&#8217;s just 30% of the original size!
-
-<p><abbr>HTTP</abbr> supports <a href=http://www.iana.org/assignments/http-parameters>several compression algorithms</a>. The two most common types are <a href=http://www.ietf.org/rfc/rfc1952.txt>gzip</a> and <a href=http://www.ietf.org/rfc/rfc1951.txt>deflate</a>. When you request a resource over <abbr>HTTP</abbr>, you can ask the server to send it in compressed format. You include an <code>Accept-encoding</code> header in your request that lists which compression algorithms you support. If the server supports any of the same algorithms, it will send you back compressed data (with a <code>Content-encoding</code> header that tells you which algorithm it used). Then it&#8217;s up to you to decompress the data.
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>Important tip for server-side developers: make sure that the compressed version of a resource has a different <a href=#etags>Etag</a> than the uncompressed version. Otherwise, caching proxies will get confused and may serve the compressed version to clients that can&#8217;t handle it. Read the discussion of <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=39727">Apache bug 39727</a> for more details on this subtle issue.
-</blockquote>
-
-<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support compression, but <code>httplib2</code> does.
-
-<h3 id=redirects>Redirects</h3>
-
-<p><a href=http://www.w3.org/Provider/Style/URI>Cool <abbr>URI</abbr>s don&#8217;t change</a>, but many <abbr>URI</abbr>s are seriously uncool. Web sites get reorganized, pages move to new addresses. Even web services can reorganize. A syndicated feed at <code>http://example.com/index.xml</code> might be moved to <code>http://example.com/xml/atom.xml</code>. Or an entire domain might move, as an organization expands and reorganizes; <code>http://www.example.com/index.xml</code> becomes <code>http://server-farm-1.example.com/index.xml</code>.
-
-<aside><code>Location</code> means &#8220;look over there!&#8221;</aside>
-
-<p>Every time you request any kind of resource from an <abbr>HTTP</abbr> server, the server includes a status code in its response. Status code <code>200</code> means &#8220;everything&#8217;s normal, here&#8217;s the page you asked for&#8221;. Status code <code>404</code> means &#8220;page not found&#8221;. (You&#8217;ve probably seen 404 errors while browsing the web.) Status codes in the 300&#8217;s indicate some form of redirection.
-
-<p><abbr>HTTP</abbr> has several different ways of signifying that a resource has moved. The two most common techiques are status codes <code>302</code> and <code>301</code>. Status code <code>302</code> is a <i>temporary redirect</i>; it means &#8220;oops, that got moved over here temporarily&#8221; (and then gives the temporary address in a <code>Location</code> header). Status code <code>301</code> is a <i>permanent redirect</i>; it means &#8220;oops, that got moved permanently&#8221; (and then gives the new address in a <code>Location</code> header). If you get a <code>302</code> status code and a new address, the <abbr>HTTP</abbr> specification says you should use the new address to get what you asked for, but the next time you want to access the same resource, you should retry the old address. But if you get a <code>301</code> status code and a new address, you&#8217;re supposed to use the new address from then on.
-
-<p>The <code>urllib.request</code> module automatically &#8220;follow&#8221; redirects when it receives the appropriate status code from the <abbr>HTTP</abbr> server, but it doesn&#8217;t tell you that it did so. You&#8217;ll end up getting data you asked for, but you&#8217;ll never know that the underlying library &#8220;helpfully&#8221; followed a redirect for you. So you&#8217;ll continue pounding away at the old address, and each time you&#8217;ll get redirected to the new address, and each time the <code>urllib.request</code> module will &#8220;helpfully&#8221; follow the redirect. In other words, it treats permanent redirects the same as temporary redirects. That means two round trips instead of one, which is bad for the server and bad for you.
-
-<p><code>httplib2</code> handles permanent redirects for you. Not only will it tell you that a permanent redirect occurred, it will keep track of them locally and automatically rewrite redirected <abbr>URL</abbr>s before requesting them.
-
-<p class=a>&#x2042;
-
-<h2 id=dont-try-this-at-home>How Not To Fetch Data Over HTTP</h2>
-
-<p>Let&#8217;s say you want to download a resource over <abbr>HTTP</abbr>, such as <a href=xml.html>an Atom feed</a>. Being a feed, you&#8217;re not just going to download it once; you&#8217;re going to download it over and over again. (Most feed readers will check for changes once an hour.) Let&#8217;s do it the quick-and-dirty way first, and then see how you can do better.
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>import urllib.request</kbd>
-<samp class=p>>>> </samp><kbd class=pp>a_url = 'http://diveintopython3.org/examples/feed.xml'</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>data = urllib.request.urlopen(a_url).read()</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>type(data)</kbd>                                   <span class=u>&#x2461;</span></a>
-<samp class=pp>&lt;class 'bytes'></samp>
-<samp class=p>>>> </samp><kbd class=pp>print(data)</kbd>
-<samp class=pp>&lt;?xml version='1.0' encoding='utf-8'?>
-&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
-  &lt;title>dive into mark&lt;/title>
-  &lt;subtitle>currently between addictions&lt;/subtitle>
-  &lt;id>tag:diveintomark.org,2001-07-29:/&lt;/id>
-  &lt;updated>2009-03-27T21:56:07Z&lt;/updated>
-  &lt;link rel='alternate' type='text/html' href='http://diveintomark.org/'/>
-  &hellip;
-</samp></pre>
-<ol>
-<li>Downloading anything over <abbr>HTTP</abbr> is incredibly easy in Python; in fact, it&#8217;s a one-liner. The <code>urllib.request</code> module has a handy <code>urlopen()</code> function that takes the address of the page you want, and returns a file-like object that you can just <code>read()</code> from to get the full contents of the page. It just can&#8217;t get any easier.
-<li>The <code>urlopen().read()</code> method always returns <a href=strings.html#byte-arrays>a <code>bytes</code> object, not a string</a>. Remember, bytes are bytes; characters are an abstraction. <abbr>HTTP</abbr> servers don&#8217;t deal in abstractions. If you request a resource, you get bytes. If you want it as a string, you&#8217;ll need to <a href=http://feedparser.org/docs/character-encoding.html>determine the character encoding</a> and explicitly convert it to a string.
-</ol>
-
-<p>So what&#8217;s wrong with this? For a quick one-off during testing or development, there&#8217;s nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis (<i>e.g.</i> requesting this feed once an hour), then you&#8217;re being inefficient, and you&#8217;re being rude.
-
-<p class=a>&#x2042;
-
-<h2 id=whats-on-the-wire>What&#8217;s On The Wire?</h2>
-
-<p>To see why this is inefficient and rude, let&#8217;s turn on the debugging features of Python&#8217;s <abbr>HTTP</abbr> library and see what&#8217;s being sent &#8220;on the wire&#8221; (<i>i.e.</i> over the network).
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>from http.client import HTTPConnection</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>HTTPConnection.debuglevel = 1</kbd>                                       <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>from urllib.request import urlopen</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>response = urlopen('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2461;</span></a>
-<samp><a>send: b'GET /examples/feed.xml HTTP/1.1                                 <span class=u>&#x2462;</span></a>
-<a>Host: diveintopython3.org                                               <span class=u>&#x2463;</span></a>
-<a>Accept-Encoding: identity                                               <span class=u>&#x2464;</span></a>
-<a>User-Agent: Python-urllib/3.1'                                          <span class=u>&#x2465;</span></a>
-Connection: close
-reply: 'HTTP/1.1 200 OK'
-&hellip;further debugging information omitted&hellip;</samp></pre>
-<ol>
-<li>As I mentioned at the beginning of the chapter, <code>urllib.request</code> relies on another standard Python library, <code>http.client</code>. Normally you don&#8217;t need to touch <code>http.client</code> directly. (The <code>urllib.request</code> module imports it automatically.) But we import it here so we can toggle the debugging flag on the <code>HTTPConnection</code> class that <code>urllib.request</code> uses to connect to the <abbr>HTTP</abbr> server.
-<li>Now that the debugging flag is set, information on the <abbr>HTTP</abbr> request and response is printed out in real time. As you can see, when you request the Atom feed, the <code>urllib.request</code> module sends five lines to the server.
-<li>The first line specifies the <abbr>HTTP</abbr> verb you&#8217;re using, and the path of the resource (minus the domain name).
-<li>The second line specifies the domain name from which we&#8217;re requesting this feed.
-<li>The third line specifies the compression algorithms that the client supports. As I mentioned earlier, <a href=#compression><code>urllib.request</code> does not support compression</a> by default.
-<li>The fourth line specifies the name of the library that is making the request. By default, this is <code>Python-urllib</code> plus a version number. Both <code>urllib.request</code> and <code>httplib2</code> support changing the user agent, simply by adding a <code>User-Agent</code> header to the request (which will override the default value).
-</ol>
-
-<aside>We&#8217;re downloading 3070 bytes when we could have just downloaded 941.</aside>
-
-<p>Now let&#8217;s look at what the server sent back in its response.
-
-<pre class=screen>
-# continued from previous example
-<a><samp class=p>>>> </samp><kbd class=pp>print(response.headers.as_string())</kbd>        <span class=u>&#x2460;</span></a>
-<samp><a>Date: Sun, 31 May 2009 19:23:06 GMT            <span class=u>&#x2461;</span></a>
-Server: Apache
-<a>Last-Modified: Sun, 31 May 2009 06:39:55 GMT   <span class=u>&#x2462;</span></a>
-<a>ETag: "bfe-93d9c4c0"                           <span class=u>&#x2463;</span></a>
-Accept-Ranges: bytes
-<a>Content-Length: 3070                           <span class=u>&#x2464;</span></a>
-<a>Cache-Control: max-age=86400                   <span class=u>&#x2465;</span></a>
-Expires: Mon, 01 Jun 2009 19:23:06 GMT
-Vary: Accept-Encoding
-Connection: close
-Content-Type: application/xml</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>data = response.read()</kbd>                     <span class=u>&#x2466;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>len(data)</kbd>
-<samp class=pp>3070</samp></pre>
-<ol>
-<li>The <var>response</var> returned from the <code>urllib.request.urlopen()</code> function contains all the <abbr>HTTP</abbr> headers the server sent back. It also contains methods to download the actual data; we&#8217;ll get to that in a minute.
-<li>The server tells you when it handled your request.
-<li>This response includes a <a href=#last-modified><code>Last-Modified</code></a> header.
-<li>This response includes an <a href=#etags><code>ETag</code></a> header.
-<li>The data is 3070 bytes long. Notice what <em>isn&#8217;t</em> here: a <code>Content-encoding</code> header. Your request stated that you only accept uncompressed data (<code>Accept-encoding: identity</code>), and sure enough, this response contains uncompressed data.
-<li>This response includes caching headers that state that this feed can be cached for up to 24 hours (86400 seconds).
-<li>And finally, download the actual data by calling <code>response.read()</code>. As you can tell from the <code>len()</code> function, this downloads all 3070 bytes at once.
-</ol>
-
-<p>As you can see, this code is already inefficient: it asked for (and received) uncompressed data. I know for a fact that this server supports <a href=#compression>gzip compression</a>, but <abbr>HTTP</abbr> compression is opt-in. We didn&#8217;t ask for it, so we didn&#8217;t get it. That means we&#8217;re downloading 3070 bytes when we could have just downloaded 941. Bad dog, no biscuit.
-
-<p>But wait, it gets worse! To see just how inefficient this code is, let&#8217;s request the same feed a second time.
-
-<pre class='nd screen'>
-# continued from the <a href=#whats-on-the-wire>previous example</a>
-<samp class=p>>>> </samp><kbd class=pp>response2 = urlopen('http://diveintopython3.org/examples/feed.xml')</kbd>
-<samp>send: b'GET /examples/feed.xml HTTP/1.1
-Host: diveintopython3.org
-Accept-Encoding: identity
-User-Agent: Python-urllib/3.1'
-Connection: close
-reply: 'HTTP/1.1 200 OK'
-&hellip;further debugging information omitted&hellip;</samp></pre>
-
-<p>Notice anything peculiar about this request? It hasn&#8217;t changed! It&#8217;s exactly the same as the first request. No sign of <a href=#last-modified><code>If-Modified-Since</code> headers</a>. No sign of <a href=#etags><code>If-None-Match</code> headers</a>. No respect for the caching headers. Still no compression.
-
-<p>And what happens when you do the same thing twice? You get the same response. Twice.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>print(response2.headers.as_string())</kbd>     <span class=u>&#x2460;</span></a>
-<samp>Date: Mon, 01 Jun 2009 03:58:00 GMT
-Server: Apache
-Last-Modified: Sun, 31 May 2009 22:51:11 GMT
-ETag: "bfe-255ef5c0"
-Accept-Ranges: bytes
-Content-Length: 3070
-Cache-Control: max-age=86400
-Expires: Tue, 02 Jun 2009 03:58:00 GMT
-Vary: Accept-Encoding
-Connection: close
-Content-Type: application/xml</samp>
-<samp class=p>>>> </samp><kbd class=pp>data2 = response2.read()</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>len(data2)</kbd>                               <span class=u>&#x2461;</span></a>
-<samp class=pp>3070</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>data2 == data</kbd>                            <span class=u>&#x2462;</span></a>
-<samp class=pp>True</samp></pre>
-<ol>
-<li>The server is still sending the same array of &#8220;smart&#8221; headers: <code>Cache-Control</code> and <code>Expires</code> to allow caching, <code>Last-Modified</code> and <code>ETag</code> to enable &#8220;not-modified&#8221; tracking. Even the <code>Vary: Accept-Encoding</code> header hints that the server would support compression, if only you would ask for it. But you didn&#8217;t.
-<li>Once again, fetching this data downloads the whole 3070 bytes&hellip;
-<li>&hellip;the exact same 3070 bytes you downloaded last time.
-</ol>
-
-<p><abbr>HTTP</abbr> is designed to work better than this. <code>urllib</code> speaks <abbr>HTTP</abbr> like I speak Spanish&nbsp;&mdash;&nbsp;enough to get by in a jam, but not enough to hold a conversation. <abbr>HTTP</abbr> is a conversation. It&#8217;s time to upgrade to a library that speaks <abbr>HTTP</abbr> fluently.
-
-<p class=a>&#x2042;
-
-<h2 id=introducing-httplib2>Introducing <code>httplib2</code></h2>
-
-<p>Before you can use <code>httplib2</code>, you&#8217;ll need to install it. Visit <a href=http://code.google.com/p/httplib2/><code>code.google.com/p/httplib2/</code></a> and download the latest version. <code>httplib2</code> is available for Python 2.x and Python 3.x; make sure you get the Python 3 version, named something like <code>httplib2-python3-0.5.0.zip</code>.
-
-<p>Unzip the archive, open a terminal window, and go to the newly created <code>httplib2</code> directory. On Windows, open the <code>Start</code> menu, select <code>Run...</code>, type <kbd>cmd.exe</kbd> and press <kbd>ENTER</kbd>.
-
-<pre class=screen>
-<samp class=p>c:\Users\pilgrim\Downloads> </samp><kbd><mark>dir</mark></kbd>
-<samp> Volume in drive C has no label.
- Volume Serial Number is DED5-B4F8
-
- Directory of c:\Users\pilgrim\Downloads
-
-07/28/2009  12:36 PM    &lt;DIR>          .
-07/28/2009  12:36 PM    &lt;DIR>          ..
-07/28/2009  12:36 PM    &lt;DIR>          httplib2-python3-0.5.0
-07/28/2009  12:33 PM            18,997 httplib2-python3-0.5.0.zip
-               1 File(s)         18,997 bytes
-               3 Dir(s)  61,496,684,544 bytes free</samp>
-
-<samp class=p>c:\Users\pilgrim\Downloads> </samp><kbd><mark>cd httplib2-python3-0.5.0</mark></kbd>
-<samp class=p>c:\Users\pilgrim\Downloads\httplib2-python3-0.5.0> </samp><kbd><mark>c:\python31\python.exe setup.py install</mark></kbd>
-<samp>running install
-running build
-running build_py
-running install_lib
-creating c:\python31\Lib\site-packages\httplib2
-copying build\lib\httplib2\iri2uri.py -> c:\python31\Lib\site-packages\httplib2
-copying build\lib\httplib2\__init__.py -> c:\python31\Lib\site-packages\httplib2
-byte-compiling c:\python31\Lib\site-packages\httplib2\iri2uri.py to iri2uri.pyc
-byte-compiling c:\python31\Lib\site-packages\httplib2\__init__.py to __init__.pyc
-running install_egg_info
-Writing c:\python31\Lib\site-packages\httplib2-python3_0.5.0-py3.1.egg-info</samp></pre>
-
-<p>On Mac OS X, run the <code>Terminal.app</code> application in your <code>/Applications/Utilities/</code> folder. On Linux, run the <code>Terminal</code> application, which is usually in your <code>Applications</code> menu under <code>Accessories</code> or <code>System</code>.
-
-<pre class=screen>
-<samp class=p>you@localhost:~/Desktop$ </samp><kbd><mark>unzip httplib2-python3-0.5.0.zip</mark></kbd>
-<samp>Archive:  httplib2-python3-0.5.0.zip
-  inflating: httplib2-python3-0.5.0/README
-  inflating: httplib2-python3-0.5.0/setup.py
-  inflating: httplib2-python3-0.5.0/PKG-INFO
-  inflating: httplib2-python3-0.5.0/httplib2/__init__.py
-  inflating: httplib2-python3-0.5.0/httplib2/iri2uri.py</samp>
-<samp class=p>you@localhost:~/Desktop$ </samp><kbd><mark>cd httplib2-python3-0.5.0/</mark></kbd>
-<samp class=p>you@localhost:~/Desktop/httplib2-python3-0.5.0$ </samp><kbd><mark>sudo python3 setup.py install</mark></kbd>
-<samp>running install
-running build
-running build_py
-creating build
-creating build/lib.linux-x86_64-3.1
-creating build/lib.linux-x86_64-3.1/httplib2
-copying httplib2/iri2uri.py -> build/lib.linux-x86_64-3.1/httplib2
-copying httplib2/__init__.py -> build/lib.linux-x86_64-3.1/httplib2
-running install_lib
-creating /usr/local/lib/python3.1/dist-packages/httplib2
-copying build/lib.linux-x86_64-3.1/httplib2/iri2uri.py -> /usr/local/lib/python3.1/dist-packages/httplib2
-copying build/lib.linux-x86_64-3.1/httplib2/__init__.py -> /usr/local/lib/python3.1/dist-packages/httplib2
-byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/iri2uri.py to iri2uri.pyc
-byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/__init__.py to __init__.pyc
-running install_egg_info
-Writing /usr/local/lib/python3.1/dist-packages/httplib2-python3_0.5.0.egg-info</samp></pre>
-
-<p>To use <code>httplib2</code>, create an instance of the <code>httplib2.Http</code> class.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>                                                    <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>response.status</kbd>                                                                <span class=u>&#x2462;</span></a>
-<samp class=pp>200</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>content[:52]</kbd>                                                                   <span class=u>&#x2463;</span></a>
-<samp class=pp>b"&lt;?xml version='1.0' encoding='utf-8'?>\r\n&lt;feed xmlns="</samp>
-<samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>
-<samp class=pp>3070</samp></pre>
-<ol>
-<li>The primary interface to <code>httplib2</code> is the <code>Http</code> object. For reasons you&#8217;ll see in the next section, you should always pass a directory name when you create an <code>Http</code> object. The directory does not need to exist; <code>httplib2</code> will create it if necessary.
-<li>Once you have an <code>Http</code> object, retrieving data is as simple as calling the <code>request()</code> method with the address of the data you want. This will issue an <abbr>HTTP</abbr> <code>GET</code> request for that <abbr>URL</abbr>. (Later in this chapter, you&#8217;ll see how to issue other <abbr>HTTP</abbr> requests, like <code>POST</code>.)
-<li>The <code>request()</code> method returns two values. The first is an <code>httplib2.Response</code> object, which contains all the <abbr>HTTP</abbr> headers the server returned. For example, a <code>status</code> code of <code>200</code> indicates that the request was successful.
-<li>The <var>content</var> variable contains the actual data that was returned by the <abbr>HTTP</abbr> server. The data is returned as <a href=strings.html#byte-arrays>a <code>bytes</code> object, not a string</a>. If you want it as a string, you&#8217;ll need to <a href=http://feedparser.org/docs/character-encoding.html>determine the character encoding</a> and convert it yourself.
-</ol>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>You probably only need one <code>httplib2.Http</code> object. There are valid reasons for creating more than one, but you should only do so if you know why you need them. &#8220;I need to request data from two different <abbr>URL</abbr>s&#8221; is not a valid reason. Re-use the <code>Http</code> object and just call the <code>request()</code> method twice.
-</blockquote>
-
-<h3 id=why-bytes>A Short Digression To Explain Why <code>httplib2</code> Returns Bytes Instead of Strings</h3>
-
-<p>Bytes. Strings. What a pain. Why can&#8217;t <code>httplib2</code> &#8220;just&#8221; do the conversion for you? Well, it&#8217;s complicated, because the rules for determining the character encoding are specific to what kind of resource you&#8217;re requesting. How could <code>httplib2</code> know what kind of resource you&#8217;re requesting? It&#8217;s usually listed in the <code>Content-Type</code> <abbr>HTTP</abbr> header, but that&#8217;s an optional feature of <abbr>HTTP</abbr> and not all <abbr>HTTP</abbr> servers include it. If that header is not included in the <abbr>HTTP</abbr> response, it&#8217;s left up to the client to guess. (This is commonly called &#8220;content sniffing,&#8221; and it&#8217;s never perfect.)
-
-<p>If you know what sort of resource you&#8217;re expecting (an <abbr>XML</abbr> document in this case), perhaps you could &#8220;just&#8221; pass the returned <code>bytes</code> object to the <a href=xml.html#xml-parse><code>xml.etree.ElementTree.parse()</code> function</a>. That&#8217;ll work as long as the <abbr>XML</abbr> document includes information on its own character encoding (as this one does), but that&#8217;s an optional feature and not all <abbr>XML</abbr> documents do that. If an <abbr>XML</abbr> document doesn&#8217;t include encoding information, the client is supposed to look at the enclosing transport&nbsp;&mdash;&nbsp;<i>i.e.</i> the <code>Content-Type</code> <abbr>HTTP</abbr> header, which can include a <code>charset</code> parameter.
-
-<p class=ss><a style=border:0 href=http://www.cafepress.com/feedparser><img src=http://feedparser.org/img/feedparser.jpg alt="[I support RFC 3023 t-shirt]" width=150 height=150></a>
-
-<p>But it&#8217;s worse than that. Now character encoding information can be in two places: within the <abbr>XML</abbr> document itself, and within the <code>Content-Type</code> <abbr>HTTP</abbr> header. If the information is in <em>both</em> places, which one wins? According to <a href=http://www.ietf.org/rfc/rfc3023.txt>RFC 3023</a> (I swear I am not making this up), if the media type given in the <code>Content-Type</code> <abbr>HTTP</abbr> header is <code>application/xml</code>, <code>application/xml-dtd</code>, <code>application/xml-external-parsed-entity</code>, or any one of the subtypes of <code>application/xml</code> such as <code>application/atom+xml</code> or <code>application/rss+xml</code> or even <code>application/rdf+xml</code>, then the encoding is
-
-<ol>
-<li>the encoding given in the <code>charset</code> parameter of the <code>Content-Type</code> <abbr>HTTP</abbr> header, or
-<li>the encoding given in the <code>encoding</code> attribute of the <abbr>XML</abbr> declaration within the document, or
-<li><abbr>UTF-8</abbr>
-</ol>
-
-<p>On the other hand, if the media type given in the <code>Content-Type</code> <abbr>HTTP</abbr> header is <code>text/xml</code>, <code>text/xml-external-parsed-entity</code>, or a subtype like <code>text/AnythingAtAll+xml</code>, then the encoding attribute of the <abbr>XML</abbr> declaration within the document is ignored completely, and the encoding is
-
-<ol>
-<li>the encoding given in the charset parameter of the <code>Content-Type</code> <abbr>HTTP</abbr> header, or
-<li><code>us-ascii</code>
-</ol>
-
-<p>And that&#8217;s just for <abbr>XML</abbr> documents. For <abbr>HTML</abbr> documents, web browsers have constructed such <a type=application/pdf href=http://www.adambarth.com/papers/2009/barth-caballero-song.pdf>byzantine rules for content-sniffing</a> [<abbr>PDF</abbr>] that <a href='http://www.google.com/search?q=barth+content-type+processing+model'>we&#8217;re still trying to figure them all out</a>.
-
-<p>&#8220;<a href=http://code.google.com/p/httplib2/source/checkout>Patches welcome</a>.&#8221;
-
-<h3 id=httplib2-caching>How <code>httplib2</code> Handles Caching</h3>
-
-<p>Remember in the previous section when I said you should always create an <code>httplib2.Http</code> object with a directory name? Caching is the reason.
-
-<pre class=screen>
-# continued from the <a href=#introducing-httplib2>previous example</a>
-<a><samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>response2.status</kbd>                                                                 <span class=u>&#x2461;</span></a>
-<samp class=pp>200</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>content2[:52]</kbd>                                                                    <span class=u>&#x2462;</span></a>
-<samp class=pp>b"&lt;?xml version='1.0' encoding='utf-8'?>\r\n&lt;feed xmlns="</samp>
-<samp class=p>>>> </samp><kbd class=pp>len(content2)</kbd>
-<samp class=pp>3070</samp></pre>
-<ol>
-<li>This shouldn&#8217;t be terribly surprising. It&#8217;s the same thing you did last time, except you&#8217;re putting the result into two new variables.
-<li>The <abbr>HTTP</abbr> <code>status</code> is once again <code>200</code>, just like last time.
-<li>The downloaded content is the same as last time, too.
-</ol>
-
-<p>So&hellip; who cares? Quit your Python interactive shell and relaunch it with a new session, and I&#8217;ll show you.
-
-<pre class=screen>
-# NOT continued from previous example!
-# Please exit out of the interactive shell
-# and launch a new one.
-<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>                                                        <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>                                                    <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2462;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>                                                                   <span class=u>&#x2463;</span></a>
-<samp class=pp>3070</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.status</kbd>                                                                <span class=u>&#x2464;</span></a>
-<samp class=pp>200</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.fromcache</kbd>                                                             <span class=u>&#x2465;</span></a>
-<samp class=pp>True</samp></pre>
-<ol>
-<li>Let&#8217;s turn on debugging and see <a href=#whats-on-the-wire>what&#8217;s on the wire</a>. This is the <code>httplib2</code> equivalent of turning on debugging in <code>http.client</code>. <code>httplib2</code> will print all the data being sent to the server and some key information being sent back.
-<li>Create an <code>httplib2.Http</code> object with the same directory name as before.
-<li>Request the same <abbr>URL</abbr> as before. <em>Nothing appears to happen.</em> More precisely, nothing gets sent to the server, and nothing gets returned from the server. There is absolutely no network activity whatsoever.
-<li>Yet we did &#8220;receive&#8221; some data&nbsp;&mdash;&nbsp;in fact, we received all of it.
-<li>We also &#8220;received&#8221; an <abbr>HTTP</abbr> status code indicating that the &#8220;request&#8221; was successful.
-<li>Here&#8217;s the rub: this &#8220;response&#8221; was generated from <code>httplib2</code>&#8217;s local cache. That directory name you passed in when you created the <code>httplib2.Http</code> object&nbsp;&mdash;&nbsp;that directory holds <code>httplib2</code>&#8217;s cache of all the operations it&#8217;s ever performed.
-</ol>
-
-<aside>What&#8217;s on the wire? Absolutely nothing.</aside>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>If you want to turn on <code>httplib2</code> debugging, you need to set a module-level constant (<code>httplib2.debuglevel</code>), then create a new <code>httplib2.Http</code> object. If you want to turn off debugging, you need to change the same module-level constant, then create a new <code>httplib2.Http</code> object.
-</blockquote>
-
-<p>You previously requested the data at this <abbr>URL</abbr>. That request was successful (<code>status: 200</code>). That response included not only the feed data, but also a set of <a href=#caching>caching headers</a> that told anyone who was listening that they could cache this resource for up to 24 hours (<code>Cache-Control: max-age=86400</code>, which is 24 hours measured in seconds). <code>httplib2</code> understand and respects those caching headers, and it stored the previous response in the <code>.cache</code> directory (which you passed in when you create the <code>Http</code> object). That cache hasn&#8217;t expired yet, so the second time you request the data at this <abbr>URL</abbr>, <code>httplib2</code> simply returns the cached result without ever hitting the network.
-
-<p>I say &#8220;simply,&#8221; but obviously there is a lot of complexity hidden behind that simplicity. <code>httplib2</code> handles <abbr>HTTP</abbr> caching <em>automatically</em> and <em>by default</em>. If for some reason you need to know whether a response came from the cache, you can check <code>response.fromcache</code>. Otherwise, it Just Works.
-
-<p id=bypass-the-cache>Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing <kbd>F5</kbd> refreshes the current page, but pressing <kbd>Ctrl+F5</kbd> bypasses the cache and re-requests the current page from the remote server. You might think &#8220;oh, I&#8217;ll just delete the data from my local cache, then request it again.&#8221; You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They&#8217;re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid.
-
-<p>Instead of manipulating your local cache and hoping for the best, you should use the features of <abbr>HTTP</abbr> to ensure that your request actually reaches the remote server.
-
-<pre class=screen>
-# continued from the previous example
-<samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml',</kbd>
-<a><samp class=p>... </samp><kbd class=pp>    headers={'cache-control':'no-cache'})</kbd>  <span class=u>&#x2460;</span></a>
-<samp><a>connect: (diveintopython3.org, 80)             <span class=u>&#x2461;</span></a>
-send: b'GET /examples/feed.xml HTTP/1.1
-Host: diveintopython3.org
-user-agent: Python-httplib2/$Rev: 259 $
-accept-encoding: deflate, gzip
-cache-control: no-cache'
-reply: 'HTTP/1.1 200 OK'
-&hellip;further debugging information omitted&hellip;</samp>
-<samp class=p>>>> </samp><kbd class=pp>response2.status</kbd>
-<samp class=pp>200</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response2.fromcache</kbd>                        <span class=u>&#x2462;</span></a>
-<samp class=pp>False</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>print(dict(response2.items()))</kbd>             <span class=u>&#x2463;</span></a>
-<samp class=pp>{'status': '200',
- 'content-length': '3070',
- 'content-location': 'http://diveintopython3.org/examples/feed.xml',
- 'accept-ranges': 'bytes',
- 'expires': 'Wed, 03 Jun 2009 00:40:26 GMT',
- 'vary': 'Accept-Encoding',
- 'server': 'Apache',
- 'last-modified': 'Sun, 31 May 2009 22:51:11 GMT',
- 'connection': 'close',
- '-content-encoding': 'gzip',
- 'etag': '"bfe-255ef5c0"',
- 'cache-control': 'max-age=86400',
- 'date': 'Tue, 02 Jun 2009 00:40:26 GMT',
- 'content-type': 'application/xml'}</samp></pre>
-<ol>
-<li><code>httplib2</code> allows you to add arbitrary <abbr>HTTP</abbr> headers to any outgoing request. In order to bypass <em>all</em> caches (not just your local disk cache, but also any caching proxies between you and the remote server), add a <code>no-cache</code> header in the <var>headers</var> dictionary.
-<li>Now you see <code>httplib2</code> initiating a network request. <code>httplib2</code> understands and respects caching headers <em>in both directions</em>&nbsp;&mdash;&nbsp;as part of the incoming response <em>and as part of the outgoing request</em>. It noticed that you added the <code>no-cache</code> header, so it bypassed its local cache altogether and then had no choice but to hit the network to request the data.
-<li>This response was <em>not</em> generated from your local cache. You knew that, of course, because you saw the debugging information on the outgoing request. But it&#8217;s nice to have that programmatically verified.
-<li>The request succeeded; you downloaded the entire feed again from the remote server. Of course, the server also sent back a full complement of <abbr>HTTP</abbr> headers along with the feed data. That includes caching headers, which <code>httplib2</code> uses to update its local cache, in the hopes of avoiding network access the <em>next</em> time you request this feed. Everything about <abbr>HTTP</abbr> caching is designed to maximize cache hits and minimize network access. Even though you bypassed the cache this time, the remote server would really appreciate it if you would cache the result for next time.
-</ol>
-
-<h3 id=httplib2-etags>How <code>httplib2</code> Handles <code>Last-Modified</code> and <code>ETag</code> Headers</h3>
-
-<p>The <code>Cache-Control</code> and <code>Expires</code> <a href=#caching>caching headers</a> are called <i>freshness indicators</i>. They tell caches in no uncertain terms that you can completely avoid all network access until the cache expires. And that&#8217;s exactly the behavior you saw <a href=#httplib2-caching>in the previous section</a>: given a freshness indicator, <code>httplib2</code> <em>does not generate a single byte of network activity</em> to serve up cached data (unless you explicitly <a href=#bypass-the-cache>bypass the cache</a>, of course).
-
-<p>But what about the case where the data <em>might</em> have changed, but hasn&#8217;t? <abbr>HTTP</abbr> defines <a href=#last-modified><code>Last-Modified</code></a> and <a href=#etags><code>Etag</code></a> headers for this purpose. These headers are called <i>validators</i>. If the local cache is no longer fresh, a client can send the validators with the next request to see if the data has actually changed. If the data hasn&#8217;t changed, the server sends back a <code>304</code> status code <em>and no data</em>. So there&#8217;s still a round-trip over the network, but you end up downloading fewer bytes.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
-<samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>
-<samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/')</kbd>  <span class=u>&#x2460;</span></a>
-<samp>connect: (diveintopython3.org, 80)
-send: b'GET / HTTP/1.1
-Host: diveintopython3.org
-accept-encoding: deflate, gzip
-user-agent: Python-httplib2/$Rev: 259 $'
-reply: 'HTTP/1.1 200 OK'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>print(dict(response.items()))</kbd>                                 <span class=u>&#x2461;</span></a>
-<samp class=pp>{'-content-encoding': 'gzip',
- 'accept-ranges': 'bytes',
- 'connection': 'close',
- 'content-length': '6657',
- 'content-location': 'http://diveintopython3.org/',
- 'content-type': 'text/html',
- 'date': 'Tue, 02 Jun 2009 03:26:54 GMT',
-<mark> 'etag': '"7f806d-1a01-9fb97900"',</mark>
-<mark> 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',</mark>
- 'server': 'Apache',
- 'status': '200',
- 'vary': 'Accept-Encoding,User-Agent'}</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>                                                  <span class=u>&#x2462;</span></a>
-<samp class=pp>6657</samp></pre>
-<ol>
-<li>Instead of the feed, this time we&#8217;re going to download the site&#8217;s home page, which is <abbr>HTML</abbr>. Since this is the first time you&#8217;ve ever requested this page, <code>httplib2</code> has little to work with, and it sends out a minimum of headers with the request.
-<li>The response contains a multitude of <abbr>HTTP</abbr> headers&hellip; but no caching information. However, it does include both an <code>ETag</code> and <code>Last-Modified</code> header.
-<li>At the time I constructed this example, this page was 6657 bytes. It&#8217;s probably changed since then, but don&#8217;t worry about it.
-</ol>
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/')</kbd>  <span class=u>&#x2460;</span></a>
-<samp>connect: (diveintopython3.org, 80)
-send: b'GET / HTTP/1.1
-Host: diveintopython3.org
-<a>if-none-match: "7f806d-1a01-9fb97900"                             <span class=u>&#x2461;</span></a>
-<a>if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT                  <span class=u>&#x2462;</span></a>
-accept-encoding: deflate, gzip
-user-agent: Python-httplib2/$Rev: 259 $'
-<a>reply: 'HTTP/1.1 304 Not Modified'                                <span class=u>&#x2463;</span></a></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.fromcache</kbd>                                            <span class=u>&#x2464;</span></a>
-<samp class=pp>True</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.status</kbd>                                               <span class=u>&#x2465;</span></a>
-<samp class=pp>200</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.dict['status']</kbd>                                       <span class=u>&#x2466;</span></a>
-<samp class=pp>'304'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>                                                  <span class=u>&#x2467;</span></a>
-<samp class=pp>6657</samp></pre>
-<ol>
-<li>You request the same page again, with the same <code>Http</code> object (and the same local cache).
-<li><code>httplib2</code> sends the <code>ETag</code> validator back to the server in the <code>If-None-Match</code> header.
-<li><code>httplib2</code> also sends the <code>Last-Modified</code> validator back to the server in the <code>If-Modified-Since</code> header.
-<li>The server looked at these validators, looked at the page you requested, and determined that the page has not changed since you last requested it, so it sends back a <code>304</code> status code <em>and no data</em>.
-<li>Back on the client, <code>httplib2</code> notices the <code>304</code> status code and loads the content of the page from its cache.
-<li>This might be a bit confusing. There are really <em>two</em> status codes&nbsp;&mdash;&nbsp;<code>304</code> (returned from the server this time, which caused <code>httplib2</code> to look in its cache), and <code>200</code> (returned from the server <em>last time</em>, and stored in <code>httplib2</code>&#8217;s cache along with the page data). <code>response.status</code> returns the status from the cache.
-<li>If you want the raw status code returned from the server, you can get that by looking in <code>response.dict</code>, which is a dictionary of the actual headers returned from the server.
-<li>However, you still get the data in the <var>content</var> variable. Generally, you don&#8217;t need to know why a response was served from the cache. (You may not even care that it was served from the cache at all, and that&#8217;s fine too. <code>httplib2</code> is smart enough to let you act dumb.) By the time the <code>request()</code> method returns to the caller, <code>httplib2</code> has already updated its cache and returned the data to you.
-</ol>
-
-<h3 id=httplib2-compression>How <code>http2lib</code> Handles Compression</h3>
-
-<aside>&#8220;We have both kinds of music, country AND western.&#8221;</aside>
-
-<p><abbr>HTTP</abbr> supports <a href=#compression>several types of compression</a>; the two most common types are gzip and deflate. <code>httplib2</code> supports both of these.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/')</kbd>
-<samp>connect: (diveintopython3.org, 80)
-send: b'GET / HTTP/1.1
-Host: diveintopython3.org
-<a>accept-encoding: deflate, gzip                          <span class=u>&#x2460;</span></a>
-user-agent: Python-httplib2/$Rev: 259 $'
-reply: 'HTTP/1.1 200 OK'</samp>
-<samp class=p>>>> </samp><kbd class=pp>print(dict(response.items()))</kbd>
-<samp class=pp><a>{'-content-encoding': 'gzip',                           <span class=u>&#x2461;</span></a>
- 'accept-ranges': 'bytes',
- 'connection': 'close',
- 'content-length': '6657',
- 'content-location': 'http://diveintopython3.org/',
- 'content-type': 'text/html',
- 'date': 'Tue, 02 Jun 2009 03:26:54 GMT',
- 'etag': '"7f806d-1a01-9fb97900"',
- 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',
- 'server': 'Apache',
- 'status': '304',
- 'vary': 'Accept-Encoding,User-Agent'}</samp></pre>
-<ol>
-<li>Every time <code>httplib2</code> sends a request, it includes an <code>Accept-Encoding</code> header to tell the server that it can handle either <code>deflate</code> or <code>gzip</code> compression.
-<li>In this case, the server has responded with a gzip-compressed payload. By the time the <code>request()</code> method returns, <code>httplib2</code> has already decompressed the body of the response and placed it in the <var>content</var> variable. If you&#8217;re curious about whether or not the response was compressed, you can check <var>response['-content-encoding']</var>; otherwise, don&#8217;t worry about it.
-</ol>
-
-<h3 id=httplib2-redirects>How <code>httplib2</code> Handles Redirects</h3>
-
-<p><abbr>HTTP</abbr> defines <a href=#redirects>two kinds of redirects</a>: temporary and permanent. There&#8217;s nothing special to do with temporary redirects except follow them, which <code>httplib2</code> does automatically.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
-<samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>
-<samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed-302.xml')</kbd>  <span class=u>&#x2460;</span></a>
-<samp>connect: (diveintopython3.org, 80)
-<a>send: b'GET /examples/feed-302.xml HTTP/1.1                                            <span class=u>&#x2461;</span></a>
-Host: diveintopython3.org
-accept-encoding: deflate, gzip
-user-agent: Python-httplib2/$Rev: 259 $'
-<a>reply: 'HTTP/1.1 302 Found'                                                            <span class=u>&#x2462;</span></a>
-<a>send: b'GET /examples/feed.xml HTTP/1.1                                                <span class=u>&#x2463;</span></a>
-Host: diveintopython3.org
-accept-encoding: deflate, gzip
-user-agent: Python-httplib2/$Rev: 259 $'
-reply: 'HTTP/1.1 200 OK'</samp></pre>
-<ol>
-<li>There is no feed at this <abbr>URL</abbr>. I&#8217;ve set up my server to issue a temporary redirect to the correct address.
-<li>There&#8217;s the request.
-<li>And there&#8217;s the response: <code>302 Found</code>. Not shown here, this response also includes a <code>Location</code> header that points to the real <abbr>URL</abbr>.
-<li><code>httplib2</code> immediately turns around and &#8220;follows&#8221; the redirect by issuing another request for the <abbr>URL</abbr> given in the <code>Location</code> header: <code>http://diveintopython3.org/examples/feed.xml</code>
-</ol>
-
-<p>&#8220;Following&#8221; a redirect is nothing more than this example shows. <code>httplib2</code> sends a request for the <abbr>URL</abbr> you asked for. The server comes back with a response that says &#8220;No no, look over there instead.&#8221; <code>httplib2</code> sends another request for the new <abbr>URL</abbr>.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>response</kbd>                                                          <span class=u>&#x2460;</span></a>
-<samp class=pp>{'status': '200',
- 'content-length': '3070',
-<a> 'content-location': 'http://diveintopython3.org/examples/feed.xml',  <span class=u>&#x2461;</span></a>
- 'accept-ranges': 'bytes',
- 'expires': 'Thu, 04 Jun 2009 02:21:41 GMT',
- 'vary': 'Accept-Encoding',
- 'server': 'Apache',
- 'last-modified': 'Wed, 03 Jun 2009 02:20:15 GMT',
- 'connection': 'close',
-<a> '-content-encoding': 'gzip',                                         <span class=u>&#x2462;</span></a>
- 'etag': '"bfe-4cbbf5c0"',
-<a> 'cache-control': 'max-age=86400',                                    <span class=u>&#x2463;</span></a>
- 'date': 'Wed, 03 Jun 2009 02:21:41 GMT',
- 'content-type': 'application/xml'}</samp></pre>
-<ol>
-<li>The <var>response</var> you get back from this single call to the <code>request()</code> method is the response from the final <abbr>URL</abbr>.
-<li><code>httplib2</code> adds the final <abbr>URL</abbr> to the <var>response</var> dictionary, as <code>content-location</code>. This is not a header that came from the server; it&#8217;s specific to <code>httplib2</code>.
-<li>Apropos of nothing, this feed is <a href=#httplib2-compression>compressed</a>.
-<li>And cacheable. (This is important, as you&#8217;ll see in a minute.)
-</ol>
-
-<p>The <var>response</var> you get back gives you information about the <em>final</em> <abbr>URL</abbr>. What if you want more information about the intermediate <abbr>URL</abbr>s, the ones that eventually redirected to the final <abbr>URL</abbr>? <code>httplib2</code> lets you do that, too.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>response.previous</kbd>                                                     <span class=u>&#x2460;</span></a>
-<samp class=pp>{'status': '302',
- 'content-length': '228',
- 'content-location': 'http://diveintopython3.org/examples/feed-302.xml',
- 'expires': 'Thu, 04 Jun 2009 02:21:41 GMT',
- 'server': 'Apache',
- 'connection': 'close',
- 'location': 'http://diveintopython3.org/examples/feed.xml',
- 'cache-control': 'max-age=86400',
- 'date': 'Wed, 03 Jun 2009 02:21:41 GMT',
- 'content-type': 'text/html; charset=iso-8859-1'}</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>type(response)</kbd>                                                        <span class=u>&#x2461;</span></a>
-<samp class=pp>&lt;class 'httplib2.Response'></samp>
-<samp class=p>>>> </samp><kbd class=pp>type(response.previous)</kbd>
-<samp class=pp>&lt;class 'httplib2.Response'></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.previous.previous</kbd>                                            <span class=u>&#x2462;</span></a>
-<samp class=p>>>></samp></pre>
-<ol>
-<li>The <var>response.previous</var> attribute holds a reference to the previous response object that <code>httplib2</code> followed to get to the current response object.
-<li>Both <var>response</var> and <var>response.previous</var> are <code>httplib2.Response</code> objects.
-<li>That means you can check <var>response.previous.previous</var> to follow the redirect chain backwards even further. (Scenario: one <abbr>URL</abbr> redirects to a second <abbr>URL</abbr> which redirects to a third <abbr>URL</abbr>. It could happen!) In this case, we&#8217;ve already reached the beginning of the redirect chain, so the attribute is <code>None</code>.
-</ol>
-
-<p>What happens if you request the same <abbr>URL</abbr> again?
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed-302.xml')</kbd>  <span class=u>&#x2460;</span></a>
-<samp>connect: (diveintopython3.org, 80)
-<a>send: b'GET /examples/feed-302.xml HTTP/1.1                                              <span class=u>&#x2461;</span></a>
-Host: diveintopython3.org
-accept-encoding: deflate, gzip
-user-agent: Python-httplib2/$Rev: 259 $'
-<a>reply: 'HTTP/1.1 302 Found'                                                              <span class=u>&#x2462;</span></a></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>content2 == content</kbd>                                                                  <span class=u>&#x2463;</span></a>
-<samp class=pp>True</samp></pre>
-<ol>
-<li>Same <abbr>URL</abbr>, same <code>httplib2.Http</code> object (and therefore the same cache).
-<li>The <code>302</code> response was not cached, so <code>httplib2</code> sends another request for the same <abbr>URL</abbr>.
-<li>Once again, the server responds with a <code>302</code>. But notice what <em>didn&#8217;t</em> happen: there wasn&#8217;t ever a second request for the final <abbr>URL</abbr>, <code>http://diveintopython3.org/examples/feed.xml</code>. That response was cached (remember the <code>Cache-Control</code> header that you saw in the previous example). Once <code>httplib2</code> received the <code>302 Found</code> code, <em>it checked its cache before issuing another request</em>. The cache contained a fresh copy of <code>http://diveintopython3.org/examples/feed.xml</code>, so there was no need to re-request it.
-<li>By the time the <code>request()</code> method returns, it has read the feed data from the cache and returned it. Of course, it&#8217;s the same as the data you received last time.
-</ol>
-
-<p>In other words, you don&#8217;t have to do anything special for temporary redirects. <code>httplib2</code> will follow them automatically, and the fact that one <abbr>URL</abbr> redirects to another has no bearing on <code>httplib2</code>&#8217;s support for compression, caching, <code>ETags</code>, or any of the other features of <abbr>HTTP</abbr>.
-
-<p>Permanent redirects are just as simple.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed-301.xml')</kbd>  <span class=u>&#x2460;</span></a>
-<samp>connect: (diveintopython3.org, 80)
-send: b'GET /examples/feed-301.xml HTTP/1.1
-Host: diveintopython3.org
-accept-encoding: deflate, gzip
-user-agent: Python-httplib2/$Rev: 259 $'
-<a>reply: 'HTTP/1.1 301 Moved Permanently'                                                <span class=u>&#x2461;</span></a></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>response.fromcache</kbd>                                                                 <span class=u>&#x2462;</span></a>
-<samp class=pp>True</samp></pre>
-<ol>
-<li>Once again, this <abbr>URL</abbr> doesn&#8217;t really exist. I&#8217;ve set up my server to issue a permanent redirect to <code>http://diveintopython3.org/examples/feed.xml</code>.
-<li>And here it is: status code <code>301</code>. But again, notice what <em>didn&#8217;t</em> happen: there was no request to the redirect <abbr>URL</abbr>. Why not? Because it&#8217;s already cached locally.
-<li><code>httplib2</code> &#8220;followed&#8221; the redirect right into its cache.
-</ol>
-
-<p>But wait! There&#8217;s more!
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed-301.xml')</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>response2.fromcache</kbd>                                                                  <span class=u>&#x2461;</span></a>
-<samp class=pp>True</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>content2 == content</kbd>                                                                  <span class=u>&#x2462;</span></a>
-<samp class=pp>True</samp>
-</pre>
-<ol>
-<li>Here&#8217;s the difference between temporary and permanent redirects: once <code>httplib2</code> follows a permanent redirect, all further requests for that <abbr>URL</abbr> will transparently be rewritten to the target <abbr>URL</abbr> <em>without hitting the network for the original <abbr>URL</abbr></em>. Remember, debugging is still turned on, yet there is no output of network activity whatsoever.
-<li>Yep, this response was retrieved from the local cache.
-<li>Yep, you got the entire feed (from the cache).
-</ol>
-
-<p><abbr>HTTP</abbr>. It works.
-
-<p class=a>&#x2042;
-
-<h2 id=beyond-get>Beyond HTTP GET</h2>
-
-<p><abbr>HTTP</abbr> web services are not limited to <code>GET</code> requests. What if you want to create something new? Whenever you post a comment on a discussion forum, update your weblog, publish your status on a microblogging service like <a href=http://twitter.com/>Twitter</a> or <a href=http://identi.ca/>Identi.ca</a>, you&#8217;re probably already using <abbr>HTTP</abbr> <code>POST</code>.
-
-<p>Both Twitter and Identi.ca both offer a simple <abbr>HTTP</abbr>-based <abbr>API</abbr> for publishing and updating your status in 140 characters or less. Let&#8217;s look at <a href=http://laconi.ca/trac/wiki/TwitterCompatibleAPI>Identi.ca&#8217;s <abbr>API</abbr> documentation</a> for updating your status:
-
-<blockquote class=pf>
-<p><b>Identi.ca <abbr>REST</abbr> <abbr>API</abbr> Method: statuses/update</b><br>
-Updates the authenticating user&#8217;s status.  Requires the <code>status</code> parameter specified below.  Request must be a <code>POST</code>.
-
-<dl>
-<dt><abbr>URL</abbr>
-<dd><code>https://identi.ca/api/statuses/update.<i><var>format</var></i></code>
-<dt>Formats
-<dd><code>xml</code>, <code>json</code>, <code>rss</code>, <code>atom</code>
-<dt><abbr>HTTP</abbr> Method(s)
-<dd><code>POST</code>
-<dt>Requires Authentication
-<dd>true
-<dt>Parameters
-<dd><code>status</code>. Required. The text of your status update. <abbr>URL</abbr>-encode as necessary.
-</dl>
-</blockquote>
-
-<p>How does this work? To publish a new message on Identi.ca, you need to issue an <abbr>HTTP</abbr> <code>POST</code> request to <code>http://identi.ca/api/statuses/update.<i>format</i></code>. (The <var>format</var> bit is not part of the <abbr>URL</abbr>; you replace it with the data format you want the server to return in response to your request. So if you want a response in <abbr>XML</abbr>, you would post the request to <code>https://identi.ca/api/statuses/update.xml</code>.) The request needs to include a parameter called <code>status</code>, which contains the text of your status update. And the request needs to be authenticated.
-
-<p>Authenticated? Sure. To update your status on Identi.ca, you need to prove who you are. Identi.ca is not a wiki; only you can update your own status. Identi.ca uses <a href=http://en.wikipedia.org/wiki/Basic_access_authentication><abbr>HTTP</abbr> Basic Authentication</a> (<i>a.k.a.</i> <a href=http://www.ietf.org/rfc/rfc2617.txt>RFC 2617</a>) over <abbr>SSL</abbr> to provide secure but easy-to-use authentication. <code>httplib2</code> supports both <abbr>SSL</abbr> and <abbr>HTTP</abbr> Basic Authentication, so this part is easy.
-
-<p>A <code>POST</code> request is different from a <code>GET</code> request, because it includes a <i>payload</i>. The payload is the data you want to send to the server. The one piece of data that this <abbr>API</abbr> method <em>requires</em> is <code>status</code>, and it should be <i><abbr>URL</abbr>-encoded</i>. This is a very simple serialization format that takes a set of key-value pairs (<i>i.e.</i> a <a href=native-datatypes.html#dictionaries>dictionary</a>) and transforms it into a string.
-
-<pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>from urllib.parse import urlencode</kbd>              <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>data = {'status': 'Test update from Python 3'}</kbd>  <span class=u>&#x2461;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>urlencode(data)</kbd>                                 <span class=u>&#x2462;</span></a>
-<samp>'status=Test+update+from+Python+3'</samp></pre>
-<ol>
-<li>Python comes with a utility function to <abbr>URL</abbr>-encode a dictionary: <code>urllib.parse.urlencode()</code>.
-<li>This is the sort of dictionary that the Identi.ca <abbr>API</abbr> is looking for. It contains one key, <code>status</code>, whose value is the text of a single status update.
-<li>This is what the <abbr>URL</abbr>-encoded string looks like. This is the <i>payload</i> that will be sent &#8220;on the wire&#8221; to the Identi.ca <abbr>API</abbr> server in your <abbr>HTTP</abbr> <code>POST</code> request.
-</ol>
-
-<p>
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>from urllib.parse import urlencode</kbd>
-<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
-<samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>
-<samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>
-<samp class=p>>>> </samp><kbd class=pp>data = {'status': 'Test update from Python 3'}</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>h.add_credentials('diveintomark', '<var>MY_SECRET_PASSWORD</var>', 'identi.ca')</kbd>    <span class=u>&#x2460;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>resp, content = h.request('https://identi.ca/api/statuses/update.xml',</kbd>
-<a><samp class=p>... </samp><kbd class=pp>    'POST',</kbd>                                                             <span class=u>&#x2461;</span></a>
-<a><samp class=p>... </samp><kbd class=pp>    urlencode(data),</kbd>                                                    <span class=u>&#x2462;</span></a>
-<a><samp class=p>... </samp><kbd class=pp>    headers={'Content-Type': 'application/x-www-form-urlencoded'})</kbd>      <span class=u>&#x2463;</span></a></pre>
-<ol>
-<li>This is how <code>httplib2</code> handles authentication. Store your username and password with the <code>add_credentials()</code> method. When <code>httplib2</code> tries to issue the request, the server will respond with a <code>401 Unauthorized</code> status code, and it will list which authentication methods it supports (in the <code>WWW-Authenticate</code> header). <code>httplib2</code> will automatically construct an <code>Authorization</code> header and re-request the <abbr>URL</abbr>.
-<li>The second parameter is the type of <abbr>HTTP</abbr> request, in this case <code>POST</code>.
-<li>The third parameter is the <i>payload</i> to send to the server. We&#8217;re sending the <abbr>URL</abbr>-encoded dictionary with a status message.
-<li>Finally, we need to tell the server that the payload is <abbr>URL</abbr>-encoded data.
-</ol>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>The third parameter to the <code>add_credentials()</code> method is the domain in which the credentials are valid. You should always specify this! If you leave out the domain and later reuse the <code>httplib2.Http</code> object on a different authenticated site, <code>httplib2</code> might end up leaking one site&#8217;s username and password to the other site.
-</blockquote>
-
-<p>This is what goes over the wire:
-
-<pre class=screen>
-# continued from the previous example
-<samp>send: b'POST /api/statuses/update.xml HTTP/1.1
-Host: identi.ca
-Accept-Encoding: identity
-Content-Length: 32
-content-type: application/x-www-form-urlencoded
-user-agent: Python-httplib2/$Rev: 259 $
-
-status=Test+update+from+Python+3'
-<a>reply: 'HTTP/1.1 401 Unauthorized'                        <span class=u>&#x2460;</span></a>
-<a>send: b'POST /api/statuses/update.xml HTTP/1.1            <span class=u>&#x2461;</span></a>
-Host: identi.ca
-Accept-Encoding: identity
-Content-Length: 32
-content-type: application/x-www-form-urlencoded
-<a>authorization: Basic SECRET_HASH_CONSTRUCTED_BY_HTTPLIB2  <span class=u>&#x2462;</span></a>
-user-agent: Python-httplib2/$Rev: 259 $
-
-status=Test+update+from+Python+3'
-<a>reply: 'HTTP/1.1 200 OK'                                  <span class=u>&#x2463;</span></a></samp></pre>
-<ol>
-<li>After the first request, the server responds with a <code>401 Unauthorized</code> status code. <code>httplib2</code> will never send authentication headers unless the server explicitly asks for them. This is how the server asks for them.
-<li><code>httplib2</code> immediately turns around and requests the same <abbr>URL</abbr> a second time.
-<li>This time, it includes the username and password that you added with the <code>add_credentials()</code> method.
-<li>It worked!
-</ol>
-
-<p>What does the server send back after a successful request? That depends entirely on the web service <abbr>API</abbr>. In some protocols (like the <a href=http://www.ietf.org/rfc/rfc5023.txt>Atom Publishing Protocol</a>), the server sends back a <code>201 Created</code> status code and the location of the newly created resource in the <code>Location</code> header. Identi.ca sends back a <code>200 OK</code> and an <abbr>XML</abbr> document containing information about the newly created resource.
-
-<pre class=screen>
-# continued from the previous example
-<a><samp class=p>>>> </samp><kbd class=pp>print(content.decode('utf-8'))</kbd>                             <span class=u>&#x2460;</span></a>
-<samp class=pp>&lt;?xml version="1.0" encoding="UTF-8"?>
-&lt;status>
-<a> &lt;text>Test update from Python 3&lt;/text>                        <span class=u>&#x2461;</span></a>
- &lt;truncated>false&lt;/truncated>
- &lt;created_at>Wed Jun 10 03:53:46 +0000 2009&lt;/created_at>
- &lt;in_reply_to_status_id>&lt;/in_reply_to_status_id>
- &lt;source>api&lt;/source>
-<a> &lt;id>5131472&lt;/id>                                              <span class=u>&#x2462;</span></a>
- &lt;in_reply_to_user_id>&lt;/in_reply_to_user_id>
- &lt;in_reply_to_screen_name>&lt;/in_reply_to_screen_name>
- &lt;favorited>false&lt;/favorited>
- &lt;user>
-  &lt;id>3212&lt;/id>
-  &lt;name>Mark Pilgrim&lt;/name>
-  &lt;screen_name>diveintomark&lt;/screen_name>
-  &lt;location>27502, US&lt;/location>
-  &lt;description>tech writer, husband, father&lt;/description>
-  &lt;profile_image_url>http://avatar.identi.ca/3212-48-20081216000626.png&lt;/profile_image_url>
-  &lt;url>http://diveintomark.org/&lt;/url>
-  &lt;protected>false&lt;/protected>
-  &lt;followers_count>329&lt;/followers_count>
-  &lt;profile_background_color>&lt;/profile_background_color>
-  &lt;profile_text_color>&lt;/profile_text_color>
-  &lt;profile_link_color>&lt;/profile_link_color>
-  &lt;profile_sidebar_fill_color>&lt;/profile_sidebar_fill_color>
-  &lt;profile_sidebar_border_color>&lt;/profile_sidebar_border_color>
-  &lt;friends_count>2&lt;/friends_count>
-  &lt;created_at>Wed Jul 02 22:03:58 +0000 2008&lt;/created_at>
-  &lt;favourites_count>30768&lt;/favourites_count>
-  &lt;utc_offset>0&lt;/utc_offset>
-  &lt;time_zone>UTC&lt;/time_zone>
-  &lt;profile_background_image_url>&lt;/profile_background_image_url>
-  &lt;profile_background_tile>false&lt;/profile_background_tile>
-  &lt;statuses_count>122&lt;/statuses_count>
-  &lt;following>false&lt;/following>
-  &lt;notifications>false&lt;/notifications>
-&lt;/user>
-&lt;/status></samp></pre>
-<ol>
-<li>Remember, the data returned by <code>httplib2</code> is always <a href=strings.html#byte-arrays>bytes</a>, not a string. To convert it to a string, you need to decode it using the proper character encoding. Identi.ca&#8217;s <abbr>API</abbr> always returns results in <abbr>UTF-8</abbr>, so that part is easy.
-<li>There&#8217;s the text of the status message we just published.
-<li>There&#8217;s the unique identifier for the new status message. Identi.ca uses this to construct a <abbr>URL</abbr> for viewing the message on the web.
-</ol>
-
-<p>And here it is:
-
-<p class=c><img class=fr src=i/identica-screenshot.png alt="screenshot showing published status message on Identi.ca" width=740 height=449>
-
-<p class=a>&#x2042;
-
-<h2 id=beyond-post>Beyond HTTP POST</h2>
-
-<p><abbr>HTTP</abbr> isn&#8217;t limited to <code>GET</code> and <code>POST</code>. Those are certainly the most common types of requests, especially in web browsers. But web service <abbr>API</abbr>s can go beyond <code>GET</code> and <code>POST</code>, and <code>httplib2</code> is ready.
-
-<pre class=screen>
-# continued from the previous example
-<samp class=p>>>> </samp><kbd class=pp>from xml.etree import ElementTree as etree</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>tree = etree.fromstring(content)</kbd>                                          <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>status_id = tree.findtext('id')</kbd>                                           <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>status_id</kbd>
-<samp class=pp>'5131472'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>url = 'https://identi.ca/api/statuses/destroy/{0}.xml'.format(status_id)</kbd>  <span class=u>&#x2462;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>resp, deleted_content = h.request(url, 'DELETE')</kbd>                          <span class=u>&#x2463;</span></a></pre>
-<ol>
-<li>The server returned <abbr>XML</abbr>, right? You know <a href=xml.html#xml-parse>how to parse <abbr>XML</abbr></a>.
-<li>The <code>findtext()</code> method finds the first instance of the given expression and extracts its text content. In this case, we&#8217;re just looking for an <code>&lt;id></code> element.
-<li>Based on the text content of the <code>&lt;id></code> element, we can construct a <abbr>URL</abbr> to delete the status message we just published.
-<li>To delete a message, you simply issue an <abbr>HTTP</abbr> <code>DELETE</code> request to that <abbr>URL</abbr>.
-</ol>
-
-<p>This is what goes over the wire:
-
-<pre class=screen>
-<samp><a>send: b'DELETE /api/statuses/destroy/5131472.xml HTTP/1.1      <span class=u>&#x2460;</span></a>
-Host: identi.ca
-Accept-Encoding: identity
-user-agent: Python-httplib2/$Rev: 259 $
-
-'
-<a>reply: 'HTTP/1.1 401 Unauthorized'                             <span class=u>&#x2461;</span></a>
-<a>send: b'DELETE /api/statuses/destroy/5131472.xml HTTP/1.1      <span class=u>&#x2462;</span></a>
-Host: identi.ca
-Accept-Encoding: identity
-<a>authorization: Basic SECRET_HASH_CONSTRUCTED_BY_HTTPLIB2       <span class=u>&#x2463;</span></a>
-user-agent: Python-httplib2/$Rev: 259 $
-
-'
-<a>reply: 'HTTP/1.1 200 OK'                                       <span class=u>&#x2464;</span></a></samp>
-<samp class=p>>>> </samp><kbd class=pp>resp.status</kbd>
-<samp class=pp>200</samp></pre>
-<ol>
-<li>&#8220;Delete this status message.&#8221;
-<li>&#8220;I&#8217;m sorry, Dave, I&#8217;m afraid I can&#8217;t do that.&#8221;
-<li>&#8220;Unauthorized<span class=u title='interrobang!'>&#8253;</span> Hmmph. Delete this status message, <em>please</em>&hellip;
-<li>&hellip;and here&#8217;s my username and password.&#8221;
-<li>&#8220;Consider it done!&#8221;
-</ol>
-
-<p>And just like that, poof, it&#8217;s gone.
-
-<p class=c><img class=fr src=i/identica-deleted.png alt="screenshot showing deleted message on Identi.ca" width=740 height=449>
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-
-<p><code>httplib2</code>:
-
-<ul>
-<li><a href=http://code.google.com/p/httplib2/><code>httplib2</code> project page</a>
-<li><a href=http://code.google.com/p/httplib2/wiki/ExamplesPython3>More <code>httplib2</code> code examples</a>
-<li><a href=http://www.xml.com/pub/a/2006/02/01/doing-http-caching-right-introducing-httplib2.html>Doing <abbr>HTTP</abbr> Caching Right: Introducing <code>httplib2</code></a>
-<li><a href=http://www.xml.com/pub/a/2006/03/29/httplib2-http-persistence-and-authentication.html><code>httplib2</code>: <abbr>HTTP</abbr> Persistence and Authentication</a>
-</ul>
-
-<p><abbr>HTTP</abbr> caching:
-
-<ul>
-<li><a href=http://www.mnot.net/cache_docs/><abbr>HTTP</abbr> Caching Tutorial</a> by Mark Nottingham
-<li><a href=http://code.google.com/p/doctype/wiki/ArticleHttpCaching>How to control caching with <abbr>HTTP</abbr> headers</a> on Google Doctype
-</ul>
-
-<p><abbr>RFC</abbr>s:
-
-<ul>
-<li><a href=http://www.ietf.org/rfc/rfc2616.txt>RFC 2616: <abbr>HTTP</abbr></a>
-<li><a href=http://www.ietf.org/rfc/rfc2617.txt>RFC 2617: <abbr>HTTP</abbr> Basic Authentication</a>
-<li><a href=http://www.ietf.org/rfc/rfc1951.txt>RFC 1951: deflate compression</a>
-<li><a href=http://www.ietf.org/rfc/rfc1952.txt>RFC 1952: gzip compression</a>
-</ul>
-
-<p class=v><a rel=prev href=serializing.html title='back to &#8220;Serializing Python Objects&#8221;'><span class=u>&#x261C;</span></a> <a rel=next href=case-study-porting-chardet-to-python-3.html title='onward to &#8220;Case Study: Porting chardet to Python 3&#8221;'><span class=u>&#x261E;</span></a>
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>HTTP Web Services - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 14}
+mark{display:inline}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=root value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#http-web-services>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
+<h1>HTTP Web Services</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> A ruffled mind makes a restless pillow. <span class=u>&#x275E;</span><br>&mdash; Charlotte Bront&euml;
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>Philosophically, I can describe HTTP web services in 12 words: exchanging data with remote servers using nothing but the operations of <abbr>HTTP</abbr>. If you want to get data from the server, use <abbr>HTTP</abbr> <code>GET</code>. If you want to send new data to the server, use <abbr>HTTP</abbr> <code>POST</code>. Some more advanced <abbr>HTTP</abbr> web service <abbr>API</abbr>s also allow creating, modifying, and deleting data, using <abbr>HTTP</abbr> <code>PUT</code> and <abbr>HTTP</abbr> <code>DELETE</code>. That&#8217;s it. No registries, no envelopes, no wrappers, no tunneling. The &#8220;verbs&#8221; built into the <abbr>HTTP</abbr> protocol (<code>GET</code>, <code>POST</code>, <code>PUT</code>, and <code>DELETE</code>) map directly to application-level operations for retrieving, creating, modifying, and deleting data.
+
+<p>The main advantage of this approach is simplicity, and its simplicity has proven popular. Data&nbsp;&mdash;&nbsp;usually <a href=xml.html><abbr>XML</abbr></a> or <a href=serializing.html#json><abbr>JSON</abbr></a>&nbsp;&mdash;&nbsp;can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an <abbr>HTTP</abbr> library for downloading it. Debugging is also easier; because each resource in an <abbr>HTTP</abbr> web service has a unique address (in the form of a <abbr>URL</abbr>), you can load it in your web browser and immediately see the raw data.
+
+<p>Examples of <abbr>HTTP</abbr> web services:
+<ul>
+<li><a href=http://code.google.com/apis/gdata/>Google Data <abbr>API</abbr>s</a> allow you to interact with a wide variety of Google services, including <a href=http://www.blogger.com/>Blogger</a> and <a href=http://www.youtube.com/>YouTube</a>.
+<li><a href=http://www.flickr.com/services/api/>Flickr Services</a> allow you to upload and download photos from <a href=http://www.flickr.com/>Flickr</a>.
+<li><a href=http://apiwiki.twitter.com/>Twitter <abbr>API</abbr></a> allows you to publish status updates on <a href=http://twitter.com/>Twitter</a>.
+<li><a href='http://www.programmableweb.com/apis/directory/1?sort=mashups'>&hellip;and many more</a>
+</ul>
+
+<p>Python 3 comes with two different libraries for interacting with <abbr>HTTP</abbr> web services:
+
+<ul>
+<li><a href=http://docs.python.org/3.1/library/http.client.html><code>http.client</code></a> is a low-level library that implements <a href=http://www.w3.org/Protocols/rfc2616/rfc2616.html><abbr>RFC</abbr> 2616</a>, the <abbr>HTTP</abbr> protocol.
+<li><a href=http://docs.python.org/3.1/library/urllib.request.html><code>urllib.request</code></a> is an abstraction layer built on top of <code>http.client</code>. It provides a standard <abbr>API</abbr> for accessing both <abbr>HTTP</abbr> and <abbr>FTP</abbr> servers, automatically follows <abbr>HTTP</abbr> redirects, and handles some common forms of <abbr>HTTP</abbr> authentication.
+</ul>
+
+<p>So which one should you use? Neither of them. Instead, you should use <a href=http://code.google.com/p/httplib2/><code>httplib2</code></a>, an open source third-party library that implements <abbr>HTTP</abbr> more fully than <code>http.client</code> but provides a better abstraction than <code>urllib.request</code>.
+
+<p>To understand why <code>httplib2</code> is the right choice, you first need to understand <abbr>HTTP</abbr>.
+
+<p class=a>&#x2042;
+
+<h2 id=http-features>Features of HTTP</h2>
+
+<p>There are five important features which all <abbr>HTTP</abbr> clients should support.
+
+<h3 id=caching>Caching</h3>
+
+<p>The most important thing to understand about any type of web service is that network access is incredibly expensive. I don&#8217;t mean &#8220;dollars and cents&#8221; expensive (although bandwidth ain&#8217;t free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, <i>latency</i> (the time it takes to send a request and start retrieving data in a response) can still be higher than you anticipated. A router misbehaves, a packet is dropped, an intermediate proxy is under attack&nbsp;&mdash;&nbsp;there&#8217;s <a href=http://isc.sans.org/>never a dull moment</a> on the public internet, and there may be nothing you can do about it.
+
+<aside><code>Cache-Control: max-age</code> means &#8220;don't bug me until next week.&#8221;</aside>
+
+<p><abbr>HTTP</abbr> is designed with caching in mind. There is an entire class of devices (called &#8220;caching proxies&#8221;) whose only job is to sit between you and the rest of the world and minimize network access. Your company or <abbr>ISP</abbr> almost certainly maintains caching proxies, even if you&#8217;re unaware of them. They work because caching built into the <abbr>HTTP</abbr> protocol.
+
+<p>Here&#8217;s a concrete example of how caching works. You visit <a href=http://diveintomark.org/><code>diveintomark.org</code></a> in your browser. That page includes a background image, <a href=http://wearehugh.com/m.jpg><code>wearehugh.com/m.jpg</code></a>. When your browser downloads that image, the server includes the following <abbr>HTTP</abbr> headers:
+
+<pre class=nd><code>HTTP/1.1 200 OK
+Date: Sun, 31 May 2009 17:14:04 GMT
+Server: Apache
+Last-Modified: Fri, 22 Aug 2008 04:28:16 GMT
+ETag: "3075-ddc8d800"
+Accept-Ranges: bytes
+Content-Length: 12405
+<mark>Cache-Control: max-age=31536000, public</mark>
+<mark>Expires: Mon, 31 May 2010 17:14:04 GMT</mark>
+Connection: close
+Content-Type: image/jpeg</code></pre>
+
+<p>The <code>Cache-Control</code> and <code>Expires</code> headers tell your browser (and any caching proxies between you and the server) that this image can be cached for up to a year. <em>A year!</em> And if, in the next year, you visit another page which also includes a link to this image, your browser will load the image from its cache <em>without generating any network activity whatsoever</em>.
+
+<p>But wait, it gets better. Let&#8217;s say your browser purges the image from your local cache for some reason. Maybe it ran out of disk space; maybe you manually cleared the cache. Whatever. But the <abbr>HTTP</abbr> headers said that this data could be cached by public caching proxies. (Technically, the important thing is what the headers <em>don&#8217;t</em> say; the <code>Cache-Control</code> header doesn&#8217;t have the <code>private</code> keyword, so this data is cacheable by default.) Caching proxies are designed to have tons of storage space, probably far more than your local browser has allocated.
+
+<p>If your company or <abbr>ISP</abbr> maintain a caching proxy, the proxy may still have the image cached. When you visit <code>diveintomark.org</code> again, your browser will look in its local cache for the image, but it won&#8217;t find it, so it will make a network request to try to download it from the remote server. But if the caching proxy still has a copy of the image, it will intercept that request and serve the image from <em>its</em> cache. That means that your request will never reach the remote server; in fact, it will never leave your company&#8217;s network. That makes for a faster download (fewer network hops) and saves your company money (less data being downloaded from the outside world).
+
+<p><abbr>HTTP</abbr> caching only works when everybody does their part. On one side, servers need to send the correct headers in their response. On the other side, clients need to understand and respect those headers before they request the same data twice. The proxies in the middle are not a panacea; they can only be as smart as the servers and clients allow them to be.
+
+<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support caching, but <code>httplib2</code> does.
+
+<h3 id=last-modified>Last-Modified Checking</h3>
+
+<p>Some data never changes, while other data changes all the time. In between, there is a vast field of data that <em>might</em> have changed, but hasn&#8217;t. CNN.com&#8217;s feed is updated every few minutes, but my weblog&#8217;s feed may not change for days or weeks at a time. In the latter case, I don&#8217;t want to tell clients to cache my feed for weeks at a time, because then when I do actually post something, people may not read it for weeks (because they&#8217;re respecting my cache headers which said &#8220;don&#8217;t bother checking this feed for weeks&#8221;). On the other hand, I don&#8217;t want clients downloading my entire feed once an hour if it hasn&#8217;t changed!
+
+<aside><code>304: Not Modified</code> means &#8220;same shit, different day.&#8221;</aside>
+
+<p><abbr>HTTP</abbr> has a solution to this, too. When you request data for the first time, the server can send back a <code>Last-Modified</code> header. This is exactly what it sounds like: the date that the data was changed. That background image referenced from <code>diveintomark.org</code> included a <code>Last-Modified</code> header.
+
+<pre class=nd><code>HTTP/1.1 200 OK
+Date: Sun, 31 May 2009 17:14:04 GMT
+Server: Apache
+<mark>Last-Modified: Fri, 22 Aug 2008 04:28:16 GMT</mark>
+ETag: "3075-ddc8d800"
+Accept-Ranges: bytes
+Content-Length: 12405
+Cache-Control: max-age=31536000, public
+Expires: Mon, 31 May 2010 17:14:04 GMT
+Connection: close
+Content-Type: image/jpeg
+</code></pre>
+
+<p>When you request the same data a second (or third or fourth) time, you can send an <code>If-Modified-Since</code> header with your request, with the date you got back from the server last time. If the data has changed since then, then the server ignores the <code>If-Modified-Since</code> header and just gives you the new data with a <code>200</code> status code. But if the data <em>hasn&#8217;t</em> changed since then, the server sends back a special <abbr>HTTP</abbr> <code>304</code> status code, which means &#8220;this data hasn&#8217;t changed since the last time you asked for it.&#8221; You can test this on the command line, using <a href=http://curl.haxx.se/>curl</a>:
+
+<pre class='nd screen'>
+<samp class=p>you@localhost:~$ </samp><kbd>curl -I <mark>-H "If-Modified-Since: Fri, 22 Aug 2008 04:28:16 GMT"</mark> http://wearehugh.com/m.jpg</kbd>
+<samp>HTTP/1.1 304 Not Modified
+Date: Sun, 31 May 2009 18:04:39 GMT
+Server: Apache
+Connection: close
+ETag: "3075-ddc8d800"
+Expires: Mon, 31 May 2010 18:04:39 GMT
+Cache-Control: max-age=31536000, public</samp></pre>
+
+<p>Why is this an improvement?  Because when the server sends a <code>304</code>, <em>it doesn&#8217;t re-send the data</em>. All you get is the status code. Even after your cached copy has expired, last-modified checking ensures that you won&#8217;t download the same data twice if it hasn&#8217;t changed. (As an extra bonus, this <code>304</code> response also includes caching headers. Proxies will keep a copy of data even after it officially &#8220;expires,&#8221; in the hopes that the data hasn&#8217;t <em>really</em> changed and the next request responds with a <code>304</code> status code and updated cache information.)
+
+<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support last-modified date checking, but <code>httplib2</code> does.
+
+<h3 id=etags>ETag Checking</h3>
+
+<p>ETags are an alternate way to accomplish the same thing as the <a href=#last-modified>last-modified checking</a>. With Etags, the server sends a hash code in an <code>ETag</code> header along with the data you requested. (Exactly how this hash is determined is entirely up to the server. The only requirement is that it changes when the data changes.) That background image referenced from <code>diveintomark.org</code> had an <code>ETag</code> header.
+
+<pre class=nd><code>HTTP/1.1 200 OK
+Date: Sun, 31 May 2009 17:14:04 GMT
+Server: Apache
+Last-Modified: Fri, 22 Aug 2008 04:28:16 GMT
+<mark>ETag: "3075-ddc8d800"</mark>
+Accept-Ranges: bytes
+Content-Length: 12405
+Cache-Control: max-age=31536000, public
+Expires: Mon, 31 May 2010 17:14:04 GMT
+Connection: close
+Content-Type: image/jpeg
+</code></pre>
+
+<aside><code>ETag</code> means &#8220;there&#8217;s nothing new under the sun.&#8221;</aside>
+
+<p>The second time you request the same data, you include the ETag hash in an <code>If-None-Match</code> header of your request. If the data hasn&#8217;t changed, the server will send you back a <code>304</code> status code. As with the last-modified date checking, the server sends back <em>only</em> the <code>304</code> status code; it doesn&#8217;t send you the same data a second time. By including the ETag hash in your second request, you&#8217;re telling the server that there&#8217;s no need to re-send the same data if it still matches this hash, since <a href=#caching>you still have the data from the last time</a>.
+
+<p>Again with the <kbd>curl</kbd>:
+
+<pre class='nd screen'>
+<a><samp class=p>you@localhost:~$ </samp><kbd>curl -I <mark>-H "If-None-Match: \"3075-ddc8d800\""</mark> http://wearehugh.com/m.jpg</kbd>  <span class=u>&#x2460;</span></a>
+<samp>HTTP/1.1 304 Not Modified
+Date: Sun, 31 May 2009 18:04:39 GMT
+Server: Apache
+Connection: close
+ETag: "3075-ddc8d800"
+Expires: Mon, 31 May 2010 18:04:39 GMT
+Cache-Control: max-age=31536000, public</samp></pre>
+<ol>
+<li>ETags are commonly enclosed in quotation marks, but <em>the quotation marks are part of the value</em>. That means you need to send the quotation marks back to the server in the <code>If-None-Match</code> header.
+</ol>
+
+<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support ETags, but <code>httplib2</code> does.
+
+<h3 id=compression>Compression</h3>
+
+<p>When you talk about <abbr>HTTP</abbr> web services, you&#8217;re almost always talking about moving text-based data back and forth over the wire. Maybe it&#8217;s <abbr>XML</abbr>, maybe it&#8217;s <abbr>JSON</abbr>, maybe it&#8217;s just <a href=strings.html#boring-stuff title='there ain&#8217;t no such thing as plain text'>plain text</a>. Regardless of the format, text compresses well. The example feed in <a href=xml.html>the XML chapter</a> is 3070 bytes uncompressed, but would be 941 bytes after gzip compression. That&#8217;s just 30% of the original size!
+
+<p><abbr>HTTP</abbr> supports <a href=http://www.iana.org/assignments/http-parameters>several compression algorithms</a>. The two most common types are <a href=http://www.ietf.org/rfc/rfc1952.txt>gzip</a> and <a href=http://www.ietf.org/rfc/rfc1951.txt>deflate</a>. When you request a resource over <abbr>HTTP</abbr>, you can ask the server to send it in compressed format. You include an <code>Accept-encoding</code> header in your request that lists which compression algorithms you support. If the server supports any of the same algorithms, it will send you back compressed data (with a <code>Content-encoding</code> header that tells you which algorithm it used). Then it&#8217;s up to you to decompress the data.
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>Important tip for server-side developers: make sure that the compressed version of a resource has a different <a href=#etags>Etag</a> than the uncompressed version. Otherwise, caching proxies will get confused and may serve the compressed version to clients that can&#8217;t handle it. Read the discussion of <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=39727">Apache bug 39727</a> for more details on this subtle issue.
+</blockquote>
+
+<p>Python&#8217;s <abbr>HTTP</abbr> libraries do not support compression, but <code>httplib2</code> does.
+
+<h3 id=redirects>Redirects</h3>
+
+<p><a href=http://www.w3.org/Provider/Style/URI>Cool <abbr>URI</abbr>s don&#8217;t change</a>, but many <abbr>URI</abbr>s are seriously uncool. Web sites get reorganized, pages move to new addresses. Even web services can reorganize. A syndicated feed at <code>http://example.com/index.xml</code> might be moved to <code>http://example.com/xml/atom.xml</code>. Or an entire domain might move, as an organization expands and reorganizes; <code>http://www.example.com/index.xml</code> becomes <code>http://server-farm-1.example.com/index.xml</code>.
+
+<aside><code>Location</code> means &#8220;look over there!&#8221;</aside>
+
+<p>Every time you request any kind of resource from an <abbr>HTTP</abbr> server, the server includes a status code in its response. Status code <code>200</code> means &#8220;everything&#8217;s normal, here&#8217;s the page you asked for&#8221;. Status code <code>404</code> means &#8220;page not found&#8221;. (You&#8217;ve probably seen 404 errors while browsing the web.) Status codes in the 300&#8217;s indicate some form of redirection.
+
+<p><abbr>HTTP</abbr> has several different ways of signifying that a resource has moved. The two most common techiques are status codes <code>302</code> and <code>301</code>. Status code <code>302</code> is a <i>temporary redirect</i>; it means &#8220;oops, that got moved over here temporarily&#8221; (and then gives the temporary address in a <code>Location</code> header). Status code <code>301</code> is a <i>permanent redirect</i>; it means &#8220;oops, that got moved permanently&#8221; (and then gives the new address in a <code>Location</code> header). If you get a <code>302</code> status code and a new address, the <abbr>HTTP</abbr> specification says you should use the new address to get what you asked for, but the next time you want to access the same resource, you should retry the old address. But if you get a <code>301</code> status code and a new address, you&#8217;re supposed to use the new address from then on.
+
+<p>The <code>urllib.request</code> module automatically &#8220;follow&#8221; redirects when it receives the appropriate status code from the <abbr>HTTP</abbr> server, but it doesn&#8217;t tell you that it did so. You&#8217;ll end up getting data you asked for, but you&#8217;ll never know that the underlying library &#8220;helpfully&#8221; followed a redirect for you. So you&#8217;ll continue pounding away at the old address, and each time you&#8217;ll get redirected to the new address, and each time the <code>urllib.request</code> module will &#8220;helpfully&#8221; follow the redirect. In other words, it treats permanent redirects the same as temporary redirects. That means two round trips instead of one, which is bad for the server and bad for you.
+
+<p><code>httplib2</code> handles permanent redirects for you. Not only will it tell you that a permanent redirect occurred, it will keep track of them locally and automatically rewrite redirected <abbr>URL</abbr>s before requesting them.
+
+<p class=a>&#x2042;
+
+<h2 id=dont-try-this-at-home>How Not To Fetch Data Over HTTP</h2>
+
+<p>Let&#8217;s say you want to download a resource over <abbr>HTTP</abbr>, such as <a href=xml.html>an Atom feed</a>. Being a feed, you&#8217;re not just going to download it once; you&#8217;re going to download it over and over again. (Most feed readers will check for changes once an hour.) Let&#8217;s do it the quick-and-dirty way first, and then see how you can do better.
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>import urllib.request</kbd>
+<samp class=p>>>> </samp><kbd class=pp>a_url = 'http://diveintopython3.org/examples/feed.xml'</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>data = urllib.request.urlopen(a_url).read()</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>type(data)</kbd>                                   <span class=u>&#x2461;</span></a>
+<samp class=pp>&lt;class 'bytes'></samp>
+<samp class=p>>>> </samp><kbd class=pp>print(data)</kbd>
+<samp class=pp>&lt;?xml version='1.0' encoding='utf-8'?>
+&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
+  &lt;title>dive into mark&lt;/title>
+  &lt;subtitle>currently between addictions&lt;/subtitle>
+  &lt;id>tag:diveintomark.org,2001-07-29:/&lt;/id>
+  &lt;updated>2009-03-27T21:56:07Z&lt;/updated>
+  &lt;link rel='alternate' type='text/html' href='http://diveintomark.org/'/>
+  &hellip;
+</samp></pre>
+<ol>
+<li>Downloading anything over <abbr>HTTP</abbr> is incredibly easy in Python; in fact, it&#8217;s a one-liner. The <code>urllib.request</code> module has a handy <code>urlopen()</code> function that takes the address of the page you want, and returns a file-like object that you can just <code>read()</code> from to get the full contents of the page. It just can&#8217;t get any easier.
+<li>The <code>urlopen().read()</code> method always returns <a href=strings.html#byte-arrays>a <code>bytes</code> object, not a string</a>. Remember, bytes are bytes; characters are an abstraction. <abbr>HTTP</abbr> servers don&#8217;t deal in abstractions. If you request a resource, you get bytes. If you want it as a string, you&#8217;ll need to <a href=http://feedparser.org/docs/character-encoding.html>determine the character encoding</a> and explicitly convert it to a string.
+</ol>
+
+<p>So what&#8217;s wrong with this? For a quick one-off during testing or development, there&#8217;s nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis (<i>e.g.</i> requesting this feed once an hour), then you&#8217;re being inefficient, and you&#8217;re being rude.
+
+<p class=a>&#x2042;
+
+<h2 id=whats-on-the-wire>What&#8217;s On The Wire?</h2>
+
+<p>To see why this is inefficient and rude, let&#8217;s turn on the debugging features of Python&#8217;s <abbr>HTTP</abbr> library and see what&#8217;s being sent &#8220;on the wire&#8221; (<i>i.e.</i> over the network).
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>from http.client import HTTPConnection</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>HTTPConnection.debuglevel = 1</kbd>                                       <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>from urllib.request import urlopen</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>response = urlopen('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2461;</span></a>
+<samp><a>send: b'GET /examples/feed.xml HTTP/1.1                                 <span class=u>&#x2462;</span></a>
+<a>Host: diveintopython3.org                                               <span class=u>&#x2463;</span></a>
+<a>Accept-Encoding: identity                                               <span class=u>&#x2464;</span></a>
+<a>User-Agent: Python-urllib/3.1'                                          <span class=u>&#x2465;</span></a>
+Connection: close
+reply: 'HTTP/1.1 200 OK'
+&hellip;further debugging information omitted&hellip;</samp></pre>
+<ol>
+<li>As I mentioned at the beginning of the chapter, <code>urllib.request</code> relies on another standard Python library, <code>http.client</code>. Normally you don&#8217;t need to touch <code>http.client</code> directly. (The <code>urllib.request</code> module imports it automatically.) But we import it here so we can toggle the debugging flag on the <code>HTTPConnection</code> class that <code>urllib.request</code> uses to connect to the <abbr>HTTP</abbr> server.
+<li>Now that the debugging flag is set, information on the <abbr>HTTP</abbr> request and response is printed out in real time. As you can see, when you request the Atom feed, the <code>urllib.request</code> module sends five lines to the server.
+<li>The first line specifies the <abbr>HTTP</abbr> verb you&#8217;re using, and the path of the resource (minus the domain name).
+<li>The second line specifies the domain name from which we&#8217;re requesting this feed.
+<li>The third line specifies the compression algorithms that the client supports. As I mentioned earlier, <a href=#compression><code>urllib.request</code> does not support compression</a> by default.
+<li>The fourth line specifies the name of the library that is making the request. By default, this is <code>Python-urllib</code> plus a version number. Both <code>urllib.request</code> and <code>httplib2</code> support changing the user agent, simply by adding a <code>User-Agent</code> header to the request (which will override the default value).
+</ol>
+
+<aside>We&#8217;re downloading 3070 bytes when we could have just downloaded 941.</aside>
+
+<p>Now let&#8217;s look at what the server sent back in its response.
+
+<pre class=screen>
+# continued from previous example
+<a><samp class=p>>>> </samp><kbd class=pp>print(response.headers.as_string())</kbd>        <span class=u>&#x2460;</span></a>
+<samp><a>Date: Sun, 31 May 2009 19:23:06 GMT            <span class=u>&#x2461;</span></a>
+Server: Apache
+<a>Last-Modified: Sun, 31 May 2009 06:39:55 GMT   <span class=u>&#x2462;</span></a>
+<a>ETag: "bfe-93d9c4c0"                           <span class=u>&#x2463;</span></a>
+Accept-Ranges: bytes
+<a>Content-Length: 3070                           <span class=u>&#x2464;</span></a>
+<a>Cache-Control: max-age=86400                   <span class=u>&#x2465;</span></a>
+Expires: Mon, 01 Jun 2009 19:23:06 GMT
+Vary: Accept-Encoding
+Connection: close
+Content-Type: application/xml</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>data = response.read()</kbd>                     <span class=u>&#x2466;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>len(data)</kbd>
+<samp class=pp>3070</samp></pre>
+<ol>
+<li>The <var>response</var> returned from the <code>urllib.request.urlopen()</code> function contains all the <abbr>HTTP</abbr> headers the server sent back. It also contains methods to download the actual data; we&#8217;ll get to that in a minute.
+<li>The server tells you when it handled your request.
+<li>This response includes a <a href=#last-modified><code>Last-Modified</code></a> header.
+<li>This response includes an <a href=#etags><code>ETag</code></a> header.
+<li>The data is 3070 bytes long. Notice what <em>isn&#8217;t</em> here: a <code>Content-encoding</code> header. Your request stated that you only accept uncompressed data (<code>Accept-encoding: identity</code>), and sure enough, this response contains uncompressed data.
+<li>This response includes caching headers that state that this feed can be cached for up to 24 hours (86400 seconds).
+<li>And finally, download the actual data by calling <code>response.read()</code>. As you can tell from the <code>len()</code> function, this downloads all 3070 bytes at once.
+</ol>
+
+<p>As you can see, this code is already inefficient: it asked for (and received) uncompressed data. I know for a fact that this server supports <a href=#compression>gzip compression</a>, but <abbr>HTTP</abbr> compression is opt-in. We didn&#8217;t ask for it, so we didn&#8217;t get it. That means we&#8217;re downloading 3070 bytes when we could have just downloaded 941. Bad dog, no biscuit.
+
+<p>But wait, it gets worse! To see just how inefficient this code is, let&#8217;s request the same feed a second time.
+
+<pre class='nd screen'>
+# continued from the <a href=#whats-on-the-wire>previous example</a>
+<samp class=p>>>> </samp><kbd class=pp>response2 = urlopen('http://diveintopython3.org/examples/feed.xml')</kbd>
+<samp>send: b'GET /examples/feed.xml HTTP/1.1
+Host: diveintopython3.org
+Accept-Encoding: identity
+User-Agent: Python-urllib/3.1'
+Connection: close
+reply: 'HTTP/1.1 200 OK'
+&hellip;further debugging information omitted&hellip;</samp></pre>
+
+<p>Notice anything peculiar about this request? It hasn&#8217;t changed! It&#8217;s exactly the same as the first request. No sign of <a href=#last-modified><code>If-Modified-Since</code> headers</a>. No sign of <a href=#etags><code>If-None-Match</code> headers</a>. No respect for the caching headers. Still no compression.
+
+<p>And what happens when you do the same thing twice? You get the same response. Twice.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>print(response2.headers.as_string())</kbd>     <span class=u>&#x2460;</span></a>
+<samp>Date: Mon, 01 Jun 2009 03:58:00 GMT
+Server: Apache
+Last-Modified: Sun, 31 May 2009 22:51:11 GMT
+ETag: "bfe-255ef5c0"
+Accept-Ranges: bytes
+Content-Length: 3070
+Cache-Control: max-age=86400
+Expires: Tue, 02 Jun 2009 03:58:00 GMT
+Vary: Accept-Encoding
+Connection: close
+Content-Type: application/xml</samp>
+<samp class=p>>>> </samp><kbd class=pp>data2 = response2.read()</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>len(data2)</kbd>                               <span class=u>&#x2461;</span></a>
+<samp class=pp>3070</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>data2 == data</kbd>                            <span class=u>&#x2462;</span></a>
+<samp class=pp>True</samp></pre>
+<ol>
+<li>The server is still sending the same array of &#8220;smart&#8221; headers: <code>Cache-Control</code> and <code>Expires</code> to allow caching, <code>Last-Modified</code> and <code>ETag</code> to enable &#8220;not-modified&#8221; tracking. Even the <code>Vary: Accept-Encoding</code> header hints that the server would support compression, if only you would ask for it. But you didn&#8217;t.
+<li>Once again, fetching this data downloads the whole 3070 bytes&hellip;
+<li>&hellip;the exact same 3070 bytes you downloaded last time.
+</ol>
+
+<p><abbr>HTTP</abbr> is designed to work better than this. <code>urllib</code> speaks <abbr>HTTP</abbr> like I speak Spanish&nbsp;&mdash;&nbsp;enough to get by in a jam, but not enough to hold a conversation. <abbr>HTTP</abbr> is a conversation. It&#8217;s time to upgrade to a library that speaks <abbr>HTTP</abbr> fluently.
+
+<p class=a>&#x2042;
+
+<h2 id=introducing-httplib2>Introducing <code>httplib2</code></h2>
+
+<p>Before you can use <code>httplib2</code>, you&#8217;ll need to install it. Visit <a href=http://code.google.com/p/httplib2/><code>code.google.com/p/httplib2/</code></a> and download the latest version. <code>httplib2</code> is available for Python 2.x and Python 3.x; make sure you get the Python 3 version, named something like <code>httplib2-python3-0.5.0.zip</code>.
+
+<p>Unzip the archive, open a terminal window, and go to the newly created <code>httplib2</code> directory. On Windows, open the <code>Start</code> menu, select <code>Run...</code>, type <kbd>cmd.exe</kbd> and press <kbd>ENTER</kbd>.
+
+<pre class=screen>
+<samp class=p>c:\Users\pilgrim\Downloads> </samp><kbd><mark>dir</mark></kbd>
+<samp> Volume in drive C has no label.
+ Volume Serial Number is DED5-B4F8
+
+ Directory of c:\Users\pilgrim\Downloads
+
+07/28/2009  12:36 PM    &lt;DIR>          .
+07/28/2009  12:36 PM    &lt;DIR>          ..
+07/28/2009  12:36 PM    &lt;DIR>          httplib2-python3-0.5.0
+07/28/2009  12:33 PM            18,997 httplib2-python3-0.5.0.zip
+               1 File(s)         18,997 bytes
+               3 Dir(s)  61,496,684,544 bytes free</samp>
+
+<samp class=p>c:\Users\pilgrim\Downloads> </samp><kbd><mark>cd httplib2-python3-0.5.0</mark></kbd>
+<samp class=p>c:\Users\pilgrim\Downloads\httplib2-python3-0.5.0> </samp><kbd><mark>c:\python31\python.exe setup.py install</mark></kbd>
+<samp>running install
+running build
+running build_py
+running install_lib
+creating c:\python31\Lib\site-packages\httplib2
+copying build\lib\httplib2\iri2uri.py -> c:\python31\Lib\site-packages\httplib2
+copying build\lib\httplib2\__init__.py -> c:\python31\Lib\site-packages\httplib2
+byte-compiling c:\python31\Lib\site-packages\httplib2\iri2uri.py to iri2uri.pyc
+byte-compiling c:\python31\Lib\site-packages\httplib2\__init__.py to __init__.pyc
+running install_egg_info
+Writing c:\python31\Lib\site-packages\httplib2-python3_0.5.0-py3.1.egg-info</samp></pre>
+
+<p>On Mac OS X, run the <code>Terminal.app</code> application in your <code>/Applications/Utilities/</code> folder. On Linux, run the <code>Terminal</code> application, which is usually in your <code>Applications</code> menu under <code>Accessories</code> or <code>System</code>.
+
+<pre class=screen>
+<samp class=p>you@localhost:~/Desktop$ </samp><kbd><mark>unzip httplib2-python3-0.5.0.zip</mark></kbd>
+<samp>Archive:  httplib2-python3-0.5.0.zip
+  inflating: httplib2-python3-0.5.0/README
+  inflating: httplib2-python3-0.5.0/setup.py
+  inflating: httplib2-python3-0.5.0/PKG-INFO
+  inflating: httplib2-python3-0.5.0/httplib2/__init__.py
+  inflating: httplib2-python3-0.5.0/httplib2/iri2uri.py</samp>
+<samp class=p>you@localhost:~/Desktop$ </samp><kbd><mark>cd httplib2-python3-0.5.0/</mark></kbd>
+<samp class=p>you@localhost:~/Desktop/httplib2-python3-0.5.0$ </samp><kbd><mark>sudo python3 setup.py install</mark></kbd>
+<samp>running install
+running build
+running build_py
+creating build
+creating build/lib.linux-x86_64-3.1
+creating build/lib.linux-x86_64-3.1/httplib2
+copying httplib2/iri2uri.py -> build/lib.linux-x86_64-3.1/httplib2
+copying httplib2/__init__.py -> build/lib.linux-x86_64-3.1/httplib2
+running install_lib
+creating /usr/local/lib/python3.1/dist-packages/httplib2
+copying build/lib.linux-x86_64-3.1/httplib2/iri2uri.py -> /usr/local/lib/python3.1/dist-packages/httplib2
+copying build/lib.linux-x86_64-3.1/httplib2/__init__.py -> /usr/local/lib/python3.1/dist-packages/httplib2
+byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/iri2uri.py to iri2uri.pyc
+byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/__init__.py to __init__.pyc
+running install_egg_info
+Writing /usr/local/lib/python3.1/dist-packages/httplib2-python3_0.5.0.egg-info</samp></pre>
+
+<p>To use <code>httplib2</code>, create an instance of the <code>httplib2.Http</code> class.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>                                                    <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>response.status</kbd>                                                                <span class=u>&#x2462;</span></a>
+<samp class=pp>200</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>content[:52]</kbd>                                                                   <span class=u>&#x2463;</span></a>
+<samp class=pp>b"&lt;?xml version='1.0' encoding='utf-8'?>\r\n&lt;feed xmlns="</samp>
+<samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>
+<samp class=pp>3070</samp></pre>
+<ol>
+<li>The primary interface to <code>httplib2</code> is the <code>Http</code> object. For reasons you&#8217;ll see in the next section, you should always pass a directory name when you create an <code>Http</code> object. The directory does not need to exist; <code>httplib2</code> will create it if necessary.
+<li>Once you have an <code>Http</code> object, retrieving data is as simple as calling the <code>request()</code> method with the address of the data you want. This will issue an <abbr>HTTP</abbr> <code>GET</code> request for that <abbr>URL</abbr>. (Later in this chapter, you&#8217;ll see how to issue other <abbr>HTTP</abbr> requests, like <code>POST</code>.)
+<li>The <code>request()</code> method returns two values. The first is an <code>httplib2.Response</code> object, which contains all the <abbr>HTTP</abbr> headers the server returned. For example, a <code>status</code> code of <code>200</code> indicates that the request was successful.
+<li>The <var>content</var> variable contains the actual data that was returned by the <abbr>HTTP</abbr> server. The data is returned as <a href=strings.html#byte-arrays>a <code>bytes</code> object, not a string</a>. If you want it as a string, you&#8217;ll need to <a href=http://feedparser.org/docs/character-encoding.html>determine the character encoding</a> and convert it yourself.
+</ol>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>You probably only need one <code>httplib2.Http</code> object. There are valid reasons for creating more than one, but you should only do so if you know why you need them. &#8220;I need to request data from two different <abbr>URL</abbr>s&#8221; is not a valid reason. Re-use the <code>Http</code> object and just call the <code>request()</code> method twice.
+</blockquote>
+
+<h3 id=why-bytes>A Short Digression To Explain Why <code>httplib2</code> Returns Bytes Instead of Strings</h3>
+
+<p>Bytes. Strings. What a pain. Why can&#8217;t <code>httplib2</code> &#8220;just&#8221; do the conversion for you? Well, it&#8217;s complicated, because the rules for determining the character encoding are specific to what kind of resource you&#8217;re requesting. How could <code>httplib2</code> know what kind of resource you&#8217;re requesting? It&#8217;s usually listed in the <code>Content-Type</code> <abbr>HTTP</abbr> header, but that&#8217;s an optional feature of <abbr>HTTP</abbr> and not all <abbr>HTTP</abbr> servers include it. If that header is not included in the <abbr>HTTP</abbr> response, it&#8217;s left up to the client to guess. (This is commonly called &#8220;content sniffing,&#8221; and it&#8217;s never perfect.)
+
+<p>If you know what sort of resource you&#8217;re expecting (an <abbr>XML</abbr> document in this case), perhaps you could &#8220;just&#8221; pass the returned <code>bytes</code> object to the <a href=xml.html#xml-parse><code>xml.etree.ElementTree.parse()</code> function</a>. That&#8217;ll work as long as the <abbr>XML</abbr> document includes information on its own character encoding (as this one does), but that&#8217;s an optional feature and not all <abbr>XML</abbr> documents do that. If an <abbr>XML</abbr> document doesn&#8217;t include encoding information, the client is supposed to look at the enclosing transport&nbsp;&mdash;&nbsp;<i>i.e.</i> the <code>Content-Type</code> <abbr>HTTP</abbr> header, which can include a <code>charset</code> parameter.
+
+<p class=ss><a style=border:0 href=http://www.cafepress.com/feedparser><img src=http://feedparser.org/img/feedparser.jpg alt="[I support RFC 3023 t-shirt]" width=150 height=150></a>
+
+<p>But it&#8217;s worse than that. Now character encoding information can be in two places: within the <abbr>XML</abbr> document itself, and within the <code>Content-Type</code> <abbr>HTTP</abbr> header. If the information is in <em>both</em> places, which one wins? According to <a href=http://www.ietf.org/rfc/rfc3023.txt>RFC 3023</a> (I swear I am not making this up), if the media type given in the <code>Content-Type</code> <abbr>HTTP</abbr> header is <code>application/xml</code>, <code>application/xml-dtd</code>, <code>application/xml-external-parsed-entity</code>, or any one of the subtypes of <code>application/xml</code> such as <code>application/atom+xml</code> or <code>application/rss+xml</code> or even <code>application/rdf+xml</code>, then the encoding is
+
+<ol>
+<li>the encoding given in the <code>charset</code> parameter of the <code>Content-Type</code> <abbr>HTTP</abbr> header, or
+<li>the encoding given in the <code>encoding</code> attribute of the <abbr>XML</abbr> declaration within the document, or
+<li><abbr>UTF-8</abbr>
+</ol>
+
+<p>On the other hand, if the media type given in the <code>Content-Type</code> <abbr>HTTP</abbr> header is <code>text/xml</code>, <code>text/xml-external-parsed-entity</code>, or a subtype like <code>text/AnythingAtAll+xml</code>, then the encoding attribute of the <abbr>XML</abbr> declaration within the document is ignored completely, and the encoding is
+
+<ol>
+<li>the encoding given in the charset parameter of the <code>Content-Type</code> <abbr>HTTP</abbr> header, or
+<li><code>us-ascii</code>
+</ol>
+
+<p>And that&#8217;s just for <abbr>XML</abbr> documents. For <abbr>HTML</abbr> documents, web browsers have constructed such <a type=application/pdf href=http://www.adambarth.com/papers/2009/barth-caballero-song.pdf>byzantine rules for content-sniffing</a> [<abbr>PDF</abbr>] that <a href='http://www.google.com/search?q=barth+content-type+processing+model'>we&#8217;re still trying to figure them all out</a>.
+
+<p>&#8220;<a href=http://code.google.com/p/httplib2/source/checkout>Patches welcome</a>.&#8221;
+
+<h3 id=httplib2-caching>How <code>httplib2</code> Handles Caching</h3>
+
+<p>Remember in the previous section when I said you should always create an <code>httplib2.Http</code> object with a directory name? Caching is the reason.
+
+<pre class=screen>
+# continued from the <a href=#introducing-httplib2>previous example</a>
+<a><samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>response2.status</kbd>                                                                 <span class=u>&#x2461;</span></a>
+<samp class=pp>200</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>content2[:52]</kbd>                                                                    <span class=u>&#x2462;</span></a>
+<samp class=pp>b"&lt;?xml version='1.0' encoding='utf-8'?>\r\n&lt;feed xmlns="</samp>
+<samp class=p>>>> </samp><kbd class=pp>len(content2)</kbd>
+<samp class=pp>3070</samp></pre>
+<ol>
+<li>This shouldn&#8217;t be terribly surprising. It&#8217;s the same thing you did last time, except you&#8217;re putting the result into two new variables.
+<li>The <abbr>HTTP</abbr> <code>status</code> is once again <code>200</code>, just like last time.
+<li>The downloaded content is the same as last time, too.
+</ol>
+
+<p>So&hellip; who cares? Quit your Python interactive shell and relaunch it with a new session, and I&#8217;ll show you.
+
+<pre class=screen>
+# NOT continued from previous example!
+# Please exit out of the interactive shell
+# and launch a new one.
+<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>                                                        <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>                                                    <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd>  <span class=u>&#x2462;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>                                                                   <span class=u>&#x2463;</span></a>
+<samp class=pp>3070</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.status</kbd>                                                                <span class=u>&#x2464;</span></a>
+<samp class=pp>200</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.fromcache</kbd>                                                             <span class=u>&#x2465;</span></a>
+<samp class=pp>True</samp></pre>
+<ol>
+<li>Let&#8217;s turn on debugging and see <a href=#whats-on-the-wire>what&#8217;s on the wire</a>. This is the <code>httplib2</code> equivalent of turning on debugging in <code>http.client</code>. <code>httplib2</code> will print all the data being sent to the server and some key information being sent back.
+<li>Create an <code>httplib2.Http</code> object with the same directory name as before.
+<li>Request the same <abbr>URL</abbr> as before. <em>Nothing appears to happen.</em> More precisely, nothing gets sent to the server, and nothing gets returned from the server. There is absolutely no network activity whatsoever.
+<li>Yet we did &#8220;receive&#8221; some data&nbsp;&mdash;&nbsp;in fact, we received all of it.
+<li>We also &#8220;received&#8221; an <abbr>HTTP</abbr> status code indicating that the &#8220;request&#8221; was successful.
+<li>Here&#8217;s the rub: this &#8220;response&#8221; was generated from <code>httplib2</code>&#8217;s local cache. That directory name you passed in when you created the <code>httplib2.Http</code> object&nbsp;&mdash;&nbsp;that directory holds <code>httplib2</code>&#8217;s cache of all the operations it&#8217;s ever performed.
+</ol>
+
+<aside>What&#8217;s on the wire? Absolutely nothing.</aside>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>If you want to turn on <code>httplib2</code> debugging, you need to set a module-level constant (<code>httplib2.debuglevel</code>), then create a new <code>httplib2.Http</code> object. If you want to turn off debugging, you need to change the same module-level constant, then create a new <code>httplib2.Http</code> object.
+</blockquote>
+
+<p>You previously requested the data at this <abbr>URL</abbr>. That request was successful (<code>status: 200</code>). That response included not only the feed data, but also a set of <a href=#caching>caching headers</a> that told anyone who was listening that they could cache this resource for up to 24 hours (<code>Cache-Control: max-age=86400</code>, which is 24 hours measured in seconds). <code>httplib2</code> understand and respects those caching headers, and it stored the previous response in the <code>.cache</code> directory (which you passed in when you create the <code>Http</code> object). That cache hasn&#8217;t expired yet, so the second time you request the data at this <abbr>URL</abbr>, <code>httplib2</code> simply returns the cached result without ever hitting the network.
+
+<p>I say &#8220;simply,&#8221; but obviously there is a lot of complexity hidden behind that simplicity. <code>httplib2</code> handles <abbr>HTTP</abbr> caching <em>automatically</em> and <em>by default</em>. If for some reason you need to know whether a response came from the cache, you can check <code>response.fromcache</code>. Otherwise, it Just Works.
+
+<p id=bypass-the-cache>Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing <kbd>F5</kbd> refreshes the current page, but pressing <kbd>Ctrl+F5</kbd> bypasses the cache and re-requests the current page from the remote server. You might think &#8220;oh, I&#8217;ll just delete the data from my local cache, then request it again.&#8221; You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They&#8217;re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid.
+
+<p>Instead of manipulating your local cache and hoping for the best, you should use the features of <abbr>HTTP</abbr> to ensure that your request actually reaches the remote server.
+
+<pre class=screen>
+# continued from the previous example
+<samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml',</kbd>
+<a><samp class=p>... </samp><kbd class=pp>    headers={'cache-control':'no-cache'})</kbd>  <span class=u>&#x2460;</span></a>
+<samp><a>connect: (diveintopython3.org, 80)             <span class=u>&#x2461;</span></a>
+send: b'GET /examples/feed.xml HTTP/1.1
+Host: diveintopython3.org
+user-agent: Python-httplib2/$Rev: 259 $
+accept-encoding: deflate, gzip
+cache-control: no-cache'
+reply: 'HTTP/1.1 200 OK'
+&hellip;further debugging information omitted&hellip;</samp>
+<samp class=p>>>> </samp><kbd class=pp>response2.status</kbd>
+<samp class=pp>200</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response2.fromcache</kbd>                        <span class=u>&#x2462;</span></a>
+<samp class=pp>False</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>print(dict(response2.items()))</kbd>             <span class=u>&#x2463;</span></a>
+<samp class=pp>{'status': '200',
+ 'content-length': '3070',
+ 'content-location': 'http://diveintopython3.org/examples/feed.xml',
+ 'accept-ranges': 'bytes',
+ 'expires': 'Wed, 03 Jun 2009 00:40:26 GMT',
+ 'vary': 'Accept-Encoding',
+ 'server': 'Apache',
+ 'last-modified': 'Sun, 31 May 2009 22:51:11 GMT',
+ 'connection': 'close',
+ '-content-encoding': 'gzip',
+ 'etag': '"bfe-255ef5c0"',
+ 'cache-control': 'max-age=86400',
+ 'date': 'Tue, 02 Jun 2009 00:40:26 GMT',
+ 'content-type': 'application/xml'}</samp></pre>
+<ol>
+<li><code>httplib2</code> allows you to add arbitrary <abbr>HTTP</abbr> headers to any outgoing request. In order to bypass <em>all</em> caches (not just your local disk cache, but also any caching proxies between you and the remote server), add a <code>no-cache</code> header in the <var>headers</var> dictionary.
+<li>Now you see <code>httplib2</code> initiating a network request. <code>httplib2</code> understands and respects caching headers <em>in both directions</em>&nbsp;&mdash;&nbsp;as part of the incoming response <em>and as part of the outgoing request</em>. It noticed that you added the <code>no-cache</code> header, so it bypassed its local cache altogether and then had no choice but to hit the network to request the data.
+<li>This response was <em>not</em> generated from your local cache. You knew that, of course, because you saw the debugging information on the outgoing request. But it&#8217;s nice to have that programmatically verified.
+<li>The request succeeded; you downloaded the entire feed again from the remote server. Of course, the server also sent back a full complement of <abbr>HTTP</abbr> headers along with the feed data. That includes caching headers, which <code>httplib2</code> uses to update its local cache, in the hopes of avoiding network access the <em>next</em> time you request this feed. Everything about <abbr>HTTP</abbr> caching is designed to maximize cache hits and minimize network access. Even though you bypassed the cache this time, the remote server would really appreciate it if you would cache the result for next time.
+</ol>
+
+<h3 id=httplib2-etags>How <code>httplib2</code> Handles <code>Last-Modified</code> and <code>ETag</code> Headers</h3>
+
+<p>The <code>Cache-Control</code> and <code>Expires</code> <a href=#caching>caching headers</a> are called <i>freshness indicators</i>. They tell caches in no uncertain terms that you can completely avoid all network access until the cache expires. And that&#8217;s exactly the behavior you saw <a href=#httplib2-caching>in the previous section</a>: given a freshness indicator, <code>httplib2</code> <em>does not generate a single byte of network activity</em> to serve up cached data (unless you explicitly <a href=#bypass-the-cache>bypass the cache</a>, of course).
+
+<p>But what about the case where the data <em>might</em> have changed, but hasn&#8217;t? <abbr>HTTP</abbr> defines <a href=#last-modified><code>Last-Modified</code></a> and <a href=#etags><code>Etag</code></a> headers for this purpose. These headers are called <i>validators</i>. If the local cache is no longer fresh, a client can send the validators with the next request to see if the data has actually changed. If the data hasn&#8217;t changed, the server sends back a <code>304</code> status code <em>and no data</em>. So there&#8217;s still a round-trip over the network, but you end up downloading fewer bytes.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
+<samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>
+<samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/')</kbd>  <span class=u>&#x2460;</span></a>
+<samp>connect: (diveintopython3.org, 80)
+send: b'GET / HTTP/1.1
+Host: diveintopython3.org
+accept-encoding: deflate, gzip
+user-agent: Python-httplib2/$Rev: 259 $'
+reply: 'HTTP/1.1 200 OK'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>print(dict(response.items()))</kbd>                                 <span class=u>&#x2461;</span></a>
+<samp class=pp>{'-content-encoding': 'gzip',
+ 'accept-ranges': 'bytes',
+ 'connection': 'close',
+ 'content-length': '6657',
+ 'content-location': 'http://diveintopython3.org/',
+ 'content-type': 'text/html',
+ 'date': 'Tue, 02 Jun 2009 03:26:54 GMT',
+<mark> 'etag': '"7f806d-1a01-9fb97900"',</mark>
+<mark> 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',</mark>
+ 'server': 'Apache',
+ 'status': '200',
+ 'vary': 'Accept-Encoding,User-Agent'}</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>                                                  <span class=u>&#x2462;</span></a>
+<samp class=pp>6657</samp></pre>
+<ol>
+<li>Instead of the feed, this time we&#8217;re going to download the site&#8217;s home page, which is <abbr>HTML</abbr>. Since this is the first time you&#8217;ve ever requested this page, <code>httplib2</code> has little to work with, and it sends out a minimum of headers with the request.
+<li>The response contains a multitude of <abbr>HTTP</abbr> headers&hellip; but no caching information. However, it does include both an <code>ETag</code> and <code>Last-Modified</code> header.
+<li>At the time I constructed this example, this page was 6657 bytes. It&#8217;s probably changed since then, but don&#8217;t worry about it.
+</ol>
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/')</kbd>  <span class=u>&#x2460;</span></a>
+<samp>connect: (diveintopython3.org, 80)
+send: b'GET / HTTP/1.1
+Host: diveintopython3.org
+<a>if-none-match: "7f806d-1a01-9fb97900"                             <span class=u>&#x2461;</span></a>
+<a>if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT                  <span class=u>&#x2462;</span></a>
+accept-encoding: deflate, gzip
+user-agent: Python-httplib2/$Rev: 259 $'
+<a>reply: 'HTTP/1.1 304 Not Modified'                                <span class=u>&#x2463;</span></a></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.fromcache</kbd>                                            <span class=u>&#x2464;</span></a>
+<samp class=pp>True</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.status</kbd>                                               <span class=u>&#x2465;</span></a>
+<samp class=pp>200</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.dict['status']</kbd>                                       <span class=u>&#x2466;</span></a>
+<samp class=pp>'304'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>len(content)</kbd>                                                  <span class=u>&#x2467;</span></a>
+<samp class=pp>6657</samp></pre>
+<ol>
+<li>You request the same page again, with the same <code>Http</code> object (and the same local cache).
+<li><code>httplib2</code> sends the <code>ETag</code> validator back to the server in the <code>If-None-Match</code> header.
+<li><code>httplib2</code> also sends the <code>Last-Modified</code> validator back to the server in the <code>If-Modified-Since</code> header.
+<li>The server looked at these validators, looked at the page you requested, and determined that the page has not changed since you last requested it, so it sends back a <code>304</code> status code <em>and no data</em>.
+<li>Back on the client, <code>httplib2</code> notices the <code>304</code> status code and loads the content of the page from its cache.
+<li>This might be a bit confusing. There are really <em>two</em> status codes&nbsp;&mdash;&nbsp;<code>304</code> (returned from the server this time, which caused <code>httplib2</code> to look in its cache), and <code>200</code> (returned from the server <em>last time</em>, and stored in <code>httplib2</code>&#8217;s cache along with the page data). <code>response.status</code> returns the status from the cache.
+<li>If you want the raw status code returned from the server, you can get that by looking in <code>response.dict</code>, which is a dictionary of the actual headers returned from the server.
+<li>However, you still get the data in the <var>content</var> variable. Generally, you don&#8217;t need to know why a response was served from the cache. (You may not even care that it was served from the cache at all, and that&#8217;s fine too. <code>httplib2</code> is smart enough to let you act dumb.) By the time the <code>request()</code> method returns to the caller, <code>httplib2</code> has already updated its cache and returned the data to you.
+</ol>
+
+<h3 id=httplib2-compression>How <code>http2lib</code> Handles Compression</h3>
+
+<aside>&#8220;We have both kinds of music, country AND western.&#8221;</aside>
+
+<p><abbr>HTTP</abbr> supports <a href=#compression>several types of compression</a>; the two most common types are gzip and deflate. <code>httplib2</code> supports both of these.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/')</kbd>
+<samp>connect: (diveintopython3.org, 80)
+send: b'GET / HTTP/1.1
+Host: diveintopython3.org
+<a>accept-encoding: deflate, gzip                          <span class=u>&#x2460;</span></a>
+user-agent: Python-httplib2/$Rev: 259 $'
+reply: 'HTTP/1.1 200 OK'</samp>
+<samp class=p>>>> </samp><kbd class=pp>print(dict(response.items()))</kbd>
+<samp class=pp><a>{'-content-encoding': 'gzip',                           <span class=u>&#x2461;</span></a>
+ 'accept-ranges': 'bytes',
+ 'connection': 'close',
+ 'content-length': '6657',
+ 'content-location': 'http://diveintopython3.org/',
+ 'content-type': 'text/html',
+ 'date': 'Tue, 02 Jun 2009 03:26:54 GMT',
+ 'etag': '"7f806d-1a01-9fb97900"',
+ 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',
+ 'server': 'Apache',
+ 'status': '304',
+ 'vary': 'Accept-Encoding,User-Agent'}</samp></pre>
+<ol>
+<li>Every time <code>httplib2</code> sends a request, it includes an <code>Accept-Encoding</code> header to tell the server that it can handle either <code>deflate</code> or <code>gzip</code> compression.
+<li>In this case, the server has responded with a gzip-compressed payload. By the time the <code>request()</code> method returns, <code>httplib2</code> has already decompressed the body of the response and placed it in the <var>content</var> variable. If you&#8217;re curious about whether or not the response was compressed, you can check <var>response['-content-encoding']</var>; otherwise, don&#8217;t worry about it.
+</ol>
+
+<h3 id=httplib2-redirects>How <code>httplib2</code> Handles Redirects</h3>
+
+<p><abbr>HTTP</abbr> defines <a href=#redirects>two kinds of redirects</a>: temporary and permanent. There&#8217;s nothing special to do with temporary redirects except follow them, which <code>httplib2</code> does automatically.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
+<samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>
+<samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed-302.xml')</kbd>  <span class=u>&#x2460;</span></a>
+<samp>connect: (diveintopython3.org, 80)
+<a>send: b'GET /examples/feed-302.xml HTTP/1.1                                            <span class=u>&#x2461;</span></a>
+Host: diveintopython3.org
+accept-encoding: deflate, gzip
+user-agent: Python-httplib2/$Rev: 259 $'
+<a>reply: 'HTTP/1.1 302 Found'                                                            <span class=u>&#x2462;</span></a>
+<a>send: b'GET /examples/feed.xml HTTP/1.1                                                <span class=u>&#x2463;</span></a>
+Host: diveintopython3.org
+accept-encoding: deflate, gzip
+user-agent: Python-httplib2/$Rev: 259 $'
+reply: 'HTTP/1.1 200 OK'</samp></pre>
+<ol>
+<li>There is no feed at this <abbr>URL</abbr>. I&#8217;ve set up my server to issue a temporary redirect to the correct address.
+<li>There&#8217;s the request.
+<li>And there&#8217;s the response: <code>302 Found</code>. Not shown here, this response also includes a <code>Location</code> header that points to the real <abbr>URL</abbr>.
+<li><code>httplib2</code> immediately turns around and &#8220;follows&#8221; the redirect by issuing another request for the <abbr>URL</abbr> given in the <code>Location</code> header: <code>http://diveintopython3.org/examples/feed.xml</code>
+</ol>
+
+<p>&#8220;Following&#8221; a redirect is nothing more than this example shows. <code>httplib2</code> sends a request for the <abbr>URL</abbr> you asked for. The server comes back with a response that says &#8220;No no, look over there instead.&#8221; <code>httplib2</code> sends another request for the new <abbr>URL</abbr>.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>response</kbd>                                                          <span class=u>&#x2460;</span></a>
+<samp class=pp>{'status': '200',
+ 'content-length': '3070',
+<a> 'content-location': 'http://diveintopython3.org/examples/feed.xml',  <span class=u>&#x2461;</span></a>
+ 'accept-ranges': 'bytes',
+ 'expires': 'Thu, 04 Jun 2009 02:21:41 GMT',
+ 'vary': 'Accept-Encoding',
+ 'server': 'Apache',
+ 'last-modified': 'Wed, 03 Jun 2009 02:20:15 GMT',
+ 'connection': 'close',
+<a> '-content-encoding': 'gzip',                                         <span class=u>&#x2462;</span></a>
+ 'etag': '"bfe-4cbbf5c0"',
+<a> 'cache-control': 'max-age=86400',                                    <span class=u>&#x2463;</span></a>
+ 'date': 'Wed, 03 Jun 2009 02:21:41 GMT',
+ 'content-type': 'application/xml'}</samp></pre>
+<ol>
+<li>The <var>response</var> you get back from this single call to the <code>request()</code> method is the response from the final <abbr>URL</abbr>.
+<li><code>httplib2</code> adds the final <abbr>URL</abbr> to the <var>response</var> dictionary, as <code>content-location</code>. This is not a header that came from the server; it&#8217;s specific to <code>httplib2</code>.
+<li>Apropos of nothing, this feed is <a href=#httplib2-compression>compressed</a>.
+<li>And cacheable. (This is important, as you&#8217;ll see in a minute.)
+</ol>
+
+<p>The <var>response</var> you get back gives you information about the <em>final</em> <abbr>URL</abbr>. What if you want more information about the intermediate <abbr>URL</abbr>s, the ones that eventually redirected to the final <abbr>URL</abbr>? <code>httplib2</code> lets you do that, too.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>response.previous</kbd>                                                     <span class=u>&#x2460;</span></a>
+<samp class=pp>{'status': '302',
+ 'content-length': '228',
+ 'content-location': 'http://diveintopython3.org/examples/feed-302.xml',
+ 'expires': 'Thu, 04 Jun 2009 02:21:41 GMT',
+ 'server': 'Apache',
+ 'connection': 'close',
+ 'location': 'http://diveintopython3.org/examples/feed.xml',
+ 'cache-control': 'max-age=86400',
+ 'date': 'Wed, 03 Jun 2009 02:21:41 GMT',
+ 'content-type': 'text/html; charset=iso-8859-1'}</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>type(response)</kbd>                                                        <span class=u>&#x2461;</span></a>
+<samp class=pp>&lt;class 'httplib2.Response'></samp>
+<samp class=p>>>> </samp><kbd class=pp>type(response.previous)</kbd>
+<samp class=pp>&lt;class 'httplib2.Response'></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.previous.previous</kbd>                                            <span class=u>&#x2462;</span></a>
+<samp class=p>>>></samp></pre>
+<ol>
+<li>The <var>response.previous</var> attribute holds a reference to the previous response object that <code>httplib2</code> followed to get to the current response object.
+<li>Both <var>response</var> and <var>response.previous</var> are <code>httplib2.Response</code> objects.
+<li>That means you can check <var>response.previous.previous</var> to follow the redirect chain backwards even further. (Scenario: one <abbr>URL</abbr> redirects to a second <abbr>URL</abbr> which redirects to a third <abbr>URL</abbr>. It could happen!) In this case, we&#8217;ve already reached the beginning of the redirect chain, so the attribute is <code>None</code>.
+</ol>
+
+<p>What happens if you request the same <abbr>URL</abbr> again?
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed-302.xml')</kbd>  <span class=u>&#x2460;</span></a>
+<samp>connect: (diveintopython3.org, 80)
+<a>send: b'GET /examples/feed-302.xml HTTP/1.1                                              <span class=u>&#x2461;</span></a>
+Host: diveintopython3.org
+accept-encoding: deflate, gzip
+user-agent: Python-httplib2/$Rev: 259 $'
+<a>reply: 'HTTP/1.1 302 Found'                                                              <span class=u>&#x2462;</span></a></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>content2 == content</kbd>                                                                  <span class=u>&#x2463;</span></a>
+<samp class=pp>True</samp></pre>
+<ol>
+<li>Same <abbr>URL</abbr>, same <code>httplib2.Http</code> object (and therefore the same cache).
+<li>The <code>302</code> response was not cached, so <code>httplib2</code> sends another request for the same <abbr>URL</abbr>.
+<li>Once again, the server responds with a <code>302</code>. But notice what <em>didn&#8217;t</em> happen: there wasn&#8217;t ever a second request for the final <abbr>URL</abbr>, <code>http://diveintopython3.org/examples/feed.xml</code>. That response was cached (remember the <code>Cache-Control</code> header that you saw in the previous example). Once <code>httplib2</code> received the <code>302 Found</code> code, <em>it checked its cache before issuing another request</em>. The cache contained a fresh copy of <code>http://diveintopython3.org/examples/feed.xml</code>, so there was no need to re-request it.
+<li>By the time the <code>request()</code> method returns, it has read the feed data from the cache and returned it. Of course, it&#8217;s the same as the data you received last time.
+</ol>
+
+<p>In other words, you don&#8217;t have to do anything special for temporary redirects. <code>httplib2</code> will follow them automatically, and the fact that one <abbr>URL</abbr> redirects to another has no bearing on <code>httplib2</code>&#8217;s support for compression, caching, <code>ETags</code>, or any of the other features of <abbr>HTTP</abbr>.
+
+<p>Permanent redirects are just as simple.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>response, content = h.request('http://diveintopython3.org/examples/feed-301.xml')</kbd>  <span class=u>&#x2460;</span></a>
+<samp>connect: (diveintopython3.org, 80)
+send: b'GET /examples/feed-301.xml HTTP/1.1
+Host: diveintopython3.org
+accept-encoding: deflate, gzip
+user-agent: Python-httplib2/$Rev: 259 $'
+<a>reply: 'HTTP/1.1 301 Moved Permanently'                                                <span class=u>&#x2461;</span></a></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>response.fromcache</kbd>                                                                 <span class=u>&#x2462;</span></a>
+<samp class=pp>True</samp></pre>
+<ol>
+<li>Once again, this <abbr>URL</abbr> doesn&#8217;t really exist. I&#8217;ve set up my server to issue a permanent redirect to <code>http://diveintopython3.org/examples/feed.xml</code>.
+<li>And here it is: status code <code>301</code>. But again, notice what <em>didn&#8217;t</em> happen: there was no request to the redirect <abbr>URL</abbr>. Why not? Because it&#8217;s already cached locally.
+<li><code>httplib2</code> &#8220;followed&#8221; the redirect right into its cache.
+</ol>
+
+<p>But wait! There&#8217;s more!
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>response2, content2 = h.request('http://diveintopython3.org/examples/feed-301.xml')</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>response2.fromcache</kbd>                                                                  <span class=u>&#x2461;</span></a>
+<samp class=pp>True</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>content2 == content</kbd>                                                                  <span class=u>&#x2462;</span></a>
+<samp class=pp>True</samp>
+</pre>
+<ol>
+<li>Here&#8217;s the difference between temporary and permanent redirects: once <code>httplib2</code> follows a permanent redirect, all further requests for that <abbr>URL</abbr> will transparently be rewritten to the target <abbr>URL</abbr> <em>without hitting the network for the original <abbr>URL</abbr></em>. Remember, debugging is still turned on, yet there is no output of network activity whatsoever.
+<li>Yep, this response was retrieved from the local cache.
+<li>Yep, you got the entire feed (from the cache).
+</ol>
+
+<p><abbr>HTTP</abbr>. It works.
+
+<p class=a>&#x2042;
+
+<h2 id=beyond-get>Beyond HTTP GET</h2>
+
+<p><abbr>HTTP</abbr> web services are not limited to <code>GET</code> requests. What if you want to create something new? Whenever you post a comment on a discussion forum, update your weblog, publish your status on a microblogging service like <a href=http://twitter.com/>Twitter</a> or <a href=http://identi.ca/>Identi.ca</a>, you&#8217;re probably already using <abbr>HTTP</abbr> <code>POST</code>.
+
+<p>Both Twitter and Identi.ca both offer a simple <abbr>HTTP</abbr>-based <abbr>API</abbr> for publishing and updating your status in 140 characters or less. Let&#8217;s look at <a href=http://laconi.ca/trac/wiki/TwitterCompatibleAPI>Identi.ca&#8217;s <abbr>API</abbr> documentation</a> for updating your status:
+
+<blockquote class=pf>
+<p><b>Identi.ca <abbr>REST</abbr> <abbr>API</abbr> Method: statuses/update</b><br>
+Updates the authenticating user&#8217;s status.  Requires the <code>status</code> parameter specified below.  Request must be a <code>POST</code>.
+
+<dl>
+<dt><abbr>URL</abbr>
+<dd><code>https://identi.ca/api/statuses/update.<i><var>format</var></i></code>
+<dt>Formats
+<dd><code>xml</code>, <code>json</code>, <code>rss</code>, <code>atom</code>
+<dt><abbr>HTTP</abbr> Method(s)
+<dd><code>POST</code>
+<dt>Requires Authentication
+<dd>true
+<dt>Parameters
+<dd><code>status</code>. Required. The text of your status update. <abbr>URL</abbr>-encode as necessary.
+</dl>
+</blockquote>
+
+<p>How does this work? To publish a new message on Identi.ca, you need to issue an <abbr>HTTP</abbr> <code>POST</code> request to <code>http://identi.ca/api/statuses/update.<i>format</i></code>. (The <var>format</var> bit is not part of the <abbr>URL</abbr>; you replace it with the data format you want the server to return in response to your request. So if you want a response in <abbr>XML</abbr>, you would post the request to <code>https://identi.ca/api/statuses/update.xml</code>.) The request needs to include a parameter called <code>status</code>, which contains the text of your status update. And the request needs to be authenticated.
+
+<p>Authenticated? Sure. To update your status on Identi.ca, you need to prove who you are. Identi.ca is not a wiki; only you can update your own status. Identi.ca uses <a href=http://en.wikipedia.org/wiki/Basic_access_authentication><abbr>HTTP</abbr> Basic Authentication</a> (<i>a.k.a.</i> <a href=http://www.ietf.org/rfc/rfc2617.txt>RFC 2617</a>) over <abbr>SSL</abbr> to provide secure but easy-to-use authentication. <code>httplib2</code> supports both <abbr>SSL</abbr> and <abbr>HTTP</abbr> Basic Authentication, so this part is easy.
+
+<p>A <code>POST</code> request is different from a <code>GET</code> request, because it includes a <i>payload</i>. The payload is the data you want to send to the server. The one piece of data that this <abbr>API</abbr> method <em>requires</em> is <code>status</code>, and it should be <i><abbr>URL</abbr>-encoded</i>. This is a very simple serialization format that takes a set of key-value pairs (<i>i.e.</i> a <a href=native-datatypes.html#dictionaries>dictionary</a>) and transforms it into a string.
+
+<pre class=screen>
+<a><samp class=p>>>> </samp><kbd class=pp>from urllib.parse import urlencode</kbd>              <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>data = {'status': 'Test update from Python 3'}</kbd>  <span class=u>&#x2461;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>urlencode(data)</kbd>                                 <span class=u>&#x2462;</span></a>
+<samp>'status=Test+update+from+Python+3'</samp></pre>
+<ol>
+<li>Python comes with a utility function to <abbr>URL</abbr>-encode a dictionary: <code>urllib.parse.urlencode()</code>.
+<li>This is the sort of dictionary that the Identi.ca <abbr>API</abbr> is looking for. It contains one key, <code>status</code>, whose value is the text of a single status update.
+<li>This is what the <abbr>URL</abbr>-encoded string looks like. This is the <i>payload</i> that will be sent &#8220;on the wire&#8221; to the Identi.ca <abbr>API</abbr> server in your <abbr>HTTP</abbr> <code>POST</code> request.
+</ol>
+
+<p>
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>from urllib.parse import urlencode</kbd>
+<samp class=p>>>> </samp><kbd class=pp>import httplib2</kbd>
+<samp class=p>>>> </samp><kbd class=pp>httplib2.debuglevel = 1</kbd>
+<samp class=p>>>> </samp><kbd class=pp>h = httplib2.Http('.cache')</kbd>
+<samp class=p>>>> </samp><kbd class=pp>data = {'status': 'Test update from Python 3'}</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>h.add_credentials('diveintomark', '<var>MY_SECRET_PASSWORD</var>', 'identi.ca')</kbd>    <span class=u>&#x2460;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>resp, content = h.request('https://identi.ca/api/statuses/update.xml',</kbd>
+<a><samp class=p>... </samp><kbd class=pp>    'POST',</kbd>                                                             <span class=u>&#x2461;</span></a>
+<a><samp class=p>... </samp><kbd class=pp>    urlencode(data),</kbd>                                                    <span class=u>&#x2462;</span></a>
+<a><samp class=p>... </samp><kbd class=pp>    headers={'Content-Type': 'application/x-www-form-urlencoded'})</kbd>      <span class=u>&#x2463;</span></a></pre>
+<ol>
+<li>This is how <code>httplib2</code> handles authentication. Store your username and password with the <code>add_credentials()</code> method. When <code>httplib2</code> tries to issue the request, the server will respond with a <code>401 Unauthorized</code> status code, and it will list which authentication methods it supports (in the <code>WWW-Authenticate</code> header). <code>httplib2</code> will automatically construct an <code>Authorization</code> header and re-request the <abbr>URL</abbr>.
+<li>The second parameter is the type of <abbr>HTTP</abbr> request, in this case <code>POST</code>.
+<li>The third parameter is the <i>payload</i> to send to the server. We&#8217;re sending the <abbr>URL</abbr>-encoded dictionary with a status message.
+<li>Finally, we need to tell the server that the payload is <abbr>URL</abbr>-encoded data.
+</ol>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>The third parameter to the <code>add_credentials()</code> method is the domain in which the credentials are valid. You should always specify this! If you leave out the domain and later reuse the <code>httplib2.Http</code> object on a different authenticated site, <code>httplib2</code> might end up leaking one site&#8217;s username and password to the other site.
+</blockquote>
+
+<p>This is what goes over the wire:
+
+<pre class=screen>
+# continued from the previous example
+<samp>send: b'POST /api/statuses/update.xml HTTP/1.1
+Host: identi.ca
+Accept-Encoding: identity
+Content-Length: 32
+content-type: application/x-www-form-urlencoded
+user-agent: Python-httplib2/$Rev: 259 $
+
+status=Test+update+from+Python+3'
+<a>reply: 'HTTP/1.1 401 Unauthorized'                        <span class=u>&#x2460;</span></a>
+<a>send: b'POST /api/statuses/update.xml HTTP/1.1            <span class=u>&#x2461;</span></a>
+Host: identi.ca
+Accept-Encoding: identity
+Content-Length: 32
+content-type: application/x-www-form-urlencoded
+<a>authorization: Basic SECRET_HASH_CONSTRUCTED_BY_HTTPLIB2  <span class=u>&#x2462;</span></a>
+user-agent: Python-httplib2/$Rev: 259 $
+
+status=Test+update+from+Python+3'
+<a>reply: 'HTTP/1.1 200 OK'                                  <span class=u>&#x2463;</span></a></samp></pre>
+<ol>
+<li>After the first request, the server responds with a <code>401 Unauthorized</code> status code. <code>httplib2</code> will never send authentication headers unless the server explicitly asks for them. This is how the server asks for them.
+<li><code>httplib2</code> immediately turns around and requests the same <abbr>URL</abbr> a second time.
+<li>This time, it includes the username and password that you added with the <code>add_credentials()</code> method.
+<li>It worked!
+</ol>
+
+<p>What does the server send back after a successful request? That depends entirely on the web service <abbr>API</abbr>. In some protocols (like the <a href=http://www.ietf.org/rfc/rfc5023.txt>Atom Publishing Protocol</a>), the server sends back a <code>201 Created</code> status code and the location of the newly created resource in the <code>Location</code> header. Identi.ca sends back a <code>200 OK</code> and an <abbr>XML</abbr> document containing information about the newly created resource.
+
+<pre class=screen>
+# continued from the previous example
+<a><samp class=p>>>> </samp><kbd class=pp>print(content.decode('utf-8'))</kbd>                             <span class=u>&#x2460;</span></a>
+<samp class=pp>&lt;?xml version="1.0" encoding="UTF-8"?>
+&lt;status>
+<a> &lt;text>Test update from Python 3&lt;/text>                        <span class=u>&#x2461;</span></a>
+ &lt;truncated>false&lt;/truncated>
+ &lt;created_at>Wed Jun 10 03:53:46 +0000 2009&lt;/created_at>
+ &lt;in_reply_to_status_id>&lt;/in_reply_to_status_id>
+ &lt;source>api&lt;/source>
+<a> &lt;id>5131472&lt;/id>                                              <span class=u>&#x2462;</span></a>
+ &lt;in_reply_to_user_id>&lt;/in_reply_to_user_id>
+ &lt;in_reply_to_screen_name>&lt;/in_reply_to_screen_name>
+ &lt;favorited>false&lt;/favorited>
+ &lt;user>
+  &lt;id>3212&lt;/id>
+  &lt;name>Mark Pilgrim&lt;/name>
+  &lt;screen_name>diveintomark&lt;/screen_name>
+  &lt;location>27502, US&lt;/location>
+  &lt;description>tech writer, husband, father&lt;/description>
+  &lt;profile_image_url>http://avatar.identi.ca/3212-48-20081216000626.png&lt;/profile_image_url>
+  &lt;url>http://diveintomark.org/&lt;/url>
+  &lt;protected>false&lt;/protected>
+  &lt;followers_count>329&lt;/followers_count>
+  &lt;profile_background_color>&lt;/profile_background_color>
+  &lt;profile_text_color>&lt;/profile_text_color>
+  &lt;profile_link_color>&lt;/profile_link_color>
+  &lt;profile_sidebar_fill_color>&lt;/profile_sidebar_fill_color>
+  &lt;profile_sidebar_border_color>&lt;/profile_sidebar_border_color>
+  &lt;friends_count>2&lt;/friends_count>
+  &lt;created_at>Wed Jul 02 22:03:58 +0000 2008&lt;/created_at>
+  &lt;favourites_count>30768&lt;/favourites_count>
+  &lt;utc_offset>0&lt;/utc_offset>
+  &lt;time_zone>UTC&lt;/time_zone>
+  &lt;profile_background_image_url>&lt;/profile_background_image_url>
+  &lt;profile_background_tile>false&lt;/profile_background_tile>
+  &lt;statuses_count>122&lt;/statuses_count>
+  &lt;following>false&lt;/following>
+  &lt;notifications>false&lt;/notifications>
+&lt;/user>
+&lt;/status></samp></pre>
+<ol>
+<li>Remember, the data returned by <code>httplib2</code> is always <a href=strings.html#byte-arrays>bytes</a>, not a string. To convert it to a string, you need to decode it using the proper character encoding. Identi.ca&#8217;s <abbr>API</abbr> always returns results in <abbr>UTF-8</abbr>, so that part is easy.
+<li>There&#8217;s the text of the status message we just published.
+<li>There&#8217;s the unique identifier for the new status message. Identi.ca uses this to construct a <abbr>URL</abbr> for viewing the message on the web.
+</ol>
+
+<p>And here it is:
+
+<p class=c><img class=fr src=i/identica-screenshot.png alt="screenshot showing published status message on Identi.ca" width=740 height=449>
+
+<p class=a>&#x2042;
+
+<h2 id=beyond-post>Beyond HTTP POST</h2>
+
+<p><abbr>HTTP</abbr> isn&#8217;t limited to <code>GET</code> and <code>POST</code>. Those are certainly the most common types of requests, especially in web browsers. But web service <abbr>API</abbr>s can go beyond <code>GET</code> and <code>POST</code>, and <code>httplib2</code> is ready.
+
+<pre class=screen>
+# continued from the previous example
+<samp class=p>>>> </samp><kbd class=pp>from xml.etree import ElementTree as etree</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>tree = etree.fromstring(content)</kbd>                                          <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>status_id = tree.findtext('id')</kbd>                                           <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>status_id</kbd>
+<samp class=pp>'5131472'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>url = 'https://identi.ca/api/statuses/destroy/{0}.xml'.format(status_id)</kbd>  <span class=u>&#x2462;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>resp, deleted_content = h.request(url, 'DELETE')</kbd>                          <span class=u>&#x2463;</span></a></pre>
+<ol>
+<li>The server returned <abbr>XML</abbr>, right? You know <a href=xml.html#xml-parse>how to parse <abbr>XML</abbr></a>.
+<li>The <code>findtext()</code> method finds the first instance of the given expression and extracts its text content. In this case, we&#8217;re just looking for an <code>&lt;id></code> element.
+<li>Based on the text content of the <code>&lt;id></code> element, we can construct a <abbr>URL</abbr> to delete the status message we just published.
+<li>To delete a message, you simply issue an <abbr>HTTP</abbr> <code>DELETE</code> request to that <abbr>URL</abbr>.
+</ol>
+
+<p>This is what goes over the wire:
+
+<pre class=screen>
+<samp><a>send: b'DELETE /api/statuses/destroy/5131472.xml HTTP/1.1      <span class=u>&#x2460;</span></a>
+Host: identi.ca
+Accept-Encoding: identity
+user-agent: Python-httplib2/$Rev: 259 $
+
+'
+<a>reply: 'HTTP/1.1 401 Unauthorized'                             <span class=u>&#x2461;</span></a>
+<a>send: b'DELETE /api/statuses/destroy/5131472.xml HTTP/1.1      <span class=u>&#x2462;</span></a>
+Host: identi.ca
+Accept-Encoding: identity
+<a>authorization: Basic SECRET_HASH_CONSTRUCTED_BY_HTTPLIB2       <span class=u>&#x2463;</span></a>
+user-agent: Python-httplib2/$Rev: 259 $
+
+'
+<a>reply: 'HTTP/1.1 200 OK'                                       <span class=u>&#x2464;</span></a></samp>
+<samp class=p>>>> </samp><kbd class=pp>resp.status</kbd>
+<samp class=pp>200</samp></pre>
+<ol>
+<li>&#8220;Delete this status message.&#8221;
+<li>&#8220;I&#8217;m sorry, Dave, I&#8217;m afraid I can&#8217;t do that.&#8221;
+<li>&#8220;Unauthorized<span class=u title='interrobang!'>&#8253;</span> Hmmph. Delete this status message, <em>please</em>&hellip;
+<li>&hellip;and here&#8217;s my username and password.&#8221;
+<li>&#8220;Consider it done!&#8221;
+</ol>
+
+<p>And just like that, poof, it&#8217;s gone.
+
+<p class=c><img class=fr src=i/identica-deleted.png alt="screenshot showing deleted message on Identi.ca" width=740 height=449>
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+
+<p><code>httplib2</code>:
+
+<ul>
+<li><a href=http://code.google.com/p/httplib2/><code>httplib2</code> project page</a>
+<li><a href=http://code.google.com/p/httplib2/wiki/ExamplesPython3>More <code>httplib2</code> code examples</a>
+<li><a href=http://www.xml.com/pub/a/2006/02/01/doing-http-caching-right-introducing-httplib2.html>Doing <abbr>HTTP</abbr> Caching Right: Introducing <code>httplib2</code></a>
+<li><a href=http://www.xml.com/pub/a/2006/03/29/httplib2-http-persistence-and-authentication.html><code>httplib2</code>: <abbr>HTTP</abbr> Persistence and Authentication</a>
+</ul>
+
+<p><abbr>HTTP</abbr> caching:
+
+<ul>
+<li><a href=http://www.mnot.net/cache_docs/><abbr>HTTP</abbr> Caching Tutorial</a> by Mark Nottingham
+<li><a href=http://code.google.com/p/doctype/wiki/ArticleHttpCaching>How to control caching with <abbr>HTTP</abbr> headers</a> on Google Doctype
+</ul>
+
+<p><abbr>RFC</abbr>s:
+
+<ul>
+<li><a href=http://www.ietf.org/rfc/rfc2616.txt>RFC 2616: <abbr>HTTP</abbr></a>
+<li><a href=http://www.ietf.org/rfc/rfc2617.txt>RFC 2617: <abbr>HTTP</abbr> Basic Authentication</a>
+<li><a href=http://www.ietf.org/rfc/rfc1951.txt>RFC 1951: deflate compression</a>
+<li><a href=http://www.ietf.org/rfc/rfc1952.txt>RFC 1952: gzip compression</a>
+</ul>
+
+<p class=v><a rel=prev href=serializing.html title='back to &#8220;Serializing Python Objects&#8221;'><span class=u>&#x261C;</span></a> <a rel=next href=case-study-porting-chardet-to-python-3.html title='onward to &#8220;Case Study: Porting chardet to Python 3&#8221;'><span class=u>&#x261E;</span></a>
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/installing-python.html b/installing-python.html
index 4793167..59df064 100755
--- a/installing-python.html
+++ b/installing-python.html
@@ -1,364 +1,364 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Installing Python - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 0}
-.i{list-style:none;margin:0;padding:0}
-#which{padding-top:1.75em}
-h2,.i>li{clear:both}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#installing-python>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=novice>&#x2666;&#x2662;&#x2662;&#x2662;&#x2662;</span>
-<h1>Installing Python</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> <i lang=la>Tempora mutantur nos et mutamur in illis.</i> (Times change, and we change with them.) <span class=u>&#x275E;</span><br>&mdash; ancient Roman proverb
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>Before you can start programming in Python 3, you need to install it. Or do you?
-
-<h2 id=which>Which Python Is Right For You?</h2>
-
-<p>If you're using an account on a hosted server, your <abbr>ISP</abbr> may have already installed Python 3. If you&#8217;re running Linux at home, you may already have Python 3, too. Most popular GNU/Linux distributions come with Python 2 in the default installation; a small but growing number of distributions also include Python 3. Mac OS X includes a command-line version of Python 2, but as of this writing it does not include Python 3. Microsoft Windows does not come with any version of Python. But don&#8217;t despair! You can point-and-click your way through installing Python, regardless of what operating system you have.
-
-<p>The easiest way to check for Python 3 on your Linux or Mac OS X system is to get to a command line. On Linux, look in your <b><code>Applications</code></b> menu for a program called <b><code>Terminal</code></b>. (It may be in a submenu like <b><code>Accessories</code></b> or <b><code>System</code></b>.) On Mac OS X, there is an application called <b><code>Terminal.app</code></b> in your <code>/Application/Utilities/</code> folder.
-
-<p>Once you&#8217;re at a command line prompt, just type <kbd>python3</kbd> (all lowercase, no spaces) and see what happens. On my home Linux system, Python 3 is already installed, and this command gets me into the <i>Python <dfn>interactive shell</dfn></i>.
-
-<pre class='nd screen'>
-<samp class=p>mark@atlantis:~$ </samp><kbd>python3</kbd>
-<samp>Python 3.0.1+ (r301:69556, Apr 15 2009, 17:25:52)
-[GCC 4.3.3] on linux2
-Type "help", "copyright", "credits" or "license" for more information.
->>></samp></pre>
-
-<p>(Type <kbd>exit()</kbd> and press <kbd>ENTER</kbd> to exit the Python interactive shell.)
-
-<p>My <a href=http://cornerhost.com/>web hosting provider</a> also runs Linux and provides command-line access, but my server does not have Python 3 installed. (Boo!)
-
-<pre class='nd screen'>
-<samp class=p>mark@manganese:~$ </samp><kbd>python3</kbd>
-<samp>bash: python3: command not found</samp></pre>
-
-<p>So back to the question that started this section, &#8220;Which Python is right for you?&#8221;  Whichever one runs on the computer you already have.
-
-<p>[Read on for Windows instructions, or skip to <a href=#macosx>Installing on Mac OS X</a>, <a href=#ubuntu>Installing on Ubuntu Linux</a>, or <a href=#other>Installing on Other Platforms</a>.]
-
-<p class=a>&#x2042;
-
-<h2 id=windows>Installing on Microsoft Windows</h2>
-
-<p>Windows comes in two architectures these days: 32-bit and 64-bit. Of course, there are lots of different <i>versions</i> of Windows&nbsp;&mdash;&nbsp;XP, Vista, Windows 7&nbsp;&mdash;&nbsp;but Python runs on all of them. The more important distinction is 32-bit v. 64-bit. If you have no idea what architecture you&#8217;re running, it&#8217;s probably 32-bit.
-
-<p>Visit <a href=http://python.org/download/><code>python.org/download/</code></a> and download the appropriate Python 3 Windows installer for your architecture. Your choices will look something like this:
-
-<ul>
-<li><b>Python 3.1 Windows installer</b> (Windows binary&nbsp;&mdash;&nbsp;does not include source)
-<li><b>Python 3.1 Windows AMD64 installer</b> (Windows AMD64 binary&nbsp;&mdash;&nbsp;does not include source)
-</ul>
-
-<p>I don&#8217;t want to include direct download links here, because minor updates of Python happen all the time and I don&#8217;t want to be responsible for you missing important updates. You should always install the most recent version of Python 3.x unless you have some esoteric reason not to.
-
-<ol class=i>
-<li>
-<p class='ss nm'><img src=i/win-install-0-security-warning.png width=409 height=309 alt='[Windows dialog: open file security warning]'>
-<p>Once your download is complete, double-click the <code>.msi</code> file. Windows will pop up a security alert, since you&#8217;re about to be running executable code. The official Python installer is digitally signed by the <a href=http://www.python.org/psf/>Python Software Foundation</a>, the non-profit corporation that oversees Python development. Don&#8217;t accept imitations!
-<p>Click the <code>Run</code> button to launch the Python 3 installer.
-
-<li>
-<p class='ss nm'><img src=i/win-install-1-all-users-or-just-me.png width=499 height=432 alt='[Python installer: select whether to install Python 3.1 for all users of this computer]'>
-<p>The first question the installer will ask you is whether you want to install Python 3 for all users or just for you. The default choice is &#8220;install for all users,&#8221; which is the best choice unless you have a good reason to choose otherwise. (One possible reason why you would want to &#8220;install just for me&#8221; is that you are installing Python on your company&#8217;s computer and you don&#8217;t have administrative rights on your Windows account. But then, why are you installing Python without permission from your company&#8217;s Windows administrator? Don&#8217;t get me in trouble here!)
-<p>Click the <code>Next</code> button to accept your choice of installation type.
-
-<li>
-<p class='ss nm'><img src=i/win-install-2-destination-directory.png width=499 height=432 alt='[Python installer: select destination directory]'>
-<p>Next, the installer will prompt you to choose a destination directory. The default for all versions of Python 3.1.x is <code>C:\Python31\</code>, which should work well for most users unless you have a specific reason to change it. If you maintain a separate drive letter for installing applications, you can browse to it using the embedded controls, or simply type the pathname in the box below. You are not limited to installing Python on the <code>C:</code> drive; you can install it on any drive, in any folder.
-<p>Click the <code>Next</code> button to accept your choice of destination directory.
-
-<li>
-<p class='ss nm'><img src=i/win-install-3-customize.png width=499 height=432 alt='[Python installer: customize Python 3.1]'>
-<p>The next page looks complicated, but it&#8217;s not really. Like many installers, you have the option not to install every single component of Python 3. If disk space is especially tight, you can exclude certain components.
-<ul>
-<li><b>Register Extensions</b> allows you to double-click Python scripts (<code>.py</code> files) and run them. Recommended but not required. (This option doesn&#8217;t require any disk space, so there is little point in excluding it.)
-<li><b>Tcl/Tk</b> is the graphics library used by the Python Shell, which you will use throughout this book. I strongly recommend keeping this option.
-<li><b>Documentation</b> installs a help file that contains much of the information on <a href=http://docs.python.org/><code>docs.python.org</code></a>. Recommended if you are on dialup or have limited Internet access.
-<li><b>Utility Scripts</b> includes the <code>2to3.py</code> script which you&#8217;ll learn about <a href=case-study-porting-chardet-to-python-3.html>later in this book</a>. Required if you want to learn about migrating existing Python 2 code to Python 3. If you have no existing Python 2 code, you can skip this option.
-<li><b>Test Suite</b> is a collection of scripts used to test the Python interpreter itself. We will not use it in this book, nor have I ever used it in the course of programming in Python. Completely optional.
-</ul>
-
-<li>
-<p class='ss nm'><img src=i/win-install-3a-disk-usage.png width=499 height=432 alt='[Python installer: disk space requirements]'>
-<p>If you&#8217;re unsure how much disk space you have, click the <code>Disk Usage</code> button. The installer will list your drive letters, compute how much space is available on each drive, and calculate how much would be left after installation.
-<p>Click the <code>OK</code> button to return to the &#8220;Customizing Python&#8221; page.
-
-<li>
-<p class='ss nm'><img src=i/win-install-3b-test-suite.png width=499 height=432 alt='[Python installer: removing Test Suite option will save 7908KB on your hard drive]'>
-<p>If you decide to exclude an option, select the drop-down button before the option and select &#8220;Entire feature will be unavailable.&#8221; For example, excluding the test suite will save you a whopping 7908<abbr>KB</abbr> of disk space.
-<p>Click the <code>Next</code> button to accept your choice of options.
-
-<li>
-<p class='ss nm'><img src=i/win-install-4-copying.png width=499 height=432 alt='[Python installer: progress meter]'>
-<p>The installer will copy all the necessary files to your chosen destination directory. (This happens so quickly, I had to try it three times to even get a screenshot of it!)
-
-<li>
-<p class='ss nm'><img src=i/win-install-5-finish.png width=499 height=432 alt='[Python installer: installation completed. Special Windows thanks to Mark Hammond, without whose years of freely shared Windows expertise, Python for Windows would still be Python for DOS.]'>
-<p>Click the <code>Finish</code> button to exit the installer.
-
-<li>
-<p class='ss nm'><img src=i/win-interactive-shell.png width=677 height=715 alt='[Windows Python Shell, a graphical interactive shell for Python]'>
-<p>In your <code>Start</code> menu, there should be a new item called <code>Python 3.1</code>. Within that, there is a program called <abbr>IDLE</abbr>. Select this item to run the interactive Python Shell.
-
-</ol>
-
-<p>[Skip to <a href=#idle>using the Python Shell</a>]
-
-<p class=a>&#x2042;
-
-<h2 id=macosx>Installing on Mac OS X</h2>
-
-<p>All modern Macintosh computers use the Intel chip (like most Windows PCs). Older Macs used PowerPC chips. You don&#8217;t need to understand the difference, because there&#8217;s just one Mac Python installer for all Macs.
-
-<p>Visit <a href=http://python.org/download/><code>python.org/download/</code></a> and download the Mac installer. It will be called something like <b>Python 3.1 Mac Installer Disk Image</b>, although the version number may vary. Be sure to download version 3.x, not 2.x.
-
-<ol class=i>
-
-<li>
-<p class='ss nm'><img src=i/mac-install-0-dmg-contents.png width=752 height=438 alt='[contents of Python installer disk image]'>
-<p>Your browser should automatically mount the disk image and open a Finder window to show you the contents. (If this doesn&#8217;t happen, you&#8217;ll need to find the disk image in your downloads folder and double-click to mount it. It will be named something like <code>python-3.1.dmg</code>.) The disk image contains a number of text files (<code>Build.txt</code>, <code>License.txt</code>, <code>ReadMe.txt</code>), and the actual installer package, <code>Python.mpkg</code>.
-<p>Double-click the <code>Python.mpkg</code> installer package to launch the Mac Python installer.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-1-welcome.png width=622 height=442 alt='[Python installer: welcome screen]'>
-<p>The first page of the installer gives a brief description of Python itself, then refers you to the <code>ReadMe.txt</code> file (which you didn&#8217;t read, did you?) for more details.
-<p>Click the <code>Continue</code> button to move along.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-2-information.png width=622 height=442 alt='[Python installer: information about supported architectures, disk space, and acceptable destination folders]'>
-<p>The next page actually contains some important information: Python requires Mac OS X 10.3 or later. If you are still running Mac OS X 10.2, you should really upgrade. Apple no longer provides security updates for your operating system, and your computer is probably at risk if you ever go online. Also, you can&#8217;t run Python 3.
-<p>Click the <code>Continue</code> button to advance.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-3-license.png width=622 height=442 alt='[Python installer: software license agreement]'>
-<p>Like all good installers, the Python installer displays the software license agreement. Python is open source, and its license is <a href=http://opensource.org/licenses/>approved by the Open Source Initiative</a>. Python has had a number of owners and sponsors throughout its history, each of which has left its mark on the software license. But the end result is this: Python is open source, and you may use it on any platform, for any purpose, without fee or obligation of reciprocity.
-<p>Click the <code>Continue</code> button once again.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-4-license-dialog.png width=622 height=442 alt='[Python installer: dialog to accept license agreement]'>
-<p>Due to quirks in the standard Apple installer framework, you must &#8220;agree&#8221; to the software license in order to complete the installation. Since Python is open source, you are really &#8220;agreeing&#8221; that the license is granting you additional rights, rather than taking them away.
-<p>Click the <code>Agree</code> button to continue.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-5-standard-install.png width=622 height=442 alt='[Python installer: standard install screen]'>
-<p>The next screen allows you to change your install location. You <strong>must</strong> install Python on your boot drive, but due to limitations of the installer, it does not enforce this. In truth, I have never had the need to change the install location.
-<p>From this screen, you can also customize the installation to exclude certain features. If you want to do this, click the <code>Customize</code> button; otherwise click the <code>Install</code> button.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-6-custom-install.png width=622 height=442 alt='[Python installer: custom install screen]'>
-<p>If you choose a Custom Install, the installer will present you with the following list of features:
-<ul>
-<li><b>Python Framework</b>. This is the guts of Python, and is both selected and disabled because it must be installed.
-<li><b>GUI Applications</b> includes IDLE, the graphical Python Shell which you will use throughout this book. I strongly recommend keeping this option selected.
-<li><b>UNIX command-line tools</b> includes the command-line <code>python3</code> application. I strongly recommend keeping this option, too.
-<li><b>Python Documentation</b> contains much of the information on <a href=http://docs.python.org/><code>docs.python.org</code></a>. Recommended if you are on dialup or have limited Internet access.
-<li><b>Shell profile updater</b> controls whether to update your shell profile (used in <code>Terminal.app</code>) to ensure that this version of Python is on the search path of your shell. You probably don&#8217;t need to change this.
-<li><b>Fix system Python</b> should not be changed. (It tells your Mac to use Python 3 as the default Python for all scripts, including built-in system scripts from Apple. This would be very bad, since most of those scripts are written for Python 2, and they would fail to run properly under Python 3.)
-</ul>
-<p>Click the <code>Install</code> button to continue.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-7-admin-password.png width=622 height=457 alt='[Python installer: dialog to enter administrative password]'>
-<p>Because it installs system-wide frameworks and binaries in <code>/usr/local/bin/</code>, the installer will ask you for an administrative password. There is no way to install Mac Python without administrator privileges.
-<p>Click the <code>OK</code> button to begin the installation.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-8-progress.png width=622 height=442 alt='[Python installer: progress meter]'>
-<p>The installer will display a progress meter while it installs the features you&#8217;ve selected.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-9-succeeded.png width=622 height=442 alt='[Python installer: install succeeded]'>
-<p>Assuming all went well, the installer will give you a big green checkmark to tell you that the installation completed successfully.
-<p>Click the <code>Close</code> button to exit the installer.
-
-<li>
-<p class='ss nm'><img src=i/mac-install-10-application-folder.png width=488 height=482 alt='[contents of /Applications/Python 3.1/ folder]'>
-<p>Assuming you didn&#8217;t change the install location, you can find the newly installed files in the <code>Python 3.1</code> folder within your <code>/Applications</code> folder. The most important piece is <abbr>IDLE</abbr>, the graphical Python Shell.
-<p>Double-click <abbr>IDLE</abbr> to launch the Python Shell.
-
-<li>
-<p class='ss nm'><img src=i/mac-interactive-shell.png width=522 height=538 alt='[Mac Python Shell, a graphical interactive shell for Python]'>
-<p>The Python Shell is where you will spend most of your time exploring Python. Examples throughout this book will assume that you can find your way into the Python Shell.
-
-</ol>
-
-<p>[Skip to <a href=#idle>using the Python Shell</a>]
-
-<p class=a>&#x2042;
-
-<h2 id=ubuntu>Installing on Ubuntu Linux</h2>
-
-<p>Modern Linux distributions are backed by vast repositories of precompiled applications, ready to install. The exact details vary by distribution. In Ubuntu Linux, the easiest way to install Python 3 is through the <code>Add/Remove</code> application in your <code>Applications</code> menu.
-
-<ol class=i>
-<li>
-<p class='ss nm'><img src=i/ubu-install-0-add-remove-programs.png width=920 height=473 alt='[Add/Remove: Canonical-maintained applications]'>
-<p>When you first launch the <code>Add/Remove</code> application, it will show you a list of preselected applications in different categories. Some are already installed; most are not. Because the repository contains over 10,000 applications, there are different filters you can apply to see small parts of the repository. The default filter is &#8220;Canonical-maintained applications,&#8221; which is a small subset of the total number of applications that are officially supported by Canonical, the company that creates and maintains Ubuntu Linux.
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-1-all-open-source-applications.png width=920 height=473 alt='[Add/Remove: all open source applications]'>
-<p>Python 3 is not maintained by Canonical, so the first step is to drop down this filter menu and select &#8220;All Open Source applications.&#8221;
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-2-search-python-3.png width=920 height=473 alt='[Add/Remove: search for Python 3]'>
-<p>Once you&#8217;ve widened the filter to include all open source applications, use the Search box immediately after the filter menu to search for <kbd>Python 3</kbd>.
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-3-select-python-3.png width=920 height=473 alt='[Add/Remove: select Python 3.0 package]'>
-<p>Now the list of applications narrows to just those matching <kbd>Python 3</kbd>. You&#8217;re going to check two packages. The first is <code>Python (v3.0)</code>. This contains the Python interpreter itself.
-<li>
-<p class='ss nm'><img src=i/ubu-install-4-select-idle.png width=920 height=473 alt='[Add/Remove: select IDLE for Python 3.0 package]'>
-<p>The second package you want is immediately above: <code>IDLE (using Python-3.0)</code>. This is a graphical Python Shell that you will use throughout this book.
-<p>After you&#8217;ve checked those two packages, click the <code>Apply Changes</code> button to continue.
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-5-apply-changes.png width=635 height=364 alt='[Add/Remove: apply changes]'>
-<p>The package manager will ask you to confirm that you want to add both <code>IDLE (using Python-3.0)</code> and <code>Python (v3.0)</code>.
-<p>Click the <code>Apply</code> button to continue.
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-6-download-progress.png width=287 height=211 alt='[Add/Remove: download progress meter]'>
-<p>The package manager will show you a progress meter while it downloads the necessary packages from Canonical&#8217;s Internet repository.
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-7-install-progress.png width=486 height=258 alt='[Add/Remove: installation progress meter]'>
-<p>Once the packages are downloaded, the package manager will automatically begin installing them.
-
-<li>
-<p class='ss nm'><img src=i/ubu-install-8-success.png width=591 height=296 alt='[Add/Remove: new applications have been installed]'>
-<p>If all went well, the package manager will confirm that both packages were successfully installed. From here, you can double-click <abbr>IDLE</abbr> to launch the Python Shell, or click the <code>Close</code> button to exit the package manager.
-<p>You can always relaunch the Python Shell by going to your <code>Applications</code> menu, then the <code>Programming</code> submenu, and selecting <abbr>IDLE</abbr>.
-
-<li>
-<p class='ss nm'><img src=i/ubu-interactive-shell.png width=679 height=687 alt='[Linux Python Shell, a graphical interactive shell for Python]'>
-<p>The Python Shell is where you will spend most of your time exploring Python. Examples throughout this book will assume that you can find your way into the Python Shell.
-
-</ol>
-
-<p>[Skip to <a href=#idle>using the Python Shell</a>]
-
-<p class=a>&#x2042;
-
-<h2 id=other>Installing on Other Platforms</h2>
-
-<p>Python 3 is available on a number of different platforms. In particular, it is available in virtually every Linux, <abbr>BSD</abbr>, and Solaris-based distribution. For example, RedHat Linux uses the <code>yum</code> package manager; FreeBSD has its <a href=http://www.freebsd.org/ports/>ports and packages collection</a>; Solaris has <code>pkgadd</code> and friends. A quick web search for <code>Python 3</code> + <i>your operating system</i> will tell you whether a Python 3 package is available, and how to install it.
-
-<p class=a>&#x2042;
-
-<h2 id=idle>Using The Python Shell</h2>
-
-<p>The Python Shell is where you can explore Python syntax, get interactive help on commands, and debug short programs. The graphical Python Shell (named <abbr>IDLE</abbr>) also contains a decent text editor that supports Python syntax coloring and integrates with the Python Shell. If you don&#8217;t already have a favorite text editor, you should give <abbr>IDLE</abbr> a try.
-
-<p>First things first. The Python Shell itself is an amazing interactive playground. Throughout this book, you&#8217;ll see examples like this:
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>1 + 1</kbd>
-<samp class=pp>2</samp></pre>
-
-<p>The three angle brackets, <samp class=p>>>></samp>, denote the Python Shell prompt. Don&#8217;t type that part. That&#8217;s just to let you know that this example is meant to be followed in the Python Shell.
-
-<p><kbd class=pp>1 + 1</kbd> is the part you type. You can type any valid Python expression or command in the Python Shell. Don&#8217;t be shy; it won&#8217;t bite! The worst that will happen is you&#8217;ll get an error message. Commands get executed immediately (once you press <kbd>ENTER</kbd>); expressions get evaluated immediately, and the Python Shell prints out the result.
-
-<p><samp class=pp>2</samp> is the result of evaluating this expression. As it happens, <kbd class=pp>1 + 1</kbd> is a valid Python expression. The result, of course, is <samp class=pp>2</samp>.
-
-<p>Let&#8217;s try another one.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>print('Hello world!')</kbd>
-<samp>Hello world!</samp>
-</pre>
-
-<p>Pretty simple, no? But there&#8217;s lots more you can do in the Python shell. If you ever get stuck&nbsp;&mdash;&nbsp;you can&#8217;t remember a command, or you can&#8217;t remember the proper arguments to pass a certain function&nbsp;&mdash;&nbsp;you can get interactive help in the Python Shell. Just type <kbd>help</kbd> and press <kbd>ENTER</kbd>.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd>help</kbd>
-<samp>Type help() for interactive help, or help(object) for help about object.</samp></pre>
-
-<p>There are two modes of help. You can get help about a single object, which just prints out the documentation and returns you to the Python Shell prompt. You can also enter <i>help mode</i>, where instead of evaluating Python expressions, you just type keywords or command names and it will print out whatever it knows about that command.
-
-<p>To enter the interactive help mode, type <kbd>help()</kbd> and press <kbd>ENTER</kbd>.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>help()</kbd>
-<samp>Welcome to Python 3.0!  This is the online help utility.
-
-If this is your first time using Python, you should definitely check out
-the tutorial on the Internet at http://docs.python.org/tutorial/.
-
-Enter the name of any module, keyword, or topic to get help on writing
-Python programs and using Python modules.  To quit this help utility and
-return to the interpreter, just type "quit".
-
-To get a list of available modules, keywords, or topics, type "modules",
-"keywords", or "topics".  Each module also comes with a one-line summary
-of what it does; to list the modules whose summaries contain a given word
-such as "spam", type "modules spam".
-</samp>
-<samp class=p>help> </samp></pre>
-
-<p>Note how the prompt changes from <samp class=p>>>></samp> to <samp class=p>help></samp>. This reminds you that you&#8217;re in the interactive help mode. Now you can enter any keyword, command, module name, function name&nbsp;&mdash;&nbsp;pretty much anything Python understands&nbsp;&mdash;&nbsp;and read documentation on it.
-
-<pre class=screen>
-<a><samp class=p>help> </samp><kbd class=pp>print</kbd>                                                                 <span class=u>&#x2460;</span></a>
-<samp>Help on built-in function print in module builtins:
-
-print(...)
-    print(value, ..., sep=' ', end='\n', file=sys.stdout)
-    
-    Prints the values to a stream, or to sys.stdout by default.
-    Optional keyword arguments:
-    file: a file-like object (stream); defaults to the current sys.stdout.
-    sep:  string inserted between values, default a space.
-    end:  string appended after the last value, default a newline.
-</samp>
-<a><samp class=p>help> </samp><kbd class=pp>PapayaWhip</kbd>                                                            <span class=u>&#x2461;</span></a>
-<samp>no Python documentation found for 'PapayaWhip'
-</samp>
-<a><samp class=p>help> </samp><kbd class=pp>quit</kbd>                                                                  <span class=u>&#x2462;</span></a>
-<samp>
-You are now leaving help and returning to the Python interpreter.
-If you want to ask for help on a particular object directly from the
-interpreter, you can type "help(object)".  Executing "help('string')"
-has the same effect as typing a particular string at the help> prompt.</samp>
-<a><samp class=p>>>> </samp>                                                                        <span class=u>&#x2463;</span></a></pre>
-<ol>
-<li>To get documentation on the <code>print()</code> function, just type <kbd>print</kbd> and press <kbd>ENTER</kbd>. The interactive help mode will display something akin to a man page: the function name, a brief synopsis, the function&#8217;s arguments and their default values, and so on. If the documentation seems opaque to you, don&#8217;t panic. You&#8217;ll learn more about all these concepts in the next few chapters.
-<li>Of course, the interactive help mode doesn&#8217;t know everything. If you type something that isn&#8217;t a Python command, module, function, or other built-in keyword, the interactive help mode will just shrug its virtual shoulders.
-<li>To quit the interactive help mode, type <kbd>quit</kbd> and press <kbd>ENTER</kbd>.
-<li>The prompt changes back to <samp class=p>>>></samp> to signal that you&#8217;ve left the interactive help mode and returned to the Python Shell.
-</ol>
-
-<p><abbr>IDLE</abbr>, the graphical Python Shell, also includes a Python-aware text editor.
-
-<p class=a>&#x2042;
-
-<h2 id=editors>Python Editors and IDEs</h2>
-
-<p><abbr>IDLE</abbr> is not the only game in town when it comes to writing programs in Python. While it&#8217;s useful to get started with learning the language itself, many developers prefer other text editors or Integrated Development Environments (<abbr>IDE</abbr>s). I won&#8217;t cover them here, but the Python community maintains <a href=http://wiki.python.org/moin/PythonEditors>a list of Python-aware editors</a> that covers a wide range of supported platforms and software licenses.
-
-<p>You might also want to check out the <a href=http://wiki.python.org/moin/IntegratedDevelopmentEnvironments>list of Python-aware <abbr>IDE</abbr>s</a>, although few of them support Python 3 yet. One that does is <a href=http://pydev.sourceforge.net/>PyDev</a>, a plugin for <a href=http://eclipse.org/>Eclipse</a> that turns Eclipse into a full-fledged Python <abbr>IDE</abbr>. Both Eclipse and PyDev are cross-platform and open source.
-
-<p>On the commercial front, there is ActiveState&#8217;s <a href=http://www.activestate.com/komodo/>Komodo <abbr>IDE</abbr></a>. It has per-user licensing, but students can get a discount, and a free time-limited trial version is available.
-
-<p>I&#8217;ve been programming in Python for nine years, and I edit my Python programs in <a href=http://www.gnu.org/software/emacs/>GNU Emacs</a> and debug them in the command-line Python Shell. There&#8217;s no right or wrong way to develop in Python. Find a way that works for you!
-
-<p class=v><a href=whats-new.html rel=prev title='back to &#8220;What&#8217;s New In Dive Into Python 3&#8221;'><span class=u>&#x261C;</span></a> <a href=your-first-python-program.html rel=next title='onward to &#8220;Your First Python Program&#8221;'><span class=u>&#x261E;</span></a>
-
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Installing Python - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 0}
+.i{list-style:none;margin:0;padding:0}
+#which{padding-top:1.75em}
+h2,.i>li{clear:both}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#installing-python>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=novice>&#x2666;&#x2662;&#x2662;&#x2662;&#x2662;</span>
+<h1>Installing Python</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> <i lang=la>Tempora mutantur nos et mutamur in illis.</i> (Times change, and we change with them.) <span class=u>&#x275E;</span><br>&mdash; ancient Roman proverb
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>Before you can start programming in Python 3, you need to install it. Or do you?
+
+<h2 id=which>Which Python Is Right For You?</h2>
+
+<p>If you're using an account on a hosted server, your <abbr>ISP</abbr> may have already installed Python 3. If you&#8217;re running Linux at home, you may already have Python 3, too. Most popular GNU/Linux distributions come with Python 2 in the default installation; a small but growing number of distributions also include Python 3. Mac OS X includes a command-line version of Python 2, but as of this writing it does not include Python 3. Microsoft Windows does not come with any version of Python. But don&#8217;t despair! You can point-and-click your way through installing Python, regardless of what operating system you have.
+
+<p>The easiest way to check for Python 3 on your Linux or Mac OS X system is to get to a command line. On Linux, look in your <b><code>Applications</code></b> menu for a program called <b><code>Terminal</code></b>. (It may be in a submenu like <b><code>Accessories</code></b> or <b><code>System</code></b>.) On Mac OS X, there is an application called <b><code>Terminal.app</code></b> in your <code>/Application/Utilities/</code> folder.
+
+<p>Once you&#8217;re at a command line prompt, just type <kbd>python3</kbd> (all lowercase, no spaces) and see what happens. On my home Linux system, Python 3 is already installed, and this command gets me into the <i>Python <dfn>interactive shell</dfn></i>.
+
+<pre class='nd screen'>
+<samp class=p>mark@atlantis:~$ </samp><kbd>python3</kbd>
+<samp>Python 3.0.1+ (r301:69556, Apr 15 2009, 17:25:52)
+[GCC 4.3.3] on linux2
+Type "help", "copyright", "credits" or "license" for more information.
+>>></samp></pre>
+
+<p>(Type <kbd>exit()</kbd> and press <kbd>ENTER</kbd> to exit the Python interactive shell.)
+
+<p>My <a href=http://cornerhost.com/>web hosting provider</a> also runs Linux and provides command-line access, but my server does not have Python 3 installed. (Boo!)
+
+<pre class='nd screen'>
+<samp class=p>mark@manganese:~$ </samp><kbd>python3</kbd>
+<samp>bash: python3: command not found</samp></pre>
+
+<p>So back to the question that started this section, &#8220;Which Python is right for you?&#8221;  Whichever one runs on the computer you already have.
+
+<p>[Read on for Windows instructions, or skip to <a href=#macosx>Installing on Mac OS X</a>, <a href=#ubuntu>Installing on Ubuntu Linux</a>, or <a href=#other>Installing on Other Platforms</a>.]
+
+<p class=a>&#x2042;
+
+<h2 id=windows>Installing on Microsoft Windows</h2>
+
+<p>Windows comes in two architectures these days: 32-bit and 64-bit. Of course, there are lots of different <i>versions</i> of Windows&nbsp;&mdash;&nbsp;XP, Vista, Windows 7&nbsp;&mdash;&nbsp;but Python runs on all of them. The more important distinction is 32-bit v. 64-bit. If you have no idea what architecture you&#8217;re running, it&#8217;s probably 32-bit.
+
+<p>Visit <a href=http://python.org/download/><code>python.org/download/</code></a> and download the appropriate Python 3 Windows installer for your architecture. Your choices will look something like this:
+
+<ul>
+<li><b>Python 3.1 Windows installer</b> (Windows binary&nbsp;&mdash;&nbsp;does not include source)
+<li><b>Python 3.1 Windows AMD64 installer</b> (Windows AMD64 binary&nbsp;&mdash;&nbsp;does not include source)
+</ul>
+
+<p>I don&#8217;t want to include direct download links here, because minor updates of Python happen all the time and I don&#8217;t want to be responsible for you missing important updates. You should always install the most recent version of Python 3.x unless you have some esoteric reason not to.
+
+<ol class=i>
+<li>
+<p class='ss nm'><img src=i/win-install-0-security-warning.png width=409 height=309 alt='[Windows dialog: open file security warning]'>
+<p>Once your download is complete, double-click the <code>.msi</code> file. Windows will pop up a security alert, since you&#8217;re about to be running executable code. The official Python installer is digitally signed by the <a href=http://www.python.org/psf/>Python Software Foundation</a>, the non-profit corporation that oversees Python development. Don&#8217;t accept imitations!
+<p>Click the <code>Run</code> button to launch the Python 3 installer.
+
+<li>
+<p class='ss nm'><img src=i/win-install-1-all-users-or-just-me.png width=499 height=432 alt='[Python installer: select whether to install Python 3.1 for all users of this computer]'>
+<p>The first question the installer will ask you is whether you want to install Python 3 for all users or just for you. The default choice is &#8220;install for all users,&#8221; which is the best choice unless you have a good reason to choose otherwise. (One possible reason why you would want to &#8220;install just for me&#8221; is that you are installing Python on your company&#8217;s computer and you don&#8217;t have administrative rights on your Windows account. But then, why are you installing Python without permission from your company&#8217;s Windows administrator? Don&#8217;t get me in trouble here!)
+<p>Click the <code>Next</code> button to accept your choice of installation type.
+
+<li>
+<p class='ss nm'><img src=i/win-install-2-destination-directory.png width=499 height=432 alt='[Python installer: select destination directory]'>
+<p>Next, the installer will prompt you to choose a destination directory. The default for all versions of Python 3.1.x is <code>C:\Python31\</code>, which should work well for most users unless you have a specific reason to change it. If you maintain a separate drive letter for installing applications, you can browse to it using the embedded controls, or simply type the pathname in the box below. You are not limited to installing Python on the <code>C:</code> drive; you can install it on any drive, in any folder.
+<p>Click the <code>Next</code> button to accept your choice of destination directory.
+
+<li>
+<p class='ss nm'><img src=i/win-install-3-customize.png width=499 height=432 alt='[Python installer: customize Python 3.1]'>
+<p>The next page looks complicated, but it&#8217;s not really. Like many installers, you have the option not to install every single component of Python 3. If disk space is especially tight, you can exclude certain components.
+<ul>
+<li><b>Register Extensions</b> allows you to double-click Python scripts (<code>.py</code> files) and run them. Recommended but not required. (This option doesn&#8217;t require any disk space, so there is little point in excluding it.)
+<li><b>Tcl/Tk</b> is the graphics library used by the Python Shell, which you will use throughout this book. I strongly recommend keeping this option.
+<li><b>Documentation</b> installs a help file that contains much of the information on <a href=http://docs.python.org/><code>docs.python.org</code></a>. Recommended if you are on dialup or have limited Internet access.
+<li><b>Utility Scripts</b> includes the <code>2to3.py</code> script which you&#8217;ll learn about <a href=case-study-porting-chardet-to-python-3.html>later in this book</a>. Required if you want to learn about migrating existing Python 2 code to Python 3. If you have no existing Python 2 code, you can skip this option.
+<li><b>Test Suite</b> is a collection of scripts used to test the Python interpreter itself. We will not use it in this book, nor have I ever used it in the course of programming in Python. Completely optional.
+</ul>
+
+<li>
+<p class='ss nm'><img src=i/win-install-3a-disk-usage.png width=499 height=432 alt='[Python installer: disk space requirements]'>
+<p>If you&#8217;re unsure how much disk space you have, click the <code>Disk Usage</code> button. The installer will list your drive letters, compute how much space is available on each drive, and calculate how much would be left after installation.
+<p>Click the <code>OK</code> button to return to the &#8220;Customizing Python&#8221; page.
+
+<li>
+<p class='ss nm'><img src=i/win-install-3b-test-suite.png width=499 height=432 alt='[Python installer: removing Test Suite option will save 7908KB on your hard drive]'>
+<p>If you decide to exclude an option, select the drop-down button before the option and select &#8220;Entire feature will be unavailable.&#8221; For example, excluding the test suite will save you a whopping 7908<abbr>KB</abbr> of disk space.
+<p>Click the <code>Next</code> button to accept your choice of options.
+
+<li>
+<p class='ss nm'><img src=i/win-install-4-copying.png width=499 height=432 alt='[Python installer: progress meter]'>
+<p>The installer will copy all the necessary files to your chosen destination directory. (This happens so quickly, I had to try it three times to even get a screenshot of it!)
+
+<li>
+<p class='ss nm'><img src=i/win-install-5-finish.png width=499 height=432 alt='[Python installer: installation completed. Special Windows thanks to Mark Hammond, without whose years of freely shared Windows expertise, Python for Windows would still be Python for DOS.]'>
+<p>Click the <code>Finish</code> button to exit the installer.
+
+<li>
+<p class='ss nm'><img src=i/win-interactive-shell.png width=677 height=715 alt='[Windows Python Shell, a graphical interactive shell for Python]'>
+<p>In your <code>Start</code> menu, there should be a new item called <code>Python 3.1</code>. Within that, there is a program called <abbr>IDLE</abbr>. Select this item to run the interactive Python Shell.
+
+</ol>
+
+<p>[Skip to <a href=#idle>using the Python Shell</a>]
+
+<p class=a>&#x2042;
+
+<h2 id=macosx>Installing on Mac OS X</h2>
+
+<p>All modern Macintosh computers use the Intel chip (like most Windows PCs). Older Macs used PowerPC chips. You don&#8217;t need to understand the difference, because there&#8217;s just one Mac Python installer for all Macs.
+
+<p>Visit <a href=http://python.org/download/><code>python.org/download/</code></a> and download the Mac installer. It will be called something like <b>Python 3.1 Mac Installer Disk Image</b>, although the version number may vary. Be sure to download version 3.x, not 2.x.
+
+<ol class=i>
+
+<li>
+<p class='ss nm'><img src=i/mac-install-0-dmg-contents.png width=752 height=438 alt='[contents of Python installer disk image]'>
+<p>Your browser should automatically mount the disk image and open a Finder window to show you the contents. (If this doesn&#8217;t happen, you&#8217;ll need to find the disk image in your downloads folder and double-click to mount it. It will be named something like <code>python-3.1.dmg</code>.) The disk image contains a number of text files (<code>Build.txt</code>, <code>License.txt</code>, <code>ReadMe.txt</code>), and the actual installer package, <code>Python.mpkg</code>.
+<p>Double-click the <code>Python.mpkg</code> installer package to launch the Mac Python installer.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-1-welcome.png width=622 height=442 alt='[Python installer: welcome screen]'>
+<p>The first page of the installer gives a brief description of Python itself, then refers you to the <code>ReadMe.txt</code> file (which you didn&#8217;t read, did you?) for more details.
+<p>Click the <code>Continue</code> button to move along.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-2-information.png width=622 height=442 alt='[Python installer: information about supported architectures, disk space, and acceptable destination folders]'>
+<p>The next page actually contains some important information: Python requires Mac OS X 10.3 or later. If you are still running Mac OS X 10.2, you should really upgrade. Apple no longer provides security updates for your operating system, and your computer is probably at risk if you ever go online. Also, you can&#8217;t run Python 3.
+<p>Click the <code>Continue</code> button to advance.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-3-license.png width=622 height=442 alt='[Python installer: software license agreement]'>
+<p>Like all good installers, the Python installer displays the software license agreement. Python is open source, and its license is <a href=http://opensource.org/licenses/>approved by the Open Source Initiative</a>. Python has had a number of owners and sponsors throughout its history, each of which has left its mark on the software license. But the end result is this: Python is open source, and you may use it on any platform, for any purpose, without fee or obligation of reciprocity.
+<p>Click the <code>Continue</code> button once again.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-4-license-dialog.png width=622 height=442 alt='[Python installer: dialog to accept license agreement]'>
+<p>Due to quirks in the standard Apple installer framework, you must &#8220;agree&#8221; to the software license in order to complete the installation. Since Python is open source, you are really &#8220;agreeing&#8221; that the license is granting you additional rights, rather than taking them away.
+<p>Click the <code>Agree</code> button to continue.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-5-standard-install.png width=622 height=442 alt='[Python installer: standard install screen]'>
+<p>The next screen allows you to change your install location. You <strong>must</strong> install Python on your boot drive, but due to limitations of the installer, it does not enforce this. In truth, I have never had the need to change the install location.
+<p>From this screen, you can also customize the installation to exclude certain features. If you want to do this, click the <code>Customize</code> button; otherwise click the <code>Install</code> button.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-6-custom-install.png width=622 height=442 alt='[Python installer: custom install screen]'>
+<p>If you choose a Custom Install, the installer will present you with the following list of features:
+<ul>
+<li><b>Python Framework</b>. This is the guts of Python, and is both selected and disabled because it must be installed.
+<li><b>GUI Applications</b> includes IDLE, the graphical Python Shell which you will use throughout this book. I strongly recommend keeping this option selected.
+<li><b>UNIX command-line tools</b> includes the command-line <code>python3</code> application. I strongly recommend keeping this option, too.
+<li><b>Python Documentation</b> contains much of the information on <a href=http://docs.python.org/><code>docs.python.org</code></a>. Recommended if you are on dialup or have limited Internet access.
+<li><b>Shell profile updater</b> controls whether to update your shell profile (used in <code>Terminal.app</code>) to ensure that this version of Python is on the search path of your shell. You probably don&#8217;t need to change this.
+<li><b>Fix system Python</b> should not be changed. (It tells your Mac to use Python 3 as the default Python for all scripts, including built-in system scripts from Apple. This would be very bad, since most of those scripts are written for Python 2, and they would fail to run properly under Python 3.)
+</ul>
+<p>Click the <code>Install</code> button to continue.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-7-admin-password.png width=622 height=457 alt='[Python installer: dialog to enter administrative password]'>
+<p>Because it installs system-wide frameworks and binaries in <code>/usr/local/bin/</code>, the installer will ask you for an administrative password. There is no way to install Mac Python without administrator privileges.
+<p>Click the <code>OK</code> button to begin the installation.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-8-progress.png width=622 height=442 alt='[Python installer: progress meter]'>
+<p>The installer will display a progress meter while it installs the features you&#8217;ve selected.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-9-succeeded.png width=622 height=442 alt='[Python installer: install succeeded]'>
+<p>Assuming all went well, the installer will give you a big green checkmark to tell you that the installation completed successfully.
+<p>Click the <code>Close</code> button to exit the installer.
+
+<li>
+<p class='ss nm'><img src=i/mac-install-10-application-folder.png width=488 height=482 alt='[contents of /Applications/Python 3.1/ folder]'>
+<p>Assuming you didn&#8217;t change the install location, you can find the newly installed files in the <code>Python 3.1</code> folder within your <code>/Applications</code> folder. The most important piece is <abbr>IDLE</abbr>, the graphical Python Shell.
+<p>Double-click <abbr>IDLE</abbr> to launch the Python Shell.
+
+<li>
+<p class='ss nm'><img src=i/mac-interactive-shell.png width=522 height=538 alt='[Mac Python Shell, a graphical interactive shell for Python]'>
+<p>The Python Shell is where you will spend most of your time exploring Python. Examples throughout this book will assume that you can find your way into the Python Shell.
+
+</ol>
+
+<p>[Skip to <a href=#idle>using the Python Shell</a>]
+
+<p class=a>&#x2042;
+
+<h2 id=ubuntu>Installing on Ubuntu Linux</h2>
+
+<p>Modern Linux distributions are backed by vast repositories of precompiled applications, ready to install. The exact details vary by distribution. In Ubuntu Linux, the easiest way to install Python 3 is through the <code>Add/Remove</code> application in your <code>Applications</code> menu.
+
+<ol class=i>
+<li>
+<p class='ss nm'><img src=i/ubu-install-0-add-remove-programs.png width=920 height=473 alt='[Add/Remove: Canonical-maintained applications]'>
+<p>When you first launch the <code>Add/Remove</code> application, it will show you a list of preselected applications in different categories. Some are already installed; most are not. Because the repository contains over 10,000 applications, there are different filters you can apply to see small parts of the repository. The default filter is &#8220;Canonical-maintained applications,&#8221; which is a small subset of the total number of applications that are officially supported by Canonical, the company that creates and maintains Ubuntu Linux.
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-1-all-open-source-applications.png width=920 height=473 alt='[Add/Remove: all open source applications]'>
+<p>Python 3 is not maintained by Canonical, so the first step is to drop down this filter menu and select &#8220;All Open Source applications.&#8221;
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-2-search-python-3.png width=920 height=473 alt='[Add/Remove: search for Python 3]'>
+<p>Once you&#8217;ve widened the filter to include all open source applications, use the Search box immediately after the filter menu to search for <kbd>Python 3</kbd>.
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-3-select-python-3.png width=920 height=473 alt='[Add/Remove: select Python 3.0 package]'>
+<p>Now the list of applications narrows to just those matching <kbd>Python 3</kbd>. You&#8217;re going to check two packages. The first is <code>Python (v3.0)</code>. This contains the Python interpreter itself.
+<li>
+<p class='ss nm'><img src=i/ubu-install-4-select-idle.png width=920 height=473 alt='[Add/Remove: select IDLE for Python 3.0 package]'>
+<p>The second package you want is immediately above: <code>IDLE (using Python-3.0)</code>. This is a graphical Python Shell that you will use throughout this book.
+<p>After you&#8217;ve checked those two packages, click the <code>Apply Changes</code> button to continue.
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-5-apply-changes.png width=635 height=364 alt='[Add/Remove: apply changes]'>
+<p>The package manager will ask you to confirm that you want to add both <code>IDLE (using Python-3.0)</code> and <code>Python (v3.0)</code>.
+<p>Click the <code>Apply</code> button to continue.
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-6-download-progress.png width=287 height=211 alt='[Add/Remove: download progress meter]'>
+<p>The package manager will show you a progress meter while it downloads the necessary packages from Canonical&#8217;s Internet repository.
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-7-install-progress.png width=486 height=258 alt='[Add/Remove: installation progress meter]'>
+<p>Once the packages are downloaded, the package manager will automatically begin installing them.
+
+<li>
+<p class='ss nm'><img src=i/ubu-install-8-success.png width=591 height=296 alt='[Add/Remove: new applications have been installed]'>
+<p>If all went well, the package manager will confirm that both packages were successfully installed. From here, you can double-click <abbr>IDLE</abbr> to launch the Python Shell, or click the <code>Close</code> button to exit the package manager.
+<p>You can always relaunch the Python Shell by going to your <code>Applications</code> menu, then the <code>Programming</code> submenu, and selecting <abbr>IDLE</abbr>.
+
+<li>
+<p class='ss nm'><img src=i/ubu-interactive-shell.png width=679 height=687 alt='[Linux Python Shell, a graphical interactive shell for Python]'>
+<p>The Python Shell is where you will spend most of your time exploring Python. Examples throughout this book will assume that you can find your way into the Python Shell.
+
+</ol>
+
+<p>[Skip to <a href=#idle>using the Python Shell</a>]
+
+<p class=a>&#x2042;
+
+<h2 id=other>Installing on Other Platforms</h2>
+
+<p>Python 3 is available on a number of different platforms. In particular, it is available in virtually every Linux, <abbr>BSD</abbr>, and Solaris-based distribution. For example, RedHat Linux uses the <code>yum</code> package manager; FreeBSD has its <a href=http://www.freebsd.org/ports/>ports and packages collection</a>; Solaris has <code>pkgadd</code> and friends. A quick web search for <code>Python 3</code> + <i>your operating system</i> will tell you whether a Python 3 package is available, and how to install it.
+
+<p class=a>&#x2042;
+
+<h2 id=idle>Using The Python Shell</h2>
+
+<p>The Python Shell is where you can explore Python syntax, get interactive help on commands, and debug short programs. The graphical Python Shell (named <abbr>IDLE</abbr>) also contains a decent text editor that supports Python syntax coloring and integrates with the Python Shell. If you don&#8217;t already have a favorite text editor, you should give <abbr>IDLE</abbr> a try.
+
+<p>First things first. The Python Shell itself is an amazing interactive playground. Throughout this book, you&#8217;ll see examples like this:
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>1 + 1</kbd>
+<samp class=pp>2</samp></pre>
+
+<p>The three angle brackets, <samp class=p>>>></samp>, denote the Python Shell prompt. Don&#8217;t type that part. That&#8217;s just to let you know that this example is meant to be followed in the Python Shell.
+
+<p><kbd class=pp>1 + 1</kbd> is the part you type. You can type any valid Python expression or command in the Python Shell. Don&#8217;t be shy; it won&#8217;t bite! The worst that will happen is you&#8217;ll get an error message. Commands get executed immediately (once you press <kbd>ENTER</kbd>); expressions get evaluated immediately, and the Python Shell prints out the result.
+
+<p><samp class=pp>2</samp> is the result of evaluating this expression. As it happens, <kbd class=pp>1 + 1</kbd> is a valid Python expression. The result, of course, is <samp class=pp>2</samp>.
+
+<p>Let&#8217;s try another one.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>print('Hello world!')</kbd>
+<samp>Hello world!</samp>
+</pre>
+
+<p>Pretty simple, no? But there&#8217;s lots more you can do in the Python shell. If you ever get stuck&nbsp;&mdash;&nbsp;you can&#8217;t remember a command, or you can&#8217;t remember the proper arguments to pass a certain function&nbsp;&mdash;&nbsp;you can get interactive help in the Python Shell. Just type <kbd>help</kbd> and press <kbd>ENTER</kbd>.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd>help</kbd>
+<samp>Type help() for interactive help, or help(object) for help about object.</samp></pre>
+
+<p>There are two modes of help. You can get help about a single object, which just prints out the documentation and returns you to the Python Shell prompt. You can also enter <i>help mode</i>, where instead of evaluating Python expressions, you just type keywords or command names and it will print out whatever it knows about that command.
+
+<p>To enter the interactive help mode, type <kbd>help()</kbd> and press <kbd>ENTER</kbd>.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>help()</kbd>
+<samp>Welcome to Python 3.0!  This is the online help utility.
+
+If this is your first time using Python, you should definitely check out
+the tutorial on the Internet at http://docs.python.org/tutorial/.
+
+Enter the name of any module, keyword, or topic to get help on writing
+Python programs and using Python modules.  To quit this help utility and
+return to the interpreter, just type "quit".
+
+To get a list of available modules, keywords, or topics, type "modules",
+"keywords", or "topics".  Each module also comes with a one-line summary
+of what it does; to list the modules whose summaries contain a given word
+such as "spam", type "modules spam".
+</samp>
+<samp class=p>help> </samp></pre>
+
+<p>Note how the prompt changes from <samp class=p>>>></samp> to <samp class=p>help></samp>. This reminds you that you&#8217;re in the interactive help mode. Now you can enter any keyword, command, module name, function name&nbsp;&mdash;&nbsp;pretty much anything Python understands&nbsp;&mdash;&nbsp;and read documentation on it.
+
+<pre class=screen>
+<a><samp class=p>help> </samp><kbd class=pp>print</kbd>                                                                 <span class=u>&#x2460;</span></a>
+<samp>Help on built-in function print in module builtins:
+
+print(...)
+    print(value, ..., sep=' ', end='\n', file=sys.stdout)
+    
+    Prints the values to a stream, or to sys.stdout by default.
+    Optional keyword arguments:
+    file: a file-like object (stream); defaults to the current sys.stdout.
+    sep:  string inserted between values, default a space.
+    end:  string appended after the last value, default a newline.
+</samp>
+<a><samp class=p>help> </samp><kbd class=pp>PapayaWhip</kbd>                                                            <span class=u>&#x2461;</span></a>
+<samp>no Python documentation found for 'PapayaWhip'
+</samp>
+<a><samp class=p>help> </samp><kbd class=pp>quit</kbd>                                                                  <span class=u>&#x2462;</span></a>
+<samp>
+You are now leaving help and returning to the Python interpreter.
+If you want to ask for help on a particular object directly from the
+interpreter, you can type "help(object)".  Executing "help('string')"
+has the same effect as typing a particular string at the help> prompt.</samp>
+<a><samp class=p>>>> </samp>                                                                        <span class=u>&#x2463;</span></a></pre>
+<ol>
+<li>To get documentation on the <code>print()</code> function, just type <kbd>print</kbd> and press <kbd>ENTER</kbd>. The interactive help mode will display something akin to a man page: the function name, a brief synopsis, the function&#8217;s arguments and their default values, and so on. If the documentation seems opaque to you, don&#8217;t panic. You&#8217;ll learn more about all these concepts in the next few chapters.
+<li>Of course, the interactive help mode doesn&#8217;t know everything. If you type something that isn&#8217;t a Python command, module, function, or other built-in keyword, the interactive help mode will just shrug its virtual shoulders.
+<li>To quit the interactive help mode, type <kbd>quit</kbd> and press <kbd>ENTER</kbd>.
+<li>The prompt changes back to <samp class=p>>>></samp> to signal that you&#8217;ve left the interactive help mode and returned to the Python Shell.
+</ol>
+
+<p><abbr>IDLE</abbr>, the graphical Python Shell, also includes a Python-aware text editor.
+
+<p class=a>&#x2042;
+
+<h2 id=editors>Python Editors and IDEs</h2>
+
+<p><abbr>IDLE</abbr> is not the only game in town when it comes to writing programs in Python. While it&#8217;s useful to get started with learning the language itself, many developers prefer other text editors or Integrated Development Environments (<abbr>IDE</abbr>s). I won&#8217;t cover them here, but the Python community maintains <a href=http://wiki.python.org/moin/PythonEditors>a list of Python-aware editors</a> that covers a wide range of supported platforms and software licenses.
+
+<p>You might also want to check out the <a href=http://wiki.python.org/moin/IntegratedDevelopmentEnvironments>list of Python-aware <abbr>IDE</abbr>s</a>, although few of them support Python 3 yet. One that does is <a href=http://pydev.sourceforge.net/>PyDev</a>, a plugin for <a href=http://eclipse.org/>Eclipse</a> that turns Eclipse into a full-fledged Python <abbr>IDE</abbr>. Both Eclipse and PyDev are cross-platform and open source.
+
+<p>On the commercial front, there is ActiveState&#8217;s <a href=http://www.activestate.com/komodo/>Komodo <abbr>IDE</abbr></a>. It has per-user licensing, but students can get a discount, and a free time-limited trial version is available.
+
+<p>I&#8217;ve been programming in Python for nine years, and I edit my Python programs in <a href=http://www.gnu.org/software/emacs/>GNU Emacs</a> and debug them in the command-line Python Shell. There&#8217;s no right or wrong way to develop in Python. Find a way that works for you!
+
+<p class=v><a href=whats-new.html rel=prev title='back to &#8220;What&#8217;s New In Dive Into Python 3&#8221;'><span class=u>&#x261C;</span></a> <a href=your-first-python-program.html rel=next title='onward to &#8220;Your First Python Program&#8221;'><span class=u>&#x261E;</span></a>
+
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/iterators.html b/iterators.html
index 4b4a3f5..8da4842 100755
--- a/iterators.html
+++ b/iterators.html
@@ -1,394 +1,394 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Classes &amp; Iterators - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 7}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#iterators>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
-<h1>Classes <i class=baa>&amp;</i> Iterators</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> East is East, and West is West, and never the twain shall meet. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Rudyard_Kipling>Rudyard Kipling</a>
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>Iterators are the &#8220;secret sauce&#8221; of Python 3. They&#8217;re everywhere, underlying everything, always just out of sight. <a href=comprehensions.html>Comprehensions</a> are just a simple form of <i>iterators</i>. Generators are just a simple form of <i>iterators</i>. A function that <code>yield</code>s values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that.
-
-<p>Remember <a href=generators.html#a-fibonacci-generator>the Fibonacci generator</a>? Here it is as a built-from-scratch iterator:
-
-<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
-<pre class=pp><code>class Fib:
-    '''iterator that yields numbers in the Fibonacci sequence'''
-
-    def __init__(self, max):
-        self.max = max
-
-    def __iter__(self):
-        self.a = 0
-        self.b = 1
-        return self
-
-    def __next__(self):
-        fib = self.a
-        if fib > self.max:
-            raise StopIteration
-        self.a, self.b = self.b, self.a + self.b
-        return fib</code></pre>
-
-<p>Let&#8217;s take that one line at a time.
-
-<pre class='nd pp'><code>class Fib:</code></pre>
-
-<p><code>class</code>? What&#8217;s a class?
-
-<p class=a>&#x2042;
-
-<h2 id=defining-classes>Defining Classes</h2>
-
-<p>Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you&#8217;ve defined.
-
-<p>Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word <code>class</code>, followed by the class name. Technically, that&#8217;s all that&#8217;s required, since a class doesn&#8217;t need to inherit from any other class.
-
-<pre class=pp><code><a>class PapayaWhip:  <span class=u>&#x2460;</span></a>
-<a>    pass           <span class=u>&#x2461;</span></a></code></pre>
-<ol>
-<li>The name of this class is <code>PapayaWhip</code>, and it doesn&#8217;t inherit from any other class. Class names are usually capitalized, <code>EachWordLikeThis</code>, but this is only a convention, not a requirement.
-<li>You probably guessed this, but everything in a class is indented, just like the code within a function, <code>if</code> statement, <code>for</code> loop, or any other block of code. The first line not indented is outside the class.
-</ol>
-
-<p>This <code>PapayaWhip</code> class doesn&#8217;t define any methods or attributes, but syntactically, there needs to be something in the definition, thus the <code>pass</code> statement. This is a Python reserved word that just means &#8220;move along, nothing to see here&#8221;. It&#8217;s a statement that does nothing, and it&#8217;s a good placeholder when you&#8217;re stubbing out functions or classes.
-
-<blockquote class='note compare java'>
-<p><span class=u>&#x261E;</span>The <code>pass</code> statement in Python is like a empty set of curly braces (<code>{}</code>) in Java or C.
-</blockquote>
-
-<p>Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don&#8217;t have explicit constructors and destructors. Although it&#8217;s not required, Python classes <em>can</em> have something similar to a constructor: the <code>__init__()</code> method.
-
-<h3 id=init-method>The <code>__init__()</code> Method</h3>
-
-<p>This example shows the initialization of the <code>Fib</code> class using the <code>__init__</code> method.
-
-<pre class=pp><code>class Fib:
-<a>    '''iterator that yields numbers in the Fibonacci sequence'''  <span class=u>&#x2460;</span></a>
-
-<a>    def __init__(self, max):                                      <span class=u>&#x2461;</span></a></code></pre>
-<ol>
-<li>Classes can (and should) have <code>docstring</code>s too, just like modules and functions.
-<li>The <code>__init__()</code> method is called immediately after an instance of the class is created. It would be tempting&nbsp;&mdash;&nbsp;but technically incorrect&nbsp;&mdash;&nbsp;to call this the &#8220;constructor&#8221; of the class. It&#8217;s tempting, because it looks like a C++ constructor (by convention, the <code>__init__()</code> method is the first method defined for the class), acts like one (it&#8217;s the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the <code>__init__()</code> method is called, and you already have a valid reference to the new instance of the class.
-</ol>
-
-<p>The first argument of every class method, including the <code>__init__()</code> method, is always a reference to the current instance of the class. By convention, this argument is named <var>self</var>. This argument fills the role of the reserved word <code>this</code> in <abbr>C++</abbr> or Java, but <var>self</var> is not a reserved word in Python, merely a naming convention. Nonetheless, please don&#8217;t call it anything but <var>self</var>; this is a very strong convention.
-
-<p>In the <code>__init__()</code> method, <var>self</var> refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify <var>self</var> explicitly when defining the method, you do <em>not</em> specify it when calling the method; Python will add it for you automatically.
-
-<p class=a>&#x2042;
-
-<h2 id=instantiating-classes>Instantiating Classes</h2>
-
-<p>Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the <code>__init__()</code> method requires. The return value will be the newly created object.
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import fibonacci2</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>fib = fibonacci2.Fib(100)</kbd>  <span class=u>&#x2460;</span></a>
-<a><samp class=p>>>> </samp><kbd class=pp>fib</kbd>                        <span class=u>&#x2461;</span></a>
-<samp class=pp>&lt;fibonacci2.Fib object at 0x00DB8810></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>fib.__class__</kbd>              <span class=u>&#x2462;</span></a>
-<samp class=pp>&lt;class 'fibonacci2.Fib'></samp>
-<a><samp class=p>>>> </samp><kbd class=pp>fib.__doc__</kbd>                <span class=u>&#x2463;</span></a>
-<samp class=pp>'iterator that yields numbers in the Fibonacci sequence'</samp></pre>
-<ol>
-<li>You are creating an instance of the <code>Fib</code> class (defined in the <code>fibonacci2</code> module) and assigning the newly created instance to the variable <var>fib</var>. You are passing one parameter, <code>100</code>, which will end up as the <var>max</var> argument in <code>Fib</code>&#8217;s <code>__init__()</code> method.
-<li><var>fib</var> is now an instance of the <code>Fib</code> class.
-<li>Every class instance has a built-in attribute, <code>__class__</code>, which is the object&#8217;s class. Java programmers may be familiar with the <code>Class</code> class, which contains methods like <code>getName()</code> and <code>getSuperclass()</code> to get metadata information about an object. In Python, this kind of metadata is available through attributes, but the idea is the same.
-<li>You can access the instance&#8217;s <code>docstring</code> just as with a function or a module. All instances of a class share the same <code>docstring</code>.
-</ol>
-
-<blockquote class='note compare java'>
-<p><span class=u>&#x261E;</span>In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit <code>new</code> operator like there is in <abbr>C++</abbr> or Java.
-</blockquote>
-
-<p class=a>&#x2042;
-
-<h2 id=instance-variables>Instance Variables</h2>
-
-<p>On to the next line:
-
-<pre class=pp><code>class Fib:
-    def __init__(self, max):
-<a>        self.max = max        <span class=u>&#x2460;</span></a></code></pre>
-<ol>
-<li>What is <var>self.max</var>? It&#8217;s an instance variable. It is completely separate from <var>max</var>, which was passed into the <code>__init__()</code> method as an argument. <var>self.max</var> is &#8220;global&#8221; to the instance. That means that you can access it from other methods.
-</ol>
-
-<pre class=pp><code>class Fib:
-    def __init__(self, max):
-<a>        self.max = max        <span class=u>&#x2460;</span></a>
-    .
-    .
-    .
-    def __next__(self):
-        fib = self.a
-<a>        if fib > self.max:    <span class=u>&#x2461;</span></a></code></pre>
-<ol>
-<li><var>self.max</var> is defined in the <code>__init__()</code> method&hellip;
-<li>&hellip;and referenced in the <code>__next__()</code> method.
-</ol>
-
-<p>Instance variables are specific to one instance of a class. For example, if you create two <code>Fib</code> instances with different maximum values, they will each remember their own values.
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>import fibonacci2</kbd>
-<samp class=p>>>> </samp><kbd class=pp>fib1 = fibonacci2.Fib(100)</kbd>
-<samp class=p>>>> </samp><kbd class=pp>fib2 = fibonacci2.Fib(200)</kbd>
-<samp class=p>>>> </samp><kbd class=pp>fib1.max</kbd>
-<samp class=pp>100</samp>
-<samp class=p>>>> </samp><kbd class=pp>fib2.max</kbd>
-<samp class=pp>200</samp></pre>
-
-<p class=a>&#x2042;
-
-<h2 id=a-fibonacci-iterator>A Fibonacci Iterator</h2>
-
-<p><em>Now</em> you&#8217;re ready to learn how to build an iterator. An iterator is just a class that defines an <code>__iter__()</code> method.
-
-<aside class=ots>
-All three of these class methods, <code>__init__</code>, <code>__iter__</code>, and <code>__next__</code>, begin and end with a pair of underscore (<code>_</code>) characters. Why is that? There&#8217;s nothing magical about it, but it usually indicates that these are &#8220;<dfn>special methods</dfn>.&#8221; The only thing &#8220;special&#8221; about special methods is that they aren&#8217;t called directly; Python calls them when you use some other syntax on the class or an instance of the class. <a href=special-method-names.html>More about special methods</a>.
-</aside>
-
-<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
-<pre class=pp><code><a>class Fib:                                        <span class=u>&#x2460;</span></a>
-<a>    def __init__(self, max):                      <span class=u>&#x2461;</span></a>
-        self.max = max
-
-<a>    def __iter__(self):                           <span class=u>&#x2462;</span></a>
-        self.a = 0
-        self.b = 1
-        return self
-
-<a>    def __next__(self):                           <span class=u>&#x2463;</span></a>
-        fib = self.a
-        if fib > self.max:
-<a>            raise StopIteration                   <span class=u>&#x2464;</span></a>
-        self.a, self.b = self.b, self.a + self.b
-<a>        return fib                                <span class=u>&#x2465;</span></a></code></pre>
-<ol>
-<li>To build an iterator from scratch, <code>fib</code> needs to be a class, not a function.
-<li>&#8220;Calling&#8221; <code>Fib(max)</code> is really creating an instance of this class and calling its <code>__init__()</code> method with <var>max</var>. The <code>__init__()</code> method saves the maximum value as an instance variable so other methods can refer to it later.
-<li>The <code>__iter__()</code> method is called whenever someone calls <code>iter(fib)</code>. (As you&#8217;ll see in a minute, a <code>for</code> loop will call this automatically, but you can also call it yourself manually.) After performing beginning-of-iteration initialization (in this case, resetting <code>self.a</code> and <code>self.b</code>, our two counters), the <code>__iter__()</code> method can return any object that implements a <code>__next__()</code> method. In this case (and in most cases), <code>__iter__()</code> simply returns <var>self</var>, since this class implements its own <code>__next__()</code> method.
-<li>The <code>__next__()</code> method is called whenever someone calls <code>next()</code> on an iterator of an instance of a class. That will make more sense in a minute.
-<li>When the <code>__next__()</code> method raises a <code>StopIteration</code> exception, this signals to the caller that the iteration is exhausted. Unlike most exceptions, this is not an error; it&#8217;s a normal condition that just means that the iterator has no more values to generate. If the caller is a <code>for</code> loop, it will notice this <code>StopIteration</code> exception and gracefully exit the loop. (In other words, it will swallow the exception.) This little bit of magic is actually the key to using iterators in <code>for</code> loops.
-<li>To spit out the next value, an iterator&#8217;s <code>__next__()</code> method simply <code>return</code>s the value. Do not use <code>yield</code> here; that&#8217;s a bit of syntactic sugar that only applies when you&#8217;re using generators. Here you&#8217;re creating your own iterator from scratch; use <code>return</code> instead.
-</ol>
-
-<p>Thoroughly confused yet? Excellent. Let&#8217;s see how to call this iterator:
-
-<pre class='nd screen'>
-<samp class=p>>>> </samp><kbd class=pp>from fibonacci2 import Fib</kbd>
-<samp class=p>>>> </samp><kbd class=pp>for n in Fib(1000):</kbd>
-<samp class=p>... </samp><kbd class=pp>    print(n, end=' ')</kbd>
-<samp class=pp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp></pre>
-
-<p>Why, it&#8217;s exactly the same! Byte for byte identical to how you called <a href=generators.html#a-fibonacci-generator>Fibonacci-as-a-generator</a> (modulo one capital letter). But how?
-
-<p>There&#8217;s a bit of magic involved in <code>for</code> loops. Here&#8217;s what happens:
-
-<ul>
-<li>The <code>for</code> loop calls <code>Fib(1000)</code>, as shown. This returns an instance of the <code>Fib</code> class. Call this <var>fib_inst</var>.
-<li>Secretly, and quite cleverly, the <code>for</code> loop calls <code>iter(fib_inst)</code>, which returns an iterator object. Call this <var>fib_iter</var>. In this case, <var>fib_iter</var> == <var>fib_inst</var>, because the <code>__iter__()</code> method returns <var>self</var>, but the <code>for</code> loop doesn&#8217;t know (or care) about that.
-<li>To &#8220;loop through&#8221; the iterator, the <code>for</code> loop calls <code>next(fib_iter)</code>, which calls the <code>__next__()</code> method on the <code>fib_iter</code> object, which does the next-Fibonacci-number calculations and returns a value. The <code>for</code> loop takes this value and assigns it to <var>n</var>, then executes the body of the <code>for</code> loop for that value of <var>n</var>.
-<li>How does the <code>for</code> loop know when to stop? I&#8217;m glad you asked! When <code>next(fib_iter)</code> raises a <code>StopIteration</code> exception, the <code>for</code> loop will swallow the exception and gracefully exit. (Any other exception will pass through and be raised as usual.) And where have you seen a <code>StopIteration</code> exception? In the <code>__next__()</code> method, of course!
-</ul>
-
-<p class=a>&#x2042;
-
-<h2 id=a-plural-rule-iterator>A Plural Rule Iterator</h2>
-
-<aside>iter(f) calls f.__iter__<br>next(f) calls f.__next__</aside>
-<p>Now it&#8217;s time for the finale. Let&#8217;s rewrite the <a href=generators.html>plural rules generator</a> as an iterator.
-
-<p class=d>[<a href=examples/plural6.py>download <code>plural6.py</code></a>]
-<pre class=pp><code>class LazyRules:
-    rules_filename = 'plural6-rules.txt'
-
-    def __init__(self):
-        self.pattern_file = open(self.rules_filename, encoding='utf-8')
-        self.cache = []
-
-    def __iter__(self):
-        self.cache_index = 0
-        return self
-
-    def __next__(self):
-        self.cache_index += 1
-        if len(self.cache) >= self.cache_index:
-            return self.cache[self.cache_index - 1]
-
-        if self.pattern_file.closed:
-            raise StopIteration
-
-        line = self.pattern_file.readline()
-        if not line:
-            self.pattern_file.close()
-            raise StopIteration
-
-        pattern, search, replace = line.split(None, 3)
-        funcs = build_match_and_apply_functions(
-            pattern, search, replace)
-        self.cache.append(funcs)
-        return funcs
-
-rules = LazyRules()</code></pre>
-
-<p>So this is a class that implements <code>__iter__()</code> and <code>__next__()</code>, so it can be used as an iterator. Then, you instantiate the class and assign it to <var>rules</var>. This happens just once, on import.
-
-<p>Let&#8217;s take the class one bite at a time.
-
-<pre class=pp><code>class LazyRules:
-    rules_filename = 'plural6-rules.txt'
-
-    def __init__(self):
-<a>        self.pattern_file = open(self.rules_filename, encoding='utf-8')  <span class=u>&#x2460;</span></a>
-<a>        self.cache = []                                                  <span class=u>&#x2461;</span></a></code></pre>
-<ol>
-<li>When we instantiate the <code>LazyRules</code> class, open the pattern file but don&#8217;t read anything from it. (That comes later.)
-<li>After opening the patterns file, initialize the cache. You&#8217;ll use this cache later (in the <code>__next__()</code> method) as you read lines from the pattern file.
-</ol>
-
-<p>Before we continue, let&#8217;s take a closer look at <var>rules_filename</var>. It&#8217;s not defined within the <code>__iter__()</code> method. In fact, it&#8217;s not defined within <em>any</em> method. It&#8217;s defined at the class level. It&#8217;s a <i>class variable</i>, and although you can access it just like an instance variable (<var>self.rules_filename</var>), it is shared across all instances of the <code>LazyRules</code> class.
-
-<pre class=screen>
-<samp class=p>>>> </samp><kbd class=pp>import plural6</kbd>
-<samp class=p>>>> </samp><kbd class=pp>r1 = plural6.LazyRules()</kbd>
-<samp class=p>>>> </samp><kbd class=pp>r2 = plural6.LazyRules()</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>r1.rules_filename</kbd>                               <span class=u>&#x2460;</span></a>
-<samp class=pp>'plural6-rules.txt'</samp>
-<samp class=p>>>> </samp><kbd class=pp>r2.rules_filename</kbd>
-<samp class=pp>'plural6-rules.txt'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>r2.rules_filename = 'r2-override.txt'</kbd>           <span class=u>&#x2461;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>r2.rules_filename</kbd>
-<samp class=pp>'r2-override.txt'</samp>
-<samp class=p>>>> </samp><kbd class=pp>r1.rules_filename</kbd>
-<samp class=pp>'plural6-rules.txt'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>r2.__class__.rules_filename</kbd>                     <span class=u>&#x2462;</span></a>
-<samp class=pp>'plural6-rules.txt'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>r2.__class__.rules_filename = 'papayawhip.txt'</kbd>  <span class=u>&#x2463;</span></a>
-<samp class=p>>>> </samp><kbd class=pp>r1.rules_filename</kbd>
-<samp class=pp>'papayawhip.txt'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>r2.rules_filename</kbd>                               <span class=u>&#x2464;</span></a>
-<samp class=pp>'r2-overridetxt'</samp></pre>
-<ol>
-<li>Each instance of the class inherits the <var>rules_filename</var> attribute with the value defined by the class.
-<li>Changing the attribute&#8217;s value in one instance does not affect other instances&hellip;
-<li>&hellip;nor does it change the class attribute. You can access the class attribute (as opposed to an individual instance&#8217;s attribute) by using the special <code>__class__</code> attribute to access the class itself.
-<li>If you change the class attribute, all instances that are still inheriting that value (like <var>r1</var> here) will be affected.
-<li>Instances that have overridden that attribute (like <var>r2</var> here) will not be affected.
-</ol>
-
-<p>And now back to our show.
-
-<pre class=pp><code><a>    def __iter__(self):       <span class=u>&#x2460;</span></a>
-        self.cache_index = 0
-<a>        return self           <span class=u>&#x2461;</span></a>
-</code></pre>
-<ol>
-<li>The <code>__iter__()</code> method will be called every time someone&nbsp;&mdash;&nbsp;say, a <code>for</code> loop&nbsp;&mdash;&nbsp;calls <code>iter(rules)</code>.
-<li>The one thing that every <code>__iter__()</code> method must do is return an iterator. In this case, it returns <var>self</var>, which signals that this class defines a <code>__next__()</code> method which will take care of returning values throughout the iteration.
-</ol>
-
-<pre class=pp><code><a>    def __next__(self):                                 <span class=u>&#x2460;</span></a>
-        .
-        .
-        .
-        pattern, search, replace = line.split(None, 3)
-<a>        funcs = build_match_and_apply_functions(        <span class=u>&#x2461;</span></a>
-            pattern, search, replace)
-<a>        self.cache.append(funcs)                        <span class=u>&#x2462;</span></a>
-        return funcs</code></pre>
-<ol>
-<li>The <code>__next__()</code> method gets called whenever someone&nbsp;&mdash;&nbsp;say, a <code>for</code> loop&nbsp;&mdash;&nbsp;calls <code>next(rules)</code>. This method will only make sense if we start at the end and work backwards. So let&#8217;s do that.
-<li>The last part of this function should look familiar, at least. The <code>build_match_and_apply_functions()</code> function hasn&#8217;t changed; it&#8217;s the same as it ever was.
-<li>The only difference is that, before returning the match and apply functions (which are stored in the tuple <var>funcs</var>), we&#8217;re going to save them in <code>self.cache</code>.
-</ol>
-
-<p>Moving backwards&hellip;
-
-<pre class=pp><code>    def __next__(self):
-        .
-        .
-        .
-<a>        line = self.pattern_file.readline()  <span class=u>&#x2460;</span></a>
-<a>        if not line:                         <span class=u>&#x2461;</span></a>
-            self.pattern_file.close()
-<a>            raise StopIteration              <span class=u>&#x2462;</span></a>
-        .
-        .
-        .</code></pre>
-<ol>
-<li>A bit of advanced file trickery here. The <code>readline()</code> method (note: singular, not the plural <code>readlines()</code>) reads exactly one line from an open file. Specifically, the next line. (<em>File objects are iterators too! It&#8217;s iterators all the way down&hellip;</em>)
-<li>If there was a line for <code>readline()</code> to read, <var>line</var> will not be an empty string. Even if the file contained a blank line, <var>line</var> would end up as the one-character string <code>'\n'</code> (a carriage return). If <var>line</var> is really an empty string, that means there are no more lines to read from the file.
-<li>When we reach the end of the file, we should close the file and raise the magic <code>StopIteration</code> exception. Remember, we got to this point because we needed a match and apply function for the next rule. The next rule comes from the next line of the file&hellip; but there is no next line! Therefore, we have no value to return. The iteration is over. (<span class=u>&#x266B;</span> The party&#8217;s over&hellip; <span class=u>&#x266B;</span>)
-</ol>
-
-<p>Moving backwards all the way to the start of the <code>__next__()</code> method&hellip;
-
-<pre class=pp><code>    def __next__(self):
-        self.cache_index += 1
-        if len(self.cache) >= self.cache_index:
-<a>            return self.cache[self.cache_index - 1]     <span class=u>&#x2460;</span></a>
-
-        if self.pattern_file.closed:
-<a>            raise StopIteration                         <span class=u>&#x2461;</span></a>
-        .
-        .
-        .</code></pre>
-<ol>
-<li><code>self.cache</code> will be a list of the functions we need to match and apply individual rules. (At least <em>that</em> should sound familiar!) <code>self.cache_index</code> keeps track of which cached item we should return next. If we haven&#8217;t exhausted the cache yet (<i>i.e.</i> if the length of <code>self.cache</code> is greater than <code>self.cache_index</code>), then we have a cache hit! Hooray! We can return the match and apply functions from the cache instead of building them from scratch.
-<li>On the other hand, if we don&#8217;t get a hit from the cache, <em>and</em> the file object has been closed (which could happen, further down the method, as you saw in the previous code snippet), then there&#8217;s nothing more we can do. If the file is closed, it means we&#8217;ve exhausted it&nbsp;&mdash;&nbsp;we&#8217;ve already read through every line from the pattern file, and we&#8217;ve already built and cached the match and apply functions for each pattern. The file is exhausted; the cache is exhausted; I&#8217;m exhausted. Wait, what? Hang in there, we&#8217;re almost done.
-</ol>
-
-<p>Putting it all together, here&#8217;s what happens when:
-
-<ul>
-<li>When the module is imported, it creates a single instance of the <code>LazyRules</code> class, called <var>rules</var>, which opens the pattern file but does not read from it.
-<li>When asked for the first match and apply function, it checks its cache but finds the cache is empty. So it reads a single line from the pattern file, builds the match and apply functions from those patterns, and caches them.
-<li>Let&#8217;s say, for the sake of argument, that the very first rule matched. If so, no further match and apply functions are built, and no further lines are read from the pattern file.
-<li>Furthermore, for the sake of argument, suppose that the caller calls the <code>plural()</code> function <em>again</em> to pluralize a different word. The <code>for</code> loop in the <code>plural()</code> function will call <code>iter(rules)</code>, which will reset the cache index but will not reset the open file object.
-<li>The first time through, the <code>for</code> loop will ask for a value from <var>rules</var>, which will invoke its <code>__next__()</code> method. This time, however, the cache is primed with a single pair of match and apply functions, corresponding to the patterns in the first line of the pattern file. Since they were built and cached in the course of pluralizing the previous word, they&#8217;re retrieved from the cache. The cache index increments, and the open file is never touched.
-<li>Let&#8217;s say, for the sake of argument, that the first rule does <em>not</em> match this time around. So the <code>for</code> loop comes around again and asks for another value from <var>rules</var>. This invokes the <code>__next__()</code> method a second time. This time, the cache is exhausted&nbsp;&mdash;&nbsp;it only contained one item, and we&#8217;re asking for a second&nbsp;&mdash;&nbsp;so the <code>__next__()</code> method continues. It reads another line from the open file, builds match and apply functions out of the patterns, and caches them.
-<li>This read-build-and-cache process will continue as long as the rules being read from the pattern file don&#8217;t match the word we&#8217;re trying to pluralize. If we do find a matching rule before the end of the file, we simply use it and stop, with the file still open. The file pointer will stay wherever we stopped reading, waiting for the next <code>readline()</code> command. In the meantime, the cache now has more items in it, and if we start all over again trying to pluralize a new word, each of those items in the cache will be tried before reading the next line from the pattern file.
-</ul>
-
-<p>We have achieved pluralization nirvana.
-
-<ol>
-<li><strong>Minimal startup cost.</strong> The only thing that happens on <code>import</code> is instantiating a single class and opening a file (but not reading from it).
-<li><strong>Maximum performance.</strong> The previous example would read through the file and build functions dynamically every time you wanted to pluralize a word. This version will cache functions as soon as they&#8217;re built, and in the worst case, it will only read through the pattern file once, no matter how many words you pluralize.
-<li><strong>Separation of code and data.</strong> All the patterns are stored in a separate file. Code is code, and data is data, and never the twain shall meet.
-</ol>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>Is this really nirvana? Well, yes and no. Here&#8217;s something to consider with the <code>LazyRules</code> example: the pattern file is opened (during <code>__init__()</code>), and it remains open until the final rule is reached. Python will eventually close the file when it exits, or after the last instantiation of the <code>LazyRules</code> class is destroyed, but still, that could be a <em>long</em> time. If this class is part of a long-running Python process, the Python interpreter may never exit, and the <code>LazyRules</code> object may never get destroyed.
-<p>There are ways around this. Instead of opening the file during <code>__init__()</code> and leaving it open while you read rules one line at a time, you could open the file, read all the rules, and immediately close the file. Or you could open the file, read one rule, save the file position with the <a href=files.html#read><code>tell()</code> method</a>, close the file, and later re-open it and use the <a href=files.html#read><code>seek()</code> method</a> to continue reading where you left off. Or you could not worry about it and just leave the file open, like this example code does. Programming is design, and design is all about trade-offs and constraints. Leaving a file open too long might be a problem; making your code more complicated might be a problem. Which one is the bigger problem depends on your development team, your application, and your runtime environment.
-</blockquote>
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-<ul>
-<li><a href=http://docs.python.org/3.1/library/stdtypes.html#iterator-types>Iterator types</a>
-<li><a href=http://www.python.org/dev/peps/pep-0234/>PEP 234: Iterators</a>
-<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
-<li><a href=http://www.dabeaz.com/generators/>Generator Tricks for Systems Programmers</a>
-</ul>
-
-<p class=v><a href=generators.html rel=prev title='back to &#8220;Closures &amp; Generators&#8221;'><span class=u>&#x261C;</span></a> <a href=advanced-iterators.html rel=next title='onward to &#8220;Advanced Iterators&#8221;'><span class=u>&#x261E;</span></a>
-
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Classes &amp; Iterators - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 7}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#iterators>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
+<h1>Classes <i class=baa>&amp;</i> Iterators</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> East is East, and West is West, and never the twain shall meet. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Rudyard_Kipling>Rudyard Kipling</a>
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>Iterators are the &#8220;secret sauce&#8221; of Python 3. They&#8217;re everywhere, underlying everything, always just out of sight. <a href=comprehensions.html>Comprehensions</a> are just a simple form of <i>iterators</i>. Generators are just a simple form of <i>iterators</i>. A function that <code>yield</code>s values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that.
+
+<p>Remember <a href=generators.html#a-fibonacci-generator>the Fibonacci generator</a>? Here it is as a built-from-scratch iterator:
+
+<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
+<pre class=pp><code>class Fib:
+    '''iterator that yields numbers in the Fibonacci sequence'''
+
+    def __init__(self, max):
+        self.max = max
+
+    def __iter__(self):
+        self.a = 0
+        self.b = 1
+        return self
+
+    def __next__(self):
+        fib = self.a
+        if fib > self.max:
+            raise StopIteration
+        self.a, self.b = self.b, self.a + self.b
+        return fib</code></pre>
+
+<p>Let&#8217;s take that one line at a time.
+
+<pre class='nd pp'><code>class Fib:</code></pre>
+
+<p><code>class</code>? What&#8217;s a class?
+
+<p class=a>&#x2042;
+
+<h2 id=defining-classes>Defining Classes</h2>
+
+<p>Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you&#8217;ve defined.
+
+<p>Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word <code>class</code>, followed by the class name. Technically, that&#8217;s all that&#8217;s required, since a class doesn&#8217;t need to inherit from any other class.
+
+<pre class=pp><code><a>class PapayaWhip:  <span class=u>&#x2460;</span></a>
+<a>    pass           <span class=u>&#x2461;</span></a></code></pre>
+<ol>
+<li>The name of this class is <code>PapayaWhip</code>, and it doesn&#8217;t inherit from any other class. Class names are usually capitalized, <code>EachWordLikeThis</code>, but this is only a convention, not a requirement.
+<li>You probably guessed this, but everything in a class is indented, just like the code within a function, <code>if</code> statement, <code>for</code> loop, or any other block of code. The first line not indented is outside the class.
+</ol>
+
+<p>This <code>PapayaWhip</code> class doesn&#8217;t define any methods or attributes, but syntactically, there needs to be something in the definition, thus the <code>pass</code> statement. This is a Python reserved word that just means &#8220;move along, nothing to see here&#8221;. It&#8217;s a statement that does nothing, and it&#8217;s a good placeholder when you&#8217;re stubbing out functions or classes.
+
+<blockquote class='note compare java'>
+<p><span class=u>&#x261E;</span>The <code>pass</code> statement in Python is like a empty set of curly braces (<code>{}</code>) in Java or C.
+</blockquote>
+
+<p>Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don&#8217;t have explicit constructors and destructors. Although it&#8217;s not required, Python classes <em>can</em> have something similar to a constructor: the <code>__init__()</code> method.
+
+<h3 id=init-method>The <code>__init__()</code> Method</h3>
+
+<p>This example shows the initialization of the <code>Fib</code> class using the <code>__init__</code> method.
+
+<pre class=pp><code>class Fib:
+<a>    '''iterator that yields numbers in the Fibonacci sequence'''  <span class=u>&#x2460;</span></a>
+
+<a>    def __init__(self, max):                                      <span class=u>&#x2461;</span></a></code></pre>
+<ol>
+<li>Classes can (and should) have <code>docstring</code>s too, just like modules and functions.
+<li>The <code>__init__()</code> method is called immediately after an instance of the class is created. It would be tempting&nbsp;&mdash;&nbsp;but technically incorrect&nbsp;&mdash;&nbsp;to call this the &#8220;constructor&#8221; of the class. It&#8217;s tempting, because it looks like a C++ constructor (by convention, the <code>__init__()</code> method is the first method defined for the class), acts like one (it&#8217;s the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the <code>__init__()</code> method is called, and you already have a valid reference to the new instance of the class.
+</ol>
+
+<p>The first argument of every class method, including the <code>__init__()</code> method, is always a reference to the current instance of the class. By convention, this argument is named <var>self</var>. This argument fills the role of the reserved word <code>this</code> in <abbr>C++</abbr> or Java, but <var>self</var> is not a reserved word in Python, merely a naming convention. Nonetheless, please don&#8217;t call it anything but <var>self</var>; this is a very strong convention.
+
+<p>In the <code>__init__()</code> method, <var>self</var> refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify <var>self</var> explicitly when defining the method, you do <em>not</em> specify it when calling the method; Python will add it for you automatically.
+
+<p class=a>&#x2042;
+
+<h2 id=instantiating-classes>Instantiating Classes</h2>
+
+<p>Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the <code>__init__()</code> method requires. The return value will be the newly created object.
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import fibonacci2</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>fib = fibonacci2.Fib(100)</kbd>  <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>fib</kbd>                        <span class=u>&#x2461;</span></a>
+<samp class=pp>&lt;fibonacci2.Fib object at 0x00DB8810></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>fib.__class__</kbd>              <span class=u>&#x2462;</span></a>
+<samp class=pp>&lt;class 'fibonacci2.Fib'></samp>
+<a><samp class=p>>>> </samp><kbd class=pp>fib.__doc__</kbd>                <span class=u>&#x2463;</span></a>
+<samp class=pp>'iterator that yields numbers in the Fibonacci sequence'</samp></pre>
+<ol>
+<li>You are creating an instance of the <code>Fib</code> class (defined in the <code>fibonacci2</code> module) and assigning the newly created instance to the variable <var>fib</var>. You are passing one parameter, <code>100</code>, which will end up as the <var>max</var> argument in <code>Fib</code>&#8217;s <code>__init__()</code> method.
+<li><var>fib</var> is now an instance of the <code>Fib</code> class.
+<li>Every class instance has a built-in attribute, <code>__class__</code>, which is the object&#8217;s class. Java programmers may be familiar with the <code>Class</code> class, which contains methods like <code>getName()</code> and <code>getSuperclass()</code> to get metadata information about an object. In Python, this kind of metadata is available through attributes, but the idea is the same.
+<li>You can access the instance&#8217;s <code>docstring</code> just as with a function or a module. All instances of a class share the same <code>docstring</code>.
+</ol>
+
+<blockquote class='note compare java'>
+<p><span class=u>&#x261E;</span>In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit <code>new</code> operator like there is in <abbr>C++</abbr> or Java.
+</blockquote>
+
+<p class=a>&#x2042;
+
+<h2 id=instance-variables>Instance Variables</h2>
+
+<p>On to the next line:
+
+<pre class=pp><code>class Fib:
+    def __init__(self, max):
+<a>        self.max = max        <span class=u>&#x2460;</span></a></code></pre>
+<ol>
+<li>What is <var>self.max</var>? It&#8217;s an instance variable. It is completely separate from <var>max</var>, which was passed into the <code>__init__()</code> method as an argument. <var>self.max</var> is &#8220;global&#8221; to the instance. That means that you can access it from other methods.
+</ol>
+
+<pre class=pp><code>class Fib:
+    def __init__(self, max):
+<a>        self.max = max        <span class=u>&#x2460;</span></a>
+    .
+    .
+    .
+    def __next__(self):
+        fib = self.a
+<a>        if fib > self.max:    <span class=u>&#x2461;</span></a></code></pre>
+<ol>
+<li><var>self.max</var> is defined in the <code>__init__()</code> method&hellip;
+<li>&hellip;and referenced in the <code>__next__()</code> method.
+</ol>
+
+<p>Instance variables are specific to one instance of a class. For example, if you create two <code>Fib</code> instances with different maximum values, they will each remember their own values.
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>import fibonacci2</kbd>
+<samp class=p>>>> </samp><kbd class=pp>fib1 = fibonacci2.Fib(100)</kbd>
+<samp class=p>>>> </samp><kbd class=pp>fib2 = fibonacci2.Fib(200)</kbd>
+<samp class=p>>>> </samp><kbd class=pp>fib1.max</kbd>
+<samp class=pp>100</samp>
+<samp class=p>>>> </samp><kbd class=pp>fib2.max</kbd>
+<samp class=pp>200</samp></pre>
+
+<p class=a>&#x2042;
+
+<h2 id=a-fibonacci-iterator>A Fibonacci Iterator</h2>
+
+<p><em>Now</em> you&#8217;re ready to learn how to build an iterator. An iterator is just a class that defines an <code>__iter__()</code> method.
+
+<aside class=ots>
+All three of these class methods, <code>__init__</code>, <code>__iter__</code>, and <code>__next__</code>, begin and end with a pair of underscore (<code>_</code>) characters. Why is that? There&#8217;s nothing magical about it, but it usually indicates that these are &#8220;<dfn>special methods</dfn>.&#8221; The only thing &#8220;special&#8221; about special methods is that they aren&#8217;t called directly; Python calls them when you use some other syntax on the class or an instance of the class. <a href=special-method-names.html>More about special methods</a>.
+</aside>
+
+<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
+<pre class=pp><code><a>class Fib:                                        <span class=u>&#x2460;</span></a>
+<a>    def __init__(self, max):                      <span class=u>&#x2461;</span></a>
+        self.max = max
+
+<a>    def __iter__(self):                           <span class=u>&#x2462;</span></a>
+        self.a = 0
+        self.b = 1
+        return self
+
+<a>    def __next__(self):                           <span class=u>&#x2463;</span></a>
+        fib = self.a
+        if fib > self.max:
+<a>            raise StopIteration                   <span class=u>&#x2464;</span></a>
+        self.a, self.b = self.b, self.a + self.b
+<a>        return fib                                <span class=u>&#x2465;</span></a></code></pre>
+<ol>
+<li>To build an iterator from scratch, <code>fib</code> needs to be a class, not a function.
+<li>&#8220;Calling&#8221; <code>Fib(max)</code> is really creating an instance of this class and calling its <code>__init__()</code> method with <var>max</var>. The <code>__init__()</code> method saves the maximum value as an instance variable so other methods can refer to it later.
+<li>The <code>__iter__()</code> method is called whenever someone calls <code>iter(fib)</code>. (As you&#8217;ll see in a minute, a <code>for</code> loop will call this automatically, but you can also call it yourself manually.) After performing beginning-of-iteration initialization (in this case, resetting <code>self.a</code> and <code>self.b</code>, our two counters), the <code>__iter__()</code> method can return any object that implements a <code>__next__()</code> method. In this case (and in most cases), <code>__iter__()</code> simply returns <var>self</var>, since this class implements its own <code>__next__()</code> method.
+<li>The <code>__next__()</code> method is called whenever someone calls <code>next()</code> on an iterator of an instance of a class. That will make more sense in a minute.
+<li>When the <code>__next__()</code> method raises a <code>StopIteration</code> exception, this signals to the caller that the iteration is exhausted. Unlike most exceptions, this is not an error; it&#8217;s a normal condition that just means that the iterator has no more values to generate. If the caller is a <code>for</code> loop, it will notice this <code>StopIteration</code> exception and gracefully exit the loop. (In other words, it will swallow the exception.) This little bit of magic is actually the key to using iterators in <code>for</code> loops.
+<li>To spit out the next value, an iterator&#8217;s <code>__next__()</code> method simply <code>return</code>s the value. Do not use <code>yield</code> here; that&#8217;s a bit of syntactic sugar that only applies when you&#8217;re using generators. Here you&#8217;re creating your own iterator from scratch; use <code>return</code> instead.
+</ol>
+
+<p>Thoroughly confused yet? Excellent. Let&#8217;s see how to call this iterator:
+
+<pre class='nd screen'>
+<samp class=p>>>> </samp><kbd class=pp>from fibonacci2 import Fib</kbd>
+<samp class=p>>>> </samp><kbd class=pp>for n in Fib(1000):</kbd>
+<samp class=p>... </samp><kbd class=pp>    print(n, end=' ')</kbd>
+<samp class=pp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp></pre>
+
+<p>Why, it&#8217;s exactly the same! Byte for byte identical to how you called <a href=generators.html#a-fibonacci-generator>Fibonacci-as-a-generator</a> (modulo one capital letter). But how?
+
+<p>There&#8217;s a bit of magic involved in <code>for</code> loops. Here&#8217;s what happens:
+
+<ul>
+<li>The <code>for</code> loop calls <code>Fib(1000)</code>, as shown. This returns an instance of the <code>Fib</code> class. Call this <var>fib_inst</var>.
+<li>Secretly, and quite cleverly, the <code>for</code> loop calls <code>iter(fib_inst)</code>, which returns an iterator object. Call this <var>fib_iter</var>. In this case, <var>fib_iter</var> == <var>fib_inst</var>, because the <code>__iter__()</code> method returns <var>self</var>, but the <code>for</code> loop doesn&#8217;t know (or care) about that.
+<li>To &#8220;loop through&#8221; the iterator, the <code>for</code> loop calls <code>next(fib_iter)</code>, which calls the <code>__next__()</code> method on the <code>fib_iter</code> object, which does the next-Fibonacci-number calculations and returns a value. The <code>for</code> loop takes this value and assigns it to <var>n</var>, then executes the body of the <code>for</code> loop for that value of <var>n</var>.
+<li>How does the <code>for</code> loop know when to stop? I&#8217;m glad you asked! When <code>next(fib_iter)</code> raises a <code>StopIteration</code> exception, the <code>for</code> loop will swallow the exception and gracefully exit. (Any other exception will pass through and be raised as usual.) And where have you seen a <code>StopIteration</code> exception? In the <code>__next__()</code> method, of course!
+</ul>
+
+<p class=a>&#x2042;
+
+<h2 id=a-plural-rule-iterator>A Plural Rule Iterator</h2>
+
+<aside>iter(f) calls f.__iter__<br>next(f) calls f.__next__</aside>
+<p>Now it&#8217;s time for the finale. Let&#8217;s rewrite the <a href=generators.html>plural rules generator</a> as an iterator.
+
+<p class=d>[<a href=examples/plural6.py>download <code>plural6.py</code></a>]
+<pre class=pp><code>class LazyRules:
+    rules_filename = 'plural6-rules.txt'
+
+    def __init__(self):
+        self.pattern_file = open(self.rules_filename, encoding='utf-8')
+        self.cache = []
+
+    def __iter__(self):
+        self.cache_index = 0
+        return self
+
+    def __next__(self):
+        self.cache_index += 1
+        if len(self.cache) >= self.cache_index:
+            return self.cache[self.cache_index - 1]
+
+        if self.pattern_file.closed:
+            raise StopIteration
+
+        line = self.pattern_file.readline()
+        if not line:
+            self.pattern_file.close()
+            raise StopIteration
+
+        pattern, search, replace = line.split(None, 3)
+        funcs = build_match_and_apply_functions(
+            pattern, search, replace)
+        self.cache.append(funcs)
+        return funcs
+
+rules = LazyRules()</code></pre>
+
+<p>So this is a class that implements <code>__iter__()</code> and <code>__next__()</code>, so it can be used as an iterator. Then, you instantiate the class and assign it to <var>rules</var>. This happens just once, on import.
+
+<p>Let&#8217;s take the class one bite at a time.
+
+<pre class=pp><code>class LazyRules:
+    rules_filename = 'plural6-rules.txt'
+
+    def __init__(self):
+<a>        self.pattern_file = open(self.rules_filename, encoding='utf-8')  <span class=u>&#x2460;</span></a>
+<a>        self.cache = []                                                  <span class=u>&#x2461;</span></a></code></pre>
+<ol>
+<li>When we instantiate the <code>LazyRules</code> class, open the pattern file but don&#8217;t read anything from it. (That comes later.)
+<li>After opening the patterns file, initialize the cache. You&#8217;ll use this cache later (in the <code>__next__()</code> method) as you read lines from the pattern file.
+</ol>
+
+<p>Before we continue, let&#8217;s take a closer look at <var>rules_filename</var>. It&#8217;s not defined within the <code>__iter__()</code> method. In fact, it&#8217;s not defined within <em>any</em> method. It&#8217;s defined at the class level. It&#8217;s a <i>class variable</i>, and although you can access it just like an instance variable (<var>self.rules_filename</var>), it is shared across all instances of the <code>LazyRules</code> class.
+
+<pre class=screen>
+<samp class=p>>>> </samp><kbd class=pp>import plural6</kbd>
+<samp class=p>>>> </samp><kbd class=pp>r1 = plural6.LazyRules()</kbd>
+<samp class=p>>>> </samp><kbd class=pp>r2 = plural6.LazyRules()</kbd>
+<a><samp class=p>>>> </samp><kbd class=pp>r1.rules_filename</kbd>                               <span class=u>&#x2460;</span></a>
+<samp class=pp>'plural6-rules.txt'</samp>
+<samp class=p>>>> </samp><kbd class=pp>r2.rules_filename</kbd>
+<samp class=pp>'plural6-rules.txt'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>r2.rules_filename = 'r2-override.txt'</kbd>           <span class=u>&#x2461;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>r2.rules_filename</kbd>
+<samp class=pp>'r2-override.txt'</samp>
+<samp class=p>>>> </samp><kbd class=pp>r1.rules_filename</kbd>
+<samp class=pp>'plural6-rules.txt'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>r2.__class__.rules_filename</kbd>                     <span class=u>&#x2462;</span></a>
+<samp class=pp>'plural6-rules.txt'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>r2.__class__.rules_filename = 'papayawhip.txt'</kbd>  <span class=u>&#x2463;</span></a>
+<samp class=p>>>> </samp><kbd class=pp>r1.rules_filename</kbd>
+<samp class=pp>'papayawhip.txt'</samp>
+<a><samp class=p>>>> </samp><kbd class=pp>r2.rules_filename</kbd>                               <span class=u>&#x2464;</span></a>
+<samp class=pp>'r2-overridetxt'</samp></pre>
+<ol>
+<li>Each instance of the class inherits the <var>rules_filename</var> attribute with the value defined by the class.
+<li>Changing the attribute&#8217;s value in one instance does not affect other instances&hellip;
+<li>&hellip;nor does it change the class attribute. You can access the class attribute (as opposed to an individual instance&#8217;s attribute) by using the special <code>__class__</code> attribute to access the class itself.
+<li>If you change the class attribute, all instances that are still inheriting that value (like <var>r1</var> here) will be affected.
+<li>Instances that have overridden that attribute (like <var>r2</var> here) will not be affected.
+</ol>
+
+<p>And now back to our show.
+
+<pre class=pp><code><a>    def __iter__(self):       <span class=u>&#x2460;</span></a>
+        self.cache_index = 0
+<a>        return self           <span class=u>&#x2461;</span></a>
+</code></pre>
+<ol>
+<li>The <code>__iter__()</code> method will be called every time someone&nbsp;&mdash;&nbsp;say, a <code>for</code> loop&nbsp;&mdash;&nbsp;calls <code>iter(rules)</code>.
+<li>The one thing that every <code>__iter__()</code> method must do is return an iterator. In this case, it returns <var>self</var>, which signals that this class defines a <code>__next__()</code> method which will take care of returning values throughout the iteration.
+</ol>
+
+<pre class=pp><code><a>    def __next__(self):                                 <span class=u>&#x2460;</span></a>
+        .
+        .
+        .
+        pattern, search, replace = line.split(None, 3)
+<a>        funcs = build_match_and_apply_functions(        <span class=u>&#x2461;</span></a>
+            pattern, search, replace)
+<a>        self.cache.append(funcs)                        <span class=u>&#x2462;</span></a>
+        return funcs</code></pre>
+<ol>
+<li>The <code>__next__()</code> method gets called whenever someone&nbsp;&mdash;&nbsp;say, a <code>for</code> loop&nbsp;&mdash;&nbsp;calls <code>next(rules)</code>. This method will only make sense if we start at the end and work backwards. So let&#8217;s do that.
+<li>The last part of this function should look familiar, at least. The <code>build_match_and_apply_functions()</code> function hasn&#8217;t changed; it&#8217;s the same as it ever was.
+<li>The only difference is that, before returning the match and apply functions (which are stored in the tuple <var>funcs</var>), we&#8217;re going to save them in <code>self.cache</code>.
+</ol>
+
+<p>Moving backwards&hellip;
+
+<pre class=pp><code>    def __next__(self):
+        .
+        .
+        .
+<a>        line = self.pattern_file.readline()  <span class=u>&#x2460;</span></a>
+<a>        if not line:                         <span class=u>&#x2461;</span></a>
+            self.pattern_file.close()
+<a>            raise StopIteration              <span class=u>&#x2462;</span></a>
+        .
+        .
+        .</code></pre>
+<ol>
+<li>A bit of advanced file trickery here. The <code>readline()</code> method (note: singular, not the plural <code>readlines()</code>) reads exactly one line from an open file. Specifically, the next line. (<em>File objects are iterators too! It&#8217;s iterators all the way down&hellip;</em>)
+<li>If there was a line for <code>readline()</code> to read, <var>line</var> will not be an empty string. Even if the file contained a blank line, <var>line</var> would end up as the one-character string <code>'\n'</code> (a carriage return). If <var>line</var> is really an empty string, that means there are no more lines to read from the file.
+<li>When we reach the end of the file, we should close the file and raise the magic <code>StopIteration</code> exception. Remember, we got to this point because we needed a match and apply function for the next rule. The next rule comes from the next line of the file&hellip; but there is no next line! Therefore, we have no value to return. The iteration is over. (<span class=u>&#x266B;</span> The party&#8217;s over&hellip; <span class=u>&#x266B;</span>)
+</ol>
+
+<p>Moving backwards all the way to the start of the <code>__next__()</code> method&hellip;
+
+<pre class=pp><code>    def __next__(self):
+        self.cache_index += 1
+        if len(self.cache) >= self.cache_index:
+<a>            return self.cache[self.cache_index - 1]     <span class=u>&#x2460;</span></a>
+
+        if self.pattern_file.closed:
+<a>            raise StopIteration                         <span class=u>&#x2461;</span></a>
+        .
+        .
+        .</code></pre>
+<ol>
+<li><code>self.cache</code> will be a list of the functions we need to match and apply individual rules. (At least <em>that</em> should sound familiar!) <code>self.cache_index</code> keeps track of which cached item we should return next. If we haven&#8217;t exhausted the cache yet (<i>i.e.</i> if the length of <code>self.cache</code> is greater than <code>self.cache_index</code>), then we have a cache hit! Hooray! We can return the match and apply functions from the cache instead of building them from scratch.
+<li>On the other hand, if we don&#8217;t get a hit from the cache, <em>and</em> the file object has been closed (which could happen, further down the method, as you saw in the previous code snippet), then there&#8217;s nothing more we can do. If the file is closed, it means we&#8217;ve exhausted it&nbsp;&mdash;&nbsp;we&#8217;ve already read through every line from the pattern file, and we&#8217;ve already built and cached the match and apply functions for each pattern. The file is exhausted; the cache is exhausted; I&#8217;m exhausted. Wait, what? Hang in there, we&#8217;re almost done.
+</ol>
+
+<p>Putting it all together, here&#8217;s what happens when:
+
+<ul>
+<li>When the module is imported, it creates a single instance of the <code>LazyRules</code> class, called <var>rules</var>, which opens the pattern file but does not read from it.
+<li>When asked for the first match and apply function, it checks its cache but finds the cache is empty. So it reads a single line from the pattern file, builds the match and apply functions from those patterns, and caches them.
+<li>Let&#8217;s say, for the sake of argument, that the very first rule matched. If so, no further match and apply functions are built, and no further lines are read from the pattern file.
+<li>Furthermore, for the sake of argument, suppose that the caller calls the <code>plural()</code> function <em>again</em> to pluralize a different word. The <code>for</code> loop in the <code>plural()</code> function will call <code>iter(rules)</code>, which will reset the cache index but will not reset the open file object.
+<li>The first time through, the <code>for</code> loop will ask for a value from <var>rules</var>, which will invoke its <code>__next__()</code> method. This time, however, the cache is primed with a single pair of match and apply functions, corresponding to the patterns in the first line of the pattern file. Since they were built and cached in the course of pluralizing the previous word, they&#8217;re retrieved from the cache. The cache index increments, and the open file is never touched.
+<li>Let&#8217;s say, for the sake of argument, that the first rule does <em>not</em> match this time around. So the <code>for</code> loop comes around again and asks for another value from <var>rules</var>. This invokes the <code>__next__()</code> method a second time. This time, the cache is exhausted&nbsp;&mdash;&nbsp;it only contained one item, and we&#8217;re asking for a second&nbsp;&mdash;&nbsp;so the <code>__next__()</code> method continues. It reads another line from the open file, builds match and apply functions out of the patterns, and caches them.
+<li>This read-build-and-cache process will continue as long as the rules being read from the pattern file don&#8217;t match the word we&#8217;re trying to pluralize. If we do find a matching rule before the end of the file, we simply use it and stop, with the file still open. The file pointer will stay wherever we stopped reading, waiting for the next <code>readline()</code> command. In the meantime, the cache now has more items in it, and if we start all over again trying to pluralize a new word, each of those items in the cache will be tried before reading the next line from the pattern file.
+</ul>
+
+<p>We have achieved pluralization nirvana.
+
+<ol>
+<li><strong>Minimal startup cost.</strong> The only thing that happens on <code>import</code> is instantiating a single class and opening a file (but not reading from it).
+<li><strong>Maximum performance.</strong> The previous example would read through the file and build functions dynamically every time you wanted to pluralize a word. This version will cache functions as soon as they&#8217;re built, and in the worst case, it will only read through the pattern file once, no matter how many words you pluralize.
+<li><strong>Separation of code and data.</strong> All the patterns are stored in a separate file. Code is code, and data is data, and never the twain shall meet.
+</ol>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>Is this really nirvana? Well, yes and no. Here&#8217;s something to consider with the <code>LazyRules</code> example: the pattern file is opened (during <code>__init__()</code>), and it remains open until the final rule is reached. Python will eventually close the file when it exits, or after the last instantiation of the <code>LazyRules</code> class is destroyed, but still, that could be a <em>long</em> time. If this class is part of a long-running Python process, the Python interpreter may never exit, and the <code>LazyRules</code> object may never get destroyed.
+<p>There are ways around this. Instead of opening the file during <code>__init__()</code> and leaving it open while you read rules one line at a time, you could open the file, read all the rules, and immediately close the file. Or you could open the file, read one rule, save the file position with the <a href=files.html#read><code>tell()</code> method</a>, close the file, and later re-open it and use the <a href=files.html#read><code>seek()</code> method</a> to continue reading where you left off. Or you could not worry about it and just leave the file open, like this example code does. Programming is design, and design is all about trade-offs and constraints. Leaving a file open too long might be a problem; making your code more complicated might be a problem. Which one is the bigger problem depends on your development team, your application, and your runtime environment.
+</blockquote>
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+<ul>
+<li><a href=http://docs.python.org/3.1/library/stdtypes.html#iterator-types>Iterator types</a>
+<li><a href=http://www.python.org/dev/peps/pep-0234/>PEP 234: Iterators</a>
+<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
+<li><a href=http://www.dabeaz.com/generators/>Generator Tricks for Systems Programmers</a>
+</ul>
+
+<p class=v><a href=generators.html rel=prev title='back to &#8220;Closures &amp; Generators&#8221;'><span class=u>&#x261C;</span></a> <a href=advanced-iterators.html rel=next title='onward to &#8220;Advanced Iterators&#8221;'><span class=u>&#x261E;</span></a>
+
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/j/.htaccess b/j/.htaccess
index 35a1445..3c593e3 100644
--- a/j/.htaccess
+++ b/j/.htaccess
@@ -1,4 +1,4 @@
-FileETag MTime Size
-
-ExpiresActive On
-ExpiresDefault "access plus 1 year"
+FileETag MTime Size
+
+ExpiresActive On
+ExpiresDefault "access plus 1 year"
diff --git a/j/html5.js b/j/html5.js
index e973e7f..6457708 100644
--- a/j/html5.js
+++ b/j/html5.js
@@ -1 +1,3 @@
-(function(){var e="abbr,article,aside,audio,bb,canvas,datagrid,datalist,details,dialog,figure,footer,header,mark,menu,meter,nav,output,progress,section,time,video".split(','),i=e.length;while(i--){document.createElement(e[i])}})()
\ No newline at end of file
+/*@cc_on@if(@_jscript_version<9)(function(p,e){function q(a,b){if(g[a])g[a].styleSheet.cssText+=b;else{var c=r[l],d=e[j]("style");d.media=a;c.insertBefore(d,c[l]);g[a]=d;q(a,b)}}function s(a,b){for(var c=new RegExp("\\b("+m+")\\b(?!.*[;}])","gi"),d=function(k){return".iepp_"+k},h=-1;++h<a.length;){b=a[h].media||b;s(a[h].imports,b);q(b,a[h].cssText.replace(c,d))}}function t(){for(var a,b=e.getElementsByTagName("*"),c,d,h=new RegExp("^"+m+"$","i"),k=-1;++k<b.length;)if((a=b[k])&&(d=a.nodeName.match(h))){c=new RegExp("^\\s*<"+d+"(.*)\\/"+
+d+">\\s*$","i");i.innerHTML=a.outerHTML.replace(/\r|\n/g," ").replace(c,a.currentStyle.display=="block"?"<div$1/div>":"<span$1/span>");c=i.childNodes[0];c.className+=" iepp_"+d;c=f[f.length]=[a,c];a.parentNode.replaceChild(c[1],c[0])}s(e.styleSheets,"all")}function u(){for(var a=-1,b;++a<f.length;)f[a][1].parentNode.replaceChild(f[a][0],f[a][1]);for(b in g)r[l].removeChild(g[b]);g={};f=[]}for(var m="abbr article aside audio canvas command datalist details figure figcaption footer header hgroup mark meter nav output progress section summary time video".replace(/ /g,
+"|"),n=m.split("|"),r=e.documentElement,i=e.createDocumentFragment(),g={},f=[],o=-1,l="firstChild",j="createElement";++o<n.length;){e[j](n[o]);i[j](n[o])}i=i.appendChild(e[j]("div"));p.attachEvent("onbeforeprint",t);p.attachEvent("onafterprint",u)})(this,document)@end@*/
\ No newline at end of file
diff --git a/layout.css b/layout.css
index 1c88c70..58dc78e 100644
--- a/layout.css
+++ b/layout.css
@@ -1,74 +1,74 @@
-/*
-
-"Dive Into Python 3" layout stylesheet
-
-Copyright (c) 2009, Mark Pilgrim, All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-* Redistributions of source code must retain the above copyright notice,
-  this list of conditions and the following disclaimer.
-* Redistributions in binary form must reproduce the above copyright notice,
-  this list of conditions and the following disclaimer in the documentation
-  and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
-LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
-*/
-
-@page {
-  size: US-Letter;
-  margin: 1.75in;
-  padding: 0;
-  @bottom-center {
-    font: 12pt/1.75 serif;
-    content: counter(page);
-  }
-}
-body, .w a {
-  font: 10pt/1.3 serif;
-}
-pre, kbd, samp, code, var, .b {
-  font: 8pt/1.3 monospace;
-}
-span {
-  font-size: 10pt;
-}
-.baa {
-  font-size: 11pt;
-}
-.q span {
-  font-size: 13pt;
-}
-.f:first-letter {
-  color: #888;
-  font: normal 48pt/0.68 serif;
-}
-p, ul, ol {
-  margin: 0;
-  font-size: 11pt;
-}
-p + p {
-   text-indent: 1em;
-}
-
-h1 {
-  page-break-before: always;
-  prince-bookmark-level: 1;
-}
-h2 {
-  prince-bookmark-level: 2;
-}
-h3 {
-  prince-bookmark-level: 3;
-}
+/*
+
+"Dive Into Python 3" layout stylesheet
+
+Copyright (c) 2009, Mark Pilgrim, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice,
+  this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+*/
+
+@page {
+  size: US-Letter;
+  margin: 1.75in;
+  padding: 0;
+  @bottom-center {
+    font: 12pt/1.75 serif;
+    content: counter(page);
+  }
+}
+body, .w a {
+  font: 10pt/1.3 serif;
+}
+pre, kbd, samp, code, var, .b {
+  font: 8pt/1.3 monospace;
+}
+span {
+  font-size: 10pt;
+}
+.baa {
+  font-size: 11pt;
+}
+.q span {
+  font-size: 13pt;
+}
+.f:first-letter {
+  color: #888;
+  font: normal 48pt/0.68 serif;
+}
+p, ul, ol {
+  margin: 0;
+  font-size: 11pt;
+}
+p + p {
+   text-indent: 1em;
+}
+
+h1 {
+  page-break-before: always;
+  prince-bookmark-level: 1;
+}
+h2 {
+  prince-bookmark-level: 2;
+}
+h3 {
+  prince-bookmark-level: 3;
+}
diff --git a/mobile.css b/mobile.css
index e28412f..9ace39b 100644
--- a/mobile.css
+++ b/mobile.css
@@ -1,112 +1,112 @@
-/*
-
-"Dive Into Python 3" mobile stylesheet for iPhone, Android, and other
-small-screen devices
-
-Copyright (c) 2009, Mark Pilgrim, All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-* Redistributions of source code must retain the above copyright notice,
-  this list of conditions and the following disclaimer.
-* Redistributions in binary form must reproduce the above copyright notice,
-  this list of conditions and the following disclaimer in the documentation
-  and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
-LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
-
-
-Acknowledgements & Inspirations
-
-"Return of the Mobile Style Sheet" ....................... http://www.alistapart.com/articles/returnofthemobilestylesheet
-"Optimizing Web Content Using Conditional CSS" ........... http://developer.apple.com/safari/library/documentation/AppleApplications/Reference/SafariWebContent/OptimizingforSafarioniPhone/chapter_3_section_2.html
-*/
-
-/* typography */
-
-body, .c, span, pre span, .c, .note, p, ul, ol {
-  font:normal 12px/18px sans-serif;
-}
-pre, kbd, samp, code, var {
-  font:normal 12px/18px monospace;
-}
-.baa {
-  font:normal 14px/18px serif;
-}
-abbr {
-  font-variant:normal;
-  text-transform:none;
-  letter-spacing:0;
-}
-.c, .note, p, ul, ol, h2, h3 {
-  margin:1.75em 0;
-}
-
-/* basics */
-
-html {
-  color:#000;
-}
-body {
-  margin:4px 2px 0 2px;
-}
-
-/* links */
-
-a {
-  text-decoration:underline;
-  border-bottom:0;
-}
-pre a {
-  text-decoration:none;
-}
-
-/* headers and pullquotes */
-
-h1, h2, h3, pre {
-  padding:0;
-  border:0;
-  letter-spacing:0;
-}
-h1 {
-  margin:0;
-}
-h1, h1 code {
-  font:normal 18px/18px serif;
-}
-h2, h2 code {
-  font:normal 16px/18px serif;
-}
-h3, h3 code {
-  font:normal 14px/18px serif;
-}
-h1:before {
-  content:"";
-}
-
-/* overrides */
-
-.nm, .w, aside, form, form+p, .note span, .q span, .a {
-  display:none;
-}
-dd {
-  margin:0 0 0 1.75em;
-}
-.nav span {
-  font-size:200%;
-}
-.xxxl {
-  font-size: xx-large;
-  line-height: 0.875;
-}
+/*
+
+"Dive Into Python 3" mobile stylesheet for iPhone, Android, and other
+small-screen devices
+
+Copyright (c) 2009, Mark Pilgrim, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice,
+  this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+
+Acknowledgements & Inspirations
+
+"Return of the Mobile Style Sheet" ....................... http://www.alistapart.com/articles/returnofthemobilestylesheet
+"Optimizing Web Content Using Conditional CSS" ........... http://developer.apple.com/safari/library/documentation/AppleApplications/Reference/SafariWebContent/OptimizingforSafarioniPhone/chapter_3_section_2.html
+*/
+
+/* typography */
+
+body, .c, span, pre span, .c, .note, p, ul, ol {
+  font:normal 12px/18px sans-serif;
+}
+pre, kbd, samp, code, var {
+  font:normal 12px/18px monospace;
+}
+.baa {
+  font:normal 14px/18px serif;
+}
+abbr {
+  font-variant:normal;
+  text-transform:none;
+  letter-spacing:0;
+}
+.c, .note, p, ul, ol, h2, h3 {
+  margin:1.75em 0;
+}
+
+/* basics */
+
+html {
+  color:#000;
+}
+body {
+  margin:4px 2px 0 2px;
+}
+
+/* links */
+
+a {
+  text-decoration:underline;
+  border-bottom:0;
+}
+pre a {
+  text-decoration:none;
+}
+
+/* headers and pullquotes */
+
+h1, h2, h3, pre {
+  padding:0;
+  border:0;
+  letter-spacing:0;
+}
+h1 {
+  margin:0;
+}
+h1, h1 code {
+  font:normal 18px/18px serif;
+}
+h2, h2 code {
+  font:normal 16px/18px serif;
+}
+h3, h3 code {
+  font:normal 14px/18px serif;
+}
+h1:before {
+  content:"";
+}
+
+/* overrides */
+
+.nm, .w, aside, form, form+p, .note span, .q span, .a {
+  display:none;
+}
+dd {
+  margin:0 0 0 1.75em;
+}
+.nav span {
+  font-size:200%;
+}
+.xxxl {
+  font-size: xx-large;
+  line-height: 0.875;
+}
diff --git a/packaging.html b/packaging.html
index cb085ef..5022941 100644
--- a/packaging.html
+++ b/packaging.html
@@ -1,577 +1,577 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Packaging Python Libraries - Dive Into Python 3</title>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
-<link rel=stylesheet href=dip3.css>
-<style>
-body{counter-reset:h1 16}
-mark{display:inline}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=root value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#packaging>Dive Into Python 3</a> <span class=u>&#8227;</span>
-<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
-<h1>Packaging Python Libraries</h1>
-<blockquote class=q>
-<p><span class=u>&#x275D;</span> You&#8217;ll find the shame is like the pain; you only feel it once. <span class=u>&#x275E;</span><br>&mdash; Marquise de Merteuil, <a href=http://www.imdb.com/title/tt0094947/quotes><cite>Dangerous Liaisons</cite></a>
-</blockquote>
-<p id=toc>&nbsp;
-<h2 id=divingin>Diving In</h2>
-<p class=f>Real artists ship. Or so says Steve Jobs. Do you want to release a Python script, library, framework, or application? Excellent. The world needs more Python code. Python 3 comes with a packaging framework called Distutils. Distutils is many things: a build tool (for you), an installation tool (for your users), a package metadata format (for search engines), and more. It integrates with the <a href=http://pypi.python.org/>Python Package Index</a> (&#8220;PyPI&#8221;), a central repository for open source Python libraries.
-
-<p>All of these facets of Distutils center around the <i>setup script</i>, traditionally called <code>setup.py</code>. In fact, you&#8217;ve already seen several Distutils setup scripts in this book. You used Distutils to install <code>httplib2</code> in <a href=http-web-services.html#introducing-httplib2>HTTP Web Services</a> and again to install <code>chardet</code> in <a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>.
-
-<p>In this chapter, you&#8217;ll learn how the setup scripts for <code>chardet</code> and <code>httplib2</code> work, and you&#8217;ll step through the process of releasing your own Python software.
-
-<pre class=pp><code># chardet's setup.py
-from distutils.core import setup
-setup(
-    name = "chardet",
-    packages = ["chardet"],
-    version = "1.0.2",
-    description = "Universal encoding detector",
-    author = "Mark Pilgrim",
-    author_email = "mark@diveintomark.org",
-    url = "http://chardet.feedparser.org/",
-    download_url = "http://chardet.feedparser.org/download/python3-chardet-1.0.1.tgz",
-    keywords = ["encoding", "i18n", "xml"],
-    classifiers = [
-        "Programming Language :: Python",
-        "Programming Language :: Python :: 3",
-        "Development Status :: 4 - Beta",
-        "Environment :: Other Environment",
-        "Intended Audience :: Developers",
-        "License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)",
-        "Operating System :: OS Independent",
-        "Topic :: Software Development :: Libraries :: Python Modules",
-        "Topic :: Text Processing :: Linguistic",
-        ],
-    long_description = """\
-Universal character encoding detector
--------------------------------------
-
-Detects
- - ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
- - Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
- - EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
- - EUC-KR, ISO-2022-KR (Korean)
- - KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
- - ISO-8859-2, windows-1250 (Hungarian)
- - ISO-8859-5, windows-1251 (Bulgarian)
- - windows-1252 (English)
- - ISO-8859-7, windows-1253 (Greek)
- - ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
- - TIS-620 (Thai)
-
-This version requires Python 3 or later; a Python 2 version is available separately.
-"""
-)</code></pre>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span><code>chardet</code> and <code>httplib2</code> are open source, but there&#8217;s no requirement that you release your own Python libraries under any particular license. The process described in this chapter will work for any Python software, regardless of license.
-</blockquote>
-
-<p class=a>&#x2042;
-
-<h2 id=cantdo>Things Distutils Can&#8217;t Do For You</h2>
-
-<p>Releasing your first Python package is a daunting process. (Releasing your second one is a little easier.) Distutils tries to automate as much of it as possible, but there are some things you simply must do yourself.
-
-<ul>
-<li><b>Choose a license</b>. This is a complicated topic, fraught with politics and peril. If you wish to release your software as open source, I humbly offer five pieces of advice:
-
-<ol>
-<li>Don&#8217;t write your own license.
-<li>Don&#8217;t write your own license.
-<li>Don&#8217;t write your own license.
-<li>It doesn&#8217;t need to be <abbr>GPL</abbr>, but <a href=http://www.dwheeler.com/essays/gpl-compatible.html>it needs to be <abbr>GPL</abbr>-compatible</a>.
-<li>Don&#8217;t write your own license.
-</ol>
-<li><b>Classify your software</b> using the PyPI classification system. I&#8217;ll explain what this means later in this chapter.
-<li><b>Write a &#8220;read me&#8221; file</b>. Don&#8217;t skimp on this. At a minimum, it should give your users an overview of what your software does and how to install it.
-</ul>
-
-<p class=a>&#x2042;
-
-<h2 id=structure>Directory Structure</h2>
-
-<p>To start packaging your Python software, you need to get your files and directories in order. The <code>httplib2</code> directory looks like this:
-
-<pre class=screen>
-<a>httplib2/                 <span class=u>&#x2460;</span></a>
-|
-<a>+--README.txt             <span class=u>&#x2461;</span></a>
-|
-<a>+--setup.py               <span class=u>&#x2462;</span></a>
-|
-<a>+--httplib2/              <span class=u>&#x2463;</span></a>
-   |
-   +--__init__.py
-   |
-   +--iri2uri.py</pre>
-<ol>
-<li>Make a root directory to hold everything. Give it the same name as your Python module.
-<li>To accomodate Windows users, your &#8220;read me&#8221; file should include a <code>.txt</code> extension, and it should use Windows-style carriage returns. Just because <em>you</em> use a fancy text editor that runs from the command line and includes its own macro language, that doesn&#8217;t mean you need to make life difficult for your users. (Your users use Notepad. Sad but true.) Even if you&#8217;re on Linux or Mac OS X, your fancy text editor undoubtedly has an option to save files with Windows-style carriage returns.
-<li>Your Distutils setup script should be named <code>setup.py</code> unless you have a good reason not to. You do not have a good reason not to.
-<li>If your Python software is a single <code>.py</code> file, you should put it in the root directory along with your &#8220;read me&#8221; file and your setup script. But <code>httplib2</code> is not a single <code>.py</code> file; it&#8217;s <a href=case-study-porting-chardet-to-python-3.html#multifile-modules>a multi-file module</a>. But that&#8217;s OK! Just put the <code>httplib2</code> directory in the root directory, so you have an <code>__init__.py</code> file within an <code>httplib2/</code> directory within the <code>httplib2/</code> root directory. That&#8217;s not a problem; in fact, it will simplify your packaging process.
-</ol>
-
-<p>The <code>chardet</code> directory looks slightly different. Like <code>httplib2</code>, it&#8217;s <a href=case-study-porting-chardet-to-python-3.html#multifile-modules>a multi-file module</a>, so there&#8217;s a <code>chardet/</code> directory within the <code>chardet/</code> root directory. In addition to the <code>README.txt</code> file, <code>chardet</code> has <abbr>HTML</abbr>-formatted documentation in the <code>docs/</code> directory. The <code>docs/</code> directory contains several <code>.html</code> and <code>.css</code> files and an <code>images/</code> subdirectory, which contains several <code>.png</code> and <code>.gif</code> files. (This will be important later.) Also, in keeping with the convention for <abbr>(L)GPL</abbr>-licensed software, it has a separate file called <code>COPYING.txt</code> which contains the complete text of the <abbr>LGPL</abbr>.
-
-<pre class=nd><code>
-chardet/
-|
-+--COPYING.txt
-|
-+--setup.py
-|
-+--README.txt
-|
-+--docs/
-|  |
-|  +--index.html
-|  |
-|  +--usage.html
-|  |
-|  +--images/ ...
-|
-+--chardet/
-   |
-   +--__init__.py
-   |
-   +--big5freq.py
-   |
-   +--...
-</code></pre>
-
-<p class=a>&#x2042;
-
-<h2 id=setuppy>Writing Your Setup Script</h2>
-
-<p>The Distutils setup script is a Python script. In theory, it can do anything Python can do. In practice, it should do as little as possible, in as standard a way as possible. Setup scripts should be boring. The more exotic your installation process is, the more exotic your bug reports will be.
-
-<p>The first line of every Distutils setup script is always the same:
-
-<pre class='nd pp'><code>from distutils.core import setup</code></pre>
-
-<p>This imports the <code>setup()</code> function, which is the main entry point into Distutils. 95% of all Distutils setup scripts consist of a single call to <code>setup()</code> and nothing else. (I totally just made up that statistic, but if your Distutils setup script is doing more than calling the Distutils <code>setup()</code> function, you should have a good reason. Do you have a good reason? I didn&#8217;t think so.)
-
-<p>The <code>setup()</code> function <a href=http://docs.python.org/3.1/distutils/apiref.html#distutils.core.setup>can take dozens of parameters</a>. For the sanity of everyone involved, you must use <a href=your-first-python-program.html#optional-arguments>named arguments</a> for every parameter. This is not merely a convention; it&#8217;s a hard requirement. Your setup script will crash if you try to call the <code>setup()</code> function with non-named arguments.
-
-<p>The following named arguments are required:
-
-<ul>
-<li><b>name</b>, the name of the package.
-<li><b>version</b>, the version number of the package.
-<li><b>author</b>, your full name.
-<li><b>author_email</b>, your email address.
-<li><b>url</b>, the home page of your project. This can be your <a href=http://pypi.python.org/>PyPI</a> package page if you don&#8217;t have a separate project website.
-</ul>
-
-<p>Although not required, I recommend that you also include the following in your setup script:
-
-<ul>
-<li><b>description</b>, a one-line summary of the project.
-<li><b>long_description</b>, a multi-line string in <a href=http://docutils.sourceforge.net/rst.html>reStructuredText format</a>. <a href=http://pypi.python.org/>PyPI</a> converts this to <abbr>HTML</abbr> and displays it on your package page.
-<li><b>classifiers</b>, a list of specially-formatted strings described in the next section.
-</ul>
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>Setup script metadata is defined in <a href=http://www.python.org/dev/peps/pep-0314/><abbr>PEP</abbr> 314</a>.
-</blockquote>
-
-<p>Now let&#8217;s look at the <code>chardet</code> setup script. It has all of these required and recommended parameters, plus one I haven&#8217;t mentioned yet: <code>packages</code>.
-
-<pre class='nd pp'><code>from distutils.core import setup
-setup(
-    name = 'chardet',
-    <mark>packages = ['chardet']</mark>,
-    version = '1.0.2',
-    description = 'Universal encoding detector',
-    author='Mark Pilgrim',
-    ...
-)</code></pre>
-
-<p>The <code>packages</code> parameter highlights an unfortunate vocabulary overlap in the distribution process. We&#8217;ve been talking about the &#8220;package&#8221; as the thing you&#8217;re building (and potentially listing in The Python &#8220;Package&#8221; Index). But that&#8217;s not what this <code>packages</code> parameter refers to. It refers to the fact that the <code>chardet</code> module is <a href=case-study-porting-chardet-to-python-3.html#multifile-modules>a multi-file module</a>, sometimes known as&hellip; a &#8220;package.&#8221; The <code>packages</code> parameter tells Distutils to include the <code>chardet/</code> directory, its <code>__init__.py</code> file, and all the other <code>.py</code> files that constitute the <code>chardet</code> module. That&#8217;s kind of important; all this happy talk about documentation and metadata is irrelevant if you forget to include the actual code!
-
-<p class=a>&#x2042;
-
-<h2 id=trove>Classifying Your Package</h2>
-
-<p>The Python Package Index (&#8220;PyPI&#8221;) contains thousands of Python libraries. Proper classification metadata will allow people to find yours more easily. PyPI lets you <a href='http://pypi.python.org/pypi?:action=browse'>browse packages by classifier</a>. You can even select multiple classifiers to narrow your search. Classifiers are not invisible metadata that you can just ignore!
-
-<p>To classify your software, pass a <code>classifiers</code> parameter to the Distutils <code>setup()</code> function. The <code>classifers</code> parameter is a list of strings. These strings are <em>not</em> freeform. All classifier strings should come from <a href='http://pypi.python.org/pypi?:action=list_classifiers'>this list on PyPI</a>.
-
-<p>Classifiers are optional. You can write a Distutils setup script without any classifiers at all. <strong>Don&#8217;t do that.</strong> You should <em>always</em> include at least these classifiers:
-
-<ul>
-<li><b>Programming Language</b>. In particular, you should include both <code>"Programming Language :: Python"</code> and <code>"Programming Language :: Python :: 3"</code>. If you do not include these, your package will not show up in <a href='http://pypi.python.org/pypi?:action=browse&amp;c=533&amp;show=all'>this list of Python 3-compatible libraries</a>, which linked from the sidebar of every single page of <code>pypi.python.org</code>.
-<li><b>License</b>. This is <em>the absolute first thing I look for</em> when I&#8217;m evaluating third-party libraries. Don&#8217;t make me hunt for this vital information. Don&#8217;t include more than one license classifier unless your software is explicitly available under multiple licenses. (And don&#8217;t release software under multiple licenses unless you&#8217;re forced to do so. And don&#8217;t force other people to do so. Licensing is enough of a headache; don&#8217;t make it worse.)
-<li><b>Operating System</b>. If your software only runs on Windows (or Mac OS X, or Linux), I want to know sooner rather than later. If your software runs anywhere without any platform-specific code, use the classifier <code>"Operating System :: OS Independent"</code>. Multiple <code>Operating System</code> classifiers are only necessary if your software requires specific support for each platform. (This is not common.)
-</ul>
-
-<p>I also recommend that you include the following classifiers:
-
-<ul>
-<li><b>Development Status</b>. Is your software beta quality? Alpha quality? Pre-alpha? Pick one. Be honest.
-<li><b>Intended Audience</b>. Who would download your software? The most common choices are <code>Developers</code>, <code>End Users/Desktop</code>, <code>Science/Research</code>, and <code>System Administrators</code>.
-<li><b>Framework</b>. If your software is a plugin for a larger Python framework like <a href=http://www.djangoproject.com/>Django</a> or <a href=http://www.zope.org/>Zope</a>, include the appropriate <code>Framework</code> classifier. If not, omit it.
-<li><b>Topic</b>. There are <a href='http://pypi.python.org/pypi?:action=list_classifiers'>a large number of topics to choose from</a>; choose all that apply.
-</ul>
-
-<h3 id=trove-examples>Examples of Good Package Classifiers</h3>
-
-<p>By way of example, here are the classifiers for <a href=http://pypi.python.org/pypi/Django/>Django</a>, a production-ready, cross-platform, <abbr>BSD</abbr>-licensed web application framework that runs on your web server. (Django is not yet compatible with Python 3, so the <code>Programming Language :: Python :: 3</code> classifier is not listed.)
-
-<pre><code>Programming Language :: Python
-License :: OSI Approved :: BSD License
-Operating System :: OS Independent
-Development Status :: 5 - Production/Stable
-Environment :: Web Environment
-Framework :: Django
-Intended Audience :: Developers
-Topic :: Internet :: WWW/HTTP
-Topic :: Internet :: WWW/HTTP :: Dynamic Content
-Topic :: Internet :: WWW/HTTP :: WSGI
-Topic :: Software Development :: Libraries :: Python Modules</code></pre>
-
-<p>Here are the classifiers for <a href=http://pypi.python.org/pypi/chardet><code>chardet</code></a>, the character encoding detection library covered in <a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>. <code>chardet</code> is beta quality, cross-platform, Python 3-compatible, <abbr>LGPL</abbr>-licensed, and intended for developers to integrate into their own products.
-
-<pre><code>Programming Language :: Python
-Programming Language :: Python :: 3
-License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
-Operating System :: OS Independent
-Development Status :: 4 - Beta
-Environment :: Other Environment
-Intended Audience :: Developers
-Topic :: Text Processing :: Linguistic
-Topic :: Software Development :: Libraries :: Python Modules</code></pre>
-
-<p>And here are the classifiers for <a href=http://pypi.python.org/pypi/httplib2><code>httplib2</code></a>, the <abbr>HTTP</abbr> module I mentioned at the beginning of this chapter. <code>httplib2</code> is beta quality, cross-platform, <abbr>MIT</abbr>-licensed, and intended for Python developers.
-
-<pre><code>Programming Language :: Python
-Programming Language :: Python :: 3
-License :: OSI Approved :: MIT License
-Operating System :: OS Independent
-Development Status :: 4 - Beta
-Environment :: Web Environment
-Intended Audience :: Developers
-Topic :: Internet :: WWW/HTTP
-Topic :: Software Development :: Libraries :: Python Modules</code></pre>
-
-<h2 id=manifest>Specifying Additional Files With A Manifest</h2>
-
-<p>By default, Distutils will include the following files in your release package:
-
-<ul>
-<li><code>README.txt</code>
-<li><code>setup.py</code>
-<li>The <code>.py</code> files needed by the multi-file modules listed in the <code>packages</code> parameter
-<li>The individual <code>.py</code> files listed in the <code>py_modules</code> parameter
-</ul>
-
-<p>That will cover <a href=#structure>all the files in the <code>httplib2</code> project</a>. But for the <code>chardet</code> project, we also want to include the <code>COPYING.txt</code> license file and the entire <code>docs/</code> directory that contains images and  <abbr>HTML</abbr> files. To tell Distutils to include these additional files and directories when it builds the <code>chardet</code> release package, you need a <i>manifest file</i>.
-
-<p>A manifest file is a text file called <code>MANIFEST.in</code>. Place it in the project&#8217;s root directory, next to <code>README.txt</code> and <code>setup.py</code>. Manifest files are <em>not</em> Python scripts; they are text files that contain a series of &#8220;commands&#8221; in a Distutils-defined format. Manifest commands allow you to include or exclude specific files and directories.
-
-<p>This is the entire manifest file for the <code>chardet</code> project:
-
-<pre class=nd><code><a>include COPYING.txt                                <span class=u>&#x2460;</span></a>
-<a>recursive-include docs *.html *.css *.png *.gif    <span class=u>&#x2461;</span></a></code></pre>
-<ol>
-<li>The first line is self-explanatory: include the <code>COPYING.txt</code> file from the project&#8217;s root directory.
-<li>The second line is a bit more complicated. The <code>recursive-include</code> command takes a directory name and one or more filenames. The filenames aren&#8217;t limited to specific files; they can include wildcards. This line means &#8220;See that <code>docs/</code> directory in the project&#8217;s root directory? Look in there (recursively) for <code>.html</code>, <code>.css</code>, <code>.png</code>, and <code>.gif</code> files. I want all of them in my release package.&#8221;
-</ol>
-
-<p>All manifest commands preserve the directory structure that you set up in your project directory. That <code>recursive-include</code> command is not going to put a bunch of <code>.html</code> and <code>.png</code> files in the root directory of the release package. It&#8217;s going to maintain the existing <code>docs/</code> directory structure, but only include those files inside that directory that match the given wildcards. (I didn&#8217;t mention it earlier, but the <code>chardet</code> documentation is actually written in <abbr>XML</abbr> and converted to <abbr>HTML</abbr> by a separate script. I don&#8217;t want to include the <abbr>XML</abbr> files in the release package, just the <abbr>HTML</abbr> and the images.)
-
-<blockquote class=note>
-<p><span class=u>&#x261E;</span>Manifest files have their own unique format. See <a href=http://docs.python.org/3.1/distutils/sourcedist.html#manifest>Specifying the files to distribute</a> and <a href=http://docs.python.org/3.1/distutils/commandref.html#sdist-cmd>the manifest template commands</a> for details.
-</blockquote>
-
-<p>To reiterate: you only need to create a manifest file if you want to include files that Distutils doesn&#8217;t include by default. If you do need a manifest file, it should only include the files and directories that Distutils wouldn&#8217;t otherwise find on its own.
-
-<h2 id=check>Checking Your Setup Script for Errors</h2>
-
-<p>There&#8217;s a lot to keep track of. Distutils comes with a built-in validation command that checks that all the required metadata is present in your setup script. For example, if you forget to include the <code>version</code> parameter, Distutils will remind you.
-
-<pre class=screen>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py check</kbd>
-<samp>running check
-warning: check: missing required meta-data: version</samp></pre>
-
-<p>Once you include a <code>version</code> parameter (and all the other required bits of metadata), the <code>check</code> command will look like this:
-
-<pre class=screen>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py check</kbd>
-<samp>running check</samp></pre>
-
-<p class=a>&#x2042;
-
-<h2 id=sdist>Creating a Source Distribution</h2>
-
-<p>Distutils supports building multiple types of release packages. At a minimum, you should build a &#8220;source distribution&#8221; that contains your source code, your Distutils setup script, your &#8220;read me&#8221; file, and whatever <a href=#manifest>additional files you want to include</a>. To build a source distribution, pass the <code>sdist</code> command to your Distutils setup script.
-
-<pre class=screen>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>c:\python31\python.exe setup.py sdist</mark></kbd>
-<samp>running sdist
-running check
-reading manifest template 'MANIFEST.in'
-writing manifest file 'MANIFEST'
-creating chardet-1.0.2
-creating chardet-1.0.2\chardet
-creating chardet-1.0.2\docs
-creating chardet-1.0.2\docs\images
-copying files to chardet-1.0.2...
-copying COPYING -> chardet-1.0.2
-copying README.txt -> chardet-1.0.2
-copying setup.py -> chardet-1.0.2
-copying chardet\__init__.py -> chardet-1.0.2\chardet
-copying chardet\big5freq.py -> chardet-1.0.2\chardet
-...
-copying chardet\universaldetector.py -> chardet-1.0.2\chardet
-copying chardet\utf8prober.py -> chardet-1.0.2\chardet
-copying docs\faq.html -> chardet-1.0.2\docs
-copying docs\history.html -> chardet-1.0.2\docs
-copying docs\how-it-works.html -> chardet-1.0.2\docs
-copying docs\index.html -> chardet-1.0.2\docs
-copying docs\license.html -> chardet-1.0.2\docs
-copying docs\supported-encodings.html -> chardet-1.0.2\docs
-copying docs\usage.html -> chardet-1.0.2\docs
-copying docs\images\caution.png -> chardet-1.0.2\docs\images
-copying docs\images\important.png -> chardet-1.0.2\docs\images
-copying docs\images\note.png -> chardet-1.0.2\docs\images
-copying docs\images\permalink.gif -> chardet-1.0.2\docs\images
-copying docs\images\tip.png -> chardet-1.0.2\docs\images
-copying docs\images\warning.png -> chardet-1.0.2\docs\images
-creating dist
-creating 'dist\chardet-1.0.2.zip' and adding 'chardet-1.0.2' to it
-adding 'chardet-1.0.2\COPYING'
-adding 'chardet-1.0.2\PKG-INFO'
-adding 'chardet-1.0.2\README.txt'
-adding 'chardet-1.0.2\setup.py'
-adding 'chardet-1.0.2\chardet\big5freq.py'
-adding 'chardet-1.0.2\chardet\big5prober.py'
-...
-adding 'chardet-1.0.2\chardet\universaldetector.py'
-adding 'chardet-1.0.2\chardet\utf8prober.py'
-adding 'chardet-1.0.2\chardet\__init__.py'
-adding 'chardet-1.0.2\docs\faq.html'
-adding 'chardet-1.0.2\docs\history.html'
-adding 'chardet-1.0.2\docs\how-it-works.html'
-adding 'chardet-1.0.2\docs\index.html'
-adding 'chardet-1.0.2\docs\license.html'
-adding 'chardet-1.0.2\docs\supported-encodings.html'
-adding 'chardet-1.0.2\docs\usage.html'
-adding 'chardet-1.0.2\docs\images\caution.png'
-adding 'chardet-1.0.2\docs\images\important.png'
-adding 'chardet-1.0.2\docs\images\note.png'
-adding 'chardet-1.0.2\docs\images\permalink.gif'
-adding 'chardet-1.0.2\docs\images\tip.png'
-adding 'chardet-1.0.2\docs\images\warning.png'
-removing 'chardet-1.0.2' (and everything under it)</samp></pre>
-
-<p>Several things to note here:
-
-<ul>
-<li>Distutils noticed the manifest file (<code>MANIFEST.in</code>).
-<li>Distutils successfully parsed the manifest file and added the additional files we wanted&nbsp;&mdash;&nbsp;<code>COPYING.txt</code> and the <abbr>HTML</abbr> and image files in the <code>docs/</code> directory.
-<li>If you look in your project directory, you&#8217;ll see that Distutils created a <code>dist/</code> directory. Within the <code>dist/</code> directory the <code>.zip</code> file that you can distribute.
-</ul>
-
-<pre class=screen>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>dir dist</mark></kbd>
-<samp> Volume in drive C has no label.
- Volume Serial Number is DED5-B4F8
-
- Directory of c:\Users\pilgrim\chardet\dist
-
-07/30/2009  06:29 PM    &lt;DIR>          .
-07/30/2009  06:29 PM    &lt;DIR>          ..
-07/30/2009  06:29 PM           206,440 <mark>chardet-1.0.2.zip</mark>
-               1 File(s)        206,440 bytes
-               2 Dir(s)  61,424,635,904 bytes free</samp></pre>
-
-<p class=a>&#x2042;
-
-<h2 id=bdist>Creating a Graphical Installer</h2>
-
-<p>In my opinion, every Python library deserves a graphical installer for Windows users. It&#8217;s easy to make (even if you don&#8217;t run Windows yourself), and Windows users appreciate it.
-
-<p>Distutils can <a href=http://docs.python.org/3.1/distutils/builtdist.html#creating-windows-installers>create a graphical Windows installer for you</a>, by passing the <code>bdist_wininst</code> command to your Distutils setup script.
-
-<pre class=screen>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>c:\python31\python.exe setup.py bdist_wininst</mark></kbd>
-<samp>running bdist_wininst
-running build
-running build_py
-creating build
-creating build\lib
-creating build\lib\chardet
-copying chardet\big5freq.py -> build\lib\chardet
-copying chardet\big5prober.py -> build\lib\chardet
-...
-copying chardet\universaldetector.py -> build\lib\chardet
-copying chardet\utf8prober.py -> build\lib\chardet
-copying chardet\__init__.py -> build\lib\chardet
-installing to build\bdist.win32\wininst
-running install_lib
-creating build\bdist.win32
-creating build\bdist.win32\wininst
-creating build\bdist.win32\wininst\PURELIB
-creating build\bdist.win32\wininst\PURELIB\chardet
-copying build\lib\chardet\big5freq.py -> build\bdist.win32\wininst\PURELIB\chardet
-copying build\lib\chardet\big5prober.py -> build\bdist.win32\wininst\PURELIB\chardet
-...
-copying build\lib\chardet\universaldetector.py -> build\bdist.win32\wininst\PURELIB\chardet
-copying build\lib\chardet\utf8prober.py -> build\bdist.win32\wininst\PURELIB\chardet
-copying build\lib\chardet\__init__.py -> build\bdist.win32\wininst\PURELIB\chardet
-running install_egg_info
-Writing build\bdist.win32\wininst\PURELIB\chardet-1.0.2-py3.1.egg-info
-creating 'c:\users\pilgrim\appdata\local\temp\tmp2f4h7e.zip' and adding '.' to it
-adding 'PURELIB\chardet-1.0.2-py3.1.egg-info'
-adding 'PURELIB\chardet\big5freq.py'
-adding 'PURELIB\chardet\big5prober.py'
-...
-adding 'PURELIB\chardet\universaldetector.py'
-adding 'PURELIB\chardet\utf8prober.py'
-adding 'PURELIB\chardet\__init__.py'
-removing 'build\bdist.win32\wininst' (and everything under it)</samp>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>dir dist</mark></kbd>
-<samp>c:\Users\pilgrim\chardet>dir dist
- Volume in drive C has no label.
- Volume Serial Number is AADE-E29F
-
- Directory of c:\Users\pilgrim\chardet\dist
-
-07/30/2009  10:14 PM    &lt;DIR>          .
-07/30/2009  10:14 PM    &lt;DIR>          ..
-07/30/2009  10:14 PM           371,236 <mark>chardet-1.0.2.win32.exe</mark>
-07/30/2009  06:29 PM           206,440 chardet-1.0.2.zip
-               2 File(s)        577,676 bytes
-               2 Dir(s)  61,424,070,656 bytes free</samp></pre>
-
-<h3 id=linux>Building Installable Packages for Other Operating Systems</h3>
-
-<p>Distutils can help you <a href=http://docs.python.org/3.1/distutils/builtdist.html#creating-rpm-packages>build installable packages for Linux users</a>. In my opinion, this probably isn&#8217;t worth your time. If you want your software distributed for Linux, your time would be better spent working with community members who specialize in packaging software for major Linux distributions.
-
-<p>For example, my <code>chardet</code> library is <a href=http://packages.debian.org/python-chardet>in the Debian GNU/Linux repositories</a> (and therefore <a href=http://packages.ubuntu.com/python-chardet>in the Ubuntu repositories</a> as well). I had nothing to do with this; the packages just showed up there one day. The Debian community has <a href=http://www.debian.org/doc/packaging-manuals/python-policy/>their own policies for packaging Python libraries</a>, and the Debian <code>python-chardet</code> package is designed to follow these conventions. And since the package lives in Debian&#8217;s repositories, Debian users will receive security updates and/or new versions, depending on the system-wide settings they&#8217;ve chosen to manage their own computers.
-
-<p>The Linux packages that Distutils builds offer none of these advantages. Your time is better spent elsewhere.
-
-<p class=a>&#x2042;
-
-<h2 id=pypi>Adding Your Software to The Python Package Index</h2>
-
-<p>Uploading software to the Python Package Index is a three step process.
-
-<ol>
-<li>Register yourself
-<li>Register your software
-<li>Upload the packages you created with <code>setup.py sdist</code> and <code>setup.py bdist_*</code>
-</ol>
-
-<p>To register yourself, go to <a href="http://pypi.python.org/pypi?:action=register_form">the PyPI user registration page</a>. Enter your desired username and password, provide a valid email address, and click the <code>Register</code> button. (If you have a <abbr>PGP</abbr> or <abbr>GPG</abbr> key, you can also provide that. If you don&#8217;t have one or don&#8217;t know what that means, don&#8217;t worry about it.) Check your email; within a few minutes, you should receive a message from PyPI with a validation link. Click the link to complete the registration process.
-
-<p>Now you need to register your software with PyPI and upload it. You can do this all in one step.
-
-<pre class=screen>
-<a><samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py register sdist bdist_wininst upload</kbd>  <span class=u>&#x2460;</span></a>
-<samp>running register
-We need to know who you are, so please choose either:
- 1. use your existing login,
- 2. register as a new user,
- 3. have the server generate a new password for you (and email it to you), or
- 4. quit</samp>
-<a><samp class=p>Your selection [default 1]:  </samp><kbd>1</kbd>                                                                 <span class=u>&#x2461;</span></a>
-<a><samp class=p>Username: </samp><kbd>MarkPilgrim</kbd>                                                                          <span class=u>&#x2462;</span></a>
-<samp class=p>Password:</samp>
-<a><samp>Registering chardet to http://pypi.python.org/pypi</samp>                                             <span class=u>&#x2463;</span></a>
-<samp>Server response (200): OK</samp>
-<a><samp>running sdist</samp>                                                                                  <span class=u>&#x2464;</span></a>
-<samp>... output trimmed for brevity ...</samp>
-<a><samp>running bdist_wininst</samp>                                                                          <span class=u>&#x2465;</span></a>
-<samp>... output trimmed for brevity ...</samp>
-<a><samp>running upload</samp>                                                                                 <span class=u>&#x2466;</span></a>
-<samp>Submitting dist\chardet-1.0.2.zip to http://pypi.python.org/pypi
-Server response (200): OK
-Submitting dist\chardet-1.0.2.win32.exe to http://pypi.python.org/pypi
-Server response (200): OK
-I can store your PyPI login so future submissions will be faster.
-(the login will be stored in c:\home\.pypirc)</samp>
-<a><samp class=p>Save your login (y/N)?</samp><kbd class=pp>n</kbd>                                                                        <span class=u>&#x2467;</span></a></pre>
-<ol>
-<li>When you release your project for the first time, Distutils will add your software to the Python Package Index and give it its own <abbr>URL</abbr>. Every time after that, it will simply update the project metadata with any changes you may have made in your <code>setup.py</code> parameters. Next, it builds a source distribution (<code>sdist</code>) and a Windows installer (<code>bdist_wininst</code>), then uploads them to PyPI (<code>upload</code>).
-<li>Type <kbd>1</kbd> or just press <kbd>ENTER</kbd> to select &#8220;use your existing login.&#8221;
-<li>Enter the username and password you selected on the <a href="http://pypi.python.org/pypi?:action=register_form">the PyPI user registration page</a>. Distuils will not echo your password; it will not even echo asterisks in place of characters. Just type your password and press <kbd>ENTER</kbd>.
-<li>Distutils registers your package with the Python Package Index&hellip;
-<li>&hellip;builds your source distribution&hellip;
-<li>&hellip;builds your Windows installer&hellip;
-<li>&hellip;and uploads them both to the Python Package Index.
-<li>If you want to automate the process of releasing new versions, you need to save your PyPI credentials in a local file. This is completely insecure and completely optional.
-</ol>
-
-<p>Congratulations, you now have your own page on the Python Package Index! The address is <code>http://pypi.python.org/pypi/<i>NAME</i></code>, where <i>NAME</i> is the string you passed in the <var>name</var> parameter in your <code>setup.py</code> file.
-
-<p>If you want to release a new version, just update your <code>setup.py</code> with the new version number, then run the same upload command again:
-
-<pre class='nd screen'>
-<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py register sdist bdist_wininst upload</kbd>
-</pre>
-
-<p class=a>&#x2042;
-
-<h2 id=future>The Many Possible Futures of Python Packaging</h2>
-
-<p>Distutils is not the be-all and end-all of Python packaging, but as of this writing (August 2009), it&#8217;s the only packaging framework that works in Python 3. There are a number of other frameworks for Python 2; some focus on installation, others on testing and deployment. Some or all of these may end up being ported to Python 3 in the future.
-
-<p>These frameworks focus on installation:
-
-<ul>
-<li><a href=http://pypi.python.org/pypi/setuptools>Setuptools</a>
-<li><a href=http://pypi.python.org/pypi/pip>Pip</a>
-<li><a href=http://bitbucket.org/tarek/distribute/>Distribute</a>
-</ul>
-
-<p>These focus on testing and deployment:
-
-<ul>
-<li><a href=http://pypi.python.org/pypi/virtualenv><code>virtualenv</code></a>
-<li><a href=http://pypi.python.org/pypi/zc.buildout><code>zc.buildout</code></a>
-<li><a href=http://www.blueskyonmars.com/projects/paver/>Paver</a>
-<li><a href=http://fabfile.org/>Fabric</a>
-<li><a href=http://www.py2exe.org/><code>py2exe</code></a>
-</ul>
-
-<p class=a>&#x2042;
-
-<h2 id=furtherreading>Further Reading</h2>
-
-<p>On Distutils:
-
-<ul>
-<li><a href=http://docs.python.org/3.1/distutils/>Distributing Python Modules with Distutils</a>
-<li><a href=http://docs.python.org/3.1/distutils/apiref.html#module-distutils.core>Core Distutils functionality</a> lists all the possible arguments to the <code>setup()</code> function
-<li><a href=http://wiki.python.org/moin/Distutils/Cookbook>Distutils Cookbook</a>
-<li><a href=http://www.python.org/dev/peps/pep-0370/><abbr>PEP</abbr> 370: Per user <code>site-packages</code> directory</a>
-<li><a href=http://jessenoller.com/2009/07/19/pep-370-per-user-site-packages-and-environment-stew/><abbr>PEP</abbr> 370 and &#8220;environment stew&#8221;</a>
-</ul>
-
-<p>On other packaging frameworks:
-
-<ul>
-<li><a href=http://groups.google.com/group/django-developers/msg/5407cdb400157259>The Python packaging ecosystem</a>
-<li><a href=http://www.b-list.org/weblog/2008/dec/14/packaging/>On packaging</a>
-<li><a href=http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/>A few corrections to &#8220;On packaging&#8221;</a>
-<li><a href=http://www.b-list.org/weblog/2008/dec/15/pip/>Why I like Pip</a>
-<li><a href=http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations-cabal-for-a-solution/>Python packaging: a few observations</a>
-<li><a href=http://jacobian.org/writing/nobody-expects-python-packaging/>Nobody expects Python packaging!</a>
-</ul>
-
-<p class=v><a rel=prev href=case-study-porting-chardet-to-python-3.html title='back to &#8220;Case Study: Porting chardet to Python 3&#8221;'><span class=u>&#x261C;</span></a> <a rel=next href=porting-code-to-python-3-with-2to3.html title='onward to &#8220;Porting Code to Python 3 with 2to3&#8221;'><span class=u>&#x261E;</span></a>
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<script src=j/jquery.js></script>
-<script src=j/prettify.js></script>
-<script src=j/dip3.js></script>
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Packaging Python Libraries - Dive Into Python 3</title>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 16}
+mark{display:inline}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=root value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#packaging>Dive Into Python 3</a> <span class=u>&#8227;</span>
+<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
+<h1>Packaging Python Libraries</h1>
+<blockquote class=q>
+<p><span class=u>&#x275D;</span> You&#8217;ll find the shame is like the pain; you only feel it once. <span class=u>&#x275E;</span><br>&mdash; Marquise de Merteuil, <a href=http://www.imdb.com/title/tt0094947/quotes><cite>Dangerous Liaisons</cite></a>
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>Real artists ship. Or so says Steve Jobs. Do you want to release a Python script, library, framework, or application? Excellent. The world needs more Python code. Python 3 comes with a packaging framework called Distutils. Distutils is many things: a build tool (for you), an installation tool (for your users), a package metadata format (for search engines), and more. It integrates with the <a href=http://pypi.python.org/>Python Package Index</a> (&#8220;PyPI&#8221;), a central repository for open source Python libraries.
+
+<p>All of these facets of Distutils center around the <i>setup script</i>, traditionally called <code>setup.py</code>. In fact, you&#8217;ve already seen several Distutils setup scripts in this book. You used Distutils to install <code>httplib2</code> in <a href=http-web-services.html#introducing-httplib2>HTTP Web Services</a> and again to install <code>chardet</code> in <a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>.
+
+<p>In this chapter, you&#8217;ll learn how the setup scripts for <code>chardet</code> and <code>httplib2</code> work, and you&#8217;ll step through the process of releasing your own Python software.
+
+<pre class=pp><code># chardet's setup.py
+from distutils.core import setup
+setup(
+    name = "chardet",
+    packages = ["chardet"],
+    version = "1.0.2",
+    description = "Universal encoding detector",
+    author = "Mark Pilgrim",
+    author_email = "mark@diveintomark.org",
+    url = "http://chardet.feedparser.org/",
+    download_url = "http://chardet.feedparser.org/download/python3-chardet-1.0.1.tgz",
+    keywords = ["encoding", "i18n", "xml"],
+    classifiers = [
+        "Programming Language :: Python",
+        "Programming Language :: Python :: 3",
+        "Development Status :: 4 - Beta",
+        "Environment :: Other Environment",
+        "Intended Audience :: Developers",
+        "License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)",
+        "Operating System :: OS Independent",
+        "Topic :: Software Development :: Libraries :: Python Modules",
+        "Topic :: Text Processing :: Linguistic",
+        ],
+    long_description = """\
+Universal character encoding detector
+-------------------------------------
+
+Detects
+ - ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
+ - Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
+ - EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
+ - EUC-KR, ISO-2022-KR (Korean)
+ - KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
+ - ISO-8859-2, windows-1250 (Hungarian)
+ - ISO-8859-5, windows-1251 (Bulgarian)
+ - windows-1252 (English)
+ - ISO-8859-7, windows-1253 (Greek)
+ - ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
+ - TIS-620 (Thai)
+
+This version requires Python 3 or later; a Python 2 version is available separately.
+"""
+)</code></pre>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span><code>chardet</code> and <code>httplib2</code> are open source, but there&#8217;s no requirement that you release your own Python libraries under any particular license. The process described in this chapter will work for any Python software, regardless of license.
+</blockquote>
+
+<p class=a>&#x2042;
+
+<h2 id=cantdo>Things Distutils Can&#8217;t Do For You</h2>
+
+<p>Releasing your first Python package is a daunting process. (Releasing your second one is a little easier.) Distutils tries to automate as much of it as possible, but there are some things you simply must do yourself.
+
+<ul>
+<li><b>Choose a license</b>. This is a complicated topic, fraught with politics and peril. If you wish to release your software as open source, I humbly offer five pieces of advice:
+
+<ol>
+<li>Don&#8217;t write your own license.
+<li>Don&#8217;t write your own license.
+<li>Don&#8217;t write your own license.
+<li>It doesn&#8217;t need to be <abbr>GPL</abbr>, but <a href=http://www.dwheeler.com/essays/gpl-compatible.html>it needs to be <abbr>GPL</abbr>-compatible</a>.
+<li>Don&#8217;t write your own license.
+</ol>
+<li><b>Classify your software</b> using the PyPI classification system. I&#8217;ll explain what this means later in this chapter.
+<li><b>Write a &#8220;read me&#8221; file</b>. Don&#8217;t skimp on this. At a minimum, it should give your users an overview of what your software does and how to install it.
+</ul>
+
+<p class=a>&#x2042;
+
+<h2 id=structure>Directory Structure</h2>
+
+<p>To start packaging your Python software, you need to get your files and directories in order. The <code>httplib2</code> directory looks like this:
+
+<pre class=screen>
+<a>httplib2/                 <span class=u>&#x2460;</span></a>
+|
+<a>+--README.txt             <span class=u>&#x2461;</span></a>
+|
+<a>+--setup.py               <span class=u>&#x2462;</span></a>
+|
+<a>+--httplib2/              <span class=u>&#x2463;</span></a>
+   |
+   +--__init__.py
+   |
+   +--iri2uri.py</pre>
+<ol>
+<li>Make a root directory to hold everything. Give it the same name as your Python module.
+<li>To accomodate Windows users, your &#8220;read me&#8221; file should include a <code>.txt</code> extension, and it should use Windows-style carriage returns. Just because <em>you</em> use a fancy text editor that runs from the command line and includes its own macro language, that doesn&#8217;t mean you need to make life difficult for your users. (Your users use Notepad. Sad but true.) Even if you&#8217;re on Linux or Mac OS X, your fancy text editor undoubtedly has an option to save files with Windows-style carriage returns.
+<li>Your Distutils setup script should be named <code>setup.py</code> unless you have a good reason not to. You do not have a good reason not to.
+<li>If your Python software is a single <code>.py</code> file, you should put it in the root directory along with your &#8220;read me&#8221; file and your setup script. But <code>httplib2</code> is not a single <code>.py</code> file; it&#8217;s <a href=case-study-porting-chardet-to-python-3.html#multifile-modules>a multi-file module</a>. But that&#8217;s OK! Just put the <code>httplib2</code> directory in the root directory, so you have an <code>__init__.py</code> file within an <code>httplib2/</code> directory within the <code>httplib2/</code> root directory. That&#8217;s not a problem; in fact, it will simplify your packaging process.
+</ol>
+
+<p>The <code>chardet</code> directory looks slightly different. Like <code>httplib2</code>, it&#8217;s <a href=case-study-porting-chardet-to-python-3.html#multifile-modules>a multi-file module</a>, so there&#8217;s a <code>chardet/</code> directory within the <code>chardet/</code> root directory. In addition to the <code>README.txt</code> file, <code>chardet</code> has <abbr>HTML</abbr>-formatted documentation in the <code>docs/</code> directory. The <code>docs/</code> directory contains several <code>.html</code> and <code>.css</code> files and an <code>images/</code> subdirectory, which contains several <code>.png</code> and <code>.gif</code> files. (This will be important later.) Also, in keeping with the convention for <abbr>(L)GPL</abbr>-licensed software, it has a separate file called <code>COPYING.txt</code> which contains the complete text of the <abbr>LGPL</abbr>.
+
+<pre class=nd><code>
+chardet/
+|
++--COPYING.txt
+|
++--setup.py
+|
++--README.txt
+|
++--docs/
+|  |
+|  +--index.html
+|  |
+|  +--usage.html
+|  |
+|  +--images/ ...
+|
++--chardet/
+   |
+   +--__init__.py
+   |
+   +--big5freq.py
+   |
+   +--...
+</code></pre>
+
+<p class=a>&#x2042;
+
+<h2 id=setuppy>Writing Your Setup Script</h2>
+
+<p>The Distutils setup script is a Python script. In theory, it can do anything Python can do. In practice, it should do as little as possible, in as standard a way as possible. Setup scripts should be boring. The more exotic your installation process is, the more exotic your bug reports will be.
+
+<p>The first line of every Distutils setup script is always the same:
+
+<pre class='nd pp'><code>from distutils.core import setup</code></pre>
+
+<p>This imports the <code>setup()</code> function, which is the main entry point into Distutils. 95% of all Distutils setup scripts consist of a single call to <code>setup()</code> and nothing else. (I totally just made up that statistic, but if your Distutils setup script is doing more than calling the Distutils <code>setup()</code> function, you should have a good reason. Do you have a good reason? I didn&#8217;t think so.)
+
+<p>The <code>setup()</code> function <a href=http://docs.python.org/3.1/distutils/apiref.html#distutils.core.setup>can take dozens of parameters</a>. For the sanity of everyone involved, you must use <a href=your-first-python-program.html#optional-arguments>named arguments</a> for every parameter. This is not merely a convention; it&#8217;s a hard requirement. Your setup script will crash if you try to call the <code>setup()</code> function with non-named arguments.
+
+<p>The following named arguments are required:
+
+<ul>
+<li><b>name</b>, the name of the package.
+<li><b>version</b>, the version number of the package.
+<li><b>author</b>, your full name.
+<li><b>author_email</b>, your email address.
+<li><b>url</b>, the home page of your project. This can be your <a href=http://pypi.python.org/>PyPI</a> package page if you don&#8217;t have a separate project website.
+</ul>
+
+<p>Although not required, I recommend that you also include the following in your setup script:
+
+<ul>
+<li><b>description</b>, a one-line summary of the project.
+<li><b>long_description</b>, a multi-line string in <a href=http://docutils.sourceforge.net/rst.html>reStructuredText format</a>. <a href=http://pypi.python.org/>PyPI</a> converts this to <abbr>HTML</abbr> and displays it on your package page.
+<li><b>classifiers</b>, a list of specially-formatted strings described in the next section.
+</ul>
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>Setup script metadata is defined in <a href=http://www.python.org/dev/peps/pep-0314/><abbr>PEP</abbr> 314</a>.
+</blockquote>
+
+<p>Now let&#8217;s look at the <code>chardet</code> setup script. It has all of these required and recommended parameters, plus one I haven&#8217;t mentioned yet: <code>packages</code>.
+
+<pre class='nd pp'><code>from distutils.core import setup
+setup(
+    name = 'chardet',
+    <mark>packages = ['chardet']</mark>,
+    version = '1.0.2',
+    description = 'Universal encoding detector',
+    author='Mark Pilgrim',
+    ...
+)</code></pre>
+
+<p>The <code>packages</code> parameter highlights an unfortunate vocabulary overlap in the distribution process. We&#8217;ve been talking about the &#8220;package&#8221; as the thing you&#8217;re building (and potentially listing in The Python &#8220;Package&#8221; Index). But that&#8217;s not what this <code>packages</code> parameter refers to. It refers to the fact that the <code>chardet</code> module is <a href=case-study-porting-chardet-to-python-3.html#multifile-modules>a multi-file module</a>, sometimes known as&hellip; a &#8220;package.&#8221; The <code>packages</code> parameter tells Distutils to include the <code>chardet/</code> directory, its <code>__init__.py</code> file, and all the other <code>.py</code> files that constitute the <code>chardet</code> module. That&#8217;s kind of important; all this happy talk about documentation and metadata is irrelevant if you forget to include the actual code!
+
+<p class=a>&#x2042;
+
+<h2 id=trove>Classifying Your Package</h2>
+
+<p>The Python Package Index (&#8220;PyPI&#8221;) contains thousands of Python libraries. Proper classification metadata will allow people to find yours more easily. PyPI lets you <a href='http://pypi.python.org/pypi?:action=browse'>browse packages by classifier</a>. You can even select multiple classifiers to narrow your search. Classifiers are not invisible metadata that you can just ignore!
+
+<p>To classify your software, pass a <code>classifiers</code> parameter to the Distutils <code>setup()</code> function. The <code>classifers</code> parameter is a list of strings. These strings are <em>not</em> freeform. All classifier strings should come from <a href='http://pypi.python.org/pypi?:action=list_classifiers'>this list on PyPI</a>.
+
+<p>Classifiers are optional. You can write a Distutils setup script without any classifiers at all. <strong>Don&#8217;t do that.</strong> You should <em>always</em> include at least these classifiers:
+
+<ul>
+<li><b>Programming Language</b>. In particular, you should include both <code>"Programming Language :: Python"</code> and <code>"Programming Language :: Python :: 3"</code>. If you do not include these, your package will not show up in <a href='http://pypi.python.org/pypi?:action=browse&amp;c=533&amp;show=all'>this list of Python 3-compatible libraries</a>, which linked from the sidebar of every single page of <code>pypi.python.org</code>.
+<li><b>License</b>. This is <em>the absolute first thing I look for</em> when I&#8217;m evaluating third-party libraries. Don&#8217;t make me hunt for this vital information. Don&#8217;t include more than one license classifier unless your software is explicitly available under multiple licenses. (And don&#8217;t release software under multiple licenses unless you&#8217;re forced to do so. And don&#8217;t force other people to do so. Licensing is enough of a headache; don&#8217;t make it worse.)
+<li><b>Operating System</b>. If your software only runs on Windows (or Mac OS X, or Linux), I want to know sooner rather than later. If your software runs anywhere without any platform-specific code, use the classifier <code>"Operating System :: OS Independent"</code>. Multiple <code>Operating System</code> classifiers are only necessary if your software requires specific support for each platform. (This is not common.)
+</ul>
+
+<p>I also recommend that you include the following classifiers:
+
+<ul>
+<li><b>Development Status</b>. Is your software beta quality? Alpha quality? Pre-alpha? Pick one. Be honest.
+<li><b>Intended Audience</b>. Who would download your software? The most common choices are <code>Developers</code>, <code>End Users/Desktop</code>, <code>Science/Research</code>, and <code>System Administrators</code>.
+<li><b>Framework</b>. If your software is a plugin for a larger Python framework like <a href=http://www.djangoproject.com/>Django</a> or <a href=http://www.zope.org/>Zope</a>, include the appropriate <code>Framework</code> classifier. If not, omit it.
+<li><b>Topic</b>. There are <a href='http://pypi.python.org/pypi?:action=list_classifiers'>a large number of topics to choose from</a>; choose all that apply.
+</ul>
+
+<h3 id=trove-examples>Examples of Good Package Classifiers</h3>
+
+<p>By way of example, here are the classifiers for <a href=http://pypi.python.org/pypi/Django/>Django</a>, a production-ready, cross-platform, <abbr>BSD</abbr>-licensed web application framework that runs on your web server. (Django is not yet compatible with Python 3, so the <code>Programming Language :: Python :: 3</code> classifier is not listed.)
+
+<pre><code>Programming Language :: Python
+License :: OSI Approved :: BSD License
+Operating System :: OS Independent
+Development Status :: 5 - Production/Stable
+Environment :: Web Environment
+Framework :: Django
+Intended Audience :: Developers
+Topic :: Internet :: WWW/HTTP
+Topic :: Internet :: WWW/HTTP :: Dynamic Content
+Topic :: Internet :: WWW/HTTP :: WSGI
+Topic :: Software Development :: Libraries :: Python Modules</code></pre>
+
+<p>Here are the classifiers for <a href=http://pypi.python.org/pypi/chardet><code>chardet</code></a>, the character encoding detection library covered in <a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>. <code>chardet</code> is beta quality, cross-platform, Python 3-compatible, <abbr>LGPL</abbr>-licensed, and intended for developers to integrate into their own products.
+
+<pre><code>Programming Language :: Python
+Programming Language :: Python :: 3
+License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
+Operating System :: OS Independent
+Development Status :: 4 - Beta
+Environment :: Other Environment
+Intended Audience :: Developers
+Topic :: Text Processing :: Linguistic
+Topic :: Software Development :: Libraries :: Python Modules</code></pre>
+
+<p>And here are the classifiers for <a href=http://pypi.python.org/pypi/httplib2><code>httplib2</code></a>, the <abbr>HTTP</abbr> module I mentioned at the beginning of this chapter. <code>httplib2</code> is beta quality, cross-platform, <abbr>MIT</abbr>-licensed, and intended for Python developers.
+
+<pre><code>Programming Language :: Python
+Programming Language :: Python :: 3
+License :: OSI Approved :: MIT License
+Operating System :: OS Independent
+Development Status :: 4 - Beta
+Environment :: Web Environment
+Intended Audience :: Developers
+Topic :: Internet :: WWW/HTTP
+Topic :: Software Development :: Libraries :: Python Modules</code></pre>
+
+<h2 id=manifest>Specifying Additional Files With A Manifest</h2>
+
+<p>By default, Distutils will include the following files in your release package:
+
+<ul>
+<li><code>README.txt</code>
+<li><code>setup.py</code>
+<li>The <code>.py</code> files needed by the multi-file modules listed in the <code>packages</code> parameter
+<li>The individual <code>.py</code> files listed in the <code>py_modules</code> parameter
+</ul>
+
+<p>That will cover <a href=#structure>all the files in the <code>httplib2</code> project</a>. But for the <code>chardet</code> project, we also want to include the <code>COPYING.txt</code> license file and the entire <code>docs/</code> directory that contains images and  <abbr>HTML</abbr> files. To tell Distutils to include these additional files and directories when it builds the <code>chardet</code> release package, you need a <i>manifest file</i>.
+
+<p>A manifest file is a text file called <code>MANIFEST.in</code>. Place it in the project&#8217;s root directory, next to <code>README.txt</code> and <code>setup.py</code>. Manifest files are <em>not</em> Python scripts; they are text files that contain a series of &#8220;commands&#8221; in a Distutils-defined format. Manifest commands allow you to include or exclude specific files and directories.
+
+<p>This is the entire manifest file for the <code>chardet</code> project:
+
+<pre class=nd><code><a>include COPYING.txt                                <span class=u>&#x2460;</span></a>
+<a>recursive-include docs *.html *.css *.png *.gif    <span class=u>&#x2461;</span></a></code></pre>
+<ol>
+<li>The first line is self-explanatory: include the <code>COPYING.txt</code> file from the project&#8217;s root directory.
+<li>The second line is a bit more complicated. The <code>recursive-include</code> command takes a directory name and one or more filenames. The filenames aren&#8217;t limited to specific files; they can include wildcards. This line means &#8220;See that <code>docs/</code> directory in the project&#8217;s root directory? Look in there (recursively) for <code>.html</code>, <code>.css</code>, <code>.png</code>, and <code>.gif</code> files. I want all of them in my release package.&#8221;
+</ol>
+
+<p>All manifest commands preserve the directory structure that you set up in your project directory. That <code>recursive-include</code> command is not going to put a bunch of <code>.html</code> and <code>.png</code> files in the root directory of the release package. It&#8217;s going to maintain the existing <code>docs/</code> directory structure, but only include those files inside that directory that match the given wildcards. (I didn&#8217;t mention it earlier, but the <code>chardet</code> documentation is actually written in <abbr>XML</abbr> and converted to <abbr>HTML</abbr> by a separate script. I don&#8217;t want to include the <abbr>XML</abbr> files in the release package, just the <abbr>HTML</abbr> and the images.)
+
+<blockquote class=note>
+<p><span class=u>&#x261E;</span>Manifest files have their own unique format. See <a href=http://docs.python.org/3.1/distutils/sourcedist.html#manifest>Specifying the files to distribute</a> and <a href=http://docs.python.org/3.1/distutils/commandref.html#sdist-cmd>the manifest template commands</a> for details.
+</blockquote>
+
+<p>To reiterate: you only need to create a manifest file if you want to include files that Distutils doesn&#8217;t include by default. If you do need a manifest file, it should only include the files and directories that Distutils wouldn&#8217;t otherwise find on its own.
+
+<h2 id=check>Checking Your Setup Script for Errors</h2>
+
+<p>There&#8217;s a lot to keep track of. Distutils comes with a built-in validation command that checks that all the required metadata is present in your setup script. For example, if you forget to include the <code>version</code> parameter, Distutils will remind you.
+
+<pre class=screen>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py check</kbd>
+<samp>running check
+warning: check: missing required meta-data: version</samp></pre>
+
+<p>Once you include a <code>version</code> parameter (and all the other required bits of metadata), the <code>check</code> command will look like this:
+
+<pre class=screen>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py check</kbd>
+<samp>running check</samp></pre>
+
+<p class=a>&#x2042;
+
+<h2 id=sdist>Creating a Source Distribution</h2>
+
+<p>Distutils supports building multiple types of release packages. At a minimum, you should build a &#8220;source distribution&#8221; that contains your source code, your Distutils setup script, your &#8220;read me&#8221; file, and whatever <a href=#manifest>additional files you want to include</a>. To build a source distribution, pass the <code>sdist</code> command to your Distutils setup script.
+
+<pre class=screen>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>c:\python31\python.exe setup.py sdist</mark></kbd>
+<samp>running sdist
+running check
+reading manifest template 'MANIFEST.in'
+writing manifest file 'MANIFEST'
+creating chardet-1.0.2
+creating chardet-1.0.2\chardet
+creating chardet-1.0.2\docs
+creating chardet-1.0.2\docs\images
+copying files to chardet-1.0.2...
+copying COPYING -> chardet-1.0.2
+copying README.txt -> chardet-1.0.2
+copying setup.py -> chardet-1.0.2
+copying chardet\__init__.py -> chardet-1.0.2\chardet
+copying chardet\big5freq.py -> chardet-1.0.2\chardet
+...
+copying chardet\universaldetector.py -> chardet-1.0.2\chardet
+copying chardet\utf8prober.py -> chardet-1.0.2\chardet
+copying docs\faq.html -> chardet-1.0.2\docs
+copying docs\history.html -> chardet-1.0.2\docs
+copying docs\how-it-works.html -> chardet-1.0.2\docs
+copying docs\index.html -> chardet-1.0.2\docs
+copying docs\license.html -> chardet-1.0.2\docs
+copying docs\supported-encodings.html -> chardet-1.0.2\docs
+copying docs\usage.html -> chardet-1.0.2\docs
+copying docs\images\caution.png -> chardet-1.0.2\docs\images
+copying docs\images\important.png -> chardet-1.0.2\docs\images
+copying docs\images\note.png -> chardet-1.0.2\docs\images
+copying docs\images\permalink.gif -> chardet-1.0.2\docs\images
+copying docs\images\tip.png -> chardet-1.0.2\docs\images
+copying docs\images\warning.png -> chardet-1.0.2\docs\images
+creating dist
+creating 'dist\chardet-1.0.2.zip' and adding 'chardet-1.0.2' to it
+adding 'chardet-1.0.2\COPYING'
+adding 'chardet-1.0.2\PKG-INFO'
+adding 'chardet-1.0.2\README.txt'
+adding 'chardet-1.0.2\setup.py'
+adding 'chardet-1.0.2\chardet\big5freq.py'
+adding 'chardet-1.0.2\chardet\big5prober.py'
+...
+adding 'chardet-1.0.2\chardet\universaldetector.py'
+adding 'chardet-1.0.2\chardet\utf8prober.py'
+adding 'chardet-1.0.2\chardet\__init__.py'
+adding 'chardet-1.0.2\docs\faq.html'
+adding 'chardet-1.0.2\docs\history.html'
+adding 'chardet-1.0.2\docs\how-it-works.html'
+adding 'chardet-1.0.2\docs\index.html'
+adding 'chardet-1.0.2\docs\license.html'
+adding 'chardet-1.0.2\docs\supported-encodings.html'
+adding 'chardet-1.0.2\docs\usage.html'
+adding 'chardet-1.0.2\docs\images\caution.png'
+adding 'chardet-1.0.2\docs\images\important.png'
+adding 'chardet-1.0.2\docs\images\note.png'
+adding 'chardet-1.0.2\docs\images\permalink.gif'
+adding 'chardet-1.0.2\docs\images\tip.png'
+adding 'chardet-1.0.2\docs\images\warning.png'
+removing 'chardet-1.0.2' (and everything under it)</samp></pre>
+
+<p>Several things to note here:
+
+<ul>
+<li>Distutils noticed the manifest file (<code>MANIFEST.in</code>).
+<li>Distutils successfully parsed the manifest file and added the additional files we wanted&nbsp;&mdash;&nbsp;<code>COPYING.txt</code> and the <abbr>HTML</abbr> and image files in the <code>docs/</code> directory.
+<li>If you look in your project directory, you&#8217;ll see that Distutils created a <code>dist/</code> directory. Within the <code>dist/</code> directory the <code>.zip</code> file that you can distribute.
+</ul>
+
+<pre class=screen>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>dir dist</mark></kbd>
+<samp> Volume in drive C has no label.
+ Volume Serial Number is DED5-B4F8
+
+ Directory of c:\Users\pilgrim\chardet\dist
+
+07/30/2009  06:29 PM    &lt;DIR>          .
+07/30/2009  06:29 PM    &lt;DIR>          ..
+07/30/2009  06:29 PM           206,440 <mark>chardet-1.0.2.zip</mark>
+               1 File(s)        206,440 bytes
+               2 Dir(s)  61,424,635,904 bytes free</samp></pre>
+
+<p class=a>&#x2042;
+
+<h2 id=bdist>Creating a Graphical Installer</h2>
+
+<p>In my opinion, every Python library deserves a graphical installer for Windows users. It&#8217;s easy to make (even if you don&#8217;t run Windows yourself), and Windows users appreciate it.
+
+<p>Distutils can <a href=http://docs.python.org/3.1/distutils/builtdist.html#creating-windows-installers>create a graphical Windows installer for you</a>, by passing the <code>bdist_wininst</code> command to your Distutils setup script.
+
+<pre class=screen>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>c:\python31\python.exe setup.py bdist_wininst</mark></kbd>
+<samp>running bdist_wininst
+running build
+running build_py
+creating build
+creating build\lib
+creating build\lib\chardet
+copying chardet\big5freq.py -> build\lib\chardet
+copying chardet\big5prober.py -> build\lib\chardet
+...
+copying chardet\universaldetector.py -> build\lib\chardet
+copying chardet\utf8prober.py -> build\lib\chardet
+copying chardet\__init__.py -> build\lib\chardet
+installing to build\bdist.win32\wininst
+running install_lib
+creating build\bdist.win32
+creating build\bdist.win32\wininst
+creating build\bdist.win32\wininst\PURELIB
+creating build\bdist.win32\wininst\PURELIB\chardet
+copying build\lib\chardet\big5freq.py -> build\bdist.win32\wininst\PURELIB\chardet
+copying build\lib\chardet\big5prober.py -> build\bdist.win32\wininst\PURELIB\chardet
+...
+copying build\lib\chardet\universaldetector.py -> build\bdist.win32\wininst\PURELIB\chardet
+copying build\lib\chardet\utf8prober.py -> build\bdist.win32\wininst\PURELIB\chardet
+copying build\lib\chardet\__init__.py -> build\bdist.win32\wininst\PURELIB\chardet
+running install_egg_info
+Writing build\bdist.win32\wininst\PURELIB\chardet-1.0.2-py3.1.egg-info
+creating 'c:\users\pilgrim\appdata\local\temp\tmp2f4h7e.zip' and adding '.' to it
+adding 'PURELIB\chardet-1.0.2-py3.1.egg-info'
+adding 'PURELIB\chardet\big5freq.py'
+adding 'PURELIB\chardet\big5prober.py'
+...
+adding 'PURELIB\chardet\universaldetector.py'
+adding 'PURELIB\chardet\utf8prober.py'
+adding 'PURELIB\chardet\__init__.py'
+removing 'build\bdist.win32\wininst' (and everything under it)</samp>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd><mark>dir dist</mark></kbd>
+<samp>c:\Users\pilgrim\chardet>dir dist
+ Volume in drive C has no label.
+ Volume Serial Number is AADE-E29F
+
+ Directory of c:\Users\pilgrim\chardet\dist
+
+07/30/2009  10:14 PM    &lt;DIR>          .
+07/30/2009  10:14 PM    &lt;DIR>          ..
+07/30/2009  10:14 PM           371,236 <mark>chardet-1.0.2.win32.exe</mark>
+07/30/2009  06:29 PM           206,440 chardet-1.0.2.zip
+               2 File(s)        577,676 bytes
+               2 Dir(s)  61,424,070,656 bytes free</samp></pre>
+
+<h3 id=linux>Building Installable Packages for Other Operating Systems</h3>
+
+<p>Distutils can help you <a href=http://docs.python.org/3.1/distutils/builtdist.html#creating-rpm-packages>build installable packages for Linux users</a>. In my opinion, this probably isn&#8217;t worth your time. If you want your software distributed for Linux, your time would be better spent working with community members who specialize in packaging software for major Linux distributions.
+
+<p>For example, my <code>chardet</code> library is <a href=http://packages.debian.org/python-chardet>in the Debian GNU/Linux repositories</a> (and therefore <a href=http://packages.ubuntu.com/python-chardet>in the Ubuntu repositories</a> as well). I had nothing to do with this; the packages just showed up there one day. The Debian community has <a href=http://www.debian.org/doc/packaging-manuals/python-policy/>their own policies for packaging Python libraries</a>, and the Debian <code>python-chardet</code> package is designed to follow these conventions. And since the package lives in Debian&#8217;s repositories, Debian users will receive security updates and/or new versions, depending on the system-wide settings they&#8217;ve chosen to manage their own computers.
+
+<p>The Linux packages that Distutils builds offer none of these advantages. Your time is better spent elsewhere.
+
+<p class=a>&#x2042;
+
+<h2 id=pypi>Adding Your Software to The Python Package Index</h2>
+
+<p>Uploading software to the Python Package Index is a three step process.
+
+<ol>
+<li>Register yourself
+<li>Register your software
+<li>Upload the packages you created with <code>setup.py sdist</code> and <code>setup.py bdist_*</code>
+</ol>
+
+<p>To register yourself, go to <a href="http://pypi.python.org/pypi?:action=register_form">the PyPI user registration page</a>. Enter your desired username and password, provide a valid email address, and click the <code>Register</code> button. (If you have a <abbr>PGP</abbr> or <abbr>GPG</abbr> key, you can also provide that. If you don&#8217;t have one or don&#8217;t know what that means, don&#8217;t worry about it.) Check your email; within a few minutes, you should receive a message from PyPI with a validation link. Click the link to complete the registration process.
+
+<p>Now you need to register your software with PyPI and upload it. You can do this all in one step.
+
+<pre class=screen>
+<a><samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py register sdist bdist_wininst upload</kbd>  <span class=u>&#x2460;</span></a>
+<samp>running register
+We need to know who you are, so please choose either:
+ 1. use your existing login,
+ 2. register as a new user,
+ 3. have the server generate a new password for you (and email it to you), or
+ 4. quit</samp>
+<a><samp class=p>Your selection [default 1]:  </samp><kbd>1</kbd>                                                                 <span class=u>&#x2461;</span></a>
+<a><samp class=p>Username: </samp><kbd>MarkPilgrim</kbd>                                                                          <span class=u>&#x2462;</span></a>
+<samp class=p>Password:</samp>
+<a><samp>Registering chardet to http://pypi.python.org/pypi</samp>                                             <span class=u>&#x2463;</span></a>
+<samp>Server response (200): OK</samp>
+<a><samp>running sdist</samp>                                                                                  <span class=u>&#x2464;</span></a>
+<samp>... output trimmed for brevity ...</samp>
+<a><samp>running bdist_wininst</samp>                                                                          <span class=u>&#x2465;</span></a>
+<samp>... output trimmed for brevity ...</samp>
+<a><samp>running upload</samp>                                                                                 <span class=u>&#x2466;</span></a>
+<samp>Submitting dist\chardet-1.0.2.zip to http://pypi.python.org/pypi
+Server response (200): OK
+Submitting dist\chardet-1.0.2.win32.exe to http://pypi.python.org/pypi
+Server response (200): OK
+I can store your PyPI login so future submissions will be faster.
+(the login will be stored in c:\home\.pypirc)</samp>
+<a><samp class=p>Save your login (y/N)?</samp><kbd class=pp>n</kbd>                                                                        <span class=u>&#x2467;</span></a></pre>
+<ol>
+<li>When you release your project for the first time, Distutils will add your software to the Python Package Index and give it its own <abbr>URL</abbr>. Every time after that, it will simply update the project metadata with any changes you may have made in your <code>setup.py</code> parameters. Next, it builds a source distribution (<code>sdist</code>) and a Windows installer (<code>bdist_wininst</code>), then uploads them to PyPI (<code>upload</code>).
+<li>Type <kbd>1</kbd> or just press <kbd>ENTER</kbd> to select &#8220;use your existing login.&#8221;
+<li>Enter the username and password you selected on the <a href="http://pypi.python.org/pypi?:action=register_form">the PyPI user registration page</a>. Distuils will not echo your password; it will not even echo asterisks in place of characters. Just type your password and press <kbd>ENTER</kbd>.
+<li>Distutils registers your package with the Python Package Index&hellip;
+<li>&hellip;builds your source distribution&hellip;
+<li>&hellip;builds your Windows installer&hellip;
+<li>&hellip;and uploads them both to the Python Package Index.
+<li>If you want to automate the process of releasing new versions, you need to save your PyPI credentials in a local file. This is completely insecure and completely optional.
+</ol>
+
+<p>Congratulations, you now have your own page on the Python Package Index! The address is <code>http://pypi.python.org/pypi/<i>NAME</i></code>, where <i>NAME</i> is the string you passed in the <var>name</var> parameter in your <code>setup.py</code> file.
+
+<p>If you want to release a new version, just update your <code>setup.py</code> with the new version number, then run the same upload command again:
+
+<pre class='nd screen'>
+<samp class=p>c:\Users\pilgrim\chardet> </samp><kbd>c:\python31\python.exe setup.py register sdist bdist_wininst upload</kbd>
+</pre>
+
+<p class=a>&#x2042;
+
+<h2 id=future>The Many Possible Futures of Python Packaging</h2>
+
+<p>Distutils is not the be-all and end-all of Python packaging, but as of this writing (August 2009), it&#8217;s the only packaging framework that works in Python 3. There are a number of other frameworks for Python 2; some focus on installation, others on testing and deployment. Some or all of these may end up being ported to Python 3 in the future.
+
+<p>These frameworks focus on installation:
+
+<ul>
+<li><a href=http://pypi.python.org/pypi/setuptools>Setuptools</a>
+<li><a href=http://pypi.python.org/pypi/pip>Pip</a>
+<li><a href=http://bitbucket.org/tarek/distribute/>Distribute</a>
+</ul>
+
+<p>These focus on testing and deployment:
+
+<ul>
+<li><a href=http://pypi.python.org/pypi/virtualenv><code>virtualenv</code></a>
+<li><a href=http://pypi.python.org/pypi/zc.buildout><code>zc.buildout</code></a>
+<li><a href=http://www.blueskyonmars.com/projects/paver/>Paver</a>
+<li><a href=http://fabfile.org/>Fabric</a>
+<li><a href=http://www.py2exe.org/><code>py2exe</code></a>
+</ul>
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+
+<p>On Distutils:
+
+<ul>
+<li><a href=http://docs.python.org/3.1/distutils/>Distributing Python Modules with Distutils</a>
+<li><a href=http://docs.python.org/3.1/distutils/apiref.html#module-distutils.core>Core Distutils functionality</a> lists all the possible arguments to the <code>setup()</code> function
+<li><a href=http://wiki.python.org/moin/Distutils/Cookbook>Distutils Cookbook</a>
+<li><a href=http://www.python.org/dev/peps/pep-0370/><abbr>PEP</abbr> 370: Per user <code>site-packages</code> directory</a>
+<li><a href=http://jessenoller.com/2009/07/19/pep-370-per-user-site-packages-and-environment-stew/><abbr>PEP</abbr> 370 and &#8220;environment stew&#8221;</a>
+</ul>
+
+<p>On other packaging frameworks:
+
+<ul>
+<li><a href=http://groups.google.com/group/django-developers/msg/5407cdb400157259>The Python packaging ecosystem</a>
+<li><a href=http://www.b-list.org/weblog/2008/dec/14/packaging/>On packaging</a>
+<li><a href=http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/>A few corrections to &#8220;On packaging&#8221;</a>
+<li><a href=http://www.b-list.org/weblog/2008/dec/15/pip/>Why I like Pip</a>
+<li><a href=http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations-cabal-for-a-solution/>Python packaging: a few observations</a>
+<li><a href=http://jacobian.org/writing/nobody-expects-python-packaging/>Nobody expects Python packaging!</a>
+</ul>
+
+<p class=v><a rel=prev href=case-study-porting-chardet-to-python-3.html title='back to &#8220;Case Study: Porting chardet to Python 3&#8221;'><span class=u>&#x261C;</span></a> <a rel=next href=porting-code-to-python-3-with-2to3.html title='onward to &#8220;Porting Code to Python 3 with 2to3&#8221;'><span class=u>&#x261E;</span></a>
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<script src=j/jquery.js></script>
+<script src=j/prettify.js></script>
+<script src=j/dip3.js></script>
diff --git a/prince.css b/prince.css
index 5dbf409..5fa3299 100644
--- a/prince.css
+++ b/prince.css
@@ -1,59 +1,59 @@
-/*
-
-"Dive Into Python 3" Prince stylesheet
-
-Copyright (c) 2009, Mark Pilgrim, All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-* Redistributions of source code must retain the above copyright notice,
-  this list of conditions and the following disclaimer.
-* Redistributions in binary form must reproduce the above copyright notice,
-  this list of conditions and the following disclaimer in the documentation
-  and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
-LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
-*/
-
-/* some Prince-specific rules to generate a nicer PDF */
-/* see http://www.princexml.com/ */
-
-@page {
-  size: US-Letter;
-  margin: 30pt;
-  padding: 0;
-  @bottom-center {
-    font: 12pt/1.75 'Gill Sans', 'Gill Sans MT', Helvetica, Corbel, 'Nimbus Sans L', sans-serif;
-    content: counter(page);
-  }
-}
-pre {
-  page-break-inside: avoid;
-}
-h1 {
-  page-break-before: always;
-  prince-bookmark-level: 1;
-}
-h2 {
-  prince-bookmark-level: 2;
-}
-h3 {
-  prince-bookmark-level: 3;
-}
-ul, ol {
-  margin: 1.75em 20pt;
-}
-abbr {
-  text-decoration: none;
-}
+/*
+
+"Dive Into Python 3" Prince stylesheet
+
+Copyright (c) 2009, Mark Pilgrim, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice,
+  this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+*/
+
+/* some Prince-specific rules to generate a nicer PDF */
+/* see http://www.princexml.com/ */
+
+@page {
+  size: US-Letter;
+  margin: 30pt;
+  padding: 0;
+  @bottom-center {
+    font: 12pt/1.75 'Gill Sans', 'Gill Sans MT', Helvetica, Corbel, 'Nimbus Sans L', sans-serif;
+    content: counter(page);
+  }
+}
+pre {
+  page-break-inside: avoid;
+}
+h1 {
+  page-break-before: always;
+  prince-bookmark-level: 1;
+}
+h2 {
+  prince-bookmark-level: 2;
+}
+h3 {
+  prince-bookmark-level: 3;
+}
+ul, ol {
+  margin: 1.75em 20pt;
+}
+abbr {
+  text-decoration: none;
+}
diff --git a/publish b/publish
index f471782..8d39edc 100755
--- a/publish
+++ b/publish
@@ -2,7 +2,6 @@
 
 die () {
   echo "$1" >/dev/stderr
-  [ -n "$(which Snarl_CMD 2>/dev/null)" ] && Snarl_CMD snShowMessage 10 "Dive Into Python 3" "$1." "C:\Users\pilgrim\site-lisp\todochiku-icons\alert.png"
   exit 1
 }
 
@@ -119,9 +118,9 @@ java -jar util/yuicompressor-2.4.2.jar build/dip3.css > build/$revision.css && \
 echo "inlining CSS, minimizing URLs, adding evil tracking code"
 ga=`cat j/ga.js`
 for f in build/*.html; do
-  css=`python2.6 util/lesscss.py "$f" "build/$revision.css"` || die "Failed to remove unused CSS"
-  mobilecss=`python2.6 util/lesscss.py "$f" "build/m-$revision.css"` || die "Failed to remove unused CSS"
-  printcss=`python2.6 util/lesscss.py "$f" "build/p-$revision.css"` || die "Failed to remove unused CSS"
+  css=`python2.5 util/lesscss.py "$f" "build/$revision.css"` || die "Failed to remove unused CSS"
+  mobilecss=`python2.5 util/lesscss.py "$f" "build/m-$revision.css"` || die "Failed to remove unused CSS"
+  printcss=`python2.5 util/lesscss.py "$f" "build/p-$revision.css"` || die "Failed to remove unused CSS"
   sed -i -e "s|<link rel=stylesheet href=dip3.css>|<style>${css}</style>|g" -e "s|<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>|<style>@media screen and (max-device-width:480px){${mobilecss}}</style>|g" -e "s|<link rel=stylesheet media=print href=print.css>|<style>@media print{${printcss}}</style>|g" -e "s|</style><style>||g" -e "s|href=index.html|href=/|g" -e "s|</style>|</style>${ga}|g" "$f" || die "Failed to inline CSS"
 done
 
@@ -130,7 +129,7 @@ chmod 755 build/examples build/j build/i build/d && \
     chmod 644 build/*.html build/*.css build/*.txt build/*.zip build/examples/* build/examples/.htaccess build/j/* build/j/.htaccess build/i/* build/i/.htaccess build/d/.htaccess build/.htaccess || die "Failed to reset file permissions"
 
 # ship it!
-#die "Aborting without publishing"
+die "Aborting without publishing"
 echo -n "publishing"
 rsync -essh -a build/d/.htaccess build/*.zip diveintomark.org:~/web/diveintopython3.org/d/ && \
     echo -n "." && \
@@ -140,5 +139,3 @@ rsync -essh -a build/d/.htaccess build/*.zip diveintomark.org:~/web/diveintopyth
     echo -n "." && \
     rsync -essh -a build/examples build/*.txt build/*.html build/.htaccess diveintomark.org:~/web/diveintopython3.org/ && \
     echo "." || die "Failed to publish to remote server"
-
-[ -n "$(which Snarl_CMD 2>/dev/null)" ] && Snarl_CMD snShowMessage 10 "Dive Into Python 3" "Published." "C:\Users\pilgrim\site-lisp\todochiku-icons\clean.png"
diff --git a/table-of-contents.html b/table-of-contents.html
index 93772a7..5d9db97 100755
--- a/table-of-contents.html
+++ b/table-of-contents.html
@@ -1,446 +1,446 @@
-<!DOCTYPE html>
-<meta charset=utf-8>
-<title>Table of contents - Dive Into Python 3</title>
-<link rel=stylesheet href=dip3.css>
-<style>
-h1:before{content:''}
-ol,ul{font-weight:bold}
-li ol{font-weight:normal}
-#porting-code-to-python-3-with-2to3,#special-method-names,#where-to-go-from-here{list-style:none;margin:0 0 0 -2em}
-#porting-code-to-python-3-with-2to3 > ol,#special-method-names > ol,#where-to-go-from-here > ol{margin:0;padding:0 0 0 4.5em}
-#porting-code-to-python-3-with-2to3:before{content:'A. \00a0 \00a0'}
-#special-method-names:before{content:'B. \00a0 \00a0'}
-#where-to-go-from-here:before{content:'C. \00a0 \00a0'}
-</style>
-<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
-<link rel=stylesheet media=print href=print.css>
-<meta name=viewport content='initial-scale=1.0'>
-<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8><input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
-<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> Dive Into Python 3 <span class=u>&#8227;</span>
-<h1>Table of Contents</h1>
-<!-- toc -->
-<ol start=-1>
-<li id=whats-new><a href=whats-new.html>What&#8217;s New in &#8220;Dive Into Python 3&#8221;</a>
-<ol>
-<li><a href=whats-new.html#divingin><i>a.k.a.</i> &#8220;the minus level&#8221;</a>
-</ol>
-<li id=installing-python><a href=installing-python.html>Installing Python</a>
-<ol>
-<li><a href=installing-python.html#divingin>Diving In</a>
-<li><a href=installing-python.html#which>Which Python Is Right For You?</a>
-<li><a href=installing-python.html#windows>Installing on Microsoft Windows</a>
-<li><a href=installing-python.html#macosx>Installing on Mac OS X</a>
-<li><a href=installing-python.html#ubuntu>Installing on Ubuntu Linux</a>
-<li><a href=installing-python.html#other>Installing on Other Platforms</a>
-<li><a href=installing-python.html#idle>Using The Python Shell</a>
-<li><a href=installing-python.html#editors>Python Editors and IDEs</a>
-</ol>
-<li id=your-first-python-program><a href=your-first-python-program.html>Your First Python Program</a>
-<ol>
-<li><a href=your-first-python-program.html#divingin>Diving In</a>
-<li><a href=your-first-python-program.html#declaringfunctions>Declaring Functions</a>
-<ol>
-<li><a href=your-first-python-program.html#optional-arguments>Optional and Named Arguments</a>
-</ol>
-<li><a href=your-first-python-program.html#readability>Writing Readable Code</a>
-<ol>
-<li><a href=your-first-python-program.html#docstrings>Documentation Strings</a>
-</ol>
-<li><a href=your-first-python-program.html#importsearchpath>The <code>import</code> Search Path</a>
-<li><a href=your-first-python-program.html#everythingisanobject>Everything Is An Object</a>
-<ol>
-<li><a href=your-first-python-program.html#whatsanobject>What&#8217;s An Object?</a>
-</ol>
-<li><a href=your-first-python-program.html#indentingcode>Indenting Code</a>
-<li><a href=your-first-python-program.html#exceptions>Exceptions</a>
-<ol>
-<li><a href=your-first-python-program.html#importerror>Catching Import Errors</a>
-</ol>
-<li><a href=your-first-python-program.html#nameerror>Unbound Variables</a>
-<li><a href=your-first-python-program.html#case>Everything is Case-Sensitive</a>
-<li><a href=your-first-python-program.html#runningscripts>Running Scripts</a>
-<li><a href=your-first-python-program.html#furtherreading>Further Reading</a>
-</ol>
-<li id=native-datatypes><a href=native-datatypes.html>Native Datatypes</a>
-<ol>
-<li><a href=native-datatypes.html#divingin>Diving In</a>
-<li><a href=native-datatypes.html#booleans>Booleans</a>
-<li><a href=native-datatypes.html#numbers>Numbers</a>
-<ol>
-<li><a href=native-datatypes.html#number-coercion>Coercing Integers To Floats And Vice-Versa</a>
-<li><a href=native-datatypes.html#common-numerical-operations>Common Numerical Operations</a>
-<li><a href=native-datatypes.html#fractions>Fractions</a>
-<li><a href=native-datatypes.html#trig>Trigonometry</a>
-<li><a href=native-datatypes.html#numbers-in-a-boolean-context>Numbers In A Boolean Context</a>
-</ol>
-<li><a href=native-datatypes.html#lists>Lists</a>
-<ol>
-<li><a href=native-datatypes.html#creatinglists>Creating A List</a>
-<li><a href=native-datatypes.html#slicinglists>Slicing A List</a>
-<li><a href=native-datatypes.html#extendinglists>Adding Items To A List</a>
-<li><a href=native-datatypes.html#searchinglists>Searching For Values In A List</a>
-<li><a href=native-datatypes.html#removingfromlists>Removing Items From A List</a>
-<li><a href=native-datatypes.html#popgoestheweasel>Removing Items From A List: Bonus Round</a>
-<li><a href=native-datatypes.html#lists-in-a-boolean-context>Lists In A Boolean Context</a>
-</ol>
-<li><a href=native-datatypes.html#tuples>Tuples</a>
-<ol>
-<li><a href=native-datatypes.html#tuples-in-a-boolean-context>Tuples In A Boolean Context</a>
-<li><a href=native-datatypes.html#multivar>Assigning Multiple Values At Once</a>
-</ol>
-<li><a href=native-datatypes.html#sets>Sets</a>
-<ol>
-<li><a href=native-datatypes.html#creating-a-set>Creating A Set</a>
-<li><a href=native-datatypes.html#modifying-sets>Modifying A Set</a>
-<li><a href=native-datatypes.html#removing-from-sets>Removing Items From A Set</a>
-<li><a href=native-datatypes.html#common-set-operations>Common Set Operations</a>
-<li><a href=native-datatypes.html#sets-in-a-boolean-context>Sets In A Boolean Context</a>
-</ol>
-<li><a href=native-datatypes.html#dictionaries>Dictionaries</a>
-<ol>
-<li><a href=native-datatypes.html#creating-dictionaries>Creating A Dictionary</a>
-<li><a href=native-datatypes.html#modifying-dictionaries>Modifying A Dictionary</a>
-<li><a href=native-datatypes.html#mixed-value-dictionaries>Mixed-Value Dictionaries</a>
-<li><a href=native-datatypes.html#dictionaries-in-a-boolean-context>Dictionaries In A Boolean Context</a>
-</ol>
-<li><a href=native-datatypes.html#none><code>None</code></a>
-<ol>
-<li><a href=native-datatypes.html#none-in-a-boolean-context><code>None</code> In A Boolean Context</a>
-</ol>
-<li><a href=native-datatypes.html#furtherreading>Further Reading</a>
-</ol>
-<li id=comprehensions><a href=comprehensions.html>Comprehensions</a>
-<ol>
-<li><a href=comprehensions.html#divingin>Diving In</a>
-<li><a href=comprehensions.html#os>Working With Files And Directories</a>
-<ol>
-<li><a href=comprehensions.html#getcwd>The Current Working Directory</a>
-<li><a href=comprehensions.html#ospath>Working With Filenames and Directory Names</a>
-<li><a href=comprehensions.html#glob>Listing Directories</a>
-<li><a href=comprehensions.html#osstat>Getting File Metadata</a>
-<li><a href=comprehensions.html#abspath>Constructing Absolute Pathnames</a>
-</ol>
-<li><a href=comprehensions.html#listcomprehension>List Comprehensions</a>
-<li><a href=comprehensions.html#dictionarycomprehension>Dictionary Comprehensions</a>
-<ol>
-<li><a href=comprehensions.html#stupiddicttricks>Other Fun Stuff To Do With Dictionary Comprehensions</a>
-</ol>
-<li><a href=comprehensions.html#setcomprehension>Set Comprehensions</a>
-<li><a href=comprehensions.html#furtherreading>Further Reading</a>
-</ol>
-<li id=strings><a href=strings.html>Strings</a>
-<ol>
-<li><a href=strings.html#boring-stuff>Some Boring Stuff You Need To Understand Before You Can Dive In</a>
-<li><a href=strings.html#one-ring-to-rule-them-all>Unicode</a>
-<li><a href=strings.html#divingin>Diving In</a>
-<li><a href=strings.html#formatting-strings>Formatting Strings</a>
-<ol>
-<li><a href=strings.html#compound-field-names>Compound Field Names</a>
-<li><a href=strings.html#format-specifiers>Format Specifiers</a>
-</ol>
-<li><a href=strings.html#common-string-methods>Other Common String Methods</a>
-<ol>
-<li><a href=strings.html#slicingstrings>Slicing A String</a>
-</ol>
-<li><a href=strings.html#byte-arrays>Strings vs. Bytes</a>
-<li><a href=strings.html#py-encoding>Postscript: Character Encoding Of Python Source Code</a>
-<li><a href=strings.html#furtherreading>Further Reading</a>
-</ol>
-<li id=regular-expressions><a href=regular-expressions.html>Regular Expressions</a>
-<ol>
-<li><a href=regular-expressions.html#divingin>Diving In</a>
-<li><a href=regular-expressions.html#streetaddresses>Case Study: Street Addresses</a>
-<li><a href=regular-expressions.html#romannumerals>Case Study: Roman Numerals</a>
-<ol>
-<li><a href=regular-expressions.html#thousands>Checking For Thousands</a>
-<li><a href=regular-expressions.html#hundreds>Checking For Hundreds</a>
-</ol>
-<li><a href=regular-expressions.html#nmsyntax>Using The <code>{n,m}</code> Syntax</a>
-<ol>
-<li><a href=regular-expressions.html#tensandones>Checking For Tens And Ones</a>
-</ol>
-<li><a href=regular-expressions.html#verbosere>Verbose Regular Expressions</a>
-<li><a href=regular-expressions.html#phonenumbers>Case study: Parsing Phone Numbers</a>
-<li><a href=regular-expressions.html#summary>Summary</a>
-</ol>
-<li id=generators><a href=generators.html>Closures <i class=baa>&amp;</i> Generators</a>
-<ol>
-<li><a href=generators.html#divingin>Diving In</a>
-<li><a href=generators.html#i-know>I Know, Let&#8217;s Use Regular Expressions!</a>
-<li><a href=generators.html#a-list-of-functions>A List Of Functions</a>
-<li><a href=generators.html#a-list-of-patterns>A List Of Patterns</a>
-<li><a href=generators.html#a-file-of-patterns>A File Of Patterns</a>
-<li><a href=generators.html#generators>Generators</a>
-<ol>
-<li><a href=generators.html#a-fibonacci-generator>A Fibonacci Generator</a>
-<li><a href=generators.html#a-plural-rule-generator>A Plural Rule Generator</a>
-</ol>
-<li><a href=generators.html#furtherreading>Further Reading</a>
-</ol>
-<li id=iterators><a href=iterators.html>Classes <i class=baa>&amp;</i> Iterators</a>
-<ol>
-<li><a href=iterators.html#divingin>Diving In</a>
-<li><a href=iterators.html#defining-classes>Defining Classes</a>
-<ol>
-<li><a href=iterators.html#init-method>The <code>__init__()</code> Method</a>
-</ol>
-<li><a href=iterators.html#instantiating-classes>Instantiating Classes</a>
-<li><a href=iterators.html#instance-variables>Instance Variables</a>
-<li><a href=iterators.html#a-fibonacci-iterator>A Fibonacci Iterator</a>
-<li><a href=iterators.html#a-plural-rule-iterator>A Plural Rule Iterator</a>
-<li><a href=iterators.html#furtherreading>Further Reading</a>
-</ol>
-<li id=advanced-iterators><a href=advanced-iterators.html>Advanced Iterators</a>
-<ol>
-<li><a href=advanced-iterators.html#divingin>Diving In</a>
-<li><a href=advanced-iterators.html#re-findall>Finding all occurrences of a pattern</a>
-<li><a href=advanced-iterators.html#unique-items>Finding the unique items in a sequence</a>
-<li><a href=advanced-iterators.html#assert>Making assertions</a>
-<li><a href=advanced-iterators.html#generator-expressions>Generator expressions</a>
-<li><a href=advanced-iterators.html#permutations>Calculating Permutations&hellip; The Lazy Way!</a>
-<li><a href=advanced-iterators.html#more-itertools>Other Fun Stuff in the <code>itertools</code> Module</a>
-<li><a href=advanced-iterators.html#string-translate>A New Kind Of String Manipulation</a>
-<li><a href=advanced-iterators.html#eval>Evaluating Arbitrary Strings As Python Expressions</a>
-<li><a href=advanced-iterators.html#alphametics-finale>Putting It All Together</a>
-<li><a href=advanced-iterators.html#furtherreading>Further Reading</a>
-</ol>
-<li id=unit-testing><a href=unit-testing.html>Unit Testing</a>
-<ol>
-<li><a href=unit-testing.html#divingin>(Not) Diving In</a>
-<li><a href=unit-testing.html#romantest1>A Single Question</a>
-<li><a href=unit-testing.html#romantest2>&#8220;Halt And Catch Fire&#8221;</a>
-<li><a href=unit-testing.html#romantest3>More Halting, More Fire</a>
-<li><a href=unit-testing.html#romantest4>And One More Thing&hellip;</a>
-<li><a href=unit-testing.html#romantest5>A Pleasing Symmetry</a>
-<li><a href=unit-testing.html#romantest6>More Bad Input</a>
-</ol>
-<li id=refactoring><a href=refactoring.html>Refactoring</a>
-<ol>
-<li><a href=refactoring.html#divingin>Diving In</a>
-<li><a href=refactoring.html#changing-requirements>Handling Changing Requirements</a>
-<li><a href=refactoring.html#refactoring>Refactoring</a>
-<li><a href=refactoring.html#summary>Summary</a>
-</ol>
-<li id=files><a href=files.html>Files</a>
-<ol>
-<li><a href=files.html#divingin>Diving In</a>
-<li><a href=files.html#reading>Reading From Text Files</a>
-<ol>
-<li><a href=files.html#encoding>Character Encoding Rears Its Ugly Head</a>
-<li><a href=files.html#file-objects>Stream Objects</a>
-<li><a href=files.html#read>Reading Data From A Text File</a>
-<li><a href=files.html#close>Closing Files</a>
-<li><a href=files.html#with>Closing Files Automatically</a>
-<li><a href=files.html#for>Reading Data One Line At A Time</a>
-</ol>
-<li><a href=files.html#writing>Writing to Text Files</a>
-<ol>
-<li><a href=files.html#encoding-again>Character Encoding Again</a>
-</ol>
-<li><a href=files.html#binary>Binary Files</a>
-<li><a href=files.html#file-like-objects>Stream Objects From Non-File Sources</a>
-<ol>
-<li><a href=files.html#gzip>Handling Compressed Files</a>
-</ol>
-<li><a href=files.html#stdio>Standard Input, Output, and Error</a>
-<ol>
-<li><a href=files.html#redirect>Redirecting Standard Output</a>
-</ol>
-<li><a href=files.html#furtherreading>Further Reading</a>
-</ol>
-<li id=xml><a href=xml.html>XML</a>
-<ol>
-<li><a href=xml.html#divingin>Diving In</a>
-<li><a href=xml.html#xml-intro>A 5-Minute Crash Course in XML</a>
-<li><a href=xml.html#xml-structure>The Structure Of An Atom Feed</a>
-<li><a href=xml.html#xml-parse>Parsing XML</a>
-<ol>
-<li><a href=xml.html#xml-elements>Elements Are Lists</a>
-<li><a href=xml.html#xml-attributes>Attributes Are Dictonaries</a>
-</ol>
-<li><a href=xml.html#xml-find>Searching For Nodes Within An XML Document</a>
-<li><a href=xml.html#xml-lxml>Going Further With lxml</a>
-<li><a href=xml.html#xml-generate>Generating XML</a>
-<li><a href=xml.html#xml-custom-parser>Parsing Broken XML</a>
-<li><a href=xml.html#furtherreading>Further Reading</a>
-</ol>
-<li id=serializing><a href=serializing.html>Serializing Python Objects</a>
-<ol>
-<li><a href=serializing.html#divingin>Diving In</a>
-<ol>
-<li><a href=serializing.html#administrivia>A Quick Note About The Examples in This Chapter</a>
-</ol>
-<li><a href=serializing.html#dump>Saving Data to a Pickle File</a>
-<li><a href=serializing.html#load>Loading Data from a Pickle File</a>
-<li><a href=serializing.html#dumps>Pickling Without a File</a>
-<li><a href=serializing.html#protocol-versions>Bytes and Strings Rear Their Ugly Heads Again</a>
-<li><a href=serializing.html#debugging>Debugging Pickle Files</a>
-<li><a href=serializing.html#json>Serializing Python Objects to be Read by Other Languages</a>
-<li><a href=serializing.html#json-dump>Saving Data to a <abbr>JSON</abbr> File</a>
-<li><a href=serializing.html#json-types>Mapping of Python Datatypes to <abbr>JSON</abbr></a>
-<li><a href=serializing.html#json-unknown-types>Serializing Datatypes Unsupported by <abbr>JSON</abbr></a>
-<li><a href=serializing.html#json-load>Loading Data from a <abbr>JSON</abbr> File</a>
-<li><a href=serializing.html#furtherreading>Further Reading</a>
-</ol>
-<li id=http-web-services><a href=http-web-services.html>HTTP Web Services</a>
-<ol>
-<li><a href=http-web-services.html#divingin>Diving In</a>
-<li><a href=http-web-services.html#http-features>Features of HTTP</a>
-<ol>
-<li><a href=http-web-services.html#caching>Caching</a>
-<li><a href=http-web-services.html#last-modified>Last-Modified Checking</a>
-<li><a href=http-web-services.html#etags>ETag Checking</a>
-<li><a href=http-web-services.html#compression>Compression</a>
-<li><a href=http-web-services.html#redirects>Redirects</a>
-</ol>
-<li><a href=http-web-services.html#dont-try-this-at-home>How Not To Fetch Data Over HTTP</a>
-<li><a href=http-web-services.html#whats-on-the-wire>What&#8217;s On The Wire?</a>
-<li><a href=http-web-services.html#introducing-httplib2>Introducing <code>httplib2</code></a>
-<ol>
-<li><a href=http-web-services.html#why-bytes>A Short Digression To Explain Why <code>httplib2</code> Returns Bytes Instead of Strings</a>
-<li><a href=http-web-services.html#httplib2-caching>How <code>httplib2</code> Handles Caching</a>
-<li><a href=http-web-services.html#httplib2-etags>How <code>httplib2</code> Handles <code>Last-Modified</code> and <code>ETag</code> Headers</a>
-<li><a href=http-web-services.html#httplib2-compression>How <code>http2lib</code> Handles Compression</a>
-<li><a href=http-web-services.html#httplib2-redirects>How <code>httplib2</code> Handles Redirects</a>
-</ol>
-<li><a href=http-web-services.html#beyond-get>Beyond HTTP GET</a>
-<li><a href=http-web-services.html#beyond-post>Beyond HTTP POST</a>
-<li><a href=http-web-services.html#furtherreading>Further Reading</a>
-</ol>
-<li id=case-study-porting-chardet-to-python-3><a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>
-<ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#divingin>Diving In</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#faq.what>What is Character Encoding Auto-Detection?</a>
-<ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#faq.impossible>Isn&#8217;t That Impossible?</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#faq.who>Does Such An Algorithm Exist?</a>
-</ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#divingin2>Introducing The <code>chardet</code> Module</a>
-<ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#how.bom><abbr>UTF-n</abbr> With A <abbr>BOM</abbr></a>
-<li><a href=case-study-porting-chardet-to-python-3.html#how.esc>Escaped Encodings</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#how.mb>Multi-Byte Encodings</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#how.sb>Single-Byte Encodings</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#how.windows1252><code>windows-1252</code></a>
-</ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#running2to3>Running <code>2to3</code></a>
-<li><a href=case-study-porting-chardet-to-python-3.html#multifile-modules>A Short Digression Into Multi-File Modules</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#manual>Fixing What <code>2to3</code> Can&#8217;t</a>
-<ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#falseisinvalidsyntax><code>False</code> is invalid syntax</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#nomodulenamedconstants>No module named <code>constants</code></a>
-<li><a href=case-study-porting-chardet-to-python-3.html#namefileisnotdefined>Name <var>'file'</var> is not defined</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#cantuseastringpattern>Can&#8217;t use a string pattern on a bytes-like object</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#cantconvertbytesobject>Can't convert <code>'bytes'</code> object to <code>str</code> implicitly</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#unsupportedoperandtypeforplus>Unsupported operand type(s) for +: <code>'int'</code> and <code>'bytes'</code></a>
-<li><a href=case-study-porting-chardet-to-python-3.html#ordexpectedstring><code>ord()</code> expected string of length 1, but <code>int</code> found</a>
-<li><a href=case-study-porting-chardet-to-python-3.html#unorderabletypes>Unorderable types: <code>int()</code> >= <code>str()</code></a>
-<li><a href=case-study-porting-chardet-to-python-3.html#reduceisnotdefined>Global name <code>'reduce'</code> is not defined</a>
-</ol>
-<li><a href=case-study-porting-chardet-to-python-3.html#summary>Summary</a>
-</ol>
-<li id=packaging><a href=packaging.html>Packaging Python Libraries</a>
-<ol>
-<li><a href=packaging.html#divingin>Diving In</a>
-<li><a href=packaging.html#cantdo>Things Distutils Can&#8217;t Do For You</a>
-<li><a href=packaging.html#structure>Directory Structure</a>
-<li><a href=packaging.html#setuppy>Writing Your Setup Script</a>
-<li><a href=packaging.html#trove>Classifying Your Package</a>
-<ol>
-<li><a href=packaging.html#trove-examples>Examples of Good Package Classifiers</a>
-</ol>
-<li><a href=packaging.html#manifest>Specifying Additional Files With A Manifest</a>
-<li><a href=packaging.html#check>Checking Your Setup Script for Errors</a>
-<li><a href=packaging.html#sdist>Creating a Source Distribution</a>
-<li><a href=packaging.html#bdist>Creating a Graphical Installer</a>
-<ol>
-<li><a href=packaging.html#linux>Building Installable Packages for Other Operating Systems</a>
-</ol>
-<li><a href=packaging.html#pypi>Adding Your Software to The Python Package Index</a>
-<li><a href=packaging.html#future>The Many Possible Futures of Python Packaging</a>
-<li><a href=packaging.html#furtherreading>Further Reading</a>
-</ol>
-<li id=porting-code-to-python-3-with-2to3><a href=porting-code-to-python-3-with-2to3.html>Porting Code to Python 3 with <code>2to3</code></a>
-<ol>
-<li><a href=porting-code-to-python-3-with-2to3.html#divingin>Diving In</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#print><code>print</code> statement</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#unicodeliteral>Unicode string literals</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#unicode><code>unicode()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#long><code>long</code> data type</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#ne>&lt;> comparison</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#has_key><code>has_key()</code> dictionary method</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#dict>Dictionary methods that return lists</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#imports>Modules that have been renamed or reorganized</a>
-<ol>
-<li><a href=porting-code-to-python-3-with-2to3.html#http><code>http</code></a>
-<li><a href=porting-code-to-python-3-with-2to3.html#urllib><code>urllib</code></a>
-<li><a href=porting-code-to-python-3-with-2to3.html#dbm><code>dbm</code></a>
-<li><a href=porting-code-to-python-3-with-2to3.html#xmlrpc><code>xmlrpc</code></a>
-<li><a href=porting-code-to-python-3-with-2to3.html#othermodules>Other modules</a>
-</ol>
-<li><a href=porting-code-to-python-3-with-2to3.html#import>Relative imports within a package</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#next><code>next()</code> iterator method</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#filter><code>filter()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#map><code>map()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#reduce><code>reduce()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#apply><code>apply()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#intern><code>intern()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#exec><code>exec</code> statement</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#execfile><code>execfile</code> statement</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#repr><code>repr</code> literals (backticks)</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#except><code>try...except</code> statement</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#raise><code>raise</code> statement</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#throw><code>throw</code> method on generators</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#xrange><code>xrange()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#raw_input><code>raw_input()</code> and <code>input()</code> global functions</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#funcattrs><code>func_*</code> function attributes</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#xreadlines><code>xreadlines()</code> I/O method</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#tuple_params><code>lambda</code> functions that take a tuple instead of multiple parameters</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#methodattrs>Special method attributes</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#nonzero><code>__nonzero__</code> special method</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#numliterals>Octal literals</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#renames><code>sys.maxint</code></a>
-<li><a href=porting-code-to-python-3-with-2to3.html#callable><code>callable()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#zip><code>zip()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#standarderror><code>StandardError</code> exception</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#types><code>types</code> module constants</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#isinstance><code>isinstance()</code> global function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#basestring><code>basestring</code> datatype</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#itertools><code>itertools</code> module</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#sys_exc><code>sys.exc_type</code>, <code>sys.exc_value</code>, <code>sys.exc_traceback</code></a>
-<li><a href=porting-code-to-python-3-with-2to3.html#paren>List comprehensions over tuples</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#getcwdu><code>os.getcwdu()</code> function</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#metaclass>Metaclasses</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#nitpick>Matters of style</a>
-<ol>
-<li><a href=porting-code-to-python-3-with-2to3.html#set_literal><code>set()</code> literals (explicit)</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#buffer><code>buffer()</code> global function (explicit)</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#wscomma>Whitespace around commas (explicit)</a>
-<li><a href=porting-code-to-python-3-with-2to3.html#idioms>Common idioms (explicit)</a>
-</ol>
-</ol>
-<li id=special-method-names><a href=special-method-names.html>Special Method Names</a>
-<ol>
-<li><a href=special-method-names.html#divingin>Diving In</a>
-<li><a href=special-method-names.html#basics>Basics</a>
-<li><a href=special-method-names.html#acts-like-iterator>Classes That Act Like Iterators</a>
-<li><a href=special-method-names.html#computed-attributes>Computed Attributes</a>
-<li><a href=special-method-names.html#acts-like-function>Classes That Act Like Functions</a>
-<li><a href=special-method-names.html#acts-like-list>Classes That Act Like Sequences</a>
-<li><a href=special-method-names.html#acts-like-dict>Classes That Act Like Dictionaries</a>
-<li><a href=special-method-names.html#acts-like-number>Classes That Act Like Numbers</a>
-<li><a href=special-method-names.html#rich-comparisons>Classes That Can Be Compared</a>
-<li><a href=special-method-names.html#pickle>Classes That Can Be Serialized</a>
-<li><a href=special-method-names.html#context-managers>Classes That Can Be Used in a <code>with</code> Block</a>
-<li><a href=special-method-names.html#esoterica>Really Esoteric Stuff</a>
-<li><a href=special-method-names.html#furtherreading>Further Reading</a>
-</ol>
-<li id=where-to-go-from-here><a href=where-to-go-from-here.html>Where to Go From Here</a>
-<ol>
-<li><a href=where-to-go-from-here.html#things-to-read>Things to Read</a>
-<li><a href=where-to-go-from-here.html#code>Where To Look For Python 3-Compatible Code</a>
-</ol>
-</ol>
-<!-- /toc -->
-<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
-<!--[if IE]><script src=j/html5.js></script><![endif]-->
+<!DOCTYPE html>
+<meta charset=utf-8>
+<title>Table of contents - Dive Into Python 3</title>
+<link rel=stylesheet href=dip3.css>
+<style>
+h1:before{content:''}
+ol,ul{font-weight:bold}
+li ol{font-weight:normal}
+#porting-code-to-python-3-with-2to3,#special-method-names,#where-to-go-from-here{list-style:none;margin:0 0 0 -2em}
+#porting-code-to-python-3-with-2to3 > ol,#special-method-names > ol,#where-to-go-from-here > ol{margin:0;padding:0 0 0 4.5em}
+#porting-code-to-python-3-with-2to3:before{content:'A. \00a0 \00a0'}
+#special-method-names:before{content:'B. \00a0 \00a0'}
+#where-to-go-from-here:before{content:'C. \00a0 \00a0'}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8><input type=search name=q size=25 placeholder="powered by Google&trade;">&nbsp;<input type=submit name=sa value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> Dive Into Python 3 <span class=u>&#8227;</span>
+<h1>Table of Contents</h1>
+<!-- toc -->
+<ol start=-1>
+<li id=whats-new><a href=whats-new.html>What&#8217;s New in &#8220;Dive Into Python 3&#8221;</a>
+<ol>
+<li><a href=whats-new.html#divingin><i>a.k.a.</i> &#8220;the minus level&#8221;</a>
+</ol>
+<li id=installing-python><a href=installing-python.html>Installing Python</a>
+<ol>
+<li><a href=installing-python.html#divingin>Diving In</a>
+<li><a href=installing-python.html#which>Which Python Is Right For You?</a>
+<li><a href=installing-python.html#windows>Installing on Microsoft Windows</a>
+<li><a href=installing-python.html#macosx>Installing on Mac OS X</a>
+<li><a href=installing-python.html#ubuntu>Installing on Ubuntu Linux</a>
+<li><a href=installing-python.html#other>Installing on Other Platforms</a>
+<li><a href=installing-python.html#idle>Using The Python Shell</a>
+<li><a href=installing-python.html#editors>Python Editors and IDEs</a>
+</ol>
+<li id=your-first-python-program><a href=your-first-python-program.html>Your First Python Program</a>
+<ol>
+<li><a href=your-first-python-program.html#divingin>Diving In</a>
+<li><a href=your-first-python-program.html#declaringfunctions>Declaring Functions</a>
+<ol>
+<li><a href=your-first-python-program.html#optional-arguments>Optional and Named Arguments</a>
+</ol>
+<li><a href=your-first-python-program.html#readability>Writing Readable Code</a>
+<ol>
+<li><a href=your-first-python-program.html#docstrings>Documentation Strings</a>
+</ol>
+<li><a href=your-first-python-program.html#importsearchpath>The <code>import</code> Search Path</a>
+<li><a href=your-first-python-program.html#everythingisanobject>Everything Is An Object</a>
+<ol>
+<li><a href=your-first-python-program.html#whatsanobject>What&#8217;s An Object?</a>
+</ol>
+<li><a href=your-first-python-program.html#indentingcode>Indenting Code</a>
+<li><a href=your-first-python-program.html#exceptions>Exceptions</a>
+<ol>
+<li><a href=your-first-python-program.html#importerror>Catching Import Errors</a>
+</ol>
+<li><a href=your-first-python-program.html#nameerror>Unbound Variables</a>
+<li><a href=your-first-python-program.html#case>Everything is Case-Sensitive</a>
+<li><a href=your-first-python-program.html#runningscripts>Running Scripts</a>
+<li><a href=your-first-python-program.html#furtherreading>Further Reading</a>
+</ol>
+<li id=native-datatypes><a href=native-datatypes.html>Native Datatypes</a>
+<ol>
+<li><a href=native-datatypes.html#divingin>Diving In</a>
+<li><a href=native-datatypes.html#booleans>Booleans</a>
+<li><a href=native-datatypes.html#numbers>Numbers</a>
+<ol>
+<li><a href=native-datatypes.html#number-coercion>Coercing Integers To Floats And Vice-Versa</a>
+<li><a href=native-datatypes.html#common-numerical-operations>Common Numerical Operations</a>
+<li><a href=native-datatypes.html#fractions>Fractions</a>
+<li><a href=native-datatypes.html#trig>Trigonometry</a>
+<li><a href=native-datatypes.html#numbers-in-a-boolean-context>Numbers In A Boolean Context</a>
+</ol>
+<li><a href=native-datatypes.html#lists>Lists</a>
+<ol>
+<li><a href=native-datatypes.html#creatinglists>Creating A List</a>
+<li><a href=native-datatypes.html#slicinglists>Slicing A List</a>
+<li><a href=native-datatypes.html#extendinglists>Adding Items To A List</a>
+<li><a href=native-datatypes.html#searchinglists>Searching For Values In A List</a>
+<li><a href=native-datatypes.html#removingfromlists>Removing Items From A List</a>
+<li><a href=native-datatypes.html#popgoestheweasel>Removing Items From A List: Bonus Round</a>
+<li><a href=native-datatypes.html#lists-in-a-boolean-context>Lists In A Boolean Context</a>
+</ol>
+<li><a href=native-datatypes.html#tuples>Tuples</a>
+<ol>
+<li><a href=native-datatypes.html#tuples-in-a-boolean-context>Tuples In A Boolean Context</a>
+<li><a href=native-datatypes.html#multivar>Assigning Multiple Values At Once</a>
+</ol>
+<li><a href=native-datatypes.html#sets>Sets</a>
+<ol>
+<li><a href=native-datatypes.html#creating-a-set>Creating A Set</a>
+<li><a href=native-datatypes.html#modifying-sets>Modifying A Set</a>
+<li><a href=native-datatypes.html#removing-from-sets>Removing Items From A Set</a>
+<li><a href=native-datatypes.html#common-set-operations>Common Set Operations</a>
+<li><a href=native-datatypes.html#sets-in-a-boolean-context>Sets In A Boolean Context</a>
+</ol>
+<li><a href=native-datatypes.html#dictionaries>Dictionaries</a>
+<ol>
+<li><a href=native-datatypes.html#creating-dictionaries>Creating A Dictionary</a>
+<li><a href=native-datatypes.html#modifying-dictionaries>Modifying A Dictionary</a>
+<li><a href=native-datatypes.html#mixed-value-dictionaries>Mixed-Value Dictionaries</a>
+<li><a href=native-datatypes.html#dictionaries-in-a-boolean-context>Dictionaries In A Boolean Context</a>
+</ol>
+<li><a href=native-datatypes.html#none><code>None</code></a>
+<ol>
+<li><a href=native-datatypes.html#none-in-a-boolean-context><code>None</code> In A Boolean Context</a>
+</ol>
+<li><a href=native-datatypes.html#furtherreading>Further Reading</a>
+</ol>
+<li id=comprehensions><a href=comprehensions.html>Comprehensions</a>
+<ol>
+<li><a href=comprehensions.html#divingin>Diving In</a>
+<li><a href=comprehensions.html#os>Working With Files And Directories</a>
+<ol>
+<li><a href=comprehensions.html#getcwd>The Current Working Directory</a>
+<li><a href=comprehensions.html#ospath>Working With Filenames and Directory Names</a>
+<li><a href=comprehensions.html#glob>Listing Directories</a>
+<li><a href=comprehensions.html#osstat>Getting File Metadata</a>
+<li><a href=comprehensions.html#abspath>Constructing Absolute Pathnames</a>
+</ol>
+<li><a href=comprehensions.html#listcomprehension>List Comprehensions</a>
+<li><a href=comprehensions.html#dictionarycomprehension>Dictionary Comprehensions</a>
+<ol>
+<li><a href=comprehensions.html#stupiddicttricks>Other Fun Stuff To Do With Dictionary Comprehensions</a>
+</ol>
+<li><a href=comprehensions.html#setcomprehension>Set Comprehensions</a>
+<li><a href=comprehensions.html#furtherreading>Further Reading</a>
+</ol>
+<li id=strings><a href=strings.html>Strings</a>
+<ol>
+<li><a href=strings.html#boring-stuff>Some Boring Stuff You Need To Understand Before You Can Dive In</a>
+<li><a href=strings.html#one-ring-to-rule-them-all>Unicode</a>
+<li><a href=strings.html#divingin>Diving In</a>
+<li><a href=strings.html#formatting-strings>Formatting Strings</a>
+<ol>
+<li><a href=strings.html#compound-field-names>Compound Field Names</a>
+<li><a href=strings.html#format-specifiers>Format Specifiers</a>
+</ol>
+<li><a href=strings.html#common-string-methods>Other Common String Methods</a>
+<ol>
+<li><a href=strings.html#slicingstrings>Slicing A String</a>
+</ol>
+<li><a href=strings.html#byte-arrays>Strings vs. Bytes</a>
+<li><a href=strings.html#py-encoding>Postscript: Character Encoding Of Python Source Code</a>
+<li><a href=strings.html#furtherreading>Further Reading</a>
+</ol>
+<li id=regular-expressions><a href=regular-expressions.html>Regular Expressions</a>
+<ol>
+<li><a href=regular-expressions.html#divingin>Diving In</a>
+<li><a href=regular-expressions.html#streetaddresses>Case Study: Street Addresses</a>
+<li><a href=regular-expressions.html#romannumerals>Case Study: Roman Numerals</a>
+<ol>
+<li><a href=regular-expressions.html#thousands>Checking For Thousands</a>
+<li><a href=regular-expressions.html#hundreds>Checking For Hundreds</a>
+</ol>
+<li><a href=regular-expressions.html#nmsyntax>Using The <code>{n,m}</code> Syntax</a>
+<ol>
+<li><a href=regular-expressions.html#tensandones>Checking For Tens And Ones</a>
+</ol>
+<li><a href=regular-expressions.html#verbosere>Verbose Regular Expressions</a>
+<li><a href=regular-expressions.html#phonenumbers>Case study: Parsing Phone Numbers</a>
+<li><a href=regular-expressions.html#summary>Summary</a>
+</ol>
+<li id=generators><a href=generators.html>Closures <i class=baa>&amp;</i> Generators</a>
+<ol>
+<li><a href=generators.html#divingin>Diving In</a>
+<li><a href=generators.html#i-know>I Know, Let&#8217;s Use Regular Expressions!</a>
+<li><a href=generators.html#a-list-of-functions>A List Of Functions</a>
+<li><a href=generators.html#a-list-of-patterns>A List Of Patterns</a>
+<li><a href=generators.html#a-file-of-patterns>A File Of Patterns</a>
+<li><a href=generators.html#generators>Generators</a>
+<ol>
+<li><a href=generators.html#a-fibonacci-generator>A Fibonacci Generator</a>
+<li><a href=generators.html#a-plural-rule-generator>A Plural Rule Generator</a>
+</ol>
+<li><a href=generators.html#furtherreading>Further Reading</a>
+</ol>
+<li id=iterators><a href=iterators.html>Classes <i class=baa>&amp;</i> Iterators</a>
+<ol>
+<li><a href=iterators.html#divingin>Diving In</a>
+<li><a href=iterators.html#defining-classes>Defining Classes</a>
+<ol>
+<li><a href=iterators.html#init-method>The <code>__init__()</code> Method</a>
+</ol>
+<li><a href=iterators.html#instantiating-classes>Instantiating Classes</a>
+<li><a href=iterators.html#instance-variables>Instance Variables</a>
+<li><a href=iterators.html#a-fibonacci-iterator>A Fibonacci Iterator</a>
+<li><a href=iterators.html#a-plural-rule-iterator>A Plural Rule Iterator</a>
+<li><a href=iterators.html#furtherreading>Further Reading</a>
+</ol>
+<li id=advanced-iterators><a href=advanced-iterators.html>Advanced Iterators</a>
+<ol>
+<li><a href=advanced-iterators.html#divingin>Diving In</a>
+<li><a href=advanced-iterators.html#re-findall>Finding all occurrences of a pattern</a>
+<li><a href=advanced-iterators.html#unique-items>Finding the unique items in a sequence</a>
+<li><a href=advanced-iterators.html#assert>Making assertions</a>
+<li><a href=advanced-iterators.html#generator-expressions>Generator expressions</a>
+<li><a href=advanced-iterators.html#permutations>Calculating Permutations&hellip; The Lazy Way!</a>
+<li><a href=advanced-iterators.html#more-itertools>Other Fun Stuff in the <code>itertools</code> Module</a>
+<li><a href=advanced-iterators.html#string-translate>A New Kind Of String Manipulation</a>
+<li><a href=advanced-iterators.html#eval>Evaluating Arbitrary Strings As Python Expressions</a>
+<li><a href=advanced-iterators.html#alphametics-finale>Putting It All Together</a>
+<li><a href=advanced-iterators.html#furtherreading>Further Reading</a>
+</ol>
+<li id=unit-testing><a href=unit-testing.html>Unit Testing</a>
+<ol>
+<li><a href=unit-testing.html#divingin>(Not) Diving In</a>
+<li><a href=unit-testing.html#romantest1>A Single Question</a>
+<li><a href=unit-testing.html#romantest2>&#8220;Halt And Catch Fire&#8221;</a>
+<li><a href=unit-testing.html#romantest3>More Halting, More Fire</a>
+<li><a href=unit-testing.html#romantest4>And One More Thing&hellip;</a>
+<li><a href=unit-testing.html#romantest5>A Pleasing Symmetry</a>
+<li><a href=unit-testing.html#romantest6>More Bad Input</a>
+</ol>
+<li id=refactoring><a href=refactoring.html>Refactoring</a>
+<ol>
+<li><a href=refactoring.html#divingin>Diving In</a>
+<li><a href=refactoring.html#changing-requirements>Handling Changing Requirements</a>
+<li><a href=refactoring.html#refactoring>Refactoring</a>
+<li><a href=refactoring.html#summary>Summary</a>
+</ol>
+<li id=files><a href=files.html>Files</a>
+<ol>
+<li><a href=files.html#divingin>Diving In</a>
+<li><a href=files.html#reading>Reading From Text Files</a>
+<ol>
+<li><a href=files.html#encoding>Character Encoding Rears Its Ugly Head</a>
+<li><a href=files.html#file-objects>Stream Objects</a>
+<li><a href=files.html#read>Reading Data From A Text File</a>
+<li><a href=files.html#close>Closing Files</a>
+<li><a href=files.html#with>Closing Files Automatically</a>
+<li><a href=files.html#for>Reading Data One Line At A Time</a>
+</ol>
+<li><a href=files.html#writing>Writing to Text Files</a>
+<ol>
+<li><a href=files.html#encoding-again>Character Encoding Again</a>
+</ol>
+<li><a href=files.html#binary>Binary Files</a>
+<li><a href=files.html#file-like-objects>Stream Objects From Non-File Sources</a>
+<ol>
+<li><a href=files.html#gzip>Handling Compressed Files</a>
+</ol>
+<li><a href=files.html#stdio>Standard Input, Output, and Error</a>
+<ol>
+<li><a href=files.html#redirect>Redirecting Standard Output</a>
+</ol>
+<li><a href=files.html#furtherreading>Further Reading</a>
+</ol>
+<li id=xml><a href=xml.html>XML</a>
+<ol>
+<li><a href=xml.html#divingin>Diving In</a>
+<li><a href=xml.html#xml-intro>A 5-Minute Crash Course in XML</a>
+<li><a href=xml.html#xml-structure>The Structure Of An Atom Feed</a>
+<li><a href=xml.html#xml-parse>Parsing XML</a>
+<ol>
+<li><a href=xml.html#xml-elements>Elements Are Lists</a>
+<li><a href=xml.html#xml-attributes>Attributes Are Dictonaries</a>
+</ol>
+<li><a href=xml.html#xml-find>Searching For Nodes Within An XML Document</a>
+<li><a href=xml.html#xml-lxml>Going Further With lxml</a>
+<li><a href=xml.html#xml-generate>Generating XML</a>
+<li><a href=xml.html#xml-custom-parser>Parsing Broken XML</a>
+<li><a href=xml.html#furtherreading>Further Reading</a>
+</ol>
+<li id=serializing><a href=serializing.html>Serializing Python Objects</a>
+<ol>
+<li><a href=serializing.html#divingin>Diving In</a>
+<ol>
+<li><a href=serializing.html#administrivia>A Quick Note About The Examples in This Chapter</a>
+</ol>
+<li><a href=serializing.html#dump>Saving Data to a Pickle File</a>
+<li><a href=serializing.html#load>Loading Data from a Pickle File</a>
+<li><a href=serializing.html#dumps>Pickling Without a File</a>
+<li><a href=serializing.html#protocol-versions>Bytes and Strings Rear Their Ugly Heads Again</a>
+<li><a href=serializing.html#debugging>Debugging Pickle Files</a>
+<li><a href=serializing.html#json>Serializing Python Objects to be Read by Other Languages</a>
+<li><a href=serializing.html#json-dump>Saving Data to a <abbr>JSON</abbr> File</a>
+<li><a href=serializing.html#json-types>Mapping of Python Datatypes to <abbr>JSON</abbr></a>
+<li><a href=serializing.html#json-unknown-types>Serializing Datatypes Unsupported by <abbr>JSON</abbr></a>
+<li><a href=serializing.html#json-load>Loading Data from a <abbr>JSON</abbr> File</a>
+<li><a href=serializing.html#furtherreading>Further Reading</a>
+</ol>
+<li id=http-web-services><a href=http-web-services.html>HTTP Web Services</a>
+<ol>
+<li><a href=http-web-services.html#divingin>Diving In</a>
+<li><a href=http-web-services.html#http-features>Features of HTTP</a>
+<ol>
+<li><a href=http-web-services.html#caching>Caching</a>
+<li><a href=http-web-services.html#last-modified>Last-Modified Checking</a>
+<li><a href=http-web-services.html#etags>ETag Checking</a>
+<li><a href=http-web-services.html#compression>Compression</a>
+<li><a href=http-web-services.html#redirects>Redirects</a>
+</ol>
+<li><a href=http-web-services.html#dont-try-this-at-home>How Not To Fetch Data Over HTTP</a>
+<li><a href=http-web-services.html#whats-on-the-wire>What&#8217;s On The Wire?</a>
+<li><a href=http-web-services.html#introducing-httplib2>Introducing <code>httplib2</code></a>
+<ol>
+<li><a href=http-web-services.html#why-bytes>A Short Digression To Explain Why <code>httplib2</code> Returns Bytes Instead of Strings</a>
+<li><a href=http-web-services.html#httplib2-caching>How <code>httplib2</code> Handles Caching</a>
+<li><a href=http-web-services.html#httplib2-etags>How <code>httplib2</code> Handles <code>Last-Modified</code> and <code>ETag</code> Headers</a>
+<li><a href=http-web-services.html#httplib2-compression>How <code>http2lib</code> Handles Compression</a>
+<li><a href=http-web-services.html#httplib2-redirects>How <code>httplib2</code> Handles Redirects</a>
+</ol>
+<li><a href=http-web-services.html#beyond-get>Beyond HTTP GET</a>
+<li><a href=http-web-services.html#beyond-post>Beyond HTTP POST</a>
+<li><a href=http-web-services.html#furtherreading>Further Reading</a>
+</ol>
+<li id=case-study-porting-chardet-to-python-3><a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>
+<ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#divingin>Diving In</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#faq.what>What is Character Encoding Auto-Detection?</a>
+<ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#faq.impossible>Isn&#8217;t That Impossible?</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#faq.who>Does Such An Algorithm Exist?</a>
+</ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#divingin2>Introducing The <code>chardet</code> Module</a>
+<ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#how.bom><abbr>UTF-n</abbr> With A <abbr>BOM</abbr></a>
+<li><a href=case-study-porting-chardet-to-python-3.html#how.esc>Escaped Encodings</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#how.mb>Multi-Byte Encodings</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#how.sb>Single-Byte Encodings</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#how.windows1252><code>windows-1252</code></a>
+</ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#running2to3>Running <code>2to3</code></a>
+<li><a href=case-study-porting-chardet-to-python-3.html#multifile-modules>A Short Digression Into Multi-File Modules</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#manual>Fixing What <code>2to3</code> Can&#8217;t</a>
+<ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#falseisinvalidsyntax><code>False</code> is invalid syntax</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#nomodulenamedconstants>No module named <code>constants</code></a>
+<li><a href=case-study-porting-chardet-to-python-3.html#namefileisnotdefined>Name <var>'file'</var> is not defined</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#cantuseastringpattern>Can&#8217;t use a string pattern on a bytes-like object</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#cantconvertbytesobject>Can't convert <code>'bytes'</code> object to <code>str</code> implicitly</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#unsupportedoperandtypeforplus>Unsupported operand type(s) for +: <code>'int'</code> and <code>'bytes'</code></a>
+<li><a href=case-study-porting-chardet-to-python-3.html#ordexpectedstring><code>ord()</code> expected string of length 1, but <code>int</code> found</a>
+<li><a href=case-study-porting-chardet-to-python-3.html#unorderabletypes>Unorderable types: <code>int()</code> >= <code>str()</code></a>
+<li><a href=case-study-porting-chardet-to-python-3.html#reduceisnotdefined>Global name <code>'reduce'</code> is not defined</a>
+</ol>
+<li><a href=case-study-porting-chardet-to-python-3.html#summary>Summary</a>
+</ol>
+<li id=packaging><a href=packaging.html>Packaging Python Libraries</a>
+<ol>
+<li><a href=packaging.html#divingin>Diving In</a>
+<li><a href=packaging.html#cantdo>Things Distutils Can&#8217;t Do For You</a>
+<li><a href=packaging.html#structure>Directory Structure</a>
+<li><a href=packaging.html#setuppy>Writing Your Setup Script</a>
+<li><a href=packaging.html#trove>Classifying Your Package</a>
+<ol>
+<li><a href=packaging.html#trove-examples>Examples of Good Package Classifiers</a>
+</ol>
+<li><a href=packaging.html#manifest>Specifying Additional Files With A Manifest</a>
+<li><a href=packaging.html#check>Checking Your Setup Script for Errors</a>
+<li><a href=packaging.html#sdist>Creating a Source Distribution</a>
+<li><a href=packaging.html#bdist>Creating a Graphical Installer</a>
+<ol>
+<li><a href=packaging.html#linux>Building Installable Packages for Other Operating Systems</a>
+</ol>
+<li><a href=packaging.html#pypi>Adding Your Software to The Python Package Index</a>
+<li><a href=packaging.html#future>The Many Possible Futures of Python Packaging</a>
+<li><a href=packaging.html#furtherreading>Further Reading</a>
+</ol>
+<li id=porting-code-to-python-3-with-2to3><a href=porting-code-to-python-3-with-2to3.html>Porting Code to Python 3 with <code>2to3</code></a>
+<ol>
+<li><a href=porting-code-to-python-3-with-2to3.html#divingin>Diving In</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#print><code>print</code> statement</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#unicodeliteral>Unicode string literals</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#unicode><code>unicode()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#long><code>long</code> data type</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#ne>&lt;> comparison</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#has_key><code>has_key()</code> dictionary method</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#dict>Dictionary methods that return lists</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#imports>Modules that have been renamed or reorganized</a>
+<ol>
+<li><a href=porting-code-to-python-3-with-2to3.html#http><code>http</code></a>
+<li><a href=porting-code-to-python-3-with-2to3.html#urllib><code>urllib</code></a>
+<li><a href=porting-code-to-python-3-with-2to3.html#dbm><code>dbm</code></a>
+<li><a href=porting-code-to-python-3-with-2to3.html#xmlrpc><code>xmlrpc</code></a>
+<li><a href=porting-code-to-python-3-with-2to3.html#othermodules>Other modules</a>
+</ol>
+<li><a href=porting-code-to-python-3-with-2to3.html#import>Relative imports within a package</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#next><code>next()</code> iterator method</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#filter><code>filter()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#map><code>map()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#reduce><code>reduce()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#apply><code>apply()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#intern><code>intern()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#exec><code>exec</code> statement</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#execfile><code>execfile</code> statement</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#repr><code>repr</code> literals (backticks)</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#except><code>try...except</code> statement</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#raise><code>raise</code> statement</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#throw><code>throw</code> method on generators</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#xrange><code>xrange()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#raw_input><code>raw_input()</code> and <code>input()</code> global functions</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#funcattrs><code>func_*</code> function attributes</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#xreadlines><code>xreadlines()</code> I/O method</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#tuple_params><code>lambda</code> functions that take a tuple instead of multiple parameters</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#methodattrs>Special method attributes</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#nonzero><code>__nonzero__</code> special method</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#numliterals>Octal literals</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#renames><code>sys.maxint</code></a>
+<li><a href=porting-code-to-python-3-with-2to3.html#callable><code>callable()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#zip><code>zip()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#standarderror><code>StandardError</code> exception</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#types><code>types</code> module constants</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#isinstance><code>isinstance()</code> global function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#basestring><code>basestring</code> datatype</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#itertools><code>itertools</code> module</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#sys_exc><code>sys.exc_type</code>, <code>sys.exc_value</code>, <code>sys.exc_traceback</code></a>
+<li><a href=porting-code-to-python-3-with-2to3.html#paren>List comprehensions over tuples</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#getcwdu><code>os.getcwdu()</code> function</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#metaclass>Metaclasses</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#nitpick>Matters of style</a>
+<ol>
+<li><a href=porting-code-to-python-3-with-2to3.html#set_literal><code>set()</code> literals (explicit)</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#buffer><code>buffer()</code> global function (explicit)</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#wscomma>Whitespace around commas (explicit)</a>
+<li><a href=porting-code-to-python-3-with-2to3.html#idioms>Common idioms (explicit)</a>
+</ol>
+</ol>
+<li id=special-method-names><a href=special-method-names.html>Special Method Names</a>
+<ol>
+<li><a href=special-method-names.html#divingin>Diving In</a>
+<li><a href=special-method-names.html#basics>Basics</a>
+<li><a href=special-method-names.html#acts-like-iterator>Classes That Act Like Iterators</a>
+<li><a href=special-method-names.html#computed-attributes>Computed Attributes</a>
+<li><a href=special-method-names.html#acts-like-function>Classes That Act Like Functions</a>
+<li><a href=special-method-names.html#acts-like-list>Classes That Act Like Sequences</a>
+<li><a href=special-method-names.html#acts-like-dict>Classes That Act Like Dictionaries</a>
+<li><a href=special-method-names.html#acts-like-number>Classes That Act Like Numbers</a>
+<li><a href=special-method-names.html#rich-comparisons>Classes That Can Be Compared</a>
+<li><a href=special-method-names.html#pickle>Classes That Can Be Serialized</a>
+<li><a href=special-method-names.html#context-managers>Classes That Can Be Used in a <code>with</code> Block</a>
+<li><a href=special-method-names.html#esoterica>Really Esoteric Stuff</a>
+<li><a href=special-method-names.html#furtherreading>Further Reading</a>
+</ol>
+<li id=where-to-go-from-here><a href=where-to-go-from-here.html>Where to Go From Here</a>
+<ol>
+<li><a href=where-to-go-from-here.html#things-to-read>Things to Read</a>
+<li><a href=where-to-go-from-here.html#code>Where To Look For Python 3-Compatible Code</a>
+</ol>
+</ol>
+<!-- /toc -->
+<p class=c>&copy; 2001&ndash;10 <a href=about.html>Mark Pilgrim</a>
+<!--[if IE]><script src=j/html5.js></script><![endif]-->
diff --git a/util/lesscss.py b/util/lesscss.py
index 9342d22..c39249c 100755
--- a/util/lesscss.py
+++ b/util/lesscss.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.6
+#!/usr/bin/python2.5
 
 from pyquery import PyQuery as pq
 import glob
@@ -12,10 +12,7 @@ SELECTOR_EXCEPTIONS = ('.w', '.b', '.str', '.kwd', '.com', '.typ', '.lit', '.pun
 filename = sys.argv[1]
 cssfilename = sys.argv[2]
 pqd = pq(filename=filename)
-
-with open(filename, 'rb') as fopen:
-    raw_data = fopen.read()
-
+raw_data = open(filename, 'rb').read()
 if raw_data.count('</a><script src=j/'): # HACK HACK HACK
     def keep(s):
         for selector in SELECTOR_EXCEPTIONS:
@@ -26,9 +23,7 @@ else:
     def keep(s):
         return False
 
-with open(cssfilename, 'rb') as fopen:
-    original_css = fopen.read()
-
+original_css = open(cssfilename, 'rb').read();
 new_css = ''
 for rule in original_css.split('}')[:-1]:
     selectors, properties = rule.split('{', 1)