diff --git a/dip2 b/dip2
index 5009412..c73b8c0 100644
--- a/dip2
+++ b/dip2
@@ -1,210 +1,3 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-<title>Dive Into Python</title>
-<link rel="stylesheet" href="diveintopython3.css" type="text/css">
-</head>
-<h1>Dive Into Python</h1>
-<p class=pubdate>20 May 2004
-<p class=copyright>Copyright &copy; 2000, 2001, 2002, 2003, 2004 <a href="mailto:mark@diveintopython3.org">Mark Pilgrim</a>
-<p>This book lives at <a href="http://diveintopython3.org/">http://diveintopython3.org/</a>. If you're reading it somewhere else, you may not have the latest version.
-<div class=toc>
-<p><b>Table of Contents</b>
-<ul>
-<li><a href="#install">1. Installing Python</a><ul>
-<li><a href="#install.choosing">1.1. Which Python is right for you?</a>
-<li><a href="#install.windows">1.2. Python on Windows</a>
-<li><a href="#install.macosx">1.3. Python on Mac OS X</a>
-<li><a href="#install.macos9">1.4. Python on Mac OS 9</a>
-<li><a href="#install.redhat">1.5. Python on RedHat Linux</a>
-<li><a href="#install.debian">1.6. Python on Debian GNU/Linux</a>
-<li><a href="#install.source">1.7. Python Installation from Source</a>
-<li><a href="#install.shell">1.8. The Interactive Shell</a>
-<li><a href="#install.summary">1.9. Summary</a>
-</ul>
-
-<li><a href="#odbchelper.tuple">3.3. Introducing Tuples</a>
-<li><a href="#odbchelper.vardef">3.4. Declaring variables</a><ul>
-<li><a href="#d0e6873">3.4.1. Referencing Variables</a>
-<li><a href="#odbchelper.multiassign">3.4.2. Assigning Multiple Values at Once</a>
-</ul>
-
-<li><a href="#apihelper">4. The Power Of Introspection</a><ul>
-<li><a href="#apihelper.divein">4.1. Diving In</a>
-<li><a href="#apihelper.optional">4.2. Using Optional and Named Arguments</a>
-<li><a href="#apihelper.builtin">4.3. Using type, str, dir, and Other Built-In Functions</a><ul>
-<li><a href="#d0e8510">4.3.1. The type Function</a>
-<li><a href="#d0e8609">4.3.2. The str Function</a>
-<li><a href="#d0e8958">4.3.3. Built-In Functions</a>
-</ul>
-
-<li><a href="#apihelper.getattr">4.4. Getting Object References With getattr</a><ul>
-<li><a href="#d0e9194">4.4.1. getattr with Modules</a>
-<li><a href="#d0e9362">4.4.2. getattr As a Dispatcher</a>
-</ul>
-
-<li><a href="#apihelper.filter">4.5. Filtering Lists</a>
-<li><a href="#apihelper.andor">4.6. The Peculiar Nature of and and or</a><ul>
-<li><a href="#d0e9975">4.6.1. Using the and-or Trick</a>
-</ul>
-
-<li><a href="#apihelper.alltogether">4.8. Putting It All Together</a>
-<li><a href="#apihelper.summary">4.9. Summary</a>
-</ul>
-
-<li><a href="#fileinfo">5. Objects and Object-Orientation</a><ul>
-<li><a href="#fileinfo.divein">5.1. Diving In</a>
-<li><a href="#fileinfo.fromimport">5.2. Importing Modules Using from module import</a>
-<li><a href="#fileinfo.class">5.3. Defining Classes</a><ul>
-<li><a href="#d0e11720">5.3.1. Initializing and Coding Classes</a>
-<li><a href="#d0e11896">5.3.2. Knowing When to Use self and __init__</a>
-</ul>
-
-<li><a href="#fileinfo.create">5.4. Instantiating Classes</a><ul>
-<li><a href="#d0e12165">5.4.1. Garbage Collection</a>
-</ul>
-
-<li><a href="#fileinfo.userdict">5.5. Exploring UserDict: A Wrapper Class</a>
-<li><a href="#fileinfo.specialmethods">5.6. Special Class Methods</a><ul>
-<li><a href="#d0e12822">5.6.1. Getting and Setting Items</a>
-</ul>
-
-<li><a href="#fileinfo.morespecial">5.7. Advanced Special Class Methods</a>
-<li><a href="#fileinfo.classattributes">5.8. Introducing Class Attributes</a>
-<li><a href="#fileinfo.private">5.9. Private Functions</a>
-<li><a href="#fileinfo.summary">5.10. Summary</a>
-</ul>
-
-<li><a href="#filehandling">6. Exceptions and File Handling</a><ul>
-<li><a href="#fileinfo.exception">6.1. Handling Exceptions</a><ul>
-<li><a href="#d0e14344">6.1.1. Using Exceptions For Other Purposes</a>
-</ul>
-
-<li><a href="#fileinfo.files">6.2. Working with File Objects</a><ul>
-<li><a href="#d0e14670">6.2.1. Reading Files</a>
-<li><a href="#d0e14800">6.2.2. Closing Files</a>
-<li><a href="#d0e14928">6.2.3. Handling I/O Errors</a>
-<li><a href="#d0e15055">6.2.4. Writing to Files</a>
-</ul>
-
-<li><a href="#fileinfo.for">6.3. Iterating with for Loops</a>
-<li><a href="#fileinfo.modules">6.4. Using sys.modules</a>
-<li><a href="#fileinfo.os">6.5. Working with Directories</a>
-<li><a href="#fileinfo.alltogether">6.6. Putting It All Together</a>
-<li><a href="#fileinfo.summary2">6.7. Summary</a>
-</ul>
-
-<li><a href="#dialect">8. HTML Processing</a><ul>
-<li><a href="#dialect.divein">8.1. Diving in</a>
-<li><a href="#dialect.sgmllib">8.2. Introducing sgmllib.py</a>
-<li><a href="#dialect.extract">8.3. Extracting data from HTML documents</a>
-<li><a href="#dialect.basehtml">8.4. Introducing BaseHTMLProcessor.py</a>
-<li><a href="#dialect.locals">8.5. locals and globals</a>
-<li><a href="#dialect.dictsub">8.6. Dictionary-based string formatting</a>
-<li><a href="#dialect.quoting">8.7. Quoting attribute values</a>
-<li><a href="#dialect.dialectizer">8.8. Introducing dialect.py</a>
-<li><a href="#dialect.alltogether">8.9. Putting it all together</a>
-<li><a href="#dialect.summary">8.10. Summary</a>
-</ul>
-
-<li><a href="#kgp">9. XML Processing</a><ul>
-<li><a href="#kgp.divein">9.1. Diving in</a>
-<li><a href="#kgp.packages">9.2. Packages</a>
-<li><a href="#kgp.parse">9.3. Parsing XML</a>
-<li><a href="#kgp.search">9.5. Searching for elements</a>
-<li><a href="#kgp.attributes">9.6. Accessing element attributes</a>
-<li><a href="#kgp.segue">9.7. Segue</a>
-</ul>
-
-<li><a href="#streams">10. Scripts and Streams</a><ul>
-<li><a href="#kgp.openanything">10.1. Abstracting input sources</a>
-<li><a href="#kgp.stdio">10.2. Standard input, output, and error</a>
-<li><a href="#kgp.cache">10.3. Caching node lookups</a>
-<li><a href="#kgp.child">10.4. Finding direct children of a node</a>
-<li><a href="#kgp.handler">10.5. Creating separate handlers by node type</a>
-<li><a href="#kgp.commandline">10.6. Handling command-line arguments</a>
-<li><a href="#kgp.alltogether">10.7. Putting it all together</a>
-<li><a href="#kgp.summary">10.8. Summary</a>
-</ul>
-
-<li><a href="#oa">11. HTTP Web Services</a><ul>
-<li><a href="#oa.divein">11.1. Diving in</a>
-<li><a href="#oa.review">11.2. How not to fetch data over HTTP</a>
-<li><a href="#oa.features">11.3. Features of HTTP</a><ul>
-<li><a href="#d0e27596">11.3.1. User-Agent</a>
-<li><a href="#d0e27616">11.3.2. Redirects</a>
-<li><a href="#d0e27689">11.3.3. Last-Modified/If-Modified-Since</a>
-<li><a href="#d0e27724">11.3.4. ETag/If-None-Match</a>
-<li><a href="#d0e27752">11.3.5. Compression</a>
-</ul>
-
-<li><a href="#oa.debug">11.4. Debugging HTTP web services</a>
-<li><a href="#oa.useragent">11.5. Setting the User-Agent</a>
-<li><a href="#oa.etags">11.6. Handling Last-Modified and ETag</a>
-<li><a href="#oa.redirect">11.7. Handling redirects</a>
-<li><a href="#oa.gzip">11.8. Handling compressed data</a>
-<li><a href="#oa.alltogether">11.9. Putting it all together</a>
-<li><a href="#oa.summary">11.10. Summary</a>
-</ul>
-
-<li><a href="#roman">13. Unit Testing</a><ul>
-<li><a href="#roman.intro">13.1. Introduction to Roman numerals</a>
-<li><a href="#roman.divein">13.2. Diving in</a>
-<li><a href="#roman.romantest">13.3. Introducing romantest.py</a>
-<li><a href="#roman.success">13.4. Testing for success</a>
-<li><a href="#roman.failure">13.5. Testing for failure</a>
-<li><a href="#roman.sanity">13.6. Testing for sanity</a>
-</ul>
-
-<li><a href="#roman1.5">14. Test-First Programming</a><ul>
-<li><a href="#roman.stage1">14.1. roman.py, stage 1</a>
-<li><a href="#roman.stage2">14.2. roman.py, stage 2</a>
-<li><a href="#roman.stage3">14.3. roman.py, stage 3</a>
-<li><a href="#roman.stage4">14.4. roman.py, stage 4</a>
-<li><a href="#roman.stage5">14.5. roman.py, stage 5</a>
-</ul>
-
-<li><a href="#roman2">15. Refactoring</a><ul>
-<li><a href="#roman.bugs">15.1. Handling bugs</a>
-<li><a href="#roman.change">15.2. Handling changing requirements</a>
-<li><a href="#roman.refactoring">15.3. Refactoring</a>
-<li><a href="#roman.postscript">15.4. Postscript</a>
-<li><a href="#roman.summary">15.5. Summary</a>
-</ul>
-
-<li><a href="#regression">16. Functional Programming</a><ul>
-<li><a href="#regression.divein">16.1. Diving in</a>
-<li><a href="#regression.path">16.2. Finding the path</a>
-<li><a href="#regression.filter">16.3. Filtering lists revisited</a>
-<li><a href="#regression.map">16.4. Mapping lists revisited</a>
-<li><a href="#regression.datacentric">16.5. Data-centric programming</a>
-<li><a href="#regression.import">16.6. Dynamically importing modules</a>
-<li><a href="#regression.alltogether">16.7. Putting it all together</a>
-<li><a href="#regression.summary">16.8. Summary</a>
-</ul>
-
-<li><a href="#plural">17. Dynamic functions</a><ul>
-<li><a href="#plural.divein">17.1. Diving in</a>
-<li><a href="#plural.stage1">17.2. plural.py, stage 1</a>
-<li><a href="#plural.stage2">17.3. plural.py, stage 2</a>
-<li><a href="#plural.stage3">17.4. plural.py, stage 3</a>
-<li><a href="#plural.stage4">17.5. plural.py, stage 4</a>
-<li><a href="#plural.stage5">17.6. plural.py, stage 5</a>
-<li><a href="#plural.stage6">17.7. plural.py, stage 6</a>
-<li><a href="#plural.summary">17.8. Summary</a>
-</ul>
-
-<li><a href="#soundex">18. Performance Tuning</a><ul>
-<li><a href="#soundex.divein">18.1. Diving in</a>
-<li><a href="#soundex.timeit">18.2. Using the timeit Module</a>
-<li><a href="#soundex.stage1">18.3. Optimizing Regular Expressions</a>
-<li><a href="#soundex.stage2">18.4. Optimizing Dictionary Lookups</a>
-<li><a href="#soundex.stage3">18.5. Optimizing List Operations</a>
-<li><a href="#soundex.stage4">18.6. Optimizing String Manipulation</a>
-<li><a href="#soundex.summary">18.7. Summary</a>
-</ul></ul>
-
-</ul>
 <div class=chapter>
 <h2 id="install">Chapter 1. Installing Python</h2>
 <p>Welcome to Python. Let's dive in. In this chapter, you'll install the version of Python that's right for you.
@@ -538,24 +331,6 @@ hello world
 
 
 
-<h2 id="odbchelper.docstring">2.3. Documenting Functions</h2>
-<p>You can document a Python function by giving it a <code>docstring</code>.
-<div class=example><h3 id="odbchelper.triplequotes">Example 2.2. Defining the <code>buildConnectionString</code> Function's <code>docstring</code></h3><pre><code>
-def buildConnectionString(params):
-    """Build a connection string from a dictionary of parameters.
-
-    Returns string."""</pre><p>Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including
-   carriage returns and other quote characters. You can use them anywhere, but you'll see them most often used when defining
-   a <code>docstring</code>.
-<table id="compare.quoting.perl" class=note border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">Triple quotes are also an easy way to define a string with both single and double quotes, like <code>qq/.../</code> in Perl.
-<p>Everything between the triple quotes is the function's <code>docstring</code>, which documents what the function does. A <code>docstring</code>, if it exists, must be the first thing defined in a function (that is, the first thing after the colon). You don't technically
-need to give your function a <code>docstring</code>, but you always should. I know you've heard this in every programming class you've ever taken, but Python gives you an added incentive: the <code>docstring</code> is available at runtime as an attribute of the function.
-<table id="tip.docstring" class=note border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">Many Python <abbr>IDE</abbr>s use the <code>docstring</code> to provide context-sensitive documentation, so that when you type a function name, its <code>docstring</code> appears as a tooltip. This can be incredibly helpful, but it's only as good as the <code>docstring</code>s you write.
-
 
 
 
@@ -1930,238 +1705,20 @@ exceptions, errors occur immediately, and you can handle them in a standard way
 <li><a href="http://www.python.org/doc/current/ref/"><i class=citetitle>Python Reference Manual</i></a> discusses the inner workings of the <a href="http://www.python.org/doc/current/ref/try.html"><code>try...except</code> block</a>.
 
 </ul>
-<h2 id="fileinfo.files">6.2. Working with File Objects</h2>
-<p>Python has a built-in function, <code>open</code>, for opening a file on disk. <code>open</code> returns a file object, which has methods and attributes for getting information about and manipulating the opened file.
-<div class=example><h3>Example 6.3. Opening a File</h3><pre class=screen><samp class=p>>>> </samp><kbd>f = open("/music/_singles/kairo.mp3", "rb")</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>f</kbd>       <span>&#x2461;</span>
-&lt;open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
-<samp class=p>>>> </samp><kbd>f.mode</kbd>  <span>&#x2462;</span>
-'rb'
-<samp class=p>>>> </samp><kbd>f.name</kbd>  <span>&#x2463;</span>
-'/music/_singles/kairo.mp3'</pre>
-<ol>
-<li>The <code>open</code> method can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename,
-            is required; the other two are <a href="#apihelper.optional" title="4.2. Using Optional and Named Arguments">optional</a>. If not specified, the file is opened for reading in text mode. Here you are opening the file for reading in binary mode.
-             (<code>print open.__doc__</code> displays a great explanation of all the possible modes.)
-<li>The <code>open</code> function returns an object (by now, <a href="#odbchelper.objects" title="2.4. Everything Is an Object">this should not surprise you</a>). A file object has several useful attributes.
-<li>The <var>mode</var> attribute of a file object tells you in which mode the file was opened.
-<li>The <var>name</var> attribute of a file object tells you the name of the file that the file object has open.
-<h3>6.2.1. Reading Files</h3>
-<p>After you open a file, the first thing you'll want to do is read from it, as shown in the next example.
-<div class=example><h3>Example 6.4. Reading a File</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>f</kbd>
-&lt;open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
-<samp class=p>>>> </samp><kbd>f.tell()</kbd>              <span>&#x2460;</span>
-0
-<samp class=p>>>> </samp><kbd>f.seek(-128, 2)</kbd>       <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>f.tell()</kbd>              <span>&#x2462;</span>
-7542909
-<samp class=p>>>> </samp><kbd>tagData = f.read(128)</kbd> <span>&#x2463;</span>
-<samp class=p>>>> </samp><kbd>tagData</kbd>
-<samp>'TAGKAIRO****THE BEST GOA         ***DJ MARY-JANE***            
-Rave Mix    2000http://mp3.com/DJMARYJANE     \037'</samp>
-<samp class=p>>>> </samp><kbd>f.tell()</kbd>              <span>&#x2464;</span>
-7543037</pre>
-<ol>
-<li>A file object maintains state about the file it has open. The <code>tell</code> method of a file object tells you your current position in the open file. Since you haven't done anything with this file
-               yet, the current position is <code>0</code>, which is the beginning of the file.
-<li>The <code>seek</code> method of a file object moves to another position in the open file. The second parameter specifies what the first one means;
-<code>0</code> means move to an absolute position (counting from the start of the file), <code>1</code> means move to a relative position (counting from the current position), and <code>2</code> means move to a position relative to the end of the file. Since the <abbr>MP3</abbr> tags you're looking for are stored at the end of the file, you use <code>2</code> and tell the file object to move to a position <code>128</code> bytes from the end of the file.
-<li>The <code>tell</code> method confirms that the current file position has moved.
-<li>The <code>read</code> method reads a specified number of bytes from the open file and returns a string with the data that was read. The optional
-               parameter specifies the maximum number of bytes to read. If no parameter is specified, <code>read</code> will read until the end of the file. (You could have simply said <code>read()</code> here, since you know exactly where you are in the file and you are, in fact, reading the last 128 bytes.)  The read data
-               is assigned to the <var>tagData</var> variable, and the current position is updated based on how many bytes were read.
-<li>The <code>tell</code> method confirms that the current position has moved. If you do the math, you'll see that after reading 128 bytes, the position
-               has been incremented by 128.
-<h3>6.2.2. Closing Files</h3>
-<p>Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It's
-   important to close files as soon as you're finished with them.
-<div class=example><h3>Example 6.5. Closing a File</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>f</kbd>
-&lt;open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
-<samp class=p>>>> </samp><kbd>f.closed</kbd>       <span>&#x2460;</span>
-False
-<samp class=p>>>> </samp><kbd>f.close()</kbd>      <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>f</kbd>
-&lt;closed file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
-<samp class=p>>>> </samp><kbd>f.closed</kbd>       <span>&#x2462;</span>
-True
-<samp class=p>>>> </samp><kbd>f.seek(0)</kbd>      <span>&#x2463;</span>
-<samp class=traceback>Traceback (innermost last):
-  File "&lt;interactive input>", line 1, in ?
-ValueError: I/O operation on closed file</samp>
-<samp class=p>>>> </samp><kbd>f.tell()</kbd>
-<samp class=traceback>Traceback (innermost last):
-  File "&lt;interactive input>", line 1, in ?
-ValueError: I/O operation on closed file</samp>
-<samp class=p>>>> </samp><kbd>f.read()</kbd>
-<samp class=traceback>Traceback (innermost last):
-  File "&lt;interactive input>", line 1, in ?
-ValueError: I/O operation on closed file</samp>
-<samp class=p>>>> </samp><kbd>f.close()</kbd>      <span>&#x2464;</span></pre>
-<ol>
-<li>The <var>closed</var> attribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (<var>closed</var> is <code>False</code>).
-<li>To close a file, call the <code>close</code> method of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any)
-               that the system hadn't gotten around to actually writing yet, and releases the system resources.
-<li>The <var>closed</var> attribute confirms that the file is closed.
-<li>Just because a file is closed doesn't mean that the file object ceases to exist. The variable <var>f</var> will continue to exist until it <a href="#fileinfo.scope" title="Example 5.8. Trying to Implement a Memory Leak">goes out of scope</a> or gets manually deleted. However, none of the methods that manipulate an open file will work once the file has been closed;
-               they all raise an exception.
-<li>Calling <code>close</code> on a file object whose file is already closed does <em>not</em> raise an exception; it fails silently.
-<h3>6.2.3. Handling <abbr>I/O</abbr> Errors</h3>
-<p>Now you've seen enough to understand the file handling code in the <code>fileinfo.py</code> sample code from teh previous chapter. This example shows how to safely open and read from a file and gracefully handle
-   errors.
-<div class=example><h3 id="fileinfo.files.incode">Example 6.6. File Objects in <code>MP3FileInfo</code></h3><pre><code>
-        try:              <span>&#x2460;</span>
-            fsock = open(filename, "rb", 0) <span>&#x2461;</span>
-            try:         
-                fsock.seek(-128, 2)         <span>&#x2462;</span>
-                tagdata = fsock.read(128)   <span>&#x2463;</span>
-            finally:      <span>&#x2464;</span>
-                fsock.close()              
-            .
-            .
-            .
-        except IOError:   <span>&#x2465;</span>
-            pass         </pre>
-<ol>
-<li>Because opening and reading files is risky and may raise an exception, all of this code is wrapped in a <code>try...except</code> block. (Hey, isn't <a href="#odbchelper.indenting" title="2.5. Indenting Code">standardized indentation</a> great?  This is where you start to appreciate it.)
-<li>The <code>open</code> function may raise an <code>IOError</code>. (Maybe the file doesn't exist.)
-<li>The <code>seek</code> method may raise an <code>IOError</code>. (Maybe the file is smaller than 128 bytes.)
-<li>The <code>read</code> method may raise an <code>IOError</code>. (Maybe the disk has a bad sector, or it's on a network drive and the network just went down.)
-<li>This is new: a <code>try...finally</code> block. Once the file has been opened successfully by the <code>open</code> function, you want to make absolutely sure that you close it, even if an exception is raised by the <code>seek</code> or <code>read</code> methods. That's what a <code>try...finally</code> block is for: code in the <code>finally</code> block will <em>always</em> be executed, even if something in the <code>try</code> block raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.
-<li>At last, you handle your <code>IOError</code> exception. This could be the <code>IOError</code> exception raised by the call to <code>open</code>, <code>seek</code>, or <code>read</code>. Here, you really don't care, because all you're going to do is ignore it silently and continue. (Remember, <code>pass</code> is a Python statement that <a href="#fileinfo.class.simplest" title="Example 5.3. The Simplest Python Class">does nothing</a>.)  That's perfectly legal; &#8220;handling&#8221; an exception can mean explicitly doing nothing. It still counts as handled, and processing will continue normally on the
-               next line of code after the <code>try...except</code> block.
-<h3>6.2.4. Writing to Files</h3>
-<p>As you would expect, you can also write to files in much the same way that you read from them. There are two basic file modes:
-<div class=itemizedlist>
-<ul>
-<li>"Append" mode will add data to the end of the file.
-<li>"write" mode will overwrite the file.
-</ul>
-<p>Either mode will create the file automatically if it doesn't already exist, so there's never a need for any sort of fiddly
-   "if the log file doesn't exist yet, create a new empty file just so you can open it for the first time" logic. Just open
-   it and start writing.
-<div class=example><h3 id="fileinfo.files.writeandappend">Example 6.7. Writing to Files</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>logfile = open('test.log', 'w')</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>logfile.write('test succeeded')</kbd> <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>logfile.close()</kbd>
-<samp class=p>>>> </samp><kbd>print file('test.log').read()</kbd>   <span>&#x2462;</span>
-test succeeded
-<samp class=p>>>> </samp><kbd>logfile = open('test.log', 'a')</kbd> <span>&#x2463;</span>
-<samp class=p>>>> </samp><kbd>logfile.write('line 2')</kbd>
-<samp class=p>>>> </samp><kbd>logfile.close()</kbd>
-<samp class=p>>>> </samp><kbd>print file('test.log').read()</kbd>   <span>&#x2464;</span>
-test succeededline 2
-</pre>
-<ol>
-<li>You start boldly by creating either the new file <code>test.log</code> or overwrites the existing file, and opening the file for writing. (The second parameter <code>"w"</code> means open the file for writing.)  Yes, that's all as dangerous as it sounds. I hope you didn't care about the previous
-               contents of that file, because it's gone now.
-<li>You can add data to the newly opened file with the <code>write</code> method of the file object returned by <code>open</code>.
-<li><code>file</code> is a synonym for <code>open</code>. This one-liner opens the file, reads its contents, and prints them.
-<li>You happen to know that <code>test.log</code> exists (since you just finished writing to it), so you can open it and append to it. (The <code>"a"</code> parameter means open the file for appending.)  Actually you could do this even if the file didn't exist, because opening
-               the file for appending will create the file if necessary. But appending will <em>never</em> harm the existing contents of the file.
-<li>As you can see, both the original line you wrote and the second line you appended are now in <code>test.log</code>. Also note that carriage returns are not included. Since you didn't write them explicitly to the file either time, the
-               file doesn't include them. You can write a carriage return with the <code>"\n"</code> character. Since you didn't do this, everything you wrote to the file ended up smooshed together on the same line.
-<div class=itemizedlist>
-<h3>Further Reading on File Handling</h3>
-<ul>
-<li><a href="http://www.python.org/doc/current/tut/tut.html"><i class=citetitle>Python Tutorial</i></a> discusses reading and writing files, including how to <a href="http://www.python.org/doc/current/tut/node9.html#SECTION009210000000000000000">read a file one line at a time into a list</a>.
 
-<li><a href="http://www.effbot.org/guides/">eff-bot</a> discusses efficiency and performance of <a href="http://www.effbot.org/guides/readline-performance.htm">various ways of reading a file</a>.
 
-<li><a href="http://www.faqts.com/knowledge-base/index.phtml/fid/199/">Python Knowledge Base</a> answers <a href="http://www.faqts.com/knowledge-base/index.phtml/fid/552">common questions about files</a>.
 
-<li><a href="http://www.python.org/doc/current/lib/"><i class=citetitle>Python Library Reference</i></a> summarizes <a href="http://www.python.org/doc/current/lib/bltin-file-objects.html">all the file object methods</a>.
 
-</ul>
-<h2 id="fileinfo.for">6.3. Iterating with <code>for</code> Loops</h2>
-<p>Like most other languages, Python has <code>for</code> loops. The only reason you haven't seen them until now is that Python is good at so many other things that you don't need them as often.
-<p>Most other languages don't have a powerful list datatype like Python, so you end up doing a lot of manual work, specifying a start, end, and step to define a range of integers or characters
-or other iteratable entities. But in Python, a <code>for</code> loop simply iterates over a list, the same way <a href="#odbchelper.map" title="3.6. Mapping Lists">list comprehensions</a> work.
-<div class=example><h3>Example 6.8. Introducing the <code>for</code> Loop</h3><pre class=screen><samp class=p>>>> </samp><kbd>li = ['a', 'b', 'e']</kbd>
-<samp class=p>>>> </samp><kbd>for s in li:</kbd>         <span>&#x2460;</span>
-<samp class=p>...    </samp>print s          <span>&#x2461;</span>
-<samp>a
-b
-e</samp>
-<samp class=p>>>> </samp><kbd>print "\n".join(li)</kbd>  <span>&#x2462;</span>
-<samp>a
-b
-e</span></pre>
-<ol>
-<li>The syntax for a <code>for</code> loop is similar to <a href="#odbchelper.map" title="3.6. Mapping Lists">list comprehensions</a>. <var>li</var> is a list, and <var>s</var> will take the value of each element in turn, starting from the first element.
-<li>Like an <code>if</code> statement or any other <a href="#odbchelper.indenting" title="2.5. Indenting Code">indented block</a>, a <code>for</code> loop can have any number of lines of code in it.
-<li>This is the reason you haven't seen the <code>for</code> loop yet: you haven't needed it yet. It's amazing how often you use <code>for</code> loops in other languages when all you really want is a <code>join</code> or a list comprehension.
-<p>Doing a &#8220;normal&#8221; (by Visual Basic standards) counter <code>for</code> loop is also simple.
-<div class=example><h3 id="fileinfo.for.counter">Example 6.9. Simple Counters</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>for i in range(5):</kbd>             <span>&#x2460;</span>
-<samp class=p>...    </samp>print i
-<samp>0
-1
-2
-3
-4</samp>
-<samp class=p>>>> </samp><kbd>li = ['a', 'b', 'c', 'd', 'e']</kbd>
-<samp class=p>>>> </samp><kbd>for i in range(len(li)):</kbd>       <span>&#x2461;</span>
-<samp class=p>...    </samp>print li[i]
-<samp>a
-b
-c
-d
-e</span>
-</pre>
-<ol>
-<li>As you saw in <a href="#odbchelper.multiassign.range" title="Example 3.20. Assigning Consecutive Values">Example 3.20, &#8220;Assigning Consecutive Values&#8221;</a>, <code>range</code> produces a list of integers, which you then loop through. I know it looks a bit odd, but it is occasionally (and I stress
-<em>occasionally</em>) useful to have a counter loop.
-<li>Don't ever do this. This is Visual Basic-style thinking. Break out of it. Just iterate through the list, as shown in the previous example.
-<p><code>for</code> loops are not just for simple counters. They can iterate through all kinds of things. Here is an example of using a <code>for</code> loop to iterate through a dictionary.
-<div class=example><h3 id="dictionaryiter.example">Example 6.10. Iterating Through a Dictionary</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>import os</kbd>
-<samp class=p>>>> </samp><kbd>for k, v in os.environ.items():</kbd>      <span>&#x2460;</span> <span>&#x2461;</span>
-<samp class=p>...    </samp>print "%s=%s" % (k, v)
-<samp>USERPROFILE=C:\Documents and Settings\mpilgrim
-OS=Windows_NT
-COMPUTERNAME=MPILGRIM
-USERNAME=mpilgrim
 
-[...snip...]</samp>
-<samp class=p>>>> </samp><kbd>print "\n".join(["%s=%s" % (k, v)</kbd>
-<samp class=p>...    </samp>for k, v in os.environ.items()]) <span>&#x2462;</span>
-<samp>USERPROFILE=C:\Documents and Settings\mpilgrim
-OS=Windows_NT
-COMPUTERNAME=MPILGRIM
-USERNAME=mpilgrim
 
-[...snip...]</span></pre>
-<ol>
-<li><var>os.environ</var> is a dictionary of the environment variables defined on your system. In Windows, these are your user and system variables
-            accessible from <abbr>MS-DOS</abbr>. In <abbr>UNIX</abbr>, they are the variables exported in your shell's startup scripts. In Mac OS, there is no concept of environment variables, so this dictionary is empty.
-<li><code>os.environ.items()</code> returns a list of tuples: <code>[(<var>key1</var>, <var>value1</var>), (<var>key2</var>, <var>value2</var>), ...]</code>. The <code>for</code> loop iterates through this list. The first round, it assigns <code><var>key1</var></code> to <var>k</var> and <code><var>value1</var></code> to <var>v</var>, so <var>k</var> = <code>USERPROFILE</code> and <var>v</var> = <code>C:\Documents and Settings\mpilgrim</code>. In the second round, <var>k</var> gets the second key, <code>OS</code>, and <var>v</var> gets the corresponding value, <code>Windows_NT</code>.
-<li>With <a href="#odbchelper.multiassign" title="3.4.2. Assigning Multiple Values at Once">multi-variable assignment</a> and <a href="#odbchelper.map" title="3.6. Mapping Lists">list comprehensions</a>, you can replace the entire <code>for</code> loop with a single statement. Whether you actually do this in real code is a matter of personal coding style. I like it
-            because it makes it clear that what I'm doing is mapping a dictionary into a list, then joining the list into a single string.
-             Other programmers prefer to write this out as a <code>for</code> loop. The output is the same in either case, although this version is slightly faster, because there is only one <code>print</code> statement instead of many.
-<p>Now we can look at the <code>for</code> loop in <code>MP3FileInfo</code>, from the sample <code>fileinfo.py</code> program introduced in <a href="#fileinfo">Chapter 5</a>.
-<div class=example><h3 id="fileinfo.multiassign.for.example">Example 6.11. <code>for</code> Loop in <code>MP3FileInfo</code></h3><pre><code>
-    tagDataMap = {"title"   : (  3,  33, stripnulls),
-"artist"  : ( 33,  63, stripnulls),
-"album"   : ( 63,  93, stripnulls),
-"year"    : ( 93,  97, stripnulls),
-"comment" : ( 97, 126, stripnulls),
-"genre"   : (127, 128, ord)}             <span>&#x2460;</span>
-    .
-    .
-    .
-            if tagdata[:3] == "TAG":
-                for tag, (start, end, parseFunc) in self.tagDataMap.items(): <span>&#x2461;</span>
-  self[tag] = parseFunc(tagdata[start:end])                <span>&#x2462;</span></pre>
-<ol>
-<li><var>tagDataMap</var> is a <a href="#fileinfo.classattributes" title="5.8. Introducing Class Attributes">class attribute</a> that defines the tags you're looking for in an <abbr>MP3</abbr> file. Tags are stored in fixed-length fields. Once you read the last 128 bytes of the file, bytes 3 through 32 of those
-            are always the song title, 33 through 62 are always the artist name, 63 through 92 are the album name, and so forth. Note
-            that <var>tagDataMap</var> is a dictionary of tuples, and each tuple contains two integers and a function reference.
-<li>This looks complicated, but it's not. The structure of the <code>for</code> variables matches the structure of the elements of the list returned by <code>items</code>. Remember that <code>items</code> returns a list of tuples of the form <code>(<var>key</var>, <var>value</var>)</code>. The first element of that list is <code>("title", (3, 33, &lt;function stripnulls>))</code>, so the first time around the loop, <var>tag</var> gets <code>"title"</code>, <var>start</var> gets <code>3</code>, <var>end</var> gets <code>33</code>, and <var>parseFunc</var> gets the function <code>stripnulls</code>.
-<li>Now that you've extracted all the parameters for a single <abbr>MP3</abbr> tag, saving the tag data is easy. You <a href="#odbchelper.list.slice" title="Example 3.8. Slicing a List">slice</a> <var>tagdata</var> from <var>start</var> to <var>end</var> to get the actual data for this tag, call <var>parseFunc</var> to post-process the data, and assign this as the value for the key <var>tag</var> in the pseudo-dictionary <var>self</var>. After iterating through all the elements in <var>tagDataMap</var>, <var>self</var> has the values for all the tags, and <a href="#fileinfo.specialmethods.setname" title="Example 5.15. Setting an MP3FileInfo's name">you know what that looks like</a>.
-<h2 id="fileinfo.modules">6.4. Using <code><code>sys</code>.modules</code></h2>
-<p>Modules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary <code><code>sys</code>.modules</code>.
+
+
+[for loop stuff was here]
+
+
+
+
+
 <div class=example><h3>Example 6.12. Introducing <code><code>sys</code>.modules</code></h3><pre class=screen><samp class=p>>>> </samp><kbd>import sys</kbd>        <span>&#x2460;</span>
 <samp class=p>>>> </samp><kbd>print '\n'.join(sys.modules.keys())</kbd> <span>&#x2461;</span>
 <samp>win32api
@@ -2353,608 +1910,17 @@ may already be familiar with from working on the command line.
 <li><a href="http://www.python.org/doc/current/lib/"><i class=citetitle>Python Library Reference</i></a> documents the <a href="http://www.python.org/doc/current/lib/module-os.html"><code>os</code></a> module and the <a href="http://www.python.org/doc/current/lib/module-os.path.html"><code>os.path</code></a> module.
 
 </ul>
-<h2 id="fileinfo.alltogether">6.6. Putting It All Together</h2>
-<p>Once again, all the dominoes are in place. You've seen how each line of code works. Now let's step back and see how it all
-   fits together.
-<div class=example><h3 id="fileinfo.nested">Example 6.21. <code>listDirectory</code></h3><pre><code>
-def listDirectory(directory, fileExtList):     <span>&#x2460;</span>
-    "get list of file info objects for files of particular extensions"
-    fileList = [os.path.normcase(f)
-                for f in os.listdir(directory)]           
-    fileList = [os.path.join(directory, f) 
-               for f in fileList
-                if os.path.splitext(f)[1] in fileExtList]        <span>&#x2461;</span>
-    def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):       <span>&#x2462;</span>
-        "get file info class from filename extension"           
-        subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]        <span>&#x2463;</span>
-        return hasattr(module, subclass) and getattr(module, subclass) or FileInfo <span>&#x2464;</span>
-    return [getFileInfoClass(f)(f) for f in fileList]            <span>&#x2465;</span></pre>
-<ol>
-<li><code>listDirectory</code> is the main attraction of this entire module. It takes a directory (like <code>c:\music\_singles\</code> in my case) and a list of interesting file extensions (like <code>['.mp3']</code>), and it returns a list of class instances that act like dictionaries that contain metadata about each interesting file in
-            that directory. And it does it in just a few straightforward lines of code.
-<li>As you saw in the <a href="#fileinfo.os" title="6.5. Working with Directories">previous section</a>, this line of code gets a list of the full pathnames of all the files in <var>directory</var> that have an interesting file extension (as specified by <var>fileExtList</var>).
-<li>Old-school Pascal programmers may be familiar with them, but most people give me a blank stare when I tell them that Python supports <em>nested functions</em> -- literally, a function within a function. The nested function <code>getFileInfoClass</code> can be called only from the function in which it is defined, <code>listDirectory</code>. As with any other function, you don't need an interface declaration or anything fancy; just define the function and code
-            it.
-<li>Now that you've seen the <a href="#fileinfo.os" title="6.5. Working with Directories"><code>os</code></a> module, this line should make more sense. It gets the extension of the file (<code>os.path.splitext(filename)[1]</code>), forces it to uppercase (<code>.upper()</code>), slices off the dot (<code>[1:]</code>), and constructs a class name out of it with string formatting. So <code>c:\music\ap\mahadeva.mp3</code> becomes <code>.mp3</code> becomes <code>.MP3</code> becomes <code>MP3</code> becomes <code>MP3FileInfo</code>.
-<li>Having constructed the name of the handler class that would handle this file, you check to see if that handler class actually
-            exists in this module. If it does, you return the class, otherwise you return the base class <code>FileInfo</code>. This is a very important point: <em>this function returns a class</em>. Not an instance of a class, but the class itself.
-<li>For each file in the &#8220;interesting files&#8221; list (<var>fileList</var>), you call <code>getFileInfoClass</code> with the filename (<var>f</var>). Calling <code>getFileInfoClass(f)</code> returns a class; you don't know exactly which class, but you don't care. You then create an instance of this class (whatever
-            it is) and pass the filename (<var>f</var> again), to the <code>__init__</code> method. As you saw <a href="#fileinfo.specialmethods.setname" title="Example 5.15. Setting an MP3FileInfo's name">earlier in this chapter</a>, the <code>__init__</code> method of <code>FileInfo</code> sets <code>self["name"]</code>, which triggers <code>__setitem__</code>, which is overridden in the descendant (<code>MP3FileInfo</code>) to parse the file appropriately to pull out the file's metadata. You do all that for each interesting file and return a
-            list of the resulting instances.
-<p>Note that <code>listDirectory</code> is completely generic. It doesn't know ahead of time which types of files it will be getting, or which classes are defined
-that could potentially handle those files. It inspects the directory for the files to process, and then introspects its own
-module to see what special handler classes (like <code>MP3FileInfo</code>) are defined. You can extend this program to handle other types of files simply by defining an appropriately-named class:
-<code>HTMLFileInfo</code> for <abbr>HTML</abbr> files, <code>DOCFileInfo</code> for Word <code>.doc</code> files, and so forth. <code>listDirectory</code> will handle them all, without modification, by handing off the real work to the appropriate classes and collating the results.
-<h2 id="fileinfo.summary2">6.7. Summary</h2>
-<p>The <code>fileinfo.py</code> program introduced in <a href="#fileinfo">Chapter 5</a> should now make perfect sense.
-<pre><code>
-"""Framework for getting filetype-specific metadata.
 
-Instantiate appropriate class with filename. Returned object acts like a
-dictionary, with key-value pairs for each piece of metadata.
-    import fileinfo
-    info = fileinfo.MP3FileInfo("/music/ap/mahadeva.mp3")
-    print "\\n".join(["%s=%s" % (k, v) for k, v in info.items()])
 
-Or use listDirectory function to get info on all files in a directory.
-    for info in fileinfo.listDirectory("/music/ap/", [".mp3"]):
-        ...
 
-Framework can be extended by adding classes for particular file types, e.g.
-HTMLFileInfo, MPGFileInfo, DOCFileInfo. Each class is completely responsible for
-parsing its files appropriately; see MP3FileInfo for example.
-"""
-import os
-import sys
-from UserDict import UserDict
 
-def stripnulls(data):
-    "strip whitespace and nulls"
-    return data.replace("\00", "").strip()
 
-class FileInfo(UserDict):
-    "store file metadata"
-    def __init__(self, filename=None):
-        UserDict.__init__(self)
-        self["name"] = filename
+[HTML stuff was here]
 
-class MP3FileInfo(FileInfo):
-    "store ID3v1.0 MP3 tags"
-    tagDataMap = {"title"   : (  3,  33, stripnulls),
-"artist"  : ( 33,  63, stripnulls),
-"album"   : ( 63,  93, stripnulls),
-"year"    : ( 93,  97, stripnulls),
-"comment" : ( 97, 126, stripnulls),
-"genre"   : (127, 128, ord)}
 
-    def __parse(self, filename):
-        "parse ID3v1.0 tags from MP3 file"
-        self.clear()
-        try:             
-            fsock = open(filename, "rb", 0)
-            try:         
-                fsock.seek(-128, 2)        
-                tagdata = fsock.read(128)  
-            finally:     
-                fsock.close()              
-            if tagdata[:3] == "TAG":
-                for tag, (start, end, parseFunc) in self.tagDataMap.items():
-  self[tag] = parseFunc(tagdata[start:end])               
-        except IOError:  
-            pass         
 
-    def __setitem__(self, key, item):
-        if key == "name" and item:
-            self.__parse(item)
-        FileInfo.__setitem__(self, key, item)
 
-def listDirectory(directory, fileExtList):    
-    "get list of file info objects for files of particular extensions"
-    fileList = [os.path.normcase(f)
-                for f in os.listdir(directory)]           
-    fileList = [os.path.join(directory, f) 
-               for f in fileList
-                if os.path.splitext(f)[1] in fileExtList] 
-    def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):      
-        "get file info class from filename extension"           
-        subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]       
-        return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
-    return [getFileInfoClass(f)(f) for f in fileList]           
 
-if __name__ == "__main__":
-    for info in listDirectory("/music/_singles/", [".mp3"]):
-        print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
-        print</pre><div class=highlights>
-<p>Before diving into the next chapter, make sure you're comfortable doing the following things:
-<div class=itemizedlist>
-<ul>
-<li>Catching exceptions with <a href="#fileinfo.exception" title="6.1. Handling Exceptions"><code>try...except</code></a>
-<li>Protecting external resources with <a href="#fileinfo.files.incode" title="Example 6.6. File Objects in MP3FileInfo"><code>try...finally</code></a>
-<li>Reading from <a href="#fileinfo.files" title="6.2. Working with File Objects">files</a>
-<li>Assigning multiple values at once in a <a href="#fileinfo.multiassign.for.example" title="Example 6.11. for Loop in MP3FileInfo"><code>for</code> loop</a>
-<li>Using the <a href="#fileinfo.os" title="6.5. Working with Directories"><code>os</code></a> module for all your cross-platform file manipulation needs
-
-<li>Dynamically <a href="#fileinfo.alltogether" title="6.6. Putting It All Together">instantiating classes of unknown type</a> by treating classes as objects and passing them around
-
-</ul>
-<div class=chapter>
-<h2 id="dialect">Chapter 8. <abbr>HTML</abbr> Processing</h2>
-<h2 id="dialect.divein">8.1. Diving in</h2>
-<p>I often see questions on <a href="http://groups.google.com/groups?group=comp.lang.python">comp.lang.python</a> like &#8220;How can I list all the [headers|images|links] in my <abbr>HTML</abbr> document?&#8221;  &#8220;How do I parse/translate/munge the text of my <abbr>HTML</abbr> document but leave the tags alone?&#8221;  &#8220;How can I add/remove/quote attributes of all my <abbr>HTML</abbr> tags at once?&#8221;  This chapter will answer all of these questions.
-<p>Here is a complete, working Python program in two parts. The first part, <code>BaseHTMLProcessor.py</code>, is a generic tool to help you process <abbr>HTML</abbr> files by walking through the tags and text blocks. The second part, <code>dialect.py</code>, is an example of how to use <code>BaseHTMLProcessor.py</code> to translate the text of an <abbr>HTML</abbr> document but leave the tags alone. Read the <code>docstring</code>s and comments to get an overview of what's going on. Most of it will seem like black magic, because it's not obvious how
-any of these class methods ever get called. Don't worry, all will be revealed in due time.
-<div class=example><h3 id="dialect.basehtml.listing">Example 8.1. <code>BaseHTMLProcessor.py</code></h3>
-<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
-<pre><code>
-from sgmllib import SGMLParser
-import htmlentitydefs
-
-class BaseHTMLProcessor(SGMLParser):
-    def reset(self):     
-        # extend (called by SGMLParser.__init__)
-        self.pieces = []
-        SGMLParser.reset(self)
-
-    def unknown_starttag(self, tag, attrs):
-        # called for each start tag
-        # attrs is a list of (attr, value) tuples
-        # e.g. for &lt;pre class=screen>, tag="pre", attrs=[("class", "screen")]
-        # Ideally we would like to reconstruct original tag and attributes, but
-        # we may end up quoting attribute values that weren't quoted in the source
-        # document, or we may change the type of quotes around the attribute value
-        # (single to double quotes).
-        # Note that improperly embedded non-HTML code (like client-side Javascript)
-        # may be parsed incorrectly by the ancestor, causing runtime script errors.
-        # All non-HTML code must be enclosed in HTML comment tags (&lt;!-- code -->)
-        # to ensure that it will pass through this parser unaltered (in handle_comment).
-        strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs])
-        self.pieces.append("&lt;%(tag)s%(strattrs)s>" % locals())
-
-    def unknown_endtag(self, tag):         
-        # called for each end tag, e.g. for &lt;/pre>, tag will be "pre"
-        # Reconstruct the original end tag.
-        self.pieces.append("&lt;/%(tag)s>" % locals())
-
-    def handle_charref(self, ref):         
-        # called for each character reference, e.g. for "&amp;#160;", ref will be "160"
-        # Reconstruct the original character reference.
-        self.pieces.append("&amp;#%(ref)s;" % locals())
-
-    def handle_entityref(self, ref):       
-        # called for each entity reference, e.g. for "&amp;copy;", ref will be "copy"
-        # Reconstruct the original entity reference.
-        self.pieces.append("&amp;%(ref)s" % locals())
-        # standard HTML entities are closed with a semicolon; other entities are not
-        if htmlentitydefs.entitydefs.has_key(ref):
-            self.pieces.append(";")
-
-    def handle_data(self, text):           
-        # called for each block of plain text, i.e. outside of any tag and
-        # not containing any character or entity references
-        # Store the original text verbatim.
-        self.pieces.append(text)
-
-    def handle_comment(self, text):        
-        # called for each HTML comment, e.g. &lt;!-- insert Javascript code here -->
-        # Reconstruct the original comment.
-        # It is especially important that the source document enclose client-side
-        # code (like Javascript) within comments so it can pass through this
-        # processor undisturbed; see comments in unknown_starttag for details.
-        self.pieces.append("&lt;!--%(text)s-->" % locals())
-
-    def handle_pi(self, text):             
-        # called for each processing instruction, e.g. &lt;?instruction>
-        # Reconstruct original processing instruction.
-        self.pieces.append("&lt;?%(text)s>" % locals())
-
-    def handle_decl(self, text):
-        # called for the DOCTYPE, if present, e.g.
-        # &lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
-        #     "http://www.w3.org/TR/html4/loose.dtd">
-        # Reconstruct original DOCTYPE
-        self.pieces.append("&lt;!%(text)s>" % locals())
-
-    def output(self):              
-        """Return processed HTML as a single string"""
-        return "".join(self.pieces)</pre><div class=example><h3>Example 8.2. <code>dialect.py</code></h3><pre><code>
-import re
-from BaseHTMLProcessor import BaseHTMLProcessor
-
-class Dialectizer(BaseHTMLProcessor):
-    subs = ()
-
-    def reset(self):
-        # extend (called from __init__ in ancestor)
-        # Reset all data attributes
-        self.verbatim = 0
-        BaseHTMLProcessor.reset(self)
-
-    def start_pre(self, attrs):            
-        # called for every &lt;pre> tag in HTML source
-        # Increment verbatim mode count, then handle tag like normal
-        self.verbatim += 1                 
-        self.unknown_starttag("pre", attrs)
-
-    def end_pre(self):   
-        # called for every &lt;/pre> tag in HTML source
-        # Decrement verbatim mode count
-        self.unknown_endtag("pre")         
-        self.verbatim -= 1                 
-
-    def handle_data(self, text):    
-        # override
-        # called for every block of text in HTML source
-        # If in verbatim mode, save text unaltered;
-        # otherwise process the text with a series of substitutions
-        self.pieces.append(self.verbatim and text or self.process(text))
-
-    def process(self, text):
-        # called from handle_data
-        # Process text block by performing series of regular expression
-        # substitutions (actual substitions are defined in descendant)
-        for fromPattern, toPattern in self.subs:
-            text = re.sub(fromPattern, toPattern, text)
-        return text
-
-class ChefDialectizer(Dialectizer):
-    """convert HTML to Swedish Chef-speak
-    
-    based on the classic chef.x, copyright (c) 1992, 1993 John Hagerman
-    """
-    subs = ((r'a([nu])', r'u\1'),
-            (r'A([nu])', r'U\1'),
-            (r'a\B', r'e'),
-            (r'A\B', r'E'),
-            (r'en\b', r'ee'),
-            (r'\Bew', r'oo'),
-            (r'\Be\b', r'e-a'),
-            (r'\be', r'i'),
-            (r'\bE', r'I'),
-            (r'\Bf', r'ff'),
-            (r'\Bir', r'ur'),
-            (r'(\w*?)i(\w*?)$', r'\1ee\2'),
-            (r'\bow', r'oo'),
-            (r'\bo', r'oo'),
-            (r'\bO', r'Oo'),
-            (r'the', r'zee'),
-            (r'The', r'Zee'),
-            (r'th\b', r't'),
-            (r'\Btion', r'shun'),
-            (r'\Bu', r'oo'),
-            (r'\BU', r'Oo'),
-            (r'v', r'f'),
-            (r'V', r'F'),
-            (r'w', r'w'),
-            (r'W', r'W'),
-            (r'([a-z])[.]', r'\1. Bork Bork Bork!'))
-
-class FuddDialectizer(Dialectizer):
-    """convert HTML to Elmer Fudd-speak"""
-    subs = ((r'[rl]', r'w'),
-            (r'qu', r'qw'),
-            (r'th\b', r'f'),
-            (r'th', r'd'),
-            (r'n[.]', r'n, uh-hah-hah-hah.'))
-
-class OldeDialectizer(Dialectizer):
-    """convert HTML to mock Middle English"""
-    subs = ((r'i([bcdfghjklmnpqrstvwxyz])e\b', r'y\1'),
-            (r'i([bcdfghjklmnpqrstvwxyz])e', r'y\1\1e'),
-            (r'ick\b', r'yk'),
-            (r'ia([bcdfghjklmnpqrstvwxyz])', r'e\1e'),
-            (r'e[ea]([bcdfghjklmnpqrstvwxyz])', r'e\1e'),
-            (r'([bcdfghjklmnpqrstvwxyz])y', r'\1ee'),
-            (r'([bcdfghjklmnpqrstvwxyz])er', r'\1re'),
-            (r'([aeiou])re\b', r'\1r'),
-            (r'ia([bcdfghjklmnpqrstvwxyz])', r'i\1e'),
-            (r'tion\b', r'cioun'),
-            (r'ion\b', r'ioun'),
-            (r'aid', r'ayde'),
-            (r'ai', r'ey'),
-            (r'ay\b', r'y'),
-            (r'ay', r'ey'),
-            (r'ant', r'aunt'),
-            (r'ea', r'ee'),
-            (r'oa', r'oo'),
-            (r'ue', r'e'),
-            (r'oe', r'o'),
-            (r'ou', r'ow'),
-            (r'ow', r'ou'),
-            (r'\bhe', r'hi'),
-            (r've\b', r'veth'),
-            (r'se\b', r'e'),
-            (r"'s\b", r'es'),
-            (r'ic\b', r'ick'),
-            (r'ics\b', r'icc'),
-            (r'ical\b', r'ick'),
-            (r'tle\b', r'til'),
-            (r'll\b', r'l'),
-            (r'ould\b', r'olde'),
-            (r'own\b', r'oune'),
-            (r'un\b', r'onne'),
-            (r'rry\b', r'rye'),
-            (r'est\b', r'este'),
-            (r'pt\b', r'pte'),
-            (r'th\b', r'the'),
-            (r'ch\b', r'che'),
-            (r'ss\b', r'sse'),
-            (r'([wybdp])\b', r'\1e'),
-            (r'([rnt])\b', r'\1\1e'),
-            (r'from', r'fro'),
-            (r'when', r'whan'))
-
-def translate(url, dialectName="chef"):
-    """fetch URL and translate using dialect
-    
-    dialect in ("chef", "fudd", "olde")"""
-    import urllib    
-    sock = urllib.urlopen(url)         
-    htmlSource = sock.read()           
-    sock.close()     
-    parserName = "%sDialectizer" % dialectName.capitalize()
-    parserClass = globals()[parserName]  
-    parser = parserClass()               
-    parser.feed(htmlSource)
-    parser.close()         
-    return parser.output() 
-
-def test(url):
-    """test all dialects against URL"""
-    for dialect in ("chef", "fudd", "olde"):
-        outfile = "%s.html" % dialect
-        fsock = open(outfile, "wb")
-        fsock.write(translate(url, dialect))
-        fsock.close()
-        import webbrowser
-        webbrowser.open_new(outfile)
-
-if __name__ == "__main__":
-    test("http://diveintopython3.org/odbchelper_list.html")</pre><div class=example><h3>Example 8.3. Output of <code>dialect.py</code></h3>
-<p>Running this script will translate <a href="#odbchelper.list" title="3.2. Introducing Lists">Section 3.2, &#8220;Introducing Lists&#8221;</a> into <a href="../native_data_types/chef.html">mock Swedish Chef-speak</a> (from The Muppets), <a href="../native_data_types/fudd.html">mock Elmer Fudd-speak</a> (from Bugs Bunny cartoons), and <a href="../native_data_types/olde.html">mock Middle English</a> (loosely based on Chaucer's <i class=citetitle>The Canterbury Tales</i>). If you look at the <abbr>HTML</abbr> source of the output pages, you'll see that all the <abbr>HTML</abbr> tags and attributes are untouched, but the text between the tags has been &#8220;translated&#8221; into the mock language. If you look closer, you'll see that, in fact, only the titles and paragraphs were translated; the
-   code listings and screen examples were left untouched.
-<pre><code>
-&lt;div class=abstract>
-&lt;p>Lists awe &lt;span class=application>Pydon&lt;/span>'s wowkhowse datatype.
-If youw onwy expewience wif wists is awways in
-&lt;span class=application>Visuaw Basic&lt;/span> ow (God fowbid) de datastowe
-in &lt;span class=application>Powewbuiwdew&lt;/span>, bwace youwsewf fow
-&lt;span class=application>Pydon&lt;/span> wists.&lt;/p>
-&lt;/div>
-</pre><h2 id="dialect.sgmllib">8.2. Introducing <code>sgmllib.py</code></h2>
-<p><abbr>HTML</abbr> processing is broken into three steps: breaking down the <abbr>HTML</abbr> into its constituent pieces, fiddling with the pieces, and reconstructing the pieces into <abbr>HTML</abbr> again. The first step is done by <code>sgmllib.py</code>, a part of the standard Python library.
-<p>The key to understanding this chapter is to realize that <abbr>HTML</abbr> is not just text, it is structured text. The structure is derived from the more-or-less-hierarchical sequence of start tags
-and end tags. Usually you don't work with <abbr>HTML</abbr> this way; you work with it <em>textually</em> in a text editor, or <em>visually</em> in a web browser or web authoring tool. <code>sgmllib.py</code> presents <abbr>HTML</abbr> <em>structurally</em>.
-<p><code>sgmllib.py</code> contains one important class: <code>SGMLParser</code>. <code>SGMLParser</code> parses <abbr>HTML</abbr> into useful pieces, like start tags and end tags. As soon as it succeeds in breaking down some data into a useful piece,
-it calls a method on itself based on what it found. In order to use the parser, you subclass the <code>SGMLParser</code> class and override these methods. This is what I meant when I said that it presents <abbr>HTML</abbr> <em>structurally</em>: the structure of the <abbr>HTML</abbr> determines the sequence of method calls and the arguments passed to each method.
-<p><code>SGMLParser</code> parses <abbr>HTML</abbr> into 8 kinds of data, and calls a separate method for each of them:
-<div class=variablelist>
-<dl>
-<dt>Start tag</dt>
-<dd>An <abbr>HTML</abbr> tag that starts a block, like <code>&lt;html></code>, <code>&lt;head></code>, <code>&lt;body></code>, or <code>&lt;pre></code>, or a standalone tag like <code>&lt;br></code> or <code>&lt;img></code>. When it finds a start tag <var><code>tagname</code></var>, <code>SGMLParser</code> will look for a method called <code>start_<var><code>tagname</code></var></code> or <code>do_<var><code>tagname</code></var></code>. For instance, when it finds a <code>&lt;pre></code> tag, it will look for a <code>start_pre</code> or <code>do_pre</code> method. If found, <code>SGMLParser</code> calls this method with a list of the tag's attributes; otherwise, it calls <code>unknown_starttag</code> with the tag name and list of attributes.
-</dd>
-<dt>End tag</dt>
-<dd>An <abbr>HTML</abbr> tag that ends a block, like <code>&lt;/html></code>, <code>&lt;/head></code>, <code>&lt;/body></code>, or <code>&lt;/pre></code>. When it finds an end tag, <code>SGMLParser</code> will look for a method called <code>end_<var><code>tagname</code></var></code>. If found, <code>SGMLParser</code> calls this method, otherwise it calls <code>unknown_endtag</code> with the tag name.
-</dd>
-<dt>Character reference</dt>
-<dd>An escaped character referenced by its decimal or hexadecimal equivalent, like <code>&amp;#160;</code>. When found, <code>SGMLParser</code> calls <code>handle_charref</code> with the text of the decimal or hexadecimal character equivalent.
-</dd>
-<dt>Entity reference</dt>
-<dd>An <abbr>HTML</abbr> entity, like <code>&amp;copy;</code>. When found, <code>SGMLParser</code> calls <code>handle_entityref</code> with the name of the <abbr>HTML</abbr> entity.
-</dd>
-<dt>Comment</dt>
-<dd>An <abbr>HTML</abbr> comment, enclosed in <code>&lt;!-- ... --></code>. When found, <code>SGMLParser</code> calls <code>handle_comment</code> with the body of the comment.
-</dd>
-<dt>Processing instruction</dt>
-<dd>An <abbr>HTML</abbr> processing instruction, enclosed in <code>&lt;? ... ></code>. When found, <code>SGMLParser</code> calls <code>handle_pi</code> with the body of the processing instruction.
-</dd>
-<dt>Declaration</dt>
-<dd>An <abbr>HTML</abbr> declaration, such as a <code>DOCTYPE</code>, enclosed in <code>&lt;! ... ></code>. When found, <code>SGMLParser</code> calls <code>handle_decl</code> with the body of the declaration.
-</dd>
-<dt>Text data</dt>
-<dd>A block of text. Anything that doesn't fit into the other 7 categories. When found, <code>SGMLParser</code> calls <code>handle_data</code> with the text.
-</dd>
-</dl>
-<table class=important border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/important.png" alt="Important" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">Python 2.0 had a bug where <code>SGMLParser</code> would not recognize declarations at all (<code>handle_decl</code> would never be called), which meant that <code>DOCTYPE</code>s were silently ignored. This is fixed in Python 2.1.
-<p><code>sgmllib.py</code> comes with a test suite to illustrate this. You can run <code>sgmllib.py</code>, passing the name of an <abbr>HTML</abbr> file on the command line, and it will print out the tags and other elements as it parses them. It does this by subclassing
-the <code>SGMLParser</code> class and defining <code>unknown_starttag</code>, <code>unknown_endtag</code>, <code>handle_data</code> and other methods which simply print their arguments.
-<table id="tip.commandline.windows" class=tip border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/tip.png" alt="Tip" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">In the ActivePython <abbr>IDE</abbr> on Windows, you can specify command line arguments in the &#8220;Run script&#8221; dialog. Separate multiple arguments with spaces.
-<div class=example><h3>Example 8.4. Sample test of <code>sgmllib.py</code></h3>
-<p>Here is a snippet from the table of contents of the <abbr>HTML</abbr> version of this book. Of course your paths may vary. (If you haven't downloaded the <abbr>HTML</abbr> version of the book, you can do so at <a href="http://diveintopython3.org/">http://diveintopython3.org/</a>.
-<pre class=screen>
-<samp class=p>c:\python23\lib></samp> type "c:\downloads\diveintopython3\html\toc\index.html"
-<code>
-&lt;!DOCTYPE html
-  PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
-&lt;html>
-   &lt;head>
-      &lt;meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
-   
-      &lt;title>Dive Into Python&lt;/title>
-      &lt;link rel="stylesheet" href="diveintopython3.css" type="text/css">
-
-... rest of file omitted for brevity ...
-</code></pre><p>Running this through the test suite of <code>sgmllib.py</code> yields this output:<pre class=screen>
-<samp class=p>c:\python23\lib></samp> python sgmllib.py "c:\downloads\diveintopython3\html\toc\index.html"
-<samp>data: '\n\n'
-start tag: &lt;html >
-data: '\n   '
-start tag: &lt;head>
-data: '\n      '
-start tag: &lt;meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" >
-data: '\n   \n      '
-start tag: &lt;title>
-data: 'Dive Into Python'
-end tag: &lt;/title>
-data: '\n      '
-start tag: &lt;link rel="stylesheet" href="diveintopython3.css" type="text/css" >
-data: '\n      '
-
-... rest of output omitted for brevity ...
-</span></pre><p>Here's the roadmap for the rest of the chapter:
-<div class=itemizedlist>
-<ul>
-<li>Subclass <code>SGMLParser</code> to create classes that extract interesting data out of <abbr>HTML</abbr> documents.
-
-<li>Subclass <code>SGMLParser</code> to create <code>BaseHTMLProcessor</code>, which overrides all 8 handler methods and uses them to reconstruct the original <abbr>HTML</abbr> from the pieces.
-
-<li>Subclass <code>BaseHTMLProcessor</code> to create <code>Dialectizer</code>, which adds some methods to process specific <abbr>HTML</abbr> tags specially, and overrides the <code>handle_data</code> method to provide a framework for processing the text blocks between the <abbr>HTML</abbr> tags.
-
-<li>Subclass <code>Dialectizer</code> to create classes that define text processing rules used by <code>Dialectizer.handle_data</code>.
-
-<li>Write a test suite that grabs a real web page from <code>http://diveintopython3.org/</code> and processes it.
-
-</ul>
-<p>Along the way, you'll also learn about <code>locals</code>, <code>globals</code>, and dictionary-based string formatting.
-<h2 id="dialect.extract">8.3. Extracting data from <abbr>HTML</abbr> documents</h2>
-<p>To extract data from <abbr>HTML</abbr> documents, subclass the <code>SGMLParser</code> class and define methods for each tag or entity you want to capture.
-<p>The first step to extracting data from an <abbr>HTML</abbr> document is getting some <abbr>HTML</abbr>. If you have some <abbr>HTML</abbr> lying around on your hard drive, you can use <a href="#fileinfo.files" title="6.2. Working with File Objects">file functions</a> to read it, but the real fun begins when you get <abbr>HTML</abbr> from live web pages.
-<div class=example><h3 id="dialect.extract.urllib">Example 8.5. Introducing <code>urllib</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>import urllib</kbd>   <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>sock = urllib.urlopen("http://diveintopython3.org/")</kbd> <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>htmlSource = sock.read()</kbd>          <span>&#x2462;</span>
-<samp class=p>>>> </samp><kbd>sock.close()</kbd>    <span>&#x2463;</span>
-<samp class=p>>>> </samp><kbd>print htmlSource</kbd><span>&#x2464;</span>
-<samp>&lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">&lt;html>&lt;head>
-      &lt;meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1'>
-   &lt;title>Dive Into Python&lt;/title>
-&lt;link rel='stylesheet' href='diveintopython3.css' type='text/css'>
-&lt;link rev='made' href='mailto:mark@diveintopython3.org'>
-&lt;meta name='keywords' content='Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free'>
-&lt;meta name='description' content='a free Python tutorial for experienced programmers'>
-&lt;/head>
-&lt;body bgcolor='white' text='black' link='#0000FF' vlink='#840084' alink='#0000FF'>
-&lt;table cellpadding='0' cellspacing='0' border='0' width='100%'>
-&lt;tr>&lt;td class='header' width='1%' valign='top'>diveintopython3.org&lt;/td>
-&lt;td width='99%' align='right'>&lt;hr size='1' noshade>&lt;/td>&lt;/tr>
-&lt;tr>&lt;td class='tagline' colspan='2'>Python&amp;nbsp;for&amp;nbsp;experienced&amp;nbsp;programmers&lt;/td>&lt;/tr></span>
-
-[...snip...]</pre>
-<ol>
-<li>The <code>urllib</code> module is part of the standard Python library. It contains functions for getting information about and actually retrieving data from Internet-based <abbr>URL</abbr>s (mainly web pages).
-<li>The simplest use of <code>urllib</code> is to retrieve the entire text of a web page using the <code>urlopen</code> function. Opening a <abbr>URL</abbr> is similar to <a href="#fileinfo.files" title="6.2. Working with File Objects">opening a file</a>. The return value of <code>urlopen</code> is a file-like object, which has some of the same methods as a file object.
-<li>The simplest thing to do with the file-like object returned by <code>urlopen</code> is <code>read</code>, which reads the entire <abbr>HTML</abbr> of the web page into a single string. The object also supports <code>readlines</code>, which reads the text line by line into a list.
-<li>When you're done with the object, make sure to <code>close</code> it, just like a normal file object.
-<li>You now have the complete <abbr>HTML</abbr> of the home page of <code>http://diveintopython3.org/</code> in a string, and you're ready to parse it.
-<div class=example><h3 id="dialect.extract.links">Example 8.6. Introducing <code>urllister.py</code></h3>
-<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
-<pre><code>
-from sgmllib import SGMLParser
-
-class URLLister(SGMLParser):
-    def reset(self):            <span>&#x2460;</span>
-        SGMLParser.reset(self)
-        self.urls = []
-
-    def start_a(self, attrs):   <span>&#x2461;</span>
-        href = [v for k, v in attrs if k=='href'] <span>&#x2462;</span> <span>&#x2463;</span>
-        if href:
-            self.urls.extend(href)</pre>
-<ol>
-<li><code>reset</code> is called by the <code>__init__</code> method of <code>SGMLParser</code>, and it can also be called manually once an instance of the parser has been created. So if you need to do any initialization,
-            do it in <code>reset</code>, not in <code>__init__</code>, so that it will be re-initialized properly when someone re-uses a parser instance.
-<li><code>start_a</code> is called by <code>SGMLParser</code> whenever it finds an <code>&lt;a></code> tag. The tag may contain an <code>href</code> attribute, and/or other attributes, like <code>name</code> or <code>title</code>. The <var>attrs</var> parameter is a list of tuples, <code>[(<var>attribute</var>, <var>value</var>), (<var>attribute</var>, <var>value</var>), ...]</code>. Or it may be just an <code>&lt;a></code>, a valid (if useless) <abbr>HTML</abbr> tag, in which case <var>attrs</var> would be an empty list.
-<li>You can find out whether this <code>&lt;a></code> tag has an <code>href</code> attribute with a simple <a href="#odbchelper.multiassign" title="3.4.2. Assigning Multiple Values at Once">multi-variable</a> <a href="#odbchelper.map" title="3.6. Mapping Lists">list comprehension</a>.
-<li>String comparisons like <code>k=='href'</code> are always case-sensitive, but that's safe in this case, because <code>SGMLParser</code> converts attribute names to lowercase while building <var>attrs</var>.
-<div class=example><h3 id="dialect.feed.example">Example 8.7. Using <code>urllister.py</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>import urllib, urllister</kbd>
-<samp class=p>>>> </samp><kbd>usock = urllib.urlopen("http://diveintopython3.org/")</kbd>
-<samp class=p>>>> </samp><kbd>parser = urllister.URLLister()</kbd>
-<samp class=p>>>> </samp><kbd>parser.feed(usock.read())</kbd>         <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>usock.close()</kbd>   <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>parser.close()</kbd>  <span>&#x2462;</span>
-<samp class=p>>>> </samp><kbd>for url in parser.urls: print url</kbd> <span>&#x2463;</span>
-<samp>toc/index.html
-#download
-#languages
-toc/index.html
-appendix/history.html
-download/diveintopython3-html-5.0.zip
-download/diveintopython3-pdf-5.0.zip
-download/diveintopython3-word-5.0.zip
-download/diveintopython3-text-5.0.zip
-download/diveintopython3-html-flat-5.0.zip
-download/diveintopython3-xml-5.0.zip
-download/diveintopython3-common-5.0.zip
-</span>
-
-... rest of output omitted for brevity ...</pre>
-<ol>
-<li>Call the <code>feed</code> method, defined in <code>SGMLParser</code>, to get <abbr>HTML</abbr> into the parser.
-<sup>[<a name="d0e20503" href="#ftn.d0e20503">1</a>]</sup>  It takes a string, which is what <code>usock.read()</code> returns.
-<li>Like files, you should <code>close</code> your <abbr>URL</abbr> objects as soon as you're done with them.
-<li>You should <code>close</code> your parser object, too, but for a different reason. You've read all the data and fed it to the parser, but the <code>feed</code> method isn't guaranteed to have actually processed all the <abbr>HTML</abbr> you give it; it may buffer it, waiting for more. Be sure to call <code>close</code> to flush the buffer and force everything to be fully parsed.
-<li>Once the parser is <code>close</code>d, the parsing is complete, and <var>parser.urls</var> contains a list of all the linked <abbr>URL</abbr>s in the <abbr>HTML</abbr> document. (Your output may look different, if the download links have been updated by the time you read this.)
-<h2 id="dialect.basehtml">8.4. Introducing <code>BaseHTMLProcessor.py</code></h2>
-<p><code>SGMLParser</code> doesn't produce anything by itself. It parses and parses and parses, and it calls a method for each interesting thing it
-   finds, but the methods don't do anything. <code>SGMLParser</code> is an <abbr>HTML</abbr> <em>consumer</em>: it takes <abbr>HTML</abbr> and breaks it down into small, structured pieces. As you saw in the <a href="#dialect.extract" title="8.3. Extracting data from HTML documents">previous section</a>, you can subclass <code>SGMLParser</code> to define classes that catch specific tags and produce useful things, like a list of all the links on a web page. Now you'll
-   take this one step further by defining a class that catches everything <code>SGMLParser</code> throws at it and reconstructs the complete <abbr>HTML</abbr> document. In technical terms, this class will be an <abbr>HTML</abbr> <em>producer</em>.
-<p><code>BaseHTMLProcessor</code> subclasses <code>SGMLParser</code> and provides all 8 essential handler methods: <code>unknown_starttag</code>, <code>unknown_endtag</code>, <code>handle_charref</code>, <code>handle_entityref</code>, <code>handle_comment</code>, <code>handle_pi</code>, <code>handle_decl</code>, and <code>handle_data</code>.
-<div class=example><h3 id="dialect.basehtml.intro">Example 8.8. Introducing <code>BaseHTMLProcessor</code></h3><pre><code>
-class BaseHTMLProcessor(SGMLParser):
-    def reset(self):      <span>&#x2460;</span>
-        self.pieces = []
-        SGMLParser.reset(self)
-
-    def unknown_starttag(self, tag, attrs): <span>&#x2461;</span>
-        strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs])
-        self.pieces.append("&lt;%(tag)s%(strattrs)s>" % locals())
-
-    def unknown_endtag(self, tag):          <span>&#x2462;</span>
-        self.pieces.append("&lt;/%(tag)s>" % locals())
-
-    def handle_charref(self, ref):          <span>&#x2463;</span>
-        self.pieces.append("&amp;#%(ref)s;" % locals())
-
-    def handle_entityref(self, ref):        <span>&#x2464;</span>
-        self.pieces.append("&amp;%(ref)s" % locals())
-        if htmlentitydefs.entitydefs.has_key(ref):
-            self.pieces.append(";")
-
-    def handle_data(self, text):            <span>&#x2465;</span>
-        self.pieces.append(text)
-
-    def handle_comment(self, text):         <span>&#x2466;</span>
-        self.pieces.append("&lt;!--%(text)s-->" % locals())
-
-    def handle_pi(self, text):              <span>&#x2467;</span>
-        self.pieces.append("&lt;?%(text)s>" % locals())
-
-    def handle_decl(self, text):
-        self.pieces.append("&lt;!%(text)s>" % locals())</pre>
-<ol>
-<li><code>reset</code>, called by <code>SGMLParser.__init__</code>, initializes <var>self.pieces</var> as an empty list before <a href="#fileinfo.init.code.example" title="Example 5.6. Coding the FileInfo Class">calling the ancestor method</a>. <var>self.pieces</var> is a <a href="#fileinfo.userdict.init.example" title="Example 5.9. Defining the UserDict Class">data attribute</a> which will hold the pieces of the <abbr>HTML</abbr> document you're constructing. Each handler method will reconstruct the <abbr>HTML</abbr> that <code>SGMLParser</code> parsed, and each method will append that string to <var>self.pieces</var>. Note that <var>self.pieces</var> is a list. You might be tempted to define it as a string and just keep appending each piece to it. That would work, but
-Python is much more efficient at dealing with lists.
-<sup>[<a name="d0e20702" href="#ftn.d0e20702">2</a>]</sup><li>Since <code>BaseHTMLProcessor</code> does not define any methods for specific tags (like the <code>start_a</code> method in <a href="#dialect.extract.links" title="Example 8.6. Introducing urllister.py"><code>URLLister</code></a>), <code>SGMLParser</code> will call <code>unknown_starttag</code> for every start tag. This method takes the tag (<var>tag</var>) and the list of attribute name/value pairs (<var>attrs</var>), reconstructs the original <abbr>HTML</abbr>, and appends it to <var>self.pieces</var>. The <a href="#odbchelper.stringformatting" title="3.5. Formatting Strings">string formatting</a> here is a little strange; you'll untangle that (and also the odd-looking <code>locals</code> function) later in this chapter.
-<li>Reconstructing end tags is much simpler; just take the tag name and wrap it in the <code>&lt;/...></code> brackets.
-<li>When <code>SGMLParser</code> finds a character reference, it calls <code>handle_charref</code> with the bare reference. If the <abbr>HTML</abbr> document contains the reference <code>&amp;#160;</code>, <var>ref</var> will be <code>160</code>. Reconstructing the original complete character reference just involves wrapping <var>ref</var> in <code>&amp;#...;</code> characters.
-<li>Entity references are similar to character references, but without the hash mark. Reconstructing the original entity reference
-            requires wrapping <var>ref</var> in <code>&amp;...;</code> characters. (Actually, as an erudite reader pointed out to me, it's slightly more complicated than this. Only certain standard
-<abbr>HTML</abbr> entites end in a semicolon; other similar-looking entities do not. Luckily for us, the set of standard <abbr>HTML</abbr> entities is defined in a dictionary in a Python module called <code>htmlentitydefs</code>. Hence the extra <code>if</code> statement.)
-<li>Blocks of text are simply appended to <var>self.pieces</var> unaltered.
-<li><abbr>HTML</abbr> comments are wrapped in <code>&lt;!--...--></code> characters.
-<li>Processing instructions are wrapped in <code>&lt;?...></code> characters.
-<table class=important border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/important.png" alt="Important" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">The <abbr>HTML</abbr> specification requires that all non-<abbr>HTML</abbr> (like client-side JavaScript) must be enclosed in <abbr>HTML</abbr> comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't). <code>BaseHTMLProcessor</code> is not forgiving; if script is improperly embedded, it will be parsed as if it were <abbr>HTML</abbr>. For instance, if the script contains less-than and equals signs, <code>SGMLParser</code> may incorrectly think that it has found tags and attributes. <code>SGMLParser</code> always converts tags and attribute names to lowercase, which may break the script, and <code>BaseHTMLProcessor</code> always encloses attribute values in double quotes (even if the original <abbr>HTML</abbr> document used single quotes or no quotes), which will certainly break the script. Always protect your client-side script
-      within <abbr>HTML</abbr> comments.
-<div class=example><h3 id="dialect.output.example">Example 8.9. <code>BaseHTMLProcessor</code> output</h3><pre><code>
-    def output(self):               <span>&#x2460;</span>
-        """Return processed HTML as a single string"""
-        return "".join(self.pieces) <span>&#x2461;</span></pre>
-<ol>
-<li>This is the one method in <code>BaseHTMLProcessor</code> that is never called by the ancestor <code>SGMLParser</code>. Since the other handler methods store their reconstructed <abbr>HTML</abbr> in <var>self.pieces</var>, this function is needed to join all those pieces into one string. As noted before, Python is great at lists and mediocre at strings, so you only create the complete string when somebody explicitly asks for it.
-<li>If you prefer, you could use the <code>join</code> method of the <code>string</code> module instead: <code>string.join(self.pieces, "")</code><div class=itemizedlist>
-<h3>Further reading</h3>
-<ul>
-<li><a href="http://www.w3.org/">W3C</a> discusses <a href="http://www.w3.org/TR/REC-html40/charset.html#entities">character and entity references</a>.
-
-<li><a href="http://www.python.org/doc/current/lib/"><i class=citetitle>Python Library Reference</i></a> confirms your suspicions that <a href="http://www.python.org/doc/current/lib/module-htmlentitydefs.html">the <code>htmlentitydefs</code> module</a> is exactly what it sounds like.
-
-</ul>
 <h2 id="dialect.locals">8.5. <code>locals</code> and <code>globals</code></h2>
 <p>Let's digress from <abbr>HTML</abbr> processing for a minute and talk about how Python handles variables. Python has two built-in functions, <code>locals</code> and <code>globals</code>, which provide dictionary-based access to local and global variables.
 <p>Remember <code>locals</code>?  You first saw it here:
@@ -3050,605 +2016,17 @@ print "z=",z          <span>&#x2464;</span>
 <li>This prints <code>x= 1</code>, not <code>x= 2</code>.
 <li>After being burned by <code>locals</code>, you might think that this <em>wouldn't</em> change the value of <var>z</var>, but it does. Due to internal differences in how Python is implemented (which I'd rather not go into, since I don't fully understand them myself), <code>globals</code> returns the actual global namespace, not a copy: the exact opposite behavior of <code>locals</code>. So any changes to the dictionary returned by <code>globals</code> directly affect your global variables.
 <li>This prints <code>z= 8</code>, not <code>z= 7</code>.
-<h2 id="dialect.dictsub">8.6. Dictionary-based string formatting</h2>
-<p>Why did you learn about <code>locals</code> and <code>globals</code>?  So you can learn about dictionary-based string formatting. As you recall, <a href="#odbchelper.stringformatting" title="3.5. Formatting Strings">regular string formatting</a> provides an easy way to insert values into strings. Values are listed in a tuple and inserted in order into the string in
-place of each formatting marker. While this is efficient, it is not always the easiest code to read, especially when multiple
-values are being inserted. You can't simply scan through the string in one pass and understand what the result will be; you're
-constantly switching between reading the string and reading the tuple of values.
-<p>There is an alternative form of string formatting that uses dictionaries instead of tuples of values.
-<div class=example><h3>Example 8.13. Introducing dictionary-based string formatting</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}</kbd>
-<samp class=p>>>> </samp><kbd>"%(pwd)s" % params</kbd><span>&#x2460;</span>
-'secret'
-<samp class=p>>>> </samp><kbd>"%(pwd)s is not a good password for %(uid)s" % params</kbd> <span>&#x2461;</span>
-'secret is not a good password for sa'
-<samp class=p>>>> </samp><kbd>"%(database)s of mind, %(database)s of body" % params</kbd> <span>&#x2462;</span>
-'master of mind, master of body'</pre>
-<ol>
-<li>Instead of a tuple of explicit values, this form of string formatting uses a dictionary, <var>params</var>. And instead of a simple <code>%s</code> marker in the string, the marker contains a name in parentheses. This name is used as a key in the <var>params</var> dictionary and subsitutes the corresponding value, <code>secret</code>, in place of the <code>%(pwd)s</code> marker.
-<li>Dictionary-based string formatting works with any number of named keys. Each key must exist in the given dictionary, or the
-            formatting will fail with a <code>KeyError</code>.
-<li>You can even specify the same key twice; each occurrence will be replaced with the same value.
-<p>So why would you use dictionary-based string formatting?  Well, it does seem like overkill to set up a dictionary of keys
-and values simply to do string formatting in the next line; it's really most useful when you happen to have a dictionary of
-meaningful keys and values already. Like <a href="#dialect.locals" title="8.5. locals and globals"><code>locals</code></a>.
-<div class=example><h3 id="dialect.unknownstarttag">Example 8.14. Dictionary-based string formatting in <code>BaseHTMLProcessor.py</code></h3><pre><code>
-    def handle_comment(self, text):        
-        self.pieces.append("&lt;!--%(text)s-->" % locals()) <span>&#x2460;</span>
-</pre>
-<ol>
-<li>Using the built-in <code>locals</code> function is the most common use of dictionary-based string formatting. It means that you can use the names of local variables
-            within your string (in this case, <var>text</var>, which was passed to the class method as an argument) and each named variable will be replaced by its value. If <var>text</var> is <code>'Begin page footer'</code>, the string formatting <code>"&lt;!--%(text)s-->" % locals()</code> will resolve to the string <code>'&lt;!--Begin page footer-->'</code>.
-<div class=example><h3>Example 8.15. More dictionary-based string formatting</h3><pre><code>
-    def unknown_starttag(self, tag, attrs):
-        strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs]) <span>&#x2460;</span>
-        self.pieces.append("&lt;%(tag)s%(strattrs)s>" % locals())    <span>&#x2461;</span>
-</pre>
-<ol>
-<li>When this method is called, <var>attrs</var> is a list of key/value tuples, just like the <a href="#odbchelper.items" title="Example 3.25. The keys, values, and items Functions"><code>items</code> of a dictionary</a>, which means you can use <a href="#odbchelper.multiassign" title="3.4.2. Assigning Multiple Values at Once">multi-variable assignment</a> to iterate through it. This should be a familiar pattern by now, but there's a lot going on here, so let's break it down:
-<div class=orderedlist>
-<ol type="a">
-<li>Suppose <var>attrs</var> is <code>[('href', 'index.html'), ('title', 'Go to home page')]</code>.
 
-<li>In the first round of the list comprehension, <var>key</var> will get <code>'href'</code>, and <var>value</var> will get <code>'index.html'</code>.
 
-<li>The string formatting <code>' %s="%s"' % (key, value)</code> will resolve to <code>' href="index.html"'</code>. This string becomes the first element of the list comprehension's return value.
 
-<li>In the second round, <var>key</var> will get <code>'title'</code>, and <var>value</var> will get <code>'Go to home page'</code>.
 
-<li>The string formatting will resolve to <code>' title="Go to home page"'</code>.
 
-<li>The list comprehension returns a list of these two resolved strings, and <var>strattrs</var> will join both elements of this list together to form <code>' href="index.html" title="Go to home page"'</code>.
+[XML stuff was here]
 
-</ol>
-<li>Now, using dictionary-based string formatting, you insert the value of <var>tag</var> and <var>strattrs</var> into a string. So if <var>tag</var> is <code>'a'</code>, the final result would be <code>'&lt;a href="index.html" title="Go to home page">'</code>, and that is what gets appended to <var>self.pieces</var>.
-<table class=important border="0" summary="">
 
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/important.png" alt="Important" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">Using dictionary-based string formatting with <code>locals</code> is a convenient way of making complex string formatting expressions more readable, but it comes with a price. There is a
-      slight performance hit in making the call to <code>locals</code>, since <a href="#dialect.locals.readonly.example" title="Example 8.12. locals is read-only, globals is not"><code>locals</code> builds a copy</a> of the local namespace.
-<h2 id="dialect.quoting">8.7. Quoting attribute values</h2>
-<p>A common question on <a href="http://groups.google.com/groups?group=comp.lang.python">comp.lang.python</a> is &#8220;I have a bunch of <abbr>HTML</abbr> documents with unquoted attribute values, and I want to properly quote them all. How can I do this?&#8221;<sup>[<a name="d0e21764" href="#ftn.d0e21764">4</a>]</sup>  (This is generally precipitated by a project manager who has found the <abbr>HTML</abbr>-is-a-standard religion joining a large project and proclaiming that all pages must validate against an <abbr>HTML</abbr> validator. Unquoted attribute values are a common violation of the <abbr>HTML</abbr> standard.)  Whatever the reason, unquoted attribute values are easy to fix by feeding <abbr>HTML</abbr> through <code>BaseHTMLProcessor</code>.
-<p><code>BaseHTMLProcessor</code> consumes <abbr>HTML</abbr> (since it's descended from <code>SGMLParser</code>) and produces equivalent <abbr>HTML</abbr>, but the <abbr>HTML</abbr> output is not identical to the input. Tags and attribute names will end up in lowercase, even if they started in uppercase
-or mixed case, and attribute values will be enclosed in double quotes, even if they started in single quotes or with no quotes
-at all. It is this last side effect that you can take advantage of.
-<div class=example><h3 id="dialect.quoting.example">Example 8.16. Quoting attribute values</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>htmlSource = """</kbd>        <span>&#x2460;</span>
-<samp class=p>...    </samp>&lt;html>
-<samp class=p>...    </samp>&lt;head>
-<samp class=p>...    </samp>&lt;title>Test page&lt;/title>
-<samp class=p>...    </samp>&lt;/head>
-<samp class=p>...    </samp>&lt;body>
-<samp class=p>...    </samp>&lt;ul>
-<samp class=p>...    </samp>&lt;li>&lt;a href=index.html>Home&lt;/a>&lt;/li>
-<samp class=p>...    </samp>&lt;li>&lt;a href=toc.html>Table of contents&lt;/a>&lt;/li>
-<samp class=p>...    </samp>&lt;li>&lt;a href=history.html>Revision history&lt;/a>&lt;/li>
-<samp class=p>...    </samp>&lt;/body>
-<samp class=p>...    </samp>&lt;/html>
-<samp class=p>...    </samp>"""
-<samp class=p>>>> </samp><kbd>from BaseHTMLProcessor import BaseHTMLProcessor</kbd>
-<samp class=p>>>> </samp><kbd>parser = BaseHTMLProcessor()</kbd>
-<samp class=p>>>> </samp><kbd>parser.feed(htmlSource)</kbd> <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>print parser.output()</kbd>   <span>&#x2462;</span>
-<samp>&lt;html>
-&lt;head>
-&lt;title>Test page&lt;/title>
-&lt;/head>
-&lt;body>
-&lt;ul>
-&lt;li>&lt;a href="index.html">Home&lt;/a>&lt;/li>
-&lt;li>&lt;a href="toc.html">Table of contents&lt;/a>&lt;/li>
-&lt;li>&lt;a href="history.html">Revision history&lt;/a>&lt;/li>
-&lt;/body>
-&lt;/html></span></pre>
-<ol>
-<li>Note that the attribute values of the <code>href</code> attributes in the <code>&lt;a></code> tags are not properly quoted. (Also note that you're using <a href="#odbchelper.triplequotes" title="Example 2.2. Defining the buildConnectionString Function's docstring">triple quotes</a> for something other than a <code>docstring</code>. And directly in the <abbr>IDE</abbr>, no less. They're very useful.)
-<li>Feed the parser.
-<li>Using the <code>output</code> function defined in <code>BaseHTMLProcessor</code>, you get the output as a single string, complete with quoted attribute values. While this may seem anti-climactic, think
-            about how much has actually happened here: <code>SGMLParser</code> parsed the entire <abbr>HTML</abbr> document, breaking it down into tags, refs, data, and so forth; <code>BaseHTMLProcessor</code> used those elements to reconstruct pieces of <abbr>HTML</abbr> (which are still stored in <var>parser.pieces</var>, if you want to see them); finally, you called <code>parser.output</code>, which joined all the pieces of <abbr>HTML</abbr> into one string.
-<h2 id="dialect.dialectizer">8.8. Introducing <code>dialect.py</code></h2>
-<p><code>Dialectizer</code> is a simple (and silly) descendant of <code>BaseHTMLProcessor</code>. It runs blocks of text through a series of substitutions, but it makes sure that anything within a <code><code>&lt;pre></code>...<code>&lt;/pre></code></code> block passes through unaltered.
-<p>To handle the <code>&lt;pre></code> blocks, you define two methods in <code>Dialectizer</code>: <code>start_pre</code> and <code>end_pre</code>.
-<div class=example><h3 id="dialect.specifictags.example">Example 8.17. Handling specific tags</h3><pre><code>
-    def start_pre(self, attrs):             <span>&#x2460;</span>
-        self.verbatim += 1<span>&#x2461;</span>
-        self.unknown_starttag("pre", attrs) <span>&#x2462;</span>
 
-    def end_pre(self):    <span>&#x2463;</span>
-        self.unknown_endtag("pre")          <span>&#x2464;</span>
-        self.verbatim -= 1<span>&#x2465;</span></pre>
-<ol>
-<li><code>start_pre</code> is called every time <code>SGMLParser</code> finds a <code>&lt;pre></code> tag in the <abbr>HTML</abbr> source. (In a minute, you'll see exactly how this happens.)  The method takes a single parameter, <var>attrs</var>, which contains the attributes of the tag (if any). <var>attrs</var> is a list of key/value tuples, just like <a href="#dialect.unknownstarttag" title="Example 8.14. Dictionary-based string formatting in BaseHTMLProcessor.py"><code>unknown_starttag</code></a> takes.
-<li>In the <code>reset</code> method, you initialize a data attribute that serves as a counter for <code>&lt;pre></code> tags. Every time you hit a <code>&lt;pre></code> tag, you increment the counter; every time you hit a <code>&lt;/pre></code> tag, you'll decrement the counter. (You could just use this as a flag and set it to <code>1</code> and reset it to <code>0</code>, but it's just as easy to do it this way, and this handles the odd (but possible) case of nested <code>&lt;pre></code> tags.)  In a minute, you'll see how this counter is put to good use.
-<li>That's it, that's the only special processing you do for <code>&lt;pre></code> tags. Now you pass the list of attributes along to <code>unknown_starttag</code> so it can do the default processing.
-<li><code>end_pre</code> is called every time <code>SGMLParser</code> finds a <code>&lt;/pre></code> tag. Since end tags can not contain attributes, the method takes no parameters.
-<li>First, you want to do the default processing, just like any other end tag.
-<li>Second, you decrement your counter to signal that this <code>&lt;pre></code> block has been closed.
-<p>At this point, it's worth digging a little further into <code>SGMLParser</code>. I've claimed repeatedly (and you've taken it on faith so far) that <code>SGMLParser</code> looks for and calls specific methods for each tag, if they exist. For instance, you just saw the definition of <code>start_pre</code> and <code>end_pre</code> to handle <code>&lt;pre></code> and <code>&lt;/pre></code>. But how does this happen?  Well, it's not magic, it's just good Python coding.
-<div class=example><h3 id="dialect.dialectizer.example">Example 8.18. <code>SGMLParser</code></h3><pre><code>
-    def finish_starttag(self, tag, attrs):               <span>&#x2460;</span>
-        try:        
-            method = getattr(self, 'start_' + tag)       <span>&#x2461;</span>
-        except AttributeError:         <span>&#x2462;</span>
-            try:    
-                method = getattr(self, 'do_' + tag)      <span>&#x2463;</span>
-            except AttributeError:    
-                self.unknown_starttag(tag, attrs)        <span>&#x2464;</span>
-                return -1             
-            else:   
-                self.handle_starttag(tag, method, attrs) <span>&#x2465;</span>
-                return 0              
-        else:       
-            self.stack.append(tag)    
-            self.handle_starttag(tag, method, attrs)    
-            return 1 <span>&#x2466;</span>
 
-    def handle_starttag(self, tag, method, attrs):      
-        method(attrs)<span>&#x2467;</span></pre>
-<ol>
-<li>At this point, <code>SGMLParser</code> has already found a start tag and parsed the attribute list. The only thing left to do is figure out whether there is a
-            specific handler method for this tag, or whether you should fall back on the default method (<code>unknown_starttag</code>).
-<li>The &#8220;magic&#8221; of <code>SGMLParser</code> is nothing more than your old friend, <a href="#apihelper.getattr" title="4.4. Getting Object References With getattr"><code>getattr</code></a>. What you may not have realized before is that <code>getattr</code> will find methods defined in descendants of an object as well as the object itself. Here the object is <var>self</var>, the current instance. So if <var>tag</var> is <code>'pre'</code>, this call to <code>getattr</code> will look for a <code>start_pre</code> method on the current instance, which is an instance of the <code>Dialectizer</code> class.
-<li><code>getattr</code> raises an <code>AttributeError</code> if the method it's looking for doesn't exist in the object (or any of its descendants), but that's okay, because you wrapped
-            the call to <code>getattr</code> inside a <a href="#fileinfo.exception" title="6.1. Handling Exceptions"><code>try...except</code></a> block and explicitly caught the <code>AttributeError</code>.
-<li>Since you didn't find a <code>start_xxx</code> method, you'll also look for a <code>do_xxx</code> method before giving up. This alternate naming scheme is generally used for standalone tags, like <code>&lt;br></code>, which have no corresponding end tag. But you can use either naming scheme; as you can see, <code>SGMLParser</code> tries both for every tag. (You shouldn't define both a <code>start_xxx</code> and <code>do_xxx</code> handler method for the same tag, though; only the <code>start_xxx</code> method will get called.)
-<li>Another <code>AttributeError</code>, which means that the call to <code>getattr</code> failed with <code>do_xxx</code>. Since you found neither a <code>start_xxx</code> nor a <code>do_xxx</code> method for this tag, you catch the exception and fall back on the default method, <code>unknown_starttag</code>.
-<li>Remember, <code>try...except</code> blocks can have an <code>else</code> clause, which is called if <a href="#crossplatform.example" title="Example 6.2. Supporting Platform-Specific Functionality">no exception is raised</a> during the <code>try...except</code> block. Logically, that means that you <em>did</em> find a <code>do_xxx</code> method for this tag, so you're going to call it.
-<li>By the way, don't worry about these different return values; in theory they mean something, but they're never actually used.
-             Don't worry about the <code>self.stack.append(tag)</code> either; <code>SGMLParser</code> keeps track internally of whether your start tags are balanced by appropriate end tags, but it doesn't do anything with this
-            information either. In theory, you could use this module to validate that your tags were fully balanced, but it's probably
-            not worth it, and it's beyond the scope of this chapter. You have better things to worry about right now.
-<li><code>start_xxx</code> and <code>do_xxx</code> methods are not called directly; the tag, method, and attributes are passed to this function, <code>handle_starttag</code>, so that descendants can override it and change the way <em>all</em> start tags are dispatched. You don't need that level of control, so you just let this method do its thing, which is to call
-            the method (<code>start_xxx</code> or <code>do_xxx</code>) with the list of attributes. Remember, <var>method</var> is a function, returned from <code>getattr</code>, and functions are objects. (I know you're getting tired of hearing it, and I promise I'll stop saying it as soon as I run
-            out of ways to use it to my advantage.)  Here, the function object is passed into this dispatch method as an argument, and
-            this method turns around and calls the function. At this point, you don't need to know what the function is, what it's named,
-            or where it's defined; the only thing you need to know about the function is that it is called with one argument, <var>attrs</var>.
-<p>Now back to our regularly scheduled program: <code>Dialectizer</code>. When you left, you were in the process of defining specific handler methods for <code>&lt;pre></code> and <code>&lt;/pre></code> tags. There's only one thing left to do, and that is to process text blocks with the pre-defined substitutions. For that,
-you need to override the <code>handle_data</code> method.
-<div class=example><h3>Example 8.19. Overriding the <code>handle_data</code> method</h3><pre><code>
-    def handle_data(self, text):     <span>&#x2460;</span>
-        self.pieces.append(self.verbatim and text or self.process(text)) <span>&#x2461;</span></pre>
-<ol>
-<li><code>handle_data</code> is called with only one argument, the text to process.
-<li>In the ancestor <a href="#dialect.basehtml.intro" title="Example 8.8. Introducing BaseHTMLProcessor"><code>BaseHTMLProcessor</code></a>, the <code>handle_data</code> method simply appended the text to the output buffer, <var>self.pieces</var>. Here the logic is only slightly more complicated. If you're in the middle of a <code><code>&lt;pre></code>...<code>&lt;/pre></code></code> block, <var>self.verbatim</var> will be some value greater than <code>0</code>, and you want to put the text in the output buffer unaltered. Otherwise, you will call a separate method to process the
-            substitutions, then put the result of that into the output buffer. In Python, this is a one-liner, using <a href="#apihelper.andortrick.intro" title="Example 4.17. Introducing the and-or Trick">the <code>and-or</code> trick</a>.
-<p>You're close to completely understanding <code>Dialectizer</code>. The only missing link is the nature of the text substitutions themselves. If you know any Perl, you know that when complex text substitutions are required, the only real solution is regular expressions. The classes
-later in <code>dialect.py</code> define a series of regular expressions that operate on the text between the <abbr>HTML</abbr> tags. But you just had <a href="#re" title="Chapter 7. Regular Expressions">a whole chapter on regular expressions</a>. You don't really want to slog through regular expressions again, do you?  God knows I don't. I think you've learned enough
-for one chapter.
-<h2 id="dialect.alltogether">8.9. Putting it all together</h2>
-<p>It's time to put everything you've learned so far to good use. I hope you were paying attention.
-<div class=example><h3>Example 8.20. The <code>translate</code> function, part 1</h3><pre><code>
-def translate(url, dialectName="chef"): <span>&#x2460;</span>
-    import urllib     <span>&#x2461;</span>
-    sock = urllib.urlopen(url)          <span>&#x2462;</span>
-    htmlSource = sock.read()           
-    sock.close()     
-</pre>
-<ol>
-<li>The <code>translate</code> function has an <a href="#apihelper.optional" title="4.2. Using Optional and Named Arguments">optional argument</a> <var>dialectName</var>, which is a string that specifies the dialect you'll be using. You'll see how this is used in a minute.
-<li>Hey, wait a minute, there's an <a href="#odbchelper.import" title="Example 2.3. Accessing the buildConnectionString Function's docstring"><code>import</code></a> statement in this function!  That's perfectly legal in Python. You're used to seeing <code>import</code> statements at the top of a program, which means that the imported module is available anywhere in the program. But you can
-            also import modules within a function, which means that the imported module is only available within the function. If you
-            have a module that is only ever used in one function, this is an easy way to make your code more modular. (When you find
-            that your weekend hack has turned into an 800-line work of art and decide to split it up into a dozen reusable modules, you'll
-            appreciate this.)
-<li>Now you <a href="#dialect.extract.urllib" title="Example 8.5. Introducing urllib">get the source of the given URL</a>.
-<div class=example><h3>Example 8.21. The <code>translate</code> function, part 2: curiouser and curiouser</h3><pre><code>
-    parserName = "%sDialectizer" % dialectName.capitalize() <span>&#x2460;</span>
-    parserClass = globals()[parserName]   <span>&#x2461;</span>
-    parser = parserClass()                <span>&#x2462;</span>
-</pre>
-<ol>
-<li><code>capitalize</code> is a string method you haven't seen before; it simply capitalizes the first letter of a string and forces everything else
-            to lowercase. Combined with some <a href="#odbchelper.stringformatting" title="3.5. Formatting Strings">string formatting</a>, you've taken the name of a dialect and transformed it into the name of the corresponding Dialectizer class. If <var>dialectName</var> is the string <code>'chef'</code>, <var>parserName</var> will be the string <code>'ChefDialectizer'</code>.
-<li>You have the name of a class as a string (<var>parserName</var>), and you have the global namespace as a dictionary (<code>globals</code>()). Combined, you can get a reference to the class which the string names. (Remember, <a href="#fileinfo.classattributes" title="5.8. Introducing Class Attributes">classes are objects</a>, and they can be assigned to variables just like any other object.)  If <var>parserName</var> is the string <code>'ChefDialectizer'</code>, <var>parserClass</var> will be the class <code>ChefDialectizer</code>.
-<li>Finally, you have a class object (<var>parserClass</var>), and you want an instance of the class. Well, you already know how to do that: <a href="#fileinfo.create" title="5.4. Instantiating Classes">call the class like a function</a>. The fact that the class is being stored in a local variable makes absolutely no difference; you just call the local variable
-            like a function, and out pops an instance of the class. If <var>parserClass</var> is the class <code>ChefDialectizer</code>, <var>parser</var> will be an instance of the class <code>ChefDialectizer</code>.
-<p>Why bother?  After all, there are only 3 <code>Dialectizer</code> classes; why not just use a <code>case</code> statement?  (Well, there's no <code>case</code> statement in Python, but why not just use a series of <code>if</code> statements?)  One reason: extensibility. The <code>translate</code> function has absolutely no idea how many Dialectizer classes you've defined. Imagine if you defined a new <code>FooDialectizer</code> tomorrow; <code>translate</code> would work by passing <code>'foo'</code> as the <var>dialectName</var>.
-<p>Even better, imagine putting <code>FooDialectizer</code> in a separate module, and importing it with <code>from <var>module</var> import</code>. You've already seen that this <a href="#dialect.globals.example" title="Example 8.11. Introducing globals">includes it in <code>globals</code>()</a>, so <code>translate</code> would still work without modification, even though <code>FooDialectizer</code> was in a separate file.
-<p>Now imagine that the name of the dialect is coming from somewhere outside the program, maybe from a database or from a user-inputted
-value on a form. You can use any number of server-side Python scripting architectures to dynamically generate web pages; this function could take a <abbr>URL</abbr> and a dialect name (both strings) in the query string of a web page request, and output the &#8220;translated&#8221; web page.
-<p>Finally, imagine a <code>Dialectizer</code> framework with a plug-in architecture. You could put each <code>Dialectizer</code> class in a separate file, leaving only the <code>translate</code> function in <code>dialect.py</code>. Assuming a consistent naming scheme, the <code>translate</code> function could dynamic import the appropiate class from the appropriate file, given nothing but the dialect name. (You haven't
-seen dynamic importing yet, but I promise to cover it in a later chapter.)  To add a new dialect, you would simply add an
-appropriately-named file in the plug-ins directory (like <code>foodialect.py</code> which contains the <code>FooDialectizer</code> class). Calling the <code>translate</code> function with the dialect name <code>'foo'</code> would find the module <code>foodialect.py</code>, import the class <code>FooDialectizer</code>, and away you go.
-<div class=example><h3>Example 8.22. The <code>translate</code> function, part 3</h3><pre><code>
-    parser.feed(htmlSource) <span>&#x2460;</span>
-    parser.close()          <span>&#x2461;</span>
-    return parser.output()  <span>&#x2462;</span>
-</pre>
-<ol>
-<li>After all that imagining, this is going to seem pretty boring, but the <code>feed</code> function is what <a href="#dialect.feed.example" title="Example 8.7. Using urllister.py">does the entire transformation</a>. You had the entire <abbr>HTML</abbr> source in a single string, so you only had to call <code>feed</code> once. However, you can call <code>feed</code> as often as you want, and the parser will just keep parsing. So if you were worried about memory usage (or you knew you
-            were going to be dealing with very large <abbr>HTML</abbr> pages), you could set this up in a loop, where you read a few bytes of <abbr>HTML</abbr> and fed it to the parser. The result would be the same.
-<li>Because <code>feed</code> maintains an internal buffer, you should always call the parser's <code>close</code> method when you're done (even if you fed it all at once, like you did). Otherwise you may find that your output is missing
-            the last few bytes.
-<li>Remember, <code>output</code> is the function you defined on <code>BaseHTMLProcessor</code> that <a href="#dialect.output.example" title="Example 8.9. BaseHTMLProcessor output">joins all the pieces of output you've buffered</a> and returns them in a single string.
-<p>And just like that, you've &#8220;translated&#8221; a web page, given nothing but a <abbr>URL</abbr> and the name of a dialect.
-<div class=itemizedlist>
-<h3>Further reading</h3>
-<ul>
-<li>You thought I was kidding about the server-side scripting idea. So did I, until I found <a href="http://rinkworks.com/dialect/">this web-based dialectizer</a>. Unfortunately, source code does not appear to be available.
 
-</ul>
-<h2 id="dialect.summary">8.10. Summary</h2>
-<p>Python provides you with a powerful tool, <code>sgmllib.py</code>, to manipulate <abbr>HTML</abbr> by turning its structure into an object model. You can use this tool in many different ways.
-<div class=itemizedlist>
-<ul>
-<li>parsing the <abbr>HTML</abbr> looking for something specific
-
-<li>aggregating the results, like the <a href="#dialect.extract.links" title="Example 8.6. Introducing urllister.py"><abbr>URL</abbr> lister</a>
-<li>altering the structure along the way, like the <a href="#dialect.quoting.example" title="Example 8.16. Quoting attribute values">attribute quoter</a>
-<li>transforming the <abbr>HTML</abbr> into something else by manipulating the text while leaving the tags alone, like the <a href="#dialect.dialectizer" title="8.8. Introducing dialect.py"><code>Dialectizer</code></a>
-</ul>
-<p>Along with these examples, you should be comfortable doing all of the following things:
-<div class=itemizedlist>
-<ul>
-<li>Using <a href="#dialect.locals" title="8.5. locals and globals"><code>locals</code>() and <code>globals</code>()</a> to access namespaces
-
-<li><a href="#dialect.dictsub" title="8.6. Dictionary-based string formatting">Formatting strings</a> using dictionary-based substitutions
-
-</ul>
-<div class=footnotes><br><hr width="100" align="left">
-<div class=footnote>
-<p><sup>[<a name="ftn.d0e20503" href="#d0e20503">1</a>] </sup>The technical term for a parser like <code>SGMLParser</code> is a <em>consumer</em>: it consumes <abbr>HTML</abbr> and breaks it down. Presumably, the name <code>feed</code> was chosen to fit into the whole &#8220;consumer&#8221; motif. Personally, it makes me think of an exhibit in the zoo where there's just a dark cage with no trees or plants or
-   evidence of life of any kind, but if you stand perfectly still and look really closely you can make out two beady eyes staring
-   back at you from the far left corner, but you convince yourself that that's just your mind playing tricks on you, and the
-   only way you can tell that the whole thing isn't just an empty cage is a small innocuous sign on the railing that reads, &#8220;Do not feed the parser.&#8221;  But maybe that's just me. In any event, it's an interesting mental image.
-<div class=footnote>
-<p><sup>[<a name="ftn.d0e20702" href="#d0e20702">2</a>] </sup>The reason Python is better at lists than strings is that lists are mutable but strings are immutable. This means that appending to a list
-   just adds the element and updates the index. Since strings can not be changed after they are created, code like <code>s = s + newpiece</code> will create an entirely new string out of the concatenation of the original and the new piece, then throw away the original
-   string. This involves a lot of expensive memory management, and the amount of effort involved increases as the string gets
-   longer, so doing <code>s = s + newpiece</code> in a loop is deadly. In technical terms, appending <var>n</var> items to a list is <code>O(n)</code>, while appending <var>n</var> items to a string is <code>O(n<sup>2</sup>)</code>.
-<div class=footnote>
-<p><sup>[<a name="ftn.d0e21226" href="#d0e21226">3</a>] </sup>I don't get out much.
-<div class=footnote>
-<p><sup>[<a name="ftn.d0e21764" href="#d0e21764">4</a>] </sup>All right, it's not that common a question. It's not up there with &#8220;What editor should I use to write Python code?&#8221; (answer: Emacs) or &#8220;Is Python better or worse than Perl?&#8221; (answer: &#8220;Perl is worse than Python because people wanted it worse.&#8221; -Larry Wall, 10/14/1998)  But questions about <abbr>HTML</abbr> processing pop up in one form or another about once a month, and among those questions, this is a popular one.
-<div class=chapter>
-<h2 id="kgp">Chapter 9. <abbr>XML</abbr> Processing</h2>
-<h2 id="kgp.divein">9.1. Diving in</h2>
-<p>These next two chapters are about <abbr>XML</abbr> processing in Python. It would be helpful if you already knew what an <abbr>XML</abbr> document looks like, that it's made up of structured tags to form a hierarchy of elements, and so on. If this doesn't make
-sense to you, there are <a href="http://directory.google.com/Top/Computers/Data_Formats/Markup_Languages/XML/Resources/FAQs,_Help,_and_Tutorials/">many <abbr>XML</abbr> tutorials</a> that can explain the basics.
-<p>If you're not particularly interested in XML, you should still read these chapters, which cover important topics like Python packages, Unicode, command line arguments, and how to use <code>getattr</code> for method dispatching.
-<p>Being a philosophy major is not required, although if you have ever had the misfortune of being subjected to the writings
-of Immanuel Kant, you will appreciate the example program a lot more than if you majored in something useful, like computer
-science.
-<p>There are two basic ways to work with <abbr>XML</abbr>. One is called <abbr>SAX</abbr> (&#8220;Simple <abbr>API</abbr> for <abbr>XML</abbr>&#8221;), and it works by reading the <abbr>XML</abbr> a little bit at a time and calling a method for each element it finds. (If you read <a href="#dialect" title="Chapter 8. HTML Processing">Chapter 8, <i>HTML Processing</i></a>, this should sound familiar, because that's how the <code>sgmllib</code> module works.)  The other is called <abbr>DOM</abbr> (&#8220;Document Object Model&#8221;), and it works by reading in the entire <abbr>XML</abbr> document at once and creating an internal representation of it using native Python classes linked in a tree structure. Python has standard modules for both kinds of parsing, but this chapter will only deal with using the <abbr>DOM</abbr>.
-<p>The following is a complete Python program which generates pseudo-random output based on a context-free grammar defined in an <abbr>XML</abbr> format. Don't worry yet if you don't understand what that means; you'll examine both the program's input and its output
-in more depth throughout these next two chapters.
-<div class=example><h3>Example 9.1. <code>kgp.py</code></h3>
-<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
-<pre><code>
-"""Kant Generator for Python
-
-Generates mock philosophy based on a context-free grammar
-
-Usage: python kgp.py [options] [source]
-
-Options:
-  -g ..., --grammar=...  use specified grammar file or URL
-  -h, --help              show this help
-  -d    show debugging information while parsing
-
-Examples:
-  kgp.pygenerates several paragraphs of Kantian philosophy
-  kgp.py -g husserl.xml   generates several paragraphs of Husserl
-  kpg.py "&lt;xref id='paragraph'/>"  generates a paragraph of Kant
-  kgp.py template.xml     reads from template.xml to decide what to generate
-"""
-from xml.dom import minidom
-import random
-import toolbox
-import sys
-import getopt
-
-_debug = 0
-
-class NoSourceError(Exception): pass
-
-class KantGenerator:
-    """generates mock philosophy based on a context-free grammar"""
-
-    def __init__(self, grammar, source=None):
-        self.loadGrammar(grammar)
-        self.loadSource(source and source or self.getDefaultSource())
-        self.refresh()
-
-    def _load(self, source):
-        """load XML input source, return parsed XML document
-
-        - a URL of a remote XML file ("http://diveintopython3.org/kant.xml")
-        - a filename of a local XML file ("~/diveintopython3/common/py/kant.xml")
-        - standard input ("-")
-        - the actual XML document, as a string
-        """
-        sock = toolbox.openAnything(source)
-        xmldoc = minidom.parse(sock).documentElement
-        sock.close()
-        return xmldoc
-
-    def loadGrammar(self, grammar):       
-        """load context-free grammar"""   
-        self.grammar = self._load(grammar)
-        self.refs = {}  
-        for ref in self.grammar.getElementsByTagName("ref"):
-            self.refs[ref.attributes["id"].value] = ref     
-
-    def loadSource(self, source):
-        """load source"""
-        self.source = self._load(source)
-
-    def getDefaultSource(self):
-        """guess default source of the current grammar
-        
-        The default source will be one of the &lt;ref>s that is not
-        cross-referenced. This sounds complicated but it's not.
-        Example: The default source for kant.xml is
-        "&lt;xref id='section'/>", because 'section' is the one &lt;ref>
-        that is not &lt;xref>'d anywhere in the grammar.
-        In most grammars, the default source will produce the
-        longest (and most interesting) output.
-        """
-        xrefs = {}
-        for xref in self.grammar.getElementsByTagName("xref"):
-            xrefs[xref.attributes["id"].value] = 1
-        xrefs = xrefs.keys()
-        standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
-        if not standaloneXrefs:
-            raise NoSourceError, "can't guess source, and no source specified"
-        return '&lt;xref id="%s"/>' % random.choice(standaloneXrefs)
-        
-    def reset(self):
-        """reset parser"""
-        self.pieces = []
-        self.capitalizeNextWord = 0
-
-    def refresh(self):
-        """reset output buffer, re-parse entire source file, and return output
-        
-        Since parsing involves a good deal of randomness, this is an
-        easy way to get new output without having to reload a grammar file
-        each time.
-        """
-        self.reset()
-        self.parse(self.source)
-        return self.output()
-
-    def output(self):
-        """output generated text"""
-        return "".join(self.pieces)
-
-    def randomChildElement(self, node):
-        """choose a random child element of a node
-        
-        This is a utility method used by do_xref and do_choice.
-        """
-        choices = [e for e in node.childNodes
- if e.nodeType == e.ELEMENT_NODE]
-        chosen = random.choice(choices)            
-        if _debug:               
-            sys.stderr.write('%s available choices: %s\n' % \
-                (len(choices), [e.toxml() for e in choices]))
-            sys.stderr.write('Chosen: %s\n' % chosen.toxml())
-        return chosen            
-
-    def parse(self, node):         
-        """parse a single XML node
-        
-        A parsed XML document (from minidom.parse) is a tree of nodes
-        of various types. Each node is represented by an instance of the
-        corresponding Python class (Element for a tag, Text for
-        text data, Document for the top-level document). The following
-        statement constructs the name of a class method based on the type
-        of node we're parsing ("parse_Element" for an Element node,
-        "parse_Text" for a Text node, etc.) and then calls the method.
-        """
-        parseMethod = getattr(self, "parse_%s" % node.__class__.__name__)
-        parseMethod(node)
-
-    def parse_Document(self, node):
-        """parse the document node
-        
-        The document node by itself isn't interesting (to us), but
-        its only child, node.documentElement, is: it's the root node
-        of the grammar.
-        """
-        self.parse(node.documentElement)
-
-    def parse_Text(self, node):    
-        """parse a text node
-        
-        The text of a text node is usually added to the output buffer
-        verbatim. The one exception is that &lt;p class='sentence'> sets
-        a flag to capitalize the first letter of the next word. If
-        that flag is set, we capitalize the text and reset the flag.
-        """
-        text = node.data
-        if self.capitalizeNextWord:
-            self.pieces.append(text[0].upper())
-            self.pieces.append(text[1:])
-            self.capitalizeNextWord = 0
-        else:
-            self.pieces.append(text)
-
-    def parse_Element(self, node): 
-        """parse an element
-        
-        An XML element corresponds to an actual tag in the source:
-        &lt;xref id='...'>, &lt;p chance='...'>, &lt;choice>, etc.
-        Each element type is handled in its own method. Like we did in
-        parse(), we construct a method name based on the name of the
-        element ("do_xref" for an &lt;xref> tag, etc.) and
-        call the method.
-        """
-        handlerMethod = getattr(self, "do_%s" % node.tagName)
-        handlerMethod(node)
-
-    def parse_Comment(self, node):
-        """parse a comment
-        
-        The grammar can contain XML comments, but we ignore them
-        """
-        pass
-    
-    def do_xref(self, node):
-        """handle &lt;xref id='...'> tag
-        
-        An &lt;xref id='...'> tag is a cross-reference to a &lt;ref id='...'>
-        tag. &lt;xref id='sentence'/> evaluates to a randomly chosen child of
-        &lt;ref id='sentence'>.
-        """
-        id = node.attributes["id"].value
-        self.parse(self.randomChildElement(self.refs[id]))
-
-    def do_p(self, node):
-        """handle &lt;p> tag
-        
-        The &lt;p> tag is the core of the grammar. It can contain almost
-        anything: freeform text, &lt;choice> tags, &lt;xref> tags, even other
-        &lt;p> tags. If a "class='sentence'" attribute is found, a flag
-        is set and the next word will be capitalized. If a "chance='X'"
-        attribute is found, there is an X% chance that the tag will be
-        evaluated (and therefore a (100-X)% chance that it will be
-        completely ignored)
-        """
-        keys = node.attributes.keys()
-        if "class" in keys:
-            if node.attributes["class"].value == "sentence":
-                self.capitalizeNextWord = 1
-        if "chance" in keys:
-            chance = int(node.attributes["chance"].value)
-            doit = (chance > random.randrange(100))
-        else:
-            doit = 1
-        if doit:
-            for child in node.childNodes: self.parse(child)
-
-    def do_choice(self, node):
-        """handle &lt;choice> tag
-        
-        A &lt;choice> tag contains one or more &lt;p> tags. One &lt;p> tag
-        is chosen at random and evaluated; the rest are ignored.
-        """
-        self.parse(self.randomChildElement(node))
-
-def usage():
-    print __doc__
-
-def main(argv):       
-    grammar = "kant.xml"                
-    try:              
-        opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
-    except getopt.GetoptError:          
-        usage()       
-        sys.exit(2)   
-    for opt, arg in opts:               
-        if opt in ("-h", "--help"):     
-            usage()   
-            sys.exit()
-        elif opt == '-d':               
-            global _debug               
-            _debug = 1
-        elif opt in ("-g", "--grammar"):
-            grammar = arg               
-    
-    source = "".join(args)              
-
-    k = KantGenerator(grammar, source)
-    print k.output()
-
-if __name__ == "__main__":
-    main(sys.argv[1:])
-</pre><div class=example><h3>Example 9.2. <code>toolbox.py</code></h3><pre><code>
-"""Miscellaneous utility functions"""
-
-def openAnything(source):            
-    """URI, filename, or string --> stream
-
-    This function lets you define parsers that take any input source
-    (URL, pathname to local or network file, or actual data as a string)
-    and deal with it in a uniform manner. Returned object is guaranteed
-    to have all the basic stdio read methods (read, readline, readlines).
-    Just .close() the object when you're done with it.
-    
-    Examples:
-    >>> from xml.dom import minidom
-    >>> sock = openAnything("http://localhost/kant.xml")
-    >>> doc = minidom.parse(sock)
-    >>> sock.close()
-    >>> sock = openAnything("c:\\inetpub\\wwwroot\\kant.xml")
-    >>> doc = minidom.parse(sock)
-    >>> sock.close()
-    >>> sock = openAnything("&lt;ref id='conjunction'>&lt;text>and&lt;/text>&lt;text>or&lt;/text>&lt;/ref>")
-    >>> doc = minidom.parse(sock)
-    >>> sock.close()
-    """
-    if hasattr(source, "read"):
-        return source
-
-    if source == '-':
-        import sys
-        return sys.stdin
-
-    # try to open with urllib (if source is http, ftp, or file URL)
-    import urllib       
-    try:                
-        return urllib.urlopen(source)     
-    except (IOError, OSError):            
-        pass            
-    
-    # try to open with native open function (if source is pathname)
-    try:                
-        return open(source)               
-    except (IOError, OSError):            
-        pass            
-    
-    # treat source as string
-    import StringIO     
-    return StringIO.StringIO(str(source)) 
-</pre><p>Run the program <code>kgp.py</code> by itself, and it will parse the default <abbr>XML</abbr>-based grammar, in <code>kant.xml</code>, and print several paragraphs worth of philosophy in the style of Immanuel Kant.
-<div class=example><h3>Example 9.3. Sample output of <code>kgp.py</code></h3><pre class=screen><samp class=p>[you@localhost kgp]$ python kgp.py</samp>
-<samp>     As is shown in the writings of Hume, our a priori concepts, in
-reference to ends, abstract from all content of knowledge; in the study
-of space, the discipline of human reason, in accordance with the
-principles of philosophy, is the clue to the discovery of the
-Transcendental Deduction. The transcendental aesthetic, in all
-theoretical sciences, occupies part of the sphere of human reason
-concerning the existence of our ideas in general; still, the
-never-ending regress in the series of empirical conditions constitutes
-the whole content for the transcendental unity of apperception. What
-we have alone been able to show is that, even as this relates to the
-architectonic of human reason, the Ideal may not contradict itself, but
-it is still possible that it may be in contradictions with the
-employment of the pure employment of our hypothetical judgements, but
-natural causes (and I assert that this is the case) prove the validity
-of the discipline of pure reason. As we have already seen, time (and
-it is obvious that this is true) proves the validity of time, and the
-architectonic of human reason, in the full sense of these terms,
-abstracts from all content of knowledge. I assert, in the case of the
-discipline of practical reason, that the Antinomies are just as
-necessary as natural causes, since knowledge of the phenomena is a
-posteriori.
-    The discipline of human reason, as I have elsewhere shown, is by
-its very nature contradictory, but our ideas exclude the possibility of
-the Antinomies. We can deduce that, on the contrary, the pure
-employment of philosophy, on the contrary, is by its very nature
-contradictory, but our sense perceptions are a representation of, in
-the case of space, metaphysics. The thing in itself is a
-representation of philosophy. Applied logic is the clue to the
-discovery of natural causes. However, what we have alone been able to
-show is that our ideas, in other words, should only be used as a canon
-for the Ideal, because of our necessary ignorance of the conditions.
-
-[...snip...]</span></pre><p>This is, of course, complete gibberish. Well, not complete gibberish. It is syntactically and grammatically correct (although
-very verbose -- Kant wasn't what you would call a get-to-the-point kind of guy). Some of it may actually be true (or at least
-the sort of thing that Kant would have agreed with), some of it is blatantly false, and most of it is simply incoherent. 
-But all of it is in the style of Immanuel Kant.
-<p>Let me repeat that this is much, much funnier if you are now or have ever been a philosophy major.
-<p>The interesting thing about this program is that there is nothing Kant-specific about it. All the content in the previous
-example was derived from the grammar file, <code>kant.xml</code>. If you tell the program to use a different grammar file (which you can specify on the command line), the output will be
-completely different.
-<div class=example><h3>Example 9.4. Simpler output from <code>kgp.py</code></h3><pre class=screen><samp class=p>[you@localhost kgp]$ python kgp.py -g binary.xml</samp>
-00101001
-<samp class=p>[you@localhost kgp]$ python kgp.py -g binary.xml</samp>
-10110100</pre><p>You will take a closer look at the structure of the grammar file later in this chapter. For now, all you need to know is
-that the grammar file defines the structure of the output, and the <code>kgp.py</code> program reads through the grammar and makes random decisions about which words to plug in where.
 <h2 id="kgp.packages">9.2. Packages</h2>
 <p>Actually parsing an <abbr>XML</abbr> document is very simple: one line of code. However, before you get to that line of code, you need to take a short detour
    to talk about packages.
@@ -3707,111 +2085,6 @@ areas simultaneously).
 package architecture. It's one of the many things Python is good at, so take advantage of it.
 <h2 id="kgp.parse">9.3. Parsing <abbr>XML</abbr></h2>
 <p>As I was saying, actually parsing an <abbr>XML</abbr> document is very simple: one line of code. Where you go from there is up to you.
-<div class=example><h3>Example 9.8. Loading an <abbr>XML</abbr> document (for real this time)</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>from xml.dom import minidom</kbd>      <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse('~/diveintopython3/common/py/kgp/binary.xml')</kbd>  <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>xmldoc</kbd>         <span>&#x2462;</span>
-&lt;xml.dom.minidom.Document instance at 010BE87C>
-<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>             <span>&#x2463;</span>
-<samp>&lt;?xml version="1.0" ?>
-&lt;grammar>
-&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref>
-&lt;ref id="byte">
-  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
-&lt;/ref>
-&lt;/grammar></span></pre>
-<ol>
-<li>As you saw in the <a href="#kgp.packages" title="9.2. Packages">previous section</a>, this imports the <code>minidom</code> module from the <code>xml.dom</code> package.
-<li>Here is the one line of code that does all the work: <code>minidom.parse</code> takes one argument and returns a parsed representation of the <abbr>XML</abbr> document. The argument can be many things; in this case, it's simply a filename of an <abbr>XML</abbr> document on my local disk. (To follow along, you'll need to change the path to point to your downloaded examples directory.)
-             But you can also pass a <a href="#fileinfo.files" title="6.2. Working with File Objects">file object</a>, or even a <a href="#dialect.extract.urllib" title="Example 8.5. Introducing urllib">file-like object</a>. You'll take advantage of this flexibility later in this chapter.
-<li>The object returned from <code>minidom.parse</code> is a <code>Document</code> object, a descendant of the <code>Node</code> class. This <code>Document</code> object is the root level of a complex tree-like structure of interlocking Python objects that completely represent the <abbr>XML</abbr> document you passed to <code>minidom.parse</code>.
-<li><code>toxml</code> is a method of the <code>Node</code> class (and is therefore available on the <code>Document</code> object you got from <code>minidom.parse</code>). <code>toxml</code> prints out the <abbr>XML</abbr> that this <code>Node</code> represents. For the <code>Document</code> node, this prints out the entire <abbr>XML</abbr> document.
-<p>Now that you have an <abbr>XML</abbr> document in memory, you can start traversing through it.
-<div class=example><h3 id="kgp.parse.gettingchildnodes.example">Example 9.9. Getting child nodes</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>xmldoc.childNodes</kbd>    <span>&#x2460;</span>
-[&lt;DOM Element: grammar at 17538908>]
-<samp class=p>>>> </samp><kbd>xmldoc.childNodes[0]</kbd> <span>&#x2461;</span>
-&lt;DOM Element: grammar at 17538908>
-<samp class=p>>>> </samp><kbd>xmldoc.firstChild</kbd>    <span>&#x2462;</span>
-&lt;DOM Element: grammar at 17538908></pre>
-<ol>
-<li>Every <code>Node</code> has a <code>childNodes</code> attribute, which is a list of the <code>Node</code> objects. A <code>Document</code> always has only one child node, the root element of the <abbr>XML</abbr> document (in this case, the <code>grammar</code> element).
-<li>To get the first (and in this case, the only) child node, just use regular list syntax. Remember, there is nothing special
-            going on here; this is just a regular Python list of regular Python objects.
-<li>Since getting the first child node of a node is a useful and common activity, the <code>Node</code> class has a <code>firstChild</code> attribute, which is synonymous with <code>childNodes[0]</code>. (There is also a <code>lastChild</code> attribute, which is synonymous with <code>childNodes[-1]</code>.)
-<div class=example><h3>Example 9.10. <code>toxml</code> works on any node</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>grammarNode = xmldoc.firstChild</kbd>
-<samp class=p>>>> </samp><kbd>print grammarNode.toxml()</kbd> <span>&#x2460;</span>
-<samp>&lt;grammar>
-&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref>
-&lt;ref id="byte">
-  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
-&lt;/ref>
-&lt;/grammar></span></pre>
-<ol>
-<li>Since the <code>toxml</code> method is defined in the <code>Node</code> class, it is available on any <abbr>XML</abbr> node, not just the <code>Document</code> element.
-<div class=example><h3 id="kgp.parse.childnodescanbetext.example">Example 9.11. Child nodes can be text</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>grammarNode.childNodes</kbd><span>&#x2460;</span>
-<samp>[&lt;DOM Text node "\n">, &lt;DOM Element: ref at 17533332>, \
-&lt;DOM Text node "\n">, &lt;DOM Element: ref at 17549660>, &lt;DOM Text node "\n">]</samp>
-<samp class=p>>>> </samp><kbd>print grammarNode.firstChild.toxml()</kbd>    <span>&#x2461;</span>
-<samp>
-
-</samp>
-<samp class=p>>>> </samp><kbd>print grammarNode.childNodes[1].toxml()</kbd> <span>&#x2462;</span>
-<samp>&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref></samp>
-<samp class=p>>>> </samp><kbd>print grammarNode.childNodes[3].toxml()</kbd> <span>&#x2463;</span>
-<samp>&lt;ref id="byte">
-  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
-&lt;/ref></samp>
-<samp class=p>>>> </samp><kbd>print grammarNode.lastChild.toxml()</kbd>     <span>&#x2464;</span>
-<samp>
-
-</span></pre>
-<ol>
-<li>Looking at the <abbr>XML</abbr> in <code>binary.xml</code>, you might think that the <code>grammar</code> has only two child nodes, the two <code>ref</code> elements. But you're missing something: the carriage returns!  After the <code>'&lt;grammar>'</code> and before the first <code>'&lt;ref>'</code> is a carriage return, and this text counts as a child node of the <code>grammar</code> element. Similarly, there is a carriage return after each <code>'&lt;/ref>'</code>; these also count as child nodes. So <code>grammar.childNodes</code> is actually a list of 5 objects: 3 <code>Text</code> objects and 2 <code>Element</code> objects.
-<li>The first child is a <code>Text</code> object representing the carriage return after the <code>'&lt;grammar>'</code> tag and before the first <code>'&lt;ref>'</code> tag.
-<li>The second child is an <code>Element</code> object representing the first <code>ref</code> element.
-<li>The fourth child is an <code>Element</code> object representing the second <code>ref</code> element.
-<li>The last child is a <code>Text</code> object representing the carriage return after the <code>'&lt;/ref>'</code> end tag and before the <code>'&lt;/grammar>'</code> end tag.
-<div class=example><h3>Example 9.12. Drilling down all the way to text</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>grammarNode</kbd>
-&lt;DOM Element: grammar at 19167148>
-<samp class=p>>>> </samp><kbd>refNode = grammarNode.childNodes[1]</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>refNode</kbd>
-&lt;DOM Element: ref at 17987740>
-<samp class=p>>>> </samp><kbd>refNode.childNodes</kbd><span>&#x2461;</span>
-<samp>[&lt;DOM Text node "\n">, &lt;DOM Text node "  ">, &lt;DOM Element: p at 19315844>, \
-&lt;DOM Text node "\n">, &lt;DOM Text node "  ">, \
-&lt;DOM Element: p at 19462036>, &lt;DOM Text node "\n">]</samp>
-<samp class=p>>>> </samp><kbd>pNode = refNode.childNodes[2]</kbd>
-<samp class=p>>>> </samp><kbd>pNode</kbd>
-&lt;DOM Element: p at 19315844>
-<samp class=p>>>> </samp><kbd>print pNode.toxml()</kbd>                 <span>&#x2462;</span>
-&lt;p>0&lt;/p>
-<samp class=p>>>> </samp><kbd>pNode.firstChild</kbd>  <span>&#x2463;</span>
-&lt;DOM Text node "0">
-<samp class=p>>>> </samp><kbd>pNode.firstChild.data</kbd>               <span>&#x2464;</span>
-u'0'</pre>
-<ol>
-<li>As you saw in the previous example, the first <code>ref</code> element is <code>grammarNode.childNodes[1]</code>, since childNodes[0] is a <code>Text</code> node for the carriage return.
-<li>The <code>ref</code> element has its own set of child nodes, one for the carriage return, a separate one for the spaces, one for the <code>p</code> element, and so forth.
-<li>You can even use the <code>toxml</code> method here, deeply nested within the document.
-<li>The <code>p</code> element has only one child node (you can't tell that from this example, but look at <code>pNode.childNodes</code> if you don't believe me), and it is a <code>Text</code> node for the single character <code>'0'</code>.
-<li>The <code>.data</code> attribute of a <code>Text</code> node gives you the actual string that the text node represents. But what is that <code>'u'</code> in front of the string?  The answer to that deserves its own section.
-
 
 
 
@@ -3823,411 +2096,8 @@ u'0'</pre>
 
 
 
-
-
-
-<p>Remember I said Python usually converted unicode to <abbr>ASCII</abbr> whenever it needed to make a regular string out of a unicode string?  Well, this default encoding scheme is an option which
-you can customize.
-<div class=example><h3>Example 9.15. <code>sitecustomize.py</code></h3><pre><code>
-# sitecustomize.py <span>&#x2460;</span>
-# this file can be anywhere in your Python path,
-# but it usually goes in ${pythondir}/lib/site-packages/
-import sys
-sys.setdefaultencoding('iso-8859-1') <span>&#x2461;</span>
-</pre>
-<ol>
-<li><code>sitecustomize.py</code> is a special script; Python will try to import it on startup, so any code in it will be run automatically. As the comment mentions, it can go anywhere
-            (as long as <code>import</code> can find it), but it usually goes in the <code>site-packages</code> directory within your Python <code>lib</code> directory.
-<li><code>setdefaultencoding</code> function sets, well, the default encoding. This is the encoding scheme that Python will try to use whenever it needs to auto-coerce a unicode string into a regular string.
-<div class=example><h3>Example 9.16. Effects of setting the default encoding</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>import sys</kbd>
-<samp class=p>>>> </samp><kbd>sys.getdefaultencoding()</kbd> <span>&#x2460;</span>
-'iso-8859-1'
-<samp class=p>>>> </samp><kbd>s = u'La Pe\xf1a'</kbd>
-<samp class=p>>>> </samp><kbd>print s</kbd><span>&#x2461;</span>
-La Pe&ntilde;a</pre>
-<ol>
-<li>This example assumes that you have made the changes listed in the previous example to your <code>sitecustomize.py</code> file, and restarted Python. If your default encoding still says <code>'ascii'</code>, you didn't set up your <code>sitecustomize.py</code> properly, or you didn't restart Python. The default encoding can only be changed during Python startup; you can't change it later. (Due to some wacky programming tricks that I won't get into right now, you can't even
-            call <code>sys.setdefaultencoding</code> after Python has started up. Dig into <code>site.py</code> and search for &#8220;<code>setdefaultencoding</code>&#8221; to find out how.)
-<li>Now that the default encoding scheme includes all the characters you use in your string, Python has no problem auto-coercing the string and printing it.
-
-
-
-
-
-(More Unicode stuff was here)
-
-
-
-
-
-
-
-<h2 id="kgp.search">9.5. Searching for elements</h2>
-<p>Traversing <abbr>XML</abbr> documents by stepping through each node can be tedious. If you're looking for something in particular, buried deep within
-   your <abbr>XML</abbr> document, there is a shortcut you can use to find it quickly: <code>getElementsByTagName</code>.
-<p>For this section, you'll be using the <code>binary.xml</code> grammar file, which looks like this:
-<div class=example><h3>Example 9.20. <code>binary.xml</code></h3><pre class=screen><samp>&lt;?xml version="1.0"?>
-&lt;!DOCTYPE grammar PUBLIC "-//diveintopython3.org//DTD Kant Generator Pro v1.0//EN" "kgp.dtd">
-&lt;grammar>
-&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref>
-&lt;ref id="byte">
-  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
-&lt;/ref>
-&lt;/grammar></span></pre><p>It has two <code>ref</code>s, <code>'bit'</code> and <code>'byte'</code>. A <code>bit</code> is either a <code>'0'</code> or <code>'1'</code>, and a <code>byte</code> is 8 <code>bit</code>s.
-<div class=example><h3>Example 9.21. Introducing <code>getElementsByTagName</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>from xml.dom import minidom</kbd>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse('binary.xml')</kbd>
-<samp class=p>>>> </samp><kbd>reflist = xmldoc.getElementsByTagName('ref')</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>reflist</kbd>
-[&lt;DOM Element: ref at 136138108>, &lt;DOM Element: ref at 136144292>]
-<samp class=p>>>> </samp><kbd>print reflist[0].toxml()</kbd>
-<samp>&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref></samp>
-<samp class=p>>>> </samp><kbd>print reflist[1].toxml()</kbd>
-<samp>&lt;ref id="byte">
-  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
-&lt;/ref>
-</span></pre>
-<ol>
-<li><code>getElementsByTagName</code> takes one argument, the name of the element you wish to find. It returns a list of <code>Element</code> objects, corresponding to the <abbr>XML</abbr> elements that have that name. In this case, you find two <code>ref</code> elements.
-<div class=example><h3>Example 9.22. Every element is searchable</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>firstref = reflist[0]</kbd>    <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>print firstref.toxml()</kbd>
-<samp>&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref></samp>
-<samp class=p>>>> </samp><kbd>plist = firstref.getElementsByTagName("p")</kbd> <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>plist</kbd>
-[&lt;DOM Element: p at 136140116>, &lt;DOM Element: p at 136142172>]
-<samp class=p>>>> </samp><kbd>print plist[0].toxml()</kbd>   <span>&#x2462;</span>
-&lt;p>0&lt;/p>
-<samp class=p>>>> </samp><kbd>print plist[1].toxml()</kbd>
-&lt;p>1&lt;/p></pre>
-<ol>
-<li>Continuing from the previous example, the first object in your <var>reflist</var> is the <code>'bit'</code> <code>ref</code> element.
-<li>You can use the same <code>getElementsByTagName</code> method on this <code>Element</code> to find all the <code>&lt;p></code> elements within the <code>'bit'</code> <code>ref</code> element.
-<li>Just as before, the <code>getElementsByTagName</code> method returns a list of all the elements it found. In this case, you have two, one for each bit.
-<div class=example><h3>Example 9.23. Searching is actually recursive</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>plist = xmldoc.getElementsByTagName("p")</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>plist</kbd>
-[&lt;DOM Element: p at 136140116>, &lt;DOM Element: p at 136142172>, &lt;DOM Element: p at 136146124>]
-<samp class=p>>>> </samp><kbd>plist[0].toxml()</kbd>       <span>&#x2461;</span>
-'&lt;p>0&lt;/p>'
-<samp class=p>>>> </samp><kbd>plist[1].toxml()</kbd>
-'&lt;p>1&lt;/p>'
-<samp class=p>>>> </samp><kbd>plist[2].toxml()</kbd>       <span>&#x2462;</span>
-<samp>'&lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>'</span></pre>
-<ol>
-<li>Note carefully the difference between this and the previous example. Previously, you were searching for <code>p</code> elements within <var>firstref</var>, but here you are searching for <code>p</code> elements within <var>xmldoc</var>, the root-level object that represents the entire <abbr>XML</abbr> document. This <em>does</em> find the <code>p</code> elements nested within the <code>ref</code> elements within the root <code>grammar</code> element.
-<li>The first two <code>p</code> elements are within the first <code>ref</code> (the <code>'bit'</code> <code>ref</code>).
-<li>The last <code>p</code> element is the one within the second <code>ref</code> (the <code>'byte'</code> <code>ref</code>).
-<h2 id="kgp.attributes">9.6. Accessing element attributes</h2>
-<p><abbr>XML</abbr> elements can have one or more attributes, and it is incredibly simple to access them once you have parsed an <abbr>XML</abbr> document.
-<p>For this section, you'll be using the <code>binary.xml</code> grammar file that you saw in the <a href="#kgp.search" title="9.5. Searching for elements">previous section</a>.
-<table class=note border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">This section may be a little confusing, because of some overlapping terminology. Elements in an <abbr>XML</abbr> document have attributes, and Python objects also have attributes. When you parse an <abbr>XML</abbr> document, you get a bunch of Python objects that represent all the pieces of the <abbr>XML</abbr> document, and some of these Python objects represent attributes of the <abbr>XML</abbr> elements. But the (Python) objects that represent the (<abbr>XML</abbr>) attributes also have (Python) attributes, which are used to access various parts of the (<abbr>XML</abbr>) attribute that the object represents. I told you it was confusing. I am open to suggestions on how to distinguish these
-      more clearly.
-<div class=example><h3>Example 9.24. Accessing element attributes</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse('binary.xml')</kbd>
-<samp class=p>>>> </samp><kbd>reflist = xmldoc.getElementsByTagName('ref')</kbd>
-<samp class=p>>>> </samp><kbd>bitref = reflist[0]</kbd>
-<samp class=p>>>> </samp><kbd>print bitref.toxml()</kbd>
-<samp>&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref></samp>
-<samp class=p>>>> </samp><kbd>bitref.attributes</kbd>          <span>&#x2460;</span>
-&lt;xml.dom.minidom.NamedNodeMap instance at 0x81e0c9c>
-<samp class=p>>>> </samp><kbd>bitref.attributes.keys()</kbd>   <span>&#x2461;</span> <span>&#x2462;</span>
-[u'id']
-<samp class=p>>>> </samp><kbd>bitref.attributes.values()</kbd> <span>&#x2463;</span>
-[&lt;xml.dom.minidom.Attr instance at 0x81d5044>]
-<samp class=p>>>> </samp><kbd>bitref.attributes["id"]</kbd>    <span>&#x2464;</span>
-&lt;xml.dom.minidom.Attr instance at 0x81d5044></pre>
-<ol>
-<li>Each <code>Element</code> object has an attribute called <code>attributes</code>, which is a <code>NamedNodeMap</code> object. This sounds scary, but it's not, because a <code>NamedNodeMap</code> is an object that <a href="#fileinfo.userdict" title="5.5. Exploring UserDict: A Wrapper Class">acts like a dictionary</a>, so you already know how to use it.
-<li>Treating the <code>NamedNodeMap</code> as a dictionary, you can get a list of the names of the attributes of this element by using <code>attributes.keys()</code>. This element has only one attribute, <code>'id'</code>.
-<li>Attribute names, like all other text in an <abbr>XML</abbr> document, are stored in <a href="#kgp.unicode" title="9.4. Unicode">unicode</a>.
-<li>Again treating the <code>NamedNodeMap</code> as a dictionary, you can get a list of the values of the attributes by using <code>attributes.values()</code>. The values are themselves objects, of type <code>Attr</code>. You'll see how to get useful information out of this object in the next example.
-<li>Still treating the <code>NamedNodeMap</code> as a dictionary, you can access an individual attribute by name, using normal dictionary syntax. (Readers who have been
-            paying extra-close attention will already know how the <code>NamedNodeMap</code> class accomplishes this neat trick: by defining a <a href="#fileinfo.specialmethods" title="5.6. Special Class Methods"><code>__getitem__</code> special method</a>. Other readers can take comfort in the fact that they don't need to understand how it works in order to use it effectively.)
-<div class=example><h3>Example 9.25. Accessing individual attributes</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>a = bitref.attributes["id"]</kbd>
-<samp class=p>>>> </samp><kbd>a</kbd>
-&lt;xml.dom.minidom.Attr instance at 0x81d5044>
-<samp class=p>>>> </samp><kbd>a.name</kbd>  <span>&#x2460;</span>
-u'id'
-<samp class=p>>>> </samp><kbd>a.value</kbd> <span>&#x2461;</span>
-u'bit'</pre>
-<ol>
-<li>The <code>Attr</code> object completely represents a single <abbr>XML</abbr> attribute of a single <abbr>XML</abbr> element. The name of the attribute (the same name as you used to find this object in the <code>bitref.attributes</code> <code>NamedNodeMap</code> pseudo-dictionary) is stored in <code>a.name</code>.
-<li>The actual text value of this <abbr>XML</abbr> attribute is stored in <code>a.value</code>.
-<table class=note border="0" summary="">
-
-<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">Like a dictionary, attributes of an <abbr>XML</abbr> element have no ordering. Attributes may <em>happen to be</em> listed in a certain order in the original <abbr>XML</abbr> document, and the <code>Attr</code> objects may <em>happen to be</em> listed in a certain order when the <abbr>XML</abbr> document is parsed into Python objects, but these orders are arbitrary and should carry no special meaning. You should always access individual attributes
-      by name, like the keys of a dictionary.
-<h2 id="kgp.segue">9.7. Segue</h2>
-<p>OK, that's it for the hard-core XML stuff. The next chapter will continue to use these same example programs, but focus on
-   other aspects that make the program more flexible: using streams for input processing, using <code>getattr</code> for method dispatching, and using command-line flags to allow users to reconfigure the program without changing the code.
-<p>Before moving on to the next chapter, you should be comfortable doing all of these things:
-<div class=itemizedlist>
-<ul>
-<li><a href="#kgp.parse" title="9.3. Parsing XML">Parsing <abbr>XML</abbr> documents</a> using <code>minidom</code>, <a href="#kgp.search" title="9.5. Searching for elements">searching through the parsed document</a>, and accessing arbitrary <a href="#kgp.attributes" title="9.6. Accessing element attributes">element attributes</a> and <a href="#kgp.child" title="10.4. Finding direct children of a node">element children</a>
-<li>Organizing complex libraries into <a href="#kgp.packages" title="9.2. Packages">packages</a>
-<li><a href="#kgp.unicode" title="9.4. Unicode">Converting unicode strings</a> to different character encodings
-
-</ul>
-<div class=footnotes><br><hr width="100" align="left">
-<div class=footnote>
-<p><sup>[<a name="ftn.d0e23786" href="#d0e23786">5</a>] </sup>This, sadly, is <em>still</em> an oversimplification. Unicode now has been extended to handle ancient Chinese, Korean, and Japanese texts, which had so
-   many different characters that the 2-byte unicode system could not represent them all. But Python doesn't currently support that out of the box, and I don't know if there is a project afoot to add it. You've reached the
-   limits of my expertise, sorry.
 <div class=chapter>
 <h2 id="streams">Chapter 10. Scripts and Streams</h2>
-<h2 id="kgp.openanything">10.1. Abstracting input sources</h2>
-<p>One of Python's greatest strengths is its dynamic binding, and one powerful use of dynamic binding is the <em>file-like object</em>.
-<p>Many functions which require an input source could simply take a filename, go open the file for reading, read it, and close
-it when they're done. But they don't. Instead, they take a <em>file-like object</em>.
-<p>In the simplest case, a <em>file-like object</em> is any object with a <code>read</code> method with an optional <var>size</var> parameter, which returns a string. When called with no <var>size</var> parameter, it reads everything there is to read from the input source and returns all the data as a single string. When
-called with a <var>size</var> parameter, it reads that much from the input source and returns that much data; when called again, it picks up where it left
-off and returns the next chunk of data.
-<p>This is how <a href="#fileinfo.files" title="6.2. Working with File Objects">reading from real files</a> works; the difference is that you're not limiting yourself to real files. The input source could be anything: a file on
-disk, a web page, even a hard-coded string. As long as you pass a file-like object to the function, and the function simply
-calls the object's <code>read</code> method, the function can handle any kind of input source without specific code to handle each kind.
-<p>In case you were wondering how this relates to <abbr>XML</abbr> processing, <code>minidom.parse</code> is one such function which can take a file-like object.
-<div class=example><h3>Example 10.1. Parsing <abbr>XML</abbr> from a file</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>from xml.dom import minidom</kbd>
-<samp class=p>>>> </samp><kbd>fsock = open('binary.xml')</kbd>    <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse(fsock)</kbd> <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>fsock.close()</kbd>                 <span>&#x2462;</span>
-<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>          <span>&#x2463;</span>
-<samp>&lt;?xml version="1.0" ?>
-&lt;grammar>
-&lt;ref id="bit">
-  &lt;p>0&lt;/p>
-  &lt;p>1&lt;/p>
-&lt;/ref>
-&lt;ref id="byte">
-  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
-&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
-&lt;/ref>
-&lt;/grammar></span></pre>
-<ol>
-<li>First, you open the file on disk. This gives you a <a href="#fileinfo.files" title="6.2. Working with File Objects">file object</a>.
-<li>You pass the file object to <code>minidom.parse</code>, which calls the <code>read</code> method of <var>fsock</var> and reads the <abbr>XML</abbr> document from the file on disk.
-<li>Be sure to call the <code>close</code> method of the file object after you're done with it. <code>minidom.parse</code> will not do this for you.
-<li>Calling the <code>toxml()</code> method on the returned <abbr>XML</abbr> document prints out the entire thing.
-<p>Well, that all seems like a colossal waste of time. After all, you've already seen that <code>minidom.parse</code> can simply take the filename and do all the opening and closing nonsense automatically. And it's true that if you know you're
-just going to be parsing a local file, you can pass the filename and <code>minidom.parse</code> is smart enough to Do The Right Thing&#8482;. But notice how similar -- and easy -- it is to parse an <abbr>XML</abbr> document straight from the Internet.
-<div class=example><h3 id="kgp.openanything.urllib">Example 10.2. Parsing <abbr>XML</abbr> from a <abbr>URL</abbr></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>import urllib</kbd>
-<samp class=p>>>> </samp><kbd>usock = urllib.urlopen('http://slashdot.org/slashdot.rdf')</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse(usock)</kbd>            <span>&#x2461;</span>
-<samp class=p>>>> </samp><kbd>usock.close()</kbd>          <span>&#x2462;</span>
-<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>   <span>&#x2463;</span>
-<samp>&lt;?xml version="1.0" ?>
-&lt;rdf:RDF xmlns="http://my.netscape.com/rdf/simple/0.9/"
- xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
-
-&lt;channel>
-&lt;title>Slashdot&lt;/title>
-&lt;link>http://slashdot.org/&lt;/link>
-&lt;description>News for nerds, stuff that matters&lt;/description>
-&lt;/channel>
-
-&lt;image>
-&lt;title>Slashdot&lt;/title>
-&lt;url>http://images.slashdot.org/topics/topicslashdot.gif&lt;/url>
-&lt;link>http://slashdot.org/&lt;/link>
-&lt;/image>
-
-&lt;item>
-&lt;title>To HDTV or Not to HDTV?&lt;/title>
-&lt;link>http://slashdot.org/article.pl?sid=01/12/28/0421241&lt;/link>
-&lt;/item>
-
-[...snip...]</span></pre>
-<ol>
-<li>As you saw <a href="#dialect.extract.urllib" title="Example 8.5. Introducing urllib">in a previous chapter</a>, <code>urlopen</code> takes a web page <abbr>URL</abbr> and returns a file-like object. Most importantly, this object has a <code>read</code> method which returns the <abbr>HTML</abbr> source of the web page.
-<li>Now you pass the file-like object to <code>minidom.parse</code>, which obediently calls the <code>read</code> method of the object and parses the <abbr>XML</abbr> data that the <code>read</code> method returns. The fact that this <abbr>XML</abbr> data is now coming straight from a web page is completely irrelevant. <code>minidom.parse</code> doesn't know about web pages, and it doesn't care about web pages; it just knows about file-like objects.
-<li>As soon as you're done with it, be sure to close the file-like object that <code>urlopen</code> gives you.
-<li>By the way, this <abbr>URL</abbr> is real, and it really is <abbr>XML</abbr>. It's an <abbr>XML</abbr> representation of the current headlines on <a href="http://slashdot.org/">Slashdot</a>, a technical news and gossip site.
-<div class=example><h3>Example 10.3. Parsing <abbr>XML</abbr> from a string (the easy but inflexible way)</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parseString(contents)</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>
-<samp>&lt;?xml version="1.0" ?>
-&lt;grammar>&lt;ref id="bit">&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar></span></pre>
-<ol>
-<li><code>minidom</code> has a method, <code>parseString</code>, which takes an entire <abbr>XML</abbr> document as a string and parses it. You can use this instead of <code>minidom.parse</code> if you know you already have your entire <abbr>XML</abbr> document in a string.
-<p>OK, so you can use the <code>minidom.parse</code> function for parsing both local files and remote <abbr>URL</abbr>s, but for parsing strings, you use... a different function. That means that if you want to be able to take input from a
-file, a <abbr>URL</abbr>, or a string, you'll need special logic to check whether it's a string, and call the <code>parseString</code> function instead. How unsatisfying.
-<p>If there were a way to turn a string into a file-like object, then you could simply pass this object to <code>minidom.parse</code>. And in fact, there is a module specifically designed for doing just that: <code>StringIO</code>.
-<div class=example><h3 id="kgp.openanything.stringio.example">Example 10.4. Introducing <code>StringIO</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
-<samp class=p>>>> </samp><kbd>import StringIO</kbd>
-<samp class=p>>>> </samp><kbd>ssock = StringIO.StringIO(contents)</kbd>   <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>ssock.read()</kbd>        <span>&#x2461;</span>
-"&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"
-<samp class=p>>>> </samp><kbd>ssock.read()</kbd>        <span>&#x2462;</span>
-''
-<samp class=p>>>> </samp><kbd>ssock.seek(0)</kbd>       <span>&#x2463;</span>
-<samp class=p>>>> </samp><kbd>ssock.read(15)</kbd>      <span>&#x2464;</span>
-'&lt;grammar>&lt;ref i'
-<samp class=p>>>> </samp><kbd>ssock.read(15)</kbd>
-"d='bit'>&lt;p>0&lt;/p"
-<samp class=p>>>> </samp><kbd>ssock.read()</kbd>
-'>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>'
-<samp class=p>>>> </samp><kbd>ssock.close()</kbd>       <span>&#x2465;</span></pre>
-<ol>
-<li>The <code>StringIO</code> module contains a single class, also called <code>StringIO</code>, which allows you to turn a string into a file-like object. The <code>StringIO</code> class takes the string as a parameter when creating an instance.
-<li>Now you have a file-like object, and you can do all sorts of file-like things with it. Like <code>read</code>, which returns the original string.
-<li>Calling <code>read</code> again returns an empty string. This is how real file objects work too; once you read the entire file, you can't read any
-            more without explicitly seeking to the beginning of the file. The <code>StringIO</code> object works the same way.
-<li>You can explicitly seek to the beginning of the string, just like seeking through a file, by using the <code>seek</code> method of the <code>StringIO</code> object.
-<li>You can also read the string in chunks, by passing a <var>size</var> parameter to the <code>read</code> method.
-<li>At any time, <code>read</code> will return the rest of the string that you haven't read yet. All of this is exactly how file objects work; hence the term
-<em>file-like object</em>.
-<div class=example><h3>Example 10.5. Parsing <abbr>XML</abbr> from a string (the file-like object way)</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
-<samp class=p>>>> </samp><kbd>ssock = StringIO.StringIO(contents)</kbd>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse(ssock)</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>ssock.close()</kbd>
-<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>
-<samp>&lt;?xml version="1.0" ?>
-&lt;grammar>&lt;ref id="bit">&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar></span></pre>
-<ol>
-<li>Now you can pass the file-like object (really a <code>StringIO</code>) to <code>minidom.parse</code>, which will call the object's <code>read</code> method and happily parse away, never knowing that its input came from a hard-coded string.
-<p>So now you know how to use a single function, <code>minidom.parse</code>, to parse an <abbr>XML</abbr> document stored on a web page, in a local file, or in a hard-coded string. For a web page, you use <code>urlopen</code> to get a file-like object; for a local file, you use <code>open</code>; and for a string, you use <code>StringIO</code>. Now let's take it one step further and generalize <em>these</em> differences as well.
-<div class=example><h3 id="kgp.openanything.example">Example 10.6. <code>openAnything</code></h3><pre><code>
-def openAnything(source):<span>&#x2460;</span>
-    # try to open with urllib (if source is http, ftp, or file URL)
-    import urllib       
-    try:                
-        return urllib.urlopen(source)      <span>&#x2461;</span>
-    except (IOError, OSError):            
-        pass            
-
-    # try to open with native open function (if source is pathname)
-    try:                
-        return open(source)                <span>&#x2462;</span>
-    except (IOError, OSError):            
-        pass            
-
-    # treat source as string
-    import StringIO     
-    return StringIO.StringIO(str(source))  <span>&#x2463;</span></pre>
-<ol>
-<li>The <code>openAnything</code> function takes a single parameter, <var>source</var>, and returns a file-like object. <var>source</var> is a string of some sort; it can either be a <abbr>URL</abbr> (like <code>'http://slashdot.org/slashdot.rdf'</code>), a full or partial pathname to a local file (like <code>'binary.xml'</code>), or a string that contains actual <abbr>XML</abbr> data to be parsed.
-<li>First, you see if <var>source</var> is a <abbr>URL</abbr>. You do this through brute force: you try to open it as a <abbr>URL</abbr> and silently ignore errors caused by trying to open something which is not a <abbr>URL</abbr>. This is actually elegant in the sense that, if <code>urllib</code> ever supports new types of <abbr>URL</abbr>s in the future, you will also support them without recoding. If <code>urllib</code> is able to open <var>source</var>, then the <code>return</code> kicks you out of the function immediately and the following <code>try</code> statements never execute.
-<li>On the other hand, if <code>urllib</code> yelled at you and told you that <var>source</var> wasn't a valid <abbr>URL</abbr>, you assume it's a path to a file on disk and try to open it. Again, you don't do anything fancy to check whether <var>source</var> is a valid filename or not (the rules for valid filenames vary wildly between different platforms anyway, so you'd probably
-            get them wrong anyway). Instead, you just blindly open the file, and silently trap any errors.
-<li>By this point, you need to assume that <var>source</var> is a string that has hard-coded data in it (since nothing else worked), so you use <code>StringIO</code> to create a file-like object out of it and return that. (In fact, since you're using the <code>str</code> function, <var>source</var> doesn't even need to be a string; it could be any object, and you'll use its string representation, as defined by its <code>__str__</code> <a href="#fileinfo.morespecial" title="5.7. Advanced Special Class Methods">special method</a>.)
-<p>Now you can use this <code>openAnything</code> function in conjunction with <code>minidom.parse</code> to make a function that takes a <var>source</var> that refers to an <abbr>XML</abbr> document somehow (either as a <abbr>URL</abbr>, or a local filename, or a hard-coded <abbr>XML</abbr> document in a string) and parses it.
-<div class=example><h3>Example 10.7. Using <code>openAnything</code> in <code>kgp.py</code></h3><pre><code>
-class KantGenerator:
-    def _load(self, source):
-        sock = toolbox.openAnything(source)
-        xmldoc = minidom.parse(sock).documentElement
-        sock.close()
-        return xmldoc</pre><h2 id="kgp.stdio">10.2. Standard input, output, and error</h2>
-<p><abbr>UNIX</abbr> users are already familiar with the concept of standard input, standard output, and standard error. This section is for
-   the rest of you.
-<p>Standard output and standard error (commonly abbreviated <code>stdout</code> and <code>stderr</code>) are pipes that are built into every <abbr>UNIX</abbr> system. When you <code>print</code> something, it goes to the <code>stdout</code> pipe; when your program crashes and prints out debugging information (like a traceback in Python), it goes to the <code>stderr</code> pipe. Both of these pipes are ordinarily just connected to the terminal window where you are working, so when a program
-prints, you see the output, and when a program crashes, you see the debugging information. (If you're working on a system
-with a window-based Python <abbr>IDE</abbr>, <code>stdout</code> and <code>stderr</code> default to your &#8220;Interactive Window&#8221;.)
-<div class=example><h3>Example 10.8. Introducing <code>stdout</code> and <code>stderr</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
-<samp class=p>...    </samp>print 'Dive in'             <span>&#x2460;</span>
-<samp>Dive in
-Dive in
-Dive in</samp>
-<samp class=p>>>> </samp><kbd>import sys</kbd>
-<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
-<samp class=p>...    </samp>sys.stdout.write('Dive in') <span>&#x2461;</span>
-Dive inDive inDive in
-<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
-<samp class=p>...    </samp>sys.stderr.write('Dive in') <span>&#x2462;</span>
-Dive inDive inDive in</pre>
-<ol>
-<li>As you saw in <a href="#fileinfo.for.counter" title="Example 6.9. Simple Counters">Example 6.9, &#8220;Simple Counters&#8221;</a>, you can use Python's built-in <code>range</code> function to build simple counter loops that repeat something a set number of times.
-<li><code>stdout</code> is a file-like object; calling its <code>write</code> function will print out whatever string you give it. In fact, this is what the <code>print</code> function really does; it adds a carriage return to the end of the string you're printing, and calls <code>sys.stdout.write</code>.
-<li>In the simplest case, <code>stdout</code> and <code>stderr</code> send their output to the same place: the Python <abbr>IDE</abbr> (if you're in one), or the terminal (if you're running Python from the command line). Like <code>stdout</code>, <code>stderr</code> does not add carriage returns for you; if you want them, add them yourself.
-<p><code>stdout</code> and <code>stderr</code> are both file-like objects, like the ones you discussed in <a href="#kgp.openanything" title="10.1. Abstracting input sources">Section 10.1, &#8220;Abstracting input sources&#8221;</a>, but they are both write-only. They have no <code>read</code> method, only <code>write</code>. Still, they are file-like objects, and you can assign any other file- or file-like object to them to redirect their output.
-<div class=example><h3>Example 10.9. Redirecting output</h3><pre class=screen>
-<samp class=p>[you@localhost kgp]$ </samp>python stdout.py
-Dive in
-<samp class=p>[you@localhost kgp]$ </samp>cat out.log
-This message will be logged instead of displayed</pre><p>(On Windows, you can use <code>type</code> instead of <code>cat</code> to display the contents of a file.)
-<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
-<pre><code>
-#stdout.py
-import sys
-
-print 'Dive in'      <span>&#x2460;</span>
-saveout = sys.stdout <span>&#x2461;</span>
-fsock = open('out.log', 'w')           <span>&#x2462;</span>
-sys.stdout = fsock   <span>&#x2463;</span>
-print 'This message will be logged instead of displayed' <span>&#x2464;</span>
-sys.stdout = saveout <span>&#x2465;</span>
-fsock.close()        <span>&#x2466;</span>
-</pre>
-<ol>
-<li>This will print to the <abbr>IDE</abbr> &#8220;Interactive Window&#8221; (or the terminal, if running the script from the command line).
-<li>Always save <code>stdout</code> before redirecting it, so you can set it back to normal later.
-<li>Open a file for writing. If the file doesn't exist, it will be created. If the file does exist, it will be overwritten.
-<li>Redirect all further output to the new file you just opened.
-<li>This will be &#8220;printed&#8221; to the log file only; it will not be visible in the <abbr>IDE</abbr> window or on the screen.
-<li>Set <code>stdout</code> back to the way it was before you mucked with it.
-<li>Close the log file.
-<p>Redirecting <code>stderr</code> works exactly the same way, using <code>sys.stderr</code> instead of <code>sys.stdout</code>.
-<div class=example><h3>Example 10.10. Redirecting error information</h3><pre class=screen>
-<samp class=p>[you@localhost kgp]$ </samp>python stderr.py
-<samp class=p>[you@localhost kgp]$ </samp>cat error.log
-<samp>Traceback (most recent line last):
-  File "stderr.py", line 5, in ?
-    raise Exception, 'this error will be logged'
-Exception: this error will be logged</span></pre><p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
-<pre><code>
-#stderr.py
-import sys
-
-fsock = open('error.log', 'w')               <span>&#x2460;</span>
-sys.stderr = fsock         <span>&#x2461;</span>
-raise Exception, 'this error will be logged' <span>&#x2462;</span> <span>&#x2463;</span>
-</pre>
-<ol>
-<li>Open the log file where you want to store debugging information.
-<li>Redirect standard error by assigning the file object of the newly-opened log file to <code>stderr</code>.
-<li>Raise an exception. Note from the screen output that this does <em>not</em> print anything on screen. All the normal traceback information has been written to <code>error.log</code>.
-<li>Also note that you're not explicitly closing your log file, nor are you setting <code>stderr</code> back to its original value. This is fine, since once the program crashes (because of the exception), Python will clean up and close the file for us, and it doesn't make any difference that <code>stderr</code> is never restored, since, as I mentioned, the program crashes and Python ends. Restoring the original is more important for <code>stdout</code>, if you expect to go do other stuff within the same script afterwards.
-<p>Since it is so common to write error messages to standard error, there is a shorthand syntax that can be used instead of going
-through the hassle of redirecting it outright.
-<div class=example><h3 id="kgp.stdio.print.example">Example 10.11. Printing to <code>stderr</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>print 'entering function'</kbd>
-entering function
-<samp class=p>>>> </samp><kbd>import sys</kbd>
-<samp class=p>>>> </samp><kbd>print >> sys.stderr, 'entering function'</kbd> <span>&#x2460;</span>
-entering function
-</pre>
-<ol>
-<li>This shorthand syntax of the <code>print</code> statement can be used to write to any open file, or file-like object. In this case, you can redirect a single <code>print</code> statement to <code>stderr</code> without affecting subsequent <code>print</code> statements.
-<p>Standard input, on the other hand, is a read-only file object, and it represents the data flowing into the program from some
-previous program. This will likely not make much sense to classic Mac OS users, or even Windows users unless you were ever fluent on the <abbr>MS-DOS</abbr> command line. The way it works is that you can construct a chain of commands in a single line, so that one program's output
-becomes the input for the next program in the chain. The first program simply outputs to standard output (without doing any
-special redirecting itself, just doing normal <code>print</code> statements or whatever), and the next program reads from standard input, and the operating system takes care of connecting
-one program's output to the next program's input.
 <div class=example><h3>Example 10.12. Chaining commands</h3><pre class=screen>
 <samp class=p>[you@localhost kgp]$ </samp>python kgp.py -g binary.xml         <span>&#x2460;</span>
 01100111
@@ -4271,101 +2141,17 @@ def openAnything(source):
 [... snip ...]</pre>
 <ol>
 <li>This is the <code>openAnything</code> function from <code>toolbox.py</code>, which you previously examined in <a href="#kgp.openanything" title="10.1. Abstracting input sources">Section 10.1, &#8220;Abstracting input sources&#8221;</a>. All you've done is add three lines of code at the beginning of the function to check if the source is &#8220;<code>-</code>&#8221;; if so, you return <code>sys.stdin</code>. Really, that's it!  Remember, <code>stdin</code> is a file-like object with a <code>read</code> method, so the rest of the code (in <code>kgp.py</code>, where you call <code>openAnything</code>) doesn't change a bit.
-<h2 id="kgp.cache">10.3. Caching node lookups</h2>
-<p><code>kgp.py</code> employs several tricks which may or may not be useful to you in your <abbr>XML</abbr> processing. The first one takes advantage of the consistent structure of the input documents to build a cache of nodes.
-<p>A grammar file defines a series of <code>ref</code> elements. Each <code>ref</code> contains one or more <code>p</code> elements, which can contain a lot of different things, including <code>xref</code>s. Whenever you encounter an <code>xref</code>, you look for a corresponding <code>ref</code> element with the same <code>id</code> attribute, and choose one of the <code>ref</code> element's children and parse it. (You'll see how this random choice is made in the next section.)
-<p>This is how you build up the grammar: define <code>ref</code> elements for the smallest pieces, then define <code>ref</code> elements which "include" the first <code>ref</code> elements by using <code>xref</code>, and so forth. Then you parse the "largest" reference and follow each <code>xref</code>, and eventually output real text. The text you output depends on the (random) decisions you make each time you fill in an
-<code>xref</code>, so the output is different each time.
-<p>This is all very flexible, but there is one downside: performance. When you find an <code>xref</code> and need to find the corresponding <code>ref</code> element, you have a problem. The <code>xref</code> has an <code>id</code> attribute, and you want to find the <code>ref</code> element that has that same <code>id</code> attribute, but there is no easy way to do that. The slow way to do it would be to get the entire list of <code>ref</code> elements each time, then manually loop through and look at each <code>id</code> attribute. The fast way is to do that once and build a cache, in the form of a dictionary.
-<div class=example><h3>Example 10.14. <code>loadGrammar</code></h3><pre><code>
-    def loadGrammar(self, grammar):       
-        self.grammar = self._load(grammar)
-        self.refs = {}   <span>&#x2460;</span>
-        for ref in self.grammar.getElementsByTagName("ref"): <span>&#x2461;</span>
-            self.refs[ref.attributes["id"].value] = ref      <span>&#x2462;</span> <span>&#x2463;</span></pre>
-<ol>
-<li>Start by creating an empty dictionary, <var>self.refs</var>.
-<li>As you saw in <a href="#kgp.search" title="9.5. Searching for elements">Section 9.5, &#8220;Searching for elements&#8221;</a>, <code>getElementsByTagName</code> returns a list of all the elements of a particular name. You easily can get a list of all the <code>ref</code> elements, then simply loop through that list.
-<li>As you saw in <a href="#kgp.attributes" title="9.6. Accessing element attributes">Section 9.6, &#8220;Accessing element attributes&#8221;</a>, you can access individual attributes of an element by name, using standard dictionary syntax. So the keys of the <var>self.refs</var> dictionary will be the values of the <code>id</code> attribute of each <code>ref</code> element.
-<li>The values of the <var>self.refs</var> dictionary will be the <code>ref</code> elements themselves. As you saw in <a href="#kgp.parse" title="9.3. Parsing XML">Section 9.3, &#8220;Parsing XML&#8221;</a>, each element, each node, each comment, each piece of text in a parsed <abbr>XML</abbr> document is an object.
-<p>Once you build this cache, whenever you come across an <code>xref</code> and need to find the <code>ref</code> element with the same <code>id</code> attribute, you can simply look it up in <var>self.refs</var>.
-<div class=example><h3>Example 10.15. Using the <code>ref</code> element cache</h3><pre><code>
-    def do_xref(self, node):
-        id = node.attributes["id"].value
-        self.parse(self.randomChildElement(self.refs[id]))</pre><p>You'll explore the <code>randomChildElement</code> function in the next section.
-<h2 id="kgp.child">10.4. Finding direct children of a node</h2>
-<p>Another useful techique when parsing <abbr>XML</abbr> documents is finding all the direct child elements of a particular element. For instance, in the grammar files, a <code>ref</code> element can have several <code>p</code> elements, each of which can contain many things, including other <code>p</code> elements. You want to find just the <code>p</code> elements that are children of the <code>ref</code>, not <code>p</code> elements that are children of other <code>p</code> elements.
-<p>You might think you could simply use <code>getElementsByTagName</code> for this, but you can't. <code>getElementsByTagName</code> searches recursively and returns a single list for all the elements it finds. Since <code>p</code> elements can contain other <code>p</code> elements, you can't use <code>getElementsByTagName</code>, because it would return nested <code>p</code> elements that you don't want. To find only direct child elements, you'll need to do it yourself.
-<div class=example><h3>Example 10.16. Finding direct child elements</h3><pre><code>
-    def randomChildElement(self, node):
-        choices = [e for e in node.childNodes
- if e.nodeType == e.ELEMENT_NODE] <span>&#x2460;</span> <span>&#x2461;</span> <span>&#x2462;</span>
-        chosen = random.choice(choices)             <span>&#x2463;</span>
-        return chosen            </pre>
-<ol>
-<li>As you saw in <a href="#kgp.parse.gettingchildnodes.example" title="Example 9.9. Getting child nodes">Example 9.9, &#8220;Getting child nodes&#8221;</a>, the <code>childNodes</code> attribute returns a list of all the child nodes of an element.
-<li>However, as you saw in <a href="#kgp.parse.childnodescanbetext.example" title="Example 9.11. Child nodes can be text">Example 9.11, &#8220;Child nodes can be text&#8221;</a>, the list returned by <code>childNodes</code> contains all different types of nodes, including text nodes. That's not what you're looking for here. You only want the
-            children that are elements.
-<li>Each node has a <var>nodeType</var> attribute, which can be <code>ELEMENT_NODE</code>, <code>TEXT_NODE</code>, <code>COMMENT_NODE</code>, or any number of other values. The complete list of possible values is in the <code>__init__.py</code> file in the <code>xml.dom</code> package. (See <a href="#kgp.packages" title="9.2. Packages">Section 9.2, &#8220;Packages&#8221;</a> for more on packages.)  But you're just interested in nodes that are elements, so you can filter the list to only include
-            those nodes whose <var>nodeType</var> is <code>ELEMENT_NODE</code>.
-<li>Once you have a list of actual elements, choosing a random one is easy. Python comes with a module called <code>random</code> which includes several useful functions. The <code>random.choice</code> function takes a list of any number of items and returns a random item. For example, if the <code>ref</code> elements contains several <code>p</code> elements, then <var>choices</var> would be a list of <code>p</code> elements, and <var>chosen</var> would end up being assigned exactly one of them, selected at random.
-<h2 id="kgp.handler">10.5. Creating separate handlers by node type</h2>
-<p>The third useful <abbr>XML</abbr> processing tip involves separating your code into logical functions, based on node types and element names. Parsed <abbr>XML</abbr> documents are made up of various types of nodes, each represented by a Python object. The root level of the document itself is represented by a <code>Document</code> object. The <code>Document</code> then contains one or more <code>Element</code> objects (for actual <abbr>XML</abbr> tags), each of which may contain other <code>Element</code> objects, <code>Text</code> objects (for bits of text), or <code>Comment</code> objects (for embedded comments). Python makes it easy to write a dispatcher to separate the logic for each node type.
-<div class=example><h3>Example 10.17. Class names of parsed <abbr>XML</abbr> objects</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>from xml.dom import minidom</kbd>
-<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse('kant.xml')</kbd> <span>&#x2460;</span>
-<samp class=p>>>> </samp><kbd>xmldoc</kbd>
-&lt;xml.dom.minidom.Document instance at 0x01359DE8>
-<samp class=p>>>> </samp><kbd>xmldoc.__class__</kbd> <span>&#x2461;</span>
-&lt;class xml.dom.minidom.Document at 0x01105D40>
-<samp class=p>>>> </samp><kbd>xmldoc.__class__.__name__</kbd>          <span>&#x2462;</span>
-'Document'</pre>
-<ol>
-<li>Assume for a moment that <code>kant.xml</code> is in the current directory.
-<li>As you saw in <a href="#kgp.packages" title="9.2. Packages">Section 9.2, &#8220;Packages&#8221;</a>, the object returned by parsing an <abbr>XML</abbr> document is a <code>Document</code> object, as defined in the <code>minidom.py</code> in the <code>xml.dom</code> package. As you saw in <a href="#fileinfo.create" title="5.4. Instantiating Classes">Section 5.4, &#8220;Instantiating Classes&#8221;</a>, <code>__class__</code> is built-in attribute of every Python object.
-<li>Furthermore, <code>__name__</code> is a built-in attribute of every Python class, and it is a string. This string is not mysterious; it's the same as the class name you type when you define a class
-            yourself. (See <a href="#fileinfo.class" title="5.3. Defining Classes">Section 5.3, &#8220;Defining Classes&#8221;</a>.)
-<p>Fine, so now you can get the class name of any particular <abbr>XML</abbr> node (since each <abbr>XML</abbr> node is represented as a Python object). How can you use this to your advantage to separate the logic of parsing each node type?  The answer is <code>getattr</code>, which you first saw in <a href="#apihelper.getattr" title="4.4. Getting Object References With getattr">Section 4.4, &#8220;Getting Object References With getattr&#8221;</a>.
-<div class=example><h3>Example 10.18. <code>parse</code>, a generic <abbr>XML</abbr> node dispatcher</h3><pre><code>
-    def parse(self, node):          
-        parseMethod = getattr(self, "parse_%s" % node.__class__.__name__) <span>&#x2460;</span> <span>&#x2461;</span>
-        parseMethod(node) <span>&#x2462;</span></pre>
-<ol>
-<li>First off, notice that you're constructing a larger string based on the class name of the node you were passed (in the <var>node</var> argument). So if you're passed a <code>Document</code> node, you're constructing the string <code>'parse_Document'</code>, and so forth.
-<li>Now you can treat that string as a function name, and get a reference to the function itself using <code>getattr</code><li>Finally, you can call that function and pass the node itself as an argument. The next example shows the definitions of each
-            of these functions.
-<div class=example><h3>Example 10.19. Functions called by the <code>parse</code> dispatcher</h3><pre><code>
-    def parse_Document(self, node): <span>&#x2460;</span>
-        self.parse(node.documentElement)
 
-    def parse_Text(self, node):    <span>&#x2461;</span>
-        text = node.data
-        if self.capitalizeNextWord:
-            self.pieces.append(text[0].upper())
-            self.pieces.append(text[1:])
-            self.capitalizeNextWord = 0
-        else:
-            self.pieces.append(text)
 
-    def parse_Comment(self, node): <span>&#x2462;</span>
-        pass
 
-    def parse_Element(self, node): <span>&#x2463;</span>
-        handlerMethod = getattr(self, "do_%s" % node.tagName)
-        handlerMethod(node)</pre>
-<ol>
-<li><code>parse_Document</code> is only ever called once, since there is only one <code>Document</code> node in an <abbr>XML</abbr> document, and only one <code>Document</code> object in the parsed <abbr>XML</abbr> representation. It simply turns around and parses the root element of the grammar file.
-<li><code>parse_Text</code> is called on nodes that represent bits of text. The function itself does some special processing to handle automatic capitalization
-            of the first word of a sentence, but otherwise simply appends the represented text to a list.
-<li><code>parse_Comment</code> is just a <code>pass</code>, since you don't care about embedded comments in the grammar files. Note, however, that you still need to define the function
-            and explicitly make it do nothing. If the function did not exist, the generic <code>parse</code> function would fail as soon as it stumbled on a comment, because it would try to find the non-existent <code>parse_Comment</code> function. Defining a separate function for every node type, even ones you don't use, allows the generic <code>parse</code> function to stay simple and dumb.
-<li>The <code>parse_Element</code> method is actually itself a dispatcher, based on the name of the element's tag. The basic idea is the same: take what distinguishes
-            elements from each other (their tag names) and dispatch to a separate function for each of them. You construct a string like
-<code>'do_xref'</code> (for an <code>&lt;xref></code> tag), find a function of that name, and call it. And so forth for each of the other tag names that might be found in the
-            course of parsing a grammar file (<code>&lt;p></code> tags, <code>&lt;choice></code> tags).
-<p>In this example, the dispatch functions <code>parse</code> and <code>parse_Element</code> simply find other methods in the same class. If your processing is very complex (or you have many different tag names),
-you could break up your code into separate modules, and use dynamic importing to import each module and call whatever functions
-you needed. Dynamic importing will be discussed in <a href="#regression" title="Chapter 16. Functional Programming">Chapter 16, <i>Functional Programming</i></a>.
+
+
+[more XML stuff was here]
+
+
+
+
+
 <h2 id="kgp.commandline">10.6. Handling command-line arguments</h2>
 <p>Python fully supports creating programs that can be run on the command line, complete with command-line arguments and either short-
    or long-style flags to specify various options. None of this is <abbr>XML</abbr>-specific, but this script makes good use of command-line processing, so it seemed like a good time to mention it.
@@ -4578,184 +2364,11 @@ def main(argv):
 
 
 
-<div class=chapter>
-<h2 id="roman">Chapter 13. Unit Testing</h2>
-<h2 id="roman.intro">13.1. Introduction to Roman numerals</h2>
-<p>In previous chapters, you &#8220;dived in&#8221; by immediately looking at code and trying to understand it as quickly as possible. Now that you have some Python under your belt, you're going to step back and look at the steps that happen <em>before</em> the code gets written.
-<p>In the next few chapters, you're going to write, debug, and optimize a set of utility functions to convert to and from Roman
-numerals. You saw the mechanics of constructing and validating Roman numerals in <a href="#re.roman" title="7.3. Case Study: Roman Numerals">Section 7.3, &#8220;Case Study: Roman Numerals&#8221;</a>, but now let's step back and consider what it would take to expand that into a two-way utility.
-<p><a href="#re.roman" title="7.3. Case Study: Roman Numerals">The rules for Roman numerals</a> lead to a number of interesting observations:
-<div class=orderedlist>
-<ol>
-<li>There is only one correct way to represent a particular number as Roman numerals.
-<li>The converse is also true: if a string of characters is a valid Roman numeral, it represents only one number (<i class=foreignphrase><abbr>i.e.</abbr></i> it can only be read one way).
+[unit testing stuff was here]
 
-<li>There is a limited range of numbers that can be expressed as Roman numerals, specifically <code>1</code> through <code>3999</code>. (The Romans did have several ways of expressing larger numbers, for instance by having a bar over a numeral to represent
-      that its normal value should be multiplied by <code>1000</code>, but you're not going to deal with that. For the purposes of this chapter, let's stipulate that Roman numerals go from <code>1</code> to <code>3999</code>.)
 
-<li>There is no way to represent <code>0</code> in Roman numerals. (Amazingly, the ancient Romans had no concept of <code>0</code> as a number. Numbers were for counting things you had; how can you count what you don't have?)
 
-<li>There is no way to represent negative numbers in Roman numerals.
-<li>There is no way to represent fractions or non-integer numbers in Roman numerals.
-</ol>
-<p>Given all of this, what would you expect out of a set of functions to convert to and from Roman numerals?
-<div class=orderedlist><h3 id="roman.requirements"><code>roman.py</code> requirements</h3>
-<ol>
-<li><code>to_roman()</code> should return the Roman numeral representation for all integers <code>1</code> to <code>3999</code>.
 
-<li><code>to_roman()</code> should fail when given an integer outside the range <code>1</code> to <code>3999</code>.
-
-<li><code>to_roman()</code> should fail when given a non-integer number.
-
-<li><code>from_roman()</code> should take a valid Roman numeral and return the number that it represents.
-
-<li><code>from_roman()</code> should fail when given an invalid Roman numeral.
-
-<li>If you take a number, convert it to Roman numerals, then convert that back to a number, you should end up with the number
-      you started with. So <code>from_roman(to_roman(n)) == n</code> for all <var>n</var> in <code>1..3999</code>.
-
-<li><code>to_roman()</code> should always return a Roman numeral using uppercase letters.
-
-<li><code>from_roman()</code> should only accept uppercase Roman numerals (<i class=foreignphrase><abbr>i.e.</abbr></i> it should fail when given lowercase input).
-
-</ol>
-<div class=itemizedlist>
-<h3>Further reading</h3>
-<ul>
-<li><a href="http://www.wilkiecollins.demon.co.uk/roman/front.htm">This site</a> has more on Roman numerals, including a fascinating <a href="http://www.wilkiecollins.demon.co.uk/roman/intro.htm">history</a> of how Romans and other civilizations really used them (short answer: haphazardly and inconsistently).
-
-</ul>
-<h2 id="roman.failure">13.5. Testing for failure</h2>
-<p>It is not enough to test that functions succeed when given good input; you must also test that they fail when given bad input. And not just any sort of failure; they must fail in the way you expect.
-<p>Remember the <a href="#roman.requirements">other requirements</a> for <code>to_roman()</code>:
-<div class=orderedlist>
-<ol start="2">
-<li><code>to_roman()</code> should fail when given an integer outside the range <code>1</code> to <code>3999</code>.
-
-<li><code>to_roman()</code> should fail when given a non-integer number.
-
-</ol>
-<p>In Python, functions indicate failure by raising <a href="#fileinfo.exception" title="6.1. Handling Exceptions">exceptions</a>, and the <code>unittest</code> module provides methods for testing whether a function raises a particular exception when given bad input.
-<div class=example><h3 id="roman.tobadinput.example">Example 13.3. Testing bad input to <code>to_roman()</code></h3><pre><code>
-class ToRomanBadInput(unittest.TestCase):          
-    def testTooLarge(self):      
-        """to_roman should fail with large input""" 
-        self.assertRaises(roman.OutOfRangeError, roman.to_roman, 4000) <span>&#x2460;</span>
-
-    def testZero(self):          
-        """to_roman should fail with 0 input"""     
-        self.assertRaises(roman.OutOfRangeError, roman.to_roman, 0)    <span>&#x2461;</span>
-
-    def testNegative(self):      
-        """to_roman should fail with negative input"""                
-        self.assertRaises(roman.OutOfRangeError, roman.to_roman, -1)  
-
-    def testNonInteger(self):    
-        """to_roman should fail with non-integer input"""             
-        self.assertRaises(roman.NotIntegerError, roman.to_roman, 0.5)  <span>&#x2462;</span></pre>
-<ol>
-<li>The <code>TestCase</code> class of the <code>unittest</code> provides the <code>assertRaises</code> method, which takes the following arguments: the exception you're expecting, the function you're testing, and the arguments
-            you're passing that function. (If the function you're testing takes more than one argument, pass them all to <code>assertRaises</code>, in order, and it will pass them right along to the function you're testing.)  Pay close attention to what you're doing here:
-            instead of calling <code>to_roman()</code> directly and manually checking that it raises a particular exception (by wrapping it in a <a href="#fileinfo.exception" title="6.1. Handling Exceptions"><code>try...except</code> block</a>), <code>assertRaises</code> has encapsulated all of that for us. All you do is give it the exception (<code>roman.OutOfRangeError</code>), the function (<code>to_roman()</code>), and <code>to_roman()</code>'s arguments (<code>4000</code>), and <code>assertRaises</code> takes care of calling <code>to_roman()</code> and checking to make sure that it raises <code>roman.OutOfRangeError</code>. (Also note that you're passing the <code>to_roman()</code> function itself as an argument; you're not calling it, and you're not passing the name of it as a string. Have I mentioned
-            recently how handy it is that <a href="#odbchelper.objects" title="2.4. Everything Is an Object">everything in Python is an object</a>, including functions and exceptions?)
-<li>Along with testing numbers that are too large, you need to test numbers that are too small. Remember, Roman numerals cannot
-            express <code>0</code> or negative numbers, so you have a test case for each of those (<code>testZero</code> and <code>testNegative</code>). In <code>testZero</code>, you are testing that <code>to_roman()</code> raises a <code>roman.OutOfRangeError</code> exception when called with <code>0</code>; if it does <em>not</em> raise a <code>roman.OutOfRangeError</code> (either because it returns an actual value, or because it raises some other exception), this test is considered failed.
-<li><a href="#roman.requirements">Requirement #3</a> specifies that <code>to_roman()</code> cannot accept a non-integer number, so here you test to make sure that <code>to_roman()</code> raises a <code>roman.NotIntegerError</code> exception when called with <code>0.5</code>. If <code>to_roman()</code> does not raise a <code>roman.NotIntegerError</code>, this test is considered failed.
-<p>The next two <a href="#roman.requirements">requirements</a> are similar to the first three, except they apply to <code>from_roman()</code> instead of <code>to_roman()</code>:
-<div class=orderedlist>
-<ol start="4">
-<li><code>from_roman()</code> should take a valid Roman numeral and return the number that it represents.
-
-<li><code>from_roman()</code> should fail when given an invalid Roman numeral.
-
-</ol>
-<p>Requirement #4 is handled in the same way as <a href="#roman.testtoromanknownvalues.example" title="Example 13.2. testToRomanKnownValues">requirement #1</a>, iterating through a sampling of known values and testing each in turn. Requirement #5 is handled in the same way as requirements
-#2 and #3, by testing a series of bad inputs and making sure <code>from_roman()</code> raises the appropriate exception.
-<div class=example><h3 id="roman.frombadinput.example">Example 13.4. Testing bad input to <code>from_roman()</code></h3><pre><code>
-class FromRomanBadInput(unittest.TestCase):  
-    def testTooManyRepeatedNumerals(self):   
-        """from_roman should fail with too many repeated numerals"""              
-        for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):             
-            self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, s) <span>&#x2460;</span>
-
-    def testRepeatedPairs(self):             
-        """from_roman should fail with repeated pairs of numerals"""              
-        for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):               
-            self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, s)
-
-    def testMalformedAntecedent(self):       
-        """from_roman should fail with malformed antecedents""" 
-        for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
-'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):     
-            self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, s)</pre>
-<ol>
-<li>Not much new to say about these; the pattern is exactly the same as the one you used to test bad input to <code>to_roman()</code>. I will briefly note that you have another exception: <code>roman.InvalidRomanNumeralError</code>. That makes a total of three custom exceptions that will need to be defined in <code>roman.py</code> (along with <code>roman.OutOfRangeError</code> and <code>roman.NotIntegerError</code>). You'll see how to define these custom exceptions when you actually start writing <code>roman.py</code>, later in this chapter.
-<h2 id="roman.sanity">13.6. Testing for sanity</h2>
-<p>Often, you will find that a unit of code contains a set of reciprocal functions, usually in the form of conversion functions
-   where one converts A to B and the other converts B to A. In these cases, it is useful to create a &#8220;sanity check&#8221; to make sure that you can convert A to B and back to A without losing precision, incurring rounding errors, or triggering
-   any other sort of bug.
-<p>Consider this <a href="#roman.requirements">requirement</a>:
-<div class=orderedlist>
-<ol start="6">
-<li>If you take a number, convert it to Roman numerals, then convert that back to a number, you should end up with the number
-      you started with. So <code>from_roman(to_roman(n)) == n</code> for all <var>n</var> in <code>1..3999</code>.
-
-</ol>
-<div class=example><h3 id="roman.sanity.example">Example 13.5. Testing <code>to_roman()</code> against <code>from_roman()</code></h3><pre><code>
-class SanityCheck(unittest.TestCase):        
-    def testSanity(self):  
-        """from_roman(to_roman(n))==n for all n"""
-        for integer in range(1, 4000):        <span>&#x2460;</span> <span>&#x2461;</span>
-            numeral = roman.to_roman(integer) 
-            result = roman.from_roman(numeral)
-            self.assertEqual(integer, result) <span>&#x2462;</span></pre>
-<ol>
-<li>You've seen <a href="#odbchelper.multiassign.range" title="Example 3.20. Assigning Consecutive Values">the <code>range</code> function</a> before, but here it is called with two arguments, which returns a list of integers starting at the first argument (<code>1</code>) and counting consecutively up to <em>but not including</em> the second argument (<code>4000</code>). Thus, <code>1..3999</code>, which is the valid range for converting to Roman numerals.
-<li>I just wanted to mention in passing that <var>integer</var> is not a keyword in Python; here it's just a variable name like any other.
-<li>The actual testing logic here is straightforward: take a number (<var>integer</var>), convert it to a Roman numeral (<var>numeral</var>), then convert it back to a number (<var>result</var>) and make sure you end up with the same number you started with. If not, <code>assertEqual</code> will raise an exception and the test will immediately be considered failed. If all the numbers match, <code>assertEqual</code> will always return silently, the entire <code>testSanity</code> method will eventually return silently, and the test will be considered passed.
-<p>The <a href="#roman.requirements">last two requirements</a> are different from the others because they seem both arbitrary and trivial:
-<div class=orderedlist>
-<ol start="7">
-<li><code>to_roman()</code> should always return a Roman numeral using uppercase letters.
-
-<li><code>from_roman()</code> should only accept uppercase Roman numerals (<i class=foreignphrase><abbr>i.e.</abbr></i> it should fail when given lowercase input).
-
-</ol>
-<p>In fact, they are somewhat arbitrary. You could, for instance, have stipulated that <code>from_roman()</code> accept lowercase and mixed case input. But they are not completely arbitrary; if <code>to_roman()</code> is always returning uppercase output, then <code>from_roman()</code> must at least accept uppercase input, or the &#8220;sanity check&#8221; (requirement #6) would fail. The fact that it <em>only</em> accepts uppercase input is arbitrary, but as any systems integrator will tell you, case always matters, so it's worth specifying
-the behavior up front. And if it's worth specifying, it's worth testing.
-<div class=example><h3>Example 13.6. Testing for case</h3><pre><code>
-class CaseCheck(unittest.TestCase): 
-    def testToRomanCase(self):      
-        """to_roman should always return uppercase"""  
-        for integer in range(1, 4000):                
-            numeral = roman.to_roman(integer)          
-            self.assertEqual(numeral, numeral.upper())         <span>&#x2460;</span>
-
-    def testFromRomanCase(self):    
-        """from_roman should only accept uppercase input"""
-        for integer in range(1, 4000):                
-            numeral = roman.to_roman(integer)          
-            roman.from_roman(numeral.upper()) <span>&#x2461;</span> <span>&#x2462;</span>
-            self.assertRaises(roman.InvalidRomanNumeralError,
-            roman.from_roman, numeral.lower())   <span>&#x2463;</span></pre>
-<ol>
-<li>The most interesting thing about this test case is all the things it doesn't test. It doesn't test that the value returned
-            from <code>to_roman()</code> is <a href="#roman.testtoromanknownvalues.example" title="Example 13.2. testToRomanKnownValues">right</a> or even <a href="#roman.sanity.example" title="Example 13.5. Testing to_roman against from_roman">consistent</a>; those questions are answered by separate test cases. You have a whole test case just to test for uppercase-ness. You might
-            be tempted to combine this with the <a href="#roman.sanity.example" title="Example 13.5. Testing to_roman against from_roman">sanity check</a>, since both run through the entire range of values and call <code>to_roman()</code>.
-<sup>[<a name="d0e32781" href="#ftn.d0e32781">6</a>]</sup>  But that would violate one of the <a href="#roman.success" title="13.4. Testing for success">fundamental rules</a>: each test case should answer only a single question. Imagine that you combined this case check with the sanity check, and
-            then that test case failed. You would need to do further analysis to figure out which part of the test case failed to determine
-            what the problem was. If you need to analyze the results of your unit testing just to figure out what they mean, it's a sure
-            sign that you've mis-designed your test cases.
-<li>There's a similar lesson to be learned here: even though &#8220;you know&#8221; that <code>to_roman()</code> always returns uppercase, you are explicitly converting its return value to uppercase here to test that <code>from_roman()</code> accepts uppercase input. Why?  Because the fact that <code>to_roman()</code> always returns uppercase is an independent requirement. If you changed that requirement so that, for instance, it always
-            returned lowercase, the <code>testToRomanCase</code> test case would need to change, but this test case would still work. This was another of the <a href="#roman.success" title="13.4. Testing for success">fundamental rules</a>: each test case must be able to work in isolation from any of the others. Every test case is an island.
-<li>Note that you're not assigning the return value of <code>from_roman()</code> to anything. This is legal syntax in Python; if a function returns a value but nobody's listening, Python just throws away the return value. In this case, that's what you want. This test case doesn't test anything about the return
-            value; it just tests that <code>from_roman()</code> accepts the uppercase input without raising an exception.
-<li>This is a complicated line, but it's very similar to what you did in the <code>ToRomanBadInput</code> and <code>FromRomanBadInput</code> tests. You are testing to make sure that calling a particular function (<code>roman.from_roman</code>) with a particular value (<code>numeral.lower()</code>, the lowercase version of the current Roman numeral in the loop) raises a particular exception (<code>roman.InvalidRomanNumeralError</code>). If it does (each time through the loop), the test passes; if even one time it does something else (like raises a different
-            exception, or returning a value without raising an exception at all), the test fails.
-<p>In the next chapter, you'll see how to write code that passes these tests.
-<div class=footnotes><br><hr width="100" align="left">
-<div class=footnote>
-<p><sup>[<a name="ftn.d0e32781" href="#d0e32781">6</a>] </sup>&#8220;I can resist everything except temptation.&#8221; --Oscar Wilde
 <div class=chapter>
 <h2 id="roman1.5">Chapter 14. Test-First Programming</h2>
 <h2 id="roman.stage1">14.1. <code>roman.py</code>, stage 1</h2>
@@ -5478,11 +3091,17 @@ OK     </span><span>&#x2463;</span></pre>
 <table class=note border="0" summary="">
 
 <td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">When all of your tests pass, stop coding.
-<div class=chapter>
-<h2 id="regression">Chapter 16. Functional Programming</h2>
-<h2 id="regression.divein">16.1. Diving in</h2>
-<p>In <a href="#roman" title="Chapter 13. Unit Testing">Chapter 13, <i>Unit Testing</i></a>, you learned about the philosophy of unit testing. In <a href="#roman1.5" title="Chapter 14. Test-First Programming">Chapter 14, <i>Test-First Programming</i></a>, you stepped through the implementation of basic unit tests in Python. In <a href="#roman2" title="Chapter 15. Refactoring">Chapter 15, <i>Refactoring</i></a>, you saw how unit testing makes large-scale refactoring easier. This chapter will build on those sample programs, but here
-   we will focus more on advanced Python-specific techniques, rather than on unit testing itself.
+
+
+
+
+
+[functional programming stuff was here]
+
+
+
+
+
 <p>The following is a complete Python program that acts as a cheap and simple regression testing framework. It takes unit tests that you've written for individual
 modules, collects them all into one big test suite, and runs them all at once. I actually use this script as part of the
 build process for this book; I have unit tests for several of the example programs (not just the <code>roman.py</code> module featured in <a href="#roman" title="Chapter 13. Unit Testing">Chapter 13, <i>Unit Testing</i></a>), and the first thing my automated build script does is run this program to make sure all my examples still work. If this
@@ -5557,6 +3176,11 @@ OK</span></pre>
 <li>The first 5 tests are from <code>apihelpertest.py</code>, which tests the example script from <a href="#apihelper" title="Chapter 4. The Power Of Introspection">Chapter 4, <i>The Power Of Introspection</i></a>.
 <li>The next 5 tests are from <code>odbchelpertest.py</code>, which tests the example script from <a href="#odbchelper" title="Chapter 2. Your First Python Program">Chapter 2, <i>Your First Python Program</i></a>.
 <li>The rest are from <code>romantest.py</code>, which you studied in depth in <a href="#roman" title="Chapter 13. Unit Testing">Chapter 13, <i>Unit Testing</i></a>.
+
+
+
+
+
 <h2 id="regression.path">16.2. Finding the path</h2>
 <p>When running Python scripts from the command line, it is sometimes useful to know where the currently running script is located on disk.
 <p>This is one of those obscure little tricks that is virtually impossible to figure out on your own, but simple to remember
@@ -5642,116 +3266,17 @@ def regressionTest():
 <li>The rest of the function is the same.
 <p>This technique will allow you to re-use this <code>regression.py</code> script on multiple projects. Just put the script in a common directory, then change to the project's directory before running
    it. All of that project's unit tests will be found and tested, instead of the unit tests in the common directory where <code>regression.py</code> is located.
-<h2 id="regression.filter">16.3. Filtering lists revisited</h2>
-<p>You're already familiar with <a href="#apihelper.filter" title="4.5. Filtering Lists">using list comprehensions to filter lists</a>. There is another way to accomplish this same thing, which some people feel is more expressive.
-<p>Python has a built-in <code>filter</code> function which takes two arguments, a function and a list, and returns a list.
-<sup>[<a name="d0e35697" href="#ftn.d0e35697">7</a>]</sup>  The function passed as the first argument to <code>filter</code> must itself take one argument, and the list that <code>filter</code> returns will contain all the elements from the list passed to <code>filter</code> for which the function passed to <code>filter</code> returns true.
-<p>Got all that?  It's not as difficult as it sounds.
-<div class=example><h3>Example 16.7. Introducing <code>filter</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>def odd(n):</kbd>                 <span>&#x2460;</span>
-<samp class=p>...    </samp>return n % 2
-<samp class=p>...    </samp>
-<samp class=p>>>> </samp><kbd>li = [1, 2, 3, 5, 9, 10, 256, -3]</kbd>
-<samp class=p>>>> </samp><kbd>filter(odd, li)</kbd>             <span>&#x2461;</span>
-[1, 3, 5, 9, -3]
-<samp class=p>>>> </samp><kbd>[e for e in li if odd(e)]</kbd>   <span>&#x2462;</span>
-<samp class=p>>>> </samp><kbd>filteredList = []</kbd>
-<samp class=p>>>> </samp><kbd>for n in li:</kbd>                <span>&#x2463;</span>
-<samp class=p>...    </samp>if odd(n):
-<samp class=p>...    </samp>    filteredList.append(n)
-<samp class=p>...    </samp>
-<samp class=p>>>> </samp><kbd>filteredList</kbd>
-[1, 3, 5, 9, -3]</pre>
-<ol>
-<li><code>odd</code> uses the built-in mod function &#8220;<code>%</code>&#8221; to return <code>True</code> if <var>n</var> is odd and <code>False</code> if <var>n</var> is even.
-<li><code>filter</code> takes two arguments, a function (<code>odd</code>) and a list (<var>li</var>). It loops through the list and calls <code>odd</code> with each element. If <code>odd</code> returns a true value (remember, any non-zero value is true in Python), then the element is included in the returned list, otherwise it is filtered out. The result is a list of only the odd
-            numbers from the original list, in the same order as they appeared in the original.
-<li>You could accomplish the same thing using list comprehensions, as you saw in <a href="#apihelper.filter" title="4.5. Filtering Lists">Section 4.5, &#8220;Filtering Lists&#8221;</a>.
-<li>You could also accomplish the same thing with a <code>for</code> loop. Depending on your programming background, this may seem more &#8220;straightforward&#8221;, but functions like <code>filter</code> are much more expressive. Not only is it easier to write, it's easier to read, too. Reading the <code>for</code> loop is like standing too close to a painting; you see all the details, but it may take a few seconds to be able to step
-            back and see the bigger picture: &#8220;Oh, you're just filtering the list!&#8221;
-<div class=example><h3>Example 16.8. <code>filter</code> in <code>regression.py</code></h3><pre><code>
-    files = os.listdir(path)              <span>&#x2460;</span>
-    test = re.compile("test\.py$", re.IGNORECASE)           <span>&#x2461;</span>
-    files = filter(test.search, files)    <span>&#x2462;</span></pre>
-<ol>
-<li>As you saw in <a href="#regression.path" title="16.2. Finding the path">Section 16.2, &#8220;Finding the path&#8221;</a>, <var>path</var> may contain the full or partial pathname of the directory of the currently running script, or it may contain an empty string
-            if the script is being run from the current directory. Either way, <var>files</var> will end up with the names of the files in the same directory as this script you're running.
-<li>This is a compiled regular expression. As you saw in <a href="#roman.refactoring" title="15.3. Refactoring">Section 15.3, &#8220;Refactoring&#8221;</a>, if you're going to use the same regular expression over and over, you should compile it for faster performance. The compiled
-            object has a <code>search</code> method which takes a single argument, the string to search. If the regular expression matches the string, the <code>search</code> method returns a <code>Match</code> object containing information about the regular expression match; otherwise it returns <code>None</code>, the Python null value.
-<li>For each element in the <var>files</var> list, you're going to call the <code>search</code> method of the compiled regular expression object, <var>test</var>. If the regular expression matches, the method will return a <code>Match</code> object, which Python considers to be true, so the element will be included in the list returned by <code>filter</code>. If the regular expression does not match, the <code>search</code> method will return <code>None</code>, which Python considers to be false, so the element will not be included.
-<p><b>Historical note. </b>Versions of Python prior to 2.0 did not have <a href="#odbchelper.map" title="3.6. Mapping Lists">list comprehensions</a>, so you couldn't <a href="#apihelper.filter" title="4.5. Filtering Lists">filter using list comprehensions</a>; the <code>filter</code> function was the only game in town. Even with the introduction of list comprehensions in 2.0, some people still prefer the
-old-style <code>filter</code> (and its companion function, <code>map</code>, which you'll see later in this chapter). Both techniques work at the moment, so which one you use is a matter of style.
-There is discussion that <code>map</code> and <code>filter</code> might be deprecated in a future version of Python, but no decision has been made.
-<div class=example><h3>Example 16.9. Filtering using list comprehensions instead</h3><pre><code>
-    files = os.listdir(path)             
-    test = re.compile("test\.py$", re.IGNORECASE)          
-    files = [f for f in files if test.search(f)] <span>&#x2460;</span></pre>
-<ol>
-<li>This will accomplish exactly the same result as using the <code>filter</code> function. Which way is more expressive?  That's up to you.
-<h2 id="regression.map">16.4. Mapping lists revisited</h2>
-<p>You're already familiar with using <a href="#odbchelper.map" title="3.6. Mapping Lists">list comprehensions</a> to map one list into another. There is another way to accomplish the same thing, using the built-in <code>map</code> function. It works much the same way as the <a href="#regression.filter" title="16.3. Filtering lists revisited"><code>filter</code></a> function.
-<div class=example><h3>Example 16.10. Introducing <code>map</code></h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>def double(n):</kbd>
-<samp class=p>...    </samp>return n*2
-<samp class=p>...    </samp>
-<samp class=p>>>> </samp><kbd>li = [1, 2, 3, 5, 9, 10, 256, -3]</kbd>
-<samp class=p>>>> </samp><kbd>map(double, li)</kbd>     <span>&#x2460;</span>
-[2, 4, 6, 10, 18, 20, 512, -6]
-<samp class=p>>>> </samp><kbd>[double(n) for n in li]</kbd>               <span>&#x2461;</span>
-[2, 4, 6, 10, 18, 20, 512, -6]
-<samp class=p>>>> </samp><kbd>newlist = []</kbd>
-<samp class=p>>>> </samp><kbd>for n in li:</kbd>        <span>&#x2462;</span>
-<samp class=p>...    </samp>newlist.append(double(n))
-<samp class=p>...    </samp>
-<samp class=p>>>> </samp><kbd>newlist</kbd>
-[2, 4, 6, 10, 18, 20, 512, -6]</pre>
-<ol>
-<li><code>map</code> takes a function and a list<sup>[<a name="d0e36079" href="#ftn.d0e36079">8</a>]</sup> and returns a new list by calling the function with each element of the list in order. In this case, the function simply
-            multiplies each element by 2.
-<li>You could accomplish the same thing with a list comprehension. List comprehensions were first introduced in Python 2.0; <code>map</code> has been around forever.
-<li>You could, if you insist on thinking like a Visual Basic programmer, use a <code>for</code> loop to accomplish the same thing.
-<div class=example><h3>Example 16.11. <code>map</code> with lists of mixed datatypes</h3><pre class=screen>
-<samp class=p>>>> </samp><kbd>li = [5, 'a', (2, 'b')]</kbd>
-<samp class=p>>>> </samp><kbd>map(double, li)</kbd>     <span>&#x2460;</span>
-[10, 'aa', (2, 'b', 2, 'b')]</pre>
-<ol>
-<li>As a side note, I'd like to point out that <code>map</code> works just as well with lists of mixed datatypes, as long as the function you're using correctly handles each type. In this
-            case, the <code>double</code> function simply multiplies the given argument by 2, and Python Does The Right Thing depending on the datatype of the argument. For integers, this means actually multiplying it by 2; for
-            strings, it means concatenating the string with itself; for tuples, it means making a new tuple that has all of the elements
-            of the original, then all of the elements of the original again.
-<p>All right, enough play time. Let's look at some real code.
-<div class=example><h3>Example 16.12. <code>map</code> in <code>regression.py</code></h3><pre><code>
-    filenameToModuleName = lambda f: os.path.splitext(f)[0] <span>&#x2460;</span>
-    moduleNames = map(filenameToModuleName, files)          <span>&#x2461;</span></pre>
-<ol>
-<li>As you saw in <a href="#apihelper.lambda" title="4.7. Using lambda Functions">Section 4.7, &#8220;Using lambda Functions&#8221;</a>, <code>lambda</code> defines an inline function. And as you saw in <a href="#splittingpathnames.example" title="Example 6.17. Splitting Pathnames">Example 6.17, &#8220;Splitting Pathnames&#8221;</a>, <code>os.path.splitext</code> takes a filename and returns a tuple <code>(<var>name</var>, <var>extension</var>)</code>. So <code>filenameToModuleName</code> is a function which will take a filename and strip off the file extension, and return just the name.
-<li>Calling <code>map</code> takes each filename listed in <var>files</var>, passes it to the function <code>filenameToModuleName</code>, and returns a list of the return values of each of those function calls. In other words, you strip the file extension off
-            of each filename, and store the list of all those stripped filenames in <var>moduleNames</var>.
-<p>As you'll see in the rest of the chapter, you can extend this type of data-centric thinking all the way to the final goal,
-which is to define and execute a single test suite that contains the tests from all of those individual test suites.
-<h2 id="regression.datacentric">16.5. Data-centric programming</h2>
-<p>By now you're probably scratching your head wondering why this is better than using <code>for</code> loops and straight function calls. And that's a perfectly valid question. Mostly, it's a matter of perspective. Using
-<code>map</code> and <code>filter</code> forces you to center your thinking around your data.
-<p>In this case, you started with no data at all; the first thing you did was <a href="#regression.path" title="16.2. Finding the path">get the directory path</a> of the current script, and got a list of files in that directory. That was the bootstrap, and it gave you real data to work
-with: a list of filenames.
-<p>However, you knew you didn't care about all of those files, only the ones that were actually test suites. You had <em>too much data</em>, so you needed to <code>filter</code> it. How did you know which data to keep?  You needed a test to decide, so you defined one and passed it to the <code>filter</code> function. In this case you used a regular expression to decide, but the concept would be the same regardless of how you
-constructed the test.
-<p>Now you had the filenames of each of the test suites (and only the test suites, since everything else had been filtered out),
-but you really wanted module names instead. You had the right amount of data, but it was <em>in the wrong format</em>. So you defined a function that would transform a single filename into a module name, and you mapped that function onto
-the entire list. From one filename, you can get a module name; from a list of filenames, you can get a list of module names.
-<p>Instead of <code>filter</code>, you could have used a <code>for</code> loop with an <code>if</code> statement. Instead of <code>map</code>, you could have used a <code>for</code> loop with a function call. But using <code>for</code> loops like that is busywork. At best, it simply wastes time; at worst, it introduces obscure bugs. For instance, you need
-to figure out how to test for the condition &#8220;is this file a test suite?&#8221; anyway; that's the application-specific logic, and no language can write that for us. But once you've figured that out,
-do you really want go to all the trouble of defining a new empty list and writing a <code>for</code> loop and an <code>if</code> statement and manually calling <code>append</code> to add each element to the new list if it passes the condition and then keeping track of which variable holds the new filtered
-data and which one holds the old unfiltered data?  Why not just define the test condition, then let Python do the rest of that work for us?
-<p>Oh sure, you could try to be fancy and delete elements in place without creating a new list. But you've been burned by that
-before. Trying to modify a data structure that you're looping through can be tricky. You delete an element, then loop to
-the next element, and suddenly you've skipped one. Is Python one of the languages that works that way?  How long would it take you to figure it out?  Would you remember for certain whether
-it was safe the next time you tried?  Programmers spend so much time and make so many mistakes dealing with purely technical
-issues like this, and it's all pointless. It doesn't advance your program at all; it's just busywork.
-<p>I resisted list comprehensions when I first learned Python, and I resisted <code>filter</code> and <code>map</code> even longer. I insisted on making my life more difficult, sticking to the familiar way of <code>for</code> loops and <code>if</code> statements and step-by-step code-centric programming. And my Python programs looked a lot like Visual Basic programs, detailing every step of every operation in every function. And they had all the same types of little problems
-and obscure bugs. And it was all pointless.
-<p>Let it all go. Busywork code is not important. Data is important. And data is not difficult. It's only data. If you have
-too much, filter it. If it's not what you want, map it. Focus on the data; leave the busywork behind.
+
+
+
+
+
+[more functional programming stuff was here]
+
+
+
+
+
 <h2 id="regression.import">16.6. Dynamically importing modules</h2>
 <p>OK, enough philosophizing. Let's talk about dynamically importing modules.
 <p>First, let's look at how you normally import modules. The <code>import <var>module</var></code> syntax looks in the search path for the named module and imports it by name. You can even import multiple modules at once
@@ -5924,6 +3449,16 @@ if __name__ == "__main__":
 <p><sup>[<a name="ftn.d0e35697" href="#d0e35697">7</a>] </sup>Technically, the second argument to <code>filter</code> can be any sequence, including lists, tuples, and custom classes that act like lists by defining the <code>__getitem__</code> special method. If possible, <code>filter</code> will return the same datatype as you give it, so filtering a list returns a list, but filtering a tuple returns a tuple.
 <div class=footnote>
 <p><sup>[<a name="ftn.d0e36079" href="#d0e36079">8</a>] </sup>Again, I should point out that <code>map</code> can take a list, a tuple, or any object that acts like a sequence. See previous footnote about <code>filter</code>.
+
+
+
+
+
+
+
+
+
+
 <div class=chapter>
 <h2 id="soundex">Chapter 18. Performance Tuning</h2>
 <p>Performance tuning is a many-splendored thing. Just because Python is an interpreted language doesn't mean you shouldn't worry about code optimization. But don't worry about it <em>too</em> much.
diff --git a/examples/beauregard-100x100.jpg b/examples/beauregard-100x100.jpg
new file mode 100644
index 0000000..5f004a5
Binary files /dev/null and b/examples/beauregard-100x100.jpg differ
diff --git a/files.html b/files.html
index 6696971..ed89858 100644
--- a/files.html
+++ b/files.html
@@ -26,6 +26,399 @@ body{counter-reset:h1 12}
 OK, so a string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a &#8220;text file&#8221; from disk, how does Python convert that sequence of bytes into a sequence of characters? The answer is that it decodes the bytes according to a specific character encoding algorithm, and returns a sequence of Unicode characters, otherwise known as a string.
 -->
 
+<h2 id=file-objects>File Objects</h2>
+
+<p>Python has a built-in function, <code>open()</code>, for opening a file on disk. The <code>open()</code> function returns a <i>file object</i>, which has methods and attributes for getting information about and manipulating the file.
+
+<pre>
+>>> image = open('examples/beauregard-100x100.jpg', 'rb')
+>>> image
+&lt;io.BufferedReader object at 0x00C7A390>
+>>> image.mode
+'rb'
+>>> image.name
+'examples/beauregard-100x100.jpg'
+>>>
+<pre class=screen><samp class=p>>>> </samp><kbd>f = open("/music/_singles/kairo.mp3", "rb")</kbd> <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>f</kbd>       <span>&#x2461;</span>
+&lt;open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
+<samp class=p>>>> </samp><kbd>f.mode</kbd>  <span>&#x2462;</span>
+'rb'
+<samp class=p>>>> </samp><kbd>f.name</kbd>  <span>&#x2463;</span>
+'/music/_singles/kairo.mp3'</pre>
+<ol>
+<li>The <code>open</code> method can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename, is required; the other two are <a href="#apihelper.optional" title="4.2. Using Optional and Named Arguments">optional</a>. If not specified, the file is opened for reading in text mode. Here you are opening the file for reading in binary mode.  (<code>print open.__doc__</code> displays a great explanation of all the possible modes.)
+<li>The <code>open</code> function returns an object (by now, <a href="#odbchelper.objects" title="2.4. Everything Is an Object">this should not surprise you</a>). A file object has several useful attributes.
+<li>The <var>mode</var> attribute of a file object tells you in which mode the file was opened.
+<li>The <var>name</var> attribute of a file object tells you the name of the file that the file object has open.
+<h3>6.2.1. Reading Files</h3>
+<p>After you open a file, the first thing you'll want to do is read from it, as shown in the next example.
+<div class=example><h3>Example 6.4. Reading a File</h3><pre class=screen>
+
+<pre>
+>>> image
+&lt;io.BufferedReader object at 0x00C7A390>
+>>> image.tell()
+0
+>>> data = image.read(3)
+>>> data
+b'\xff\xd8\xff'
+>>> image.tell()
+3
+>>> image.seek(0)
+0
+>>> data = image.read()
+>>> len(data)
+3150
+
+<samp class=p>>>> </samp><kbd>f</kbd>
+&lt;open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
+<samp class=p>>>> </samp><kbd>f.tell()</kbd>              <span>&#x2460;</span>
+0
+<samp class=p>>>> </samp><kbd>f.seek(-128, 2)</kbd>       <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>f.tell()</kbd>              <span>&#x2462;</span>
+7542909
+<samp class=p>>>> </samp><kbd>tagData = f.read(128)</kbd> <span>&#x2463;</span>
+<samp class=p>>>> </samp><kbd>tagData</kbd>
+<samp>'TAGKAIRO****THE BEST GOA         ***DJ MARY-JANE***            
+Rave Mix    2000http://mp3.com/DJMARYJANE     \037'</samp>
+<samp class=p>>>> </samp><kbd>f.tell()</kbd>              <span>&#x2464;</span>
+7543037</pre>
+<ol>
+<li>A file object maintains state about the file it has open. The <code>tell</code> method of a file object tells you your current position in the open file. Since you haven't done anything with this file    yet, the current position is <code>0</code>, which is the beginning of the file.
+<li>The <code>seek</code> method of a file object moves to another position in the open file. The second parameter specifies what the first one means;
+<code>0</code> means move to an absolute position (counting from the start of the file), <code>1</code> means move to a relative position (counting from the current position), and <code>2</code> means move to a position relative to the end of the file. Since the <abbr>MP3</abbr> tags you're looking for are stored at the end of the file, you use <code>2</code> and tell the file object to move to a position <code>128</code> bytes from the end of the file.
+<li>The <code>tell</code> method confirms that the current file position has moved.
+<li>The <code>read</code> method reads a specified number of bytes from the open file and returns a string with the data that was read. The optional    parameter specifies the maximum number of bytes to read. If no parameter is specified, <code>read</code> will read until the end of the file. (You could have simply said <code>read()</code> here, since you know exactly where you are in the file and you are, in fact, reading the last 128 bytes.)  The read data    is assigned to the <var>tagData</var> variable, and the current position is updated based on how many bytes were read.
+<li>The <code>tell</code> method confirms that the current position has moved. If you do the math, you'll see that after reading 128 bytes, the position    has been incremented by 128.
+<h3>6.2.2. Closing Files</h3>
+<p>Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It's
+   important to close files as soon as you're finished with them.
+<div class=example><h3>Example 6.5. Closing a File</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>f</kbd>
+&lt;open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
+<samp class=p>>>> </samp><kbd>f.closed</kbd>       <span>&#x2460;</span>
+False
+<samp class=p>>>> </samp><kbd>f.close()</kbd>      <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>f</kbd>
+&lt;closed file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
+<samp class=p>>>> </samp><kbd>f.closed</kbd>       <span>&#x2462;</span>
+True
+<samp class=p>>>> </samp><kbd>f.seek(0)</kbd>      <span>&#x2463;</span>
+<samp class=traceback>Traceback (innermost last):
+  File "&lt;interactive input>", line 1, in ?
+ValueError: I/O operation on closed file</samp>
+<samp class=p>>>> </samp><kbd>f.tell()</kbd>
+<samp class=traceback>Traceback (innermost last):
+  File "&lt;interactive input>", line 1, in ?
+ValueError: I/O operation on closed file</samp>
+<samp class=p>>>> </samp><kbd>f.read()</kbd>
+<samp class=traceback>Traceback (innermost last):
+  File "&lt;interactive input>", line 1, in ?
+ValueError: I/O operation on closed file</samp>
+<samp class=p>>>> </samp><kbd>f.close()</kbd>      <span>&#x2464;</span></pre>
+<ol>
+<li>The <var>closed</var> attribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (<var>closed</var> is <code>False</code>).
+<li>To close a file, call the <code>close</code> method of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any)    that the system hadn't gotten around to actually writing yet, and releases the system resources.
+<li>The <var>closed</var> attribute confirms that the file is closed.
+<li>Just because a file is closed doesn't mean that the file object ceases to exist. The variable <var>f</var> will continue to exist until it <a href="#fileinfo.scope" title="Example 5.8. Trying to Implement a Memory Leak">goes out of scope</a> or gets manually deleted. However, none of the methods that manipulate an open file will work once the file has been closed;    they all raise an exception.
+<li>Calling <code>close</code> on a file object whose file is already closed does <em>not</em> raise an exception; it fails silently.
+<h3>6.2.3. Handling <abbr>I/O</abbr> Errors</h3>
+<p>Now you've seen enough to understand the file handling code in the <code>fileinfo.py</code> sample code from teh previous chapter. This example shows how to safely open and read from a file and gracefully handle
+   errors.
+<div class=example><h3 id="fileinfo.files.incode">Example 6.6. File Objects in <code>MP3FileInfo</code></h3><pre><code>
+        try:              <span>&#x2460;</span> fsock = open(filename, "rb", 0) <span>&#x2461;</span> try:              fsock.seek(-128, 2)         <span>&#x2462;</span>     tagdata = fsock.read(128)   <span>&#x2463;</span> finally:      <span>&#x2464;</span>     fsock.close()               . . .
+        except IOError:   <span>&#x2465;</span> pass         </pre>
+<ol>
+<li>Because opening and reading files is risky and may raise an exception, all of this code is wrapped in a <code>try...except</code> block. (Hey, isn't <a href="#odbchelper.indenting" title="2.5. Indenting Code">standardized indentation</a> great?  This is where you start to appreciate it.)
+<li>The <code>open</code> function may raise an <code>IOError</code>. (Maybe the file doesn't exist.)
+<li>The <code>seek</code> method may raise an <code>IOError</code>. (Maybe the file is smaller than 128 bytes.)
+<li>The <code>read</code> method may raise an <code>IOError</code>. (Maybe the disk has a bad sector, or it's on a network drive and the network just went down.)
+<li>This is new: a <code>try...finally</code> block. Once the file has been opened successfully by the <code>open</code> function, you want to make absolutely sure that you close it, even if an exception is raised by the <code>seek</code> or <code>read</code> methods. That's what a <code>try...finally</code> block is for: code in the <code>finally</code> block will <em>always</em> be executed, even if something in the <code>try</code> block raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.
+<li>At last, you handle your <code>IOError</code> exception. This could be the <code>IOError</code> exception raised by the call to <code>open</code>, <code>seek</code>, or <code>read</code>. Here, you really don't care, because all you're going to do is ignore it silently and continue. (Remember, <code>pass</code> is a Python statement that <a href="#fileinfo.class.simplest" title="Example 5.3. The Simplest Python Class">does nothing</a>.)  That's perfectly legal; &#8220;handling&#8221; an exception can mean explicitly doing nothing. It still counts as handled, and processing will continue normally on the    next line of code after the <code>try...except</code> block.
+<h3>6.2.4. Writing to Files</h3>
+<p>As you would expect, you can also write to files in much the same way that you read from them. There are two basic file modes:
+<div class=itemizedlist>
+<ul>
+<li>"Append" mode will add data to the end of the file.
+<li>"write" mode will overwrite the file.
+</ul>
+<p>Either mode will create the file automatically if it doesn't already exist, so there's never a need for any sort of fiddly
+   "if the log file doesn't exist yet, create a new empty file just so you can open it for the first time" logic. Just open
+   it and start writing.
+<div class=example><h3 id="fileinfo.files.writeandappend">Example 6.7. Writing to Files</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>logfile = open('test.log', 'w')</kbd> <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>logfile.write('test succeeded')</kbd> <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>logfile.close()</kbd>
+<samp class=p>>>> </samp><kbd>print file('test.log').read()</kbd>   <span>&#x2462;</span>
+test succeeded
+<samp class=p>>>> </samp><kbd>logfile = open('test.log', 'a')</kbd> <span>&#x2463;</span>
+<samp class=p>>>> </samp><kbd>logfile.write('line 2')</kbd>
+<samp class=p>>>> </samp><kbd>logfile.close()</kbd>
+<samp class=p>>>> </samp><kbd>print file('test.log').read()</kbd>   <span>&#x2464;</span>
+test succeededline 2
+</pre>
+<ol>
+<li>You start boldly by creating either the new file <code>test.log</code> or overwrites the existing file, and opening the file for writing. (The second parameter <code>"w"</code> means open the file for writing.)  Yes, that's all as dangerous as it sounds. I hope you didn't care about the previous    contents of that file, because it's gone now.
+<li>You can add data to the newly opened file with the <code>write</code> method of the file object returned by <code>open</code>.
+<li><code>file</code> is a synonym for <code>open</code>. This one-liner opens the file, reads its contents, and prints them.
+<li>You happen to know that <code>test.log</code> exists (since you just finished writing to it), so you can open it and append to it. (The <code>"a"</code> parameter means open the file for appending.)  Actually you could do this even if the file didn't exist, because opening    the file for appending will create the file if necessary. But appending will <em>never</em> harm the existing contents of the file.
+<li>As you can see, both the original line you wrote and the second line you appended are now in <code>test.log</code>. Also note that carriage returns are not included. Since you didn't write them explicitly to the file either time, the    file doesn't include them. You can write a carriage return with the <code>"\n"</code> character. Since you didn't do this, everything you wrote to the file ended up smooshed together on the same line.
+<div class=itemizedlist>
+<h3>Further Reading on File Handling</h3>
+<ul>
+<li><a href="http://www.python.org/doc/current/tut/tut.html"><i class=citetitle>Python Tutorial</i></a> discusses reading and writing files, including how to <a href="http://www.python.org/doc/current/tut/node9.html#SECTION009210000000000000000">read a file one line at a time into a list</a>.
+
+<li><a href="http://www.effbot.org/guides/">eff-bot</a> discusses efficiency and performance of <a href="http://www.effbot.org/guides/readline-performance.htm">various ways of reading a file</a>.
+
+<li><a href="http://www.faqts.com/knowledge-base/index.phtml/fid/199/">Python Knowledge Base</a> answers <a href="http://www.faqts.com/knowledge-base/index.phtml/fid/552">common questions about files</a>.
+
+<li><a href="http://www.python.org/doc/current/lib/"><i class=citetitle>Python Library Reference</i></a> summarizes <a href="http://www.python.org/doc/current/lib/bltin-file-objects.html">all the file object methods</a>.
+
+</ul>
+
+
+
+
+
+<h2 id="kgp.openanything">10.1. Abstracting input sources</h2>
+<p>One of Python's greatest strengths is its dynamic binding, and one powerful use of dynamic binding is the <em>file-like object</em>.
+<p>Many functions which require an input source could simply take a filename, go open the file for reading, read it, and close
+it when they're done. But they don't. Instead, they take a <em>file-like object</em>.
+<p>In the simplest case, a <em>file-like object</em> is any object with a <code>read</code> method with an optional <var>size</var> parameter, which returns a string. When called with no <var>size</var> parameter, it reads everything there is to read from the input source and returns all the data as a single string. When
+called with a <var>size</var> parameter, it reads that much from the input source and returns that much data; when called again, it picks up where it left
+off and returns the next chunk of data.
+<p>This is how <a href="#fileinfo.files" title="6.2. Working with File Objects">reading from real files</a> works; the difference is that you're not limiting yourself to real files. The input source could be anything: a file on
+disk, a web page, even a hard-coded string. As long as you pass a file-like object to the function, and the function simply
+calls the object's <code>read</code> method, the function can handle any kind of input source without specific code to handle each kind.
+<p>In case you were wondering how this relates to <abbr>XML</abbr> processing, <code>minidom.parse</code> is one such function which can take a file-like object.
+<div class=example><h3>Example 10.1. Parsing <abbr>XML</abbr> from a file</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>from xml.dom import minidom</kbd>
+<samp class=p>>>> </samp><kbd>fsock = open('binary.xml')</kbd>    <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse(fsock)</kbd> <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>fsock.close()</kbd>                 <span>&#x2462;</span>
+<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>          <span>&#x2463;</span>
+<samp>&lt;?xml version="1.0" ?>
+&lt;grammar>
+&lt;ref id="bit">
+  &lt;p>0&lt;/p>
+  &lt;p>1&lt;/p>
+&lt;/ref>
+&lt;ref id="byte">
+  &lt;p>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>\
+&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;xref id="bit"/>&lt;/p>
+&lt;/ref>
+&lt;/grammar></span></pre>
+<ol>
+<li>First, you open the file on disk. This gives you a <a href="#fileinfo.files" title="6.2. Working with File Objects">file object</a>.
+<li>You pass the file object to <code>minidom.parse</code>, which calls the <code>read</code> method of <var>fsock</var> and reads the <abbr>XML</abbr> document from the file on disk.
+<li>Be sure to call the <code>close</code> method of the file object after you're done with it. <code>minidom.parse</code> will not do this for you.
+<li>Calling the <code>toxml()</code> method on the returned <abbr>XML</abbr> document prints out the entire thing.
+<p>Well, that all seems like a colossal waste of time. After all, you've already seen that <code>minidom.parse</code> can simply take the filename and do all the opening and closing nonsense automatically. And it's true that if you know you're
+just going to be parsing a local file, you can pass the filename and <code>minidom.parse</code> is smart enough to Do The Right Thing&#8482;. But notice how similar -- and easy -- it is to parse an <abbr>XML</abbr> document straight from the Internet.
+<div class=example><h3 id="kgp.openanything.urllib">Example 10.2. Parsing <abbr>XML</abbr> from a <abbr>URL</abbr></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import urllib</kbd>
+<samp class=p>>>> </samp><kbd>usock = urllib.urlopen('http://slashdot.org/slashdot.rdf')</kbd> <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse(usock)</kbd>            <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>usock.close()</kbd>          <span>&#x2462;</span>
+<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>   <span>&#x2463;</span>
+<samp>&lt;?xml version="1.0" ?>
+&lt;rdf:RDF xmlns="http://my.netscape.com/rdf/simple/0.9/"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+
+&lt;channel>
+&lt;title>Slashdot&lt;/title>
+&lt;link>http://slashdot.org/&lt;/link>
+&lt;description>News for nerds, stuff that matters&lt;/description>
+&lt;/channel>
+
+&lt;image>
+&lt;title>Slashdot&lt;/title>
+&lt;url>http://images.slashdot.org/topics/topicslashdot.gif&lt;/url>
+&lt;link>http://slashdot.org/&lt;/link>
+&lt;/image>
+
+&lt;item>
+&lt;title>To HDTV or Not to HDTV?&lt;/title>
+&lt;link>http://slashdot.org/article.pl?sid=01/12/28/0421241&lt;/link>
+&lt;/item>
+
+[...snip...]</span></pre>
+<ol>
+<li>As you saw <a href="#dialect.extract.urllib" title="Example 8.5. Introducing urllib">in a previous chapter</a>, <code>urlopen</code> takes a web page <abbr>URL</abbr> and returns a file-like object. Most importantly, this object has a <code>read</code> method which returns the <abbr>HTML</abbr> source of the web page.
+<li>Now you pass the file-like object to <code>minidom.parse</code>, which obediently calls the <code>read</code> method of the object and parses the <abbr>XML</abbr> data that the <code>read</code> method returns. The fact that this <abbr>XML</abbr> data is now coming straight from a web page is completely irrelevant. <code>minidom.parse</code> doesn't know about web pages, and it doesn't care about web pages; it just knows about file-like objects.
+<li>As soon as you're done with it, be sure to close the file-like object that <code>urlopen</code> gives you.
+<li>By the way, this <abbr>URL</abbr> is real, and it really is <abbr>XML</abbr>. It's an <abbr>XML</abbr> representation of the current headlines on <a href="http://slashdot.org/">Slashdot</a>, a technical news and gossip site.
+<div class=example><h3>Example 10.3. Parsing <abbr>XML</abbr> from a string (the easy but inflexible way)</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
+<samp class=p>>>> </samp><kbd>xmldoc = minidom.parseString(contents)</kbd> <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>
+<samp>&lt;?xml version="1.0" ?>
+&lt;grammar>&lt;ref id="bit">&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar></span></pre>
+<ol>
+<li><code>minidom</code> has a method, <code>parseString</code>, which takes an entire <abbr>XML</abbr> document as a string and parses it. You can use this instead of <code>minidom.parse</code> if you know you already have your entire <abbr>XML</abbr> document in a string.
+<p>OK, so you can use the <code>minidom.parse</code> function for parsing both local files and remote <abbr>URL</abbr>s, but for parsing strings, you use... a different function. That means that if you want to be able to take input from a
+file, a <abbr>URL</abbr>, or a string, you'll need special logic to check whether it's a string, and call the <code>parseString</code> function instead. How unsatisfying.
+<p>If there were a way to turn a string into a file-like object, then you could simply pass this object to <code>minidom.parse</code>. And in fact, there is a module specifically designed for doing just that: <code>StringIO</code>.
+<div class=example><h3 id="kgp.openanything.stringio.example">Example 10.4. Introducing <code>StringIO</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
+<samp class=p>>>> </samp><kbd>import StringIO</kbd>
+<samp class=p>>>> </samp><kbd>ssock = StringIO.StringIO(contents)</kbd>   <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>ssock.read()</kbd>        <span>&#x2461;</span>
+"&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"
+<samp class=p>>>> </samp><kbd>ssock.read()</kbd>        <span>&#x2462;</span>
+''
+<samp class=p>>>> </samp><kbd>ssock.seek(0)</kbd>       <span>&#x2463;</span>
+<samp class=p>>>> </samp><kbd>ssock.read(15)</kbd>      <span>&#x2464;</span>
+'&lt;grammar>&lt;ref i'
+<samp class=p>>>> </samp><kbd>ssock.read(15)</kbd>
+"d='bit'>&lt;p>0&lt;/p"
+<samp class=p>>>> </samp><kbd>ssock.read()</kbd>
+'>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>'
+<samp class=p>>>> </samp><kbd>ssock.close()</kbd>       <span>&#x2465;</span></pre>
+<ol>
+<li>The <code>StringIO</code> module contains a single class, also called <code>StringIO</code>, which allows you to turn a string into a file-like object. The <code>StringIO</code> class takes the string as a parameter when creating an instance.
+<li>Now you have a file-like object, and you can do all sorts of file-like things with it. Like <code>read</code>, which returns the original string.
+<li>Calling <code>read</code> again returns an empty string. This is how real file objects work too; once you read the entire file, you can't read any more without explicitly seeking to the beginning of the file. The <code>StringIO</code> object works the same way.
+<li>You can explicitly seek to the beginning of the string, just like seeking through a file, by using the <code>seek</code> method of the <code>StringIO</code> object.
+<li>You can also read the string in chunks, by passing a <var>size</var> parameter to the <code>read</code> method.
+<li>At any time, <code>read</code> will return the rest of the string that you haven't read yet. All of this is exactly how file objects work; hence the term
+<em>file-like object</em>.
+<div class=example><h3>Example 10.5. Parsing <abbr>XML</abbr> from a string (the file-like object way)</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
+<samp class=p>>>> </samp><kbd>ssock = StringIO.StringIO(contents)</kbd>
+<samp class=p>>>> </samp><kbd>xmldoc = minidom.parse(ssock)</kbd> <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>ssock.close()</kbd>
+<samp class=p>>>> </samp><kbd>print xmldoc.toxml()</kbd>
+<samp>&lt;?xml version="1.0" ?>
+&lt;grammar>&lt;ref id="bit">&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar></span></pre>
+<ol>
+<li>Now you can pass the file-like object (really a <code>StringIO</code>) to <code>minidom.parse</code>, which will call the object's <code>read</code> method and happily parse away, never knowing that its input came from a hard-coded string.
+<p>So now you know how to use a single function, <code>minidom.parse</code>, to parse an <abbr>XML</abbr> document stored on a web page, in a local file, or in a hard-coded string. For a web page, you use <code>urlopen</code> to get a file-like object; for a local file, you use <code>open</code>; and for a string, you use <code>StringIO</code>. Now let's take it one step further and generalize <em>these</em> differences as well.
+<div class=example><h3 id="kgp.openanything.example">Example 10.6. <code>openAnything</code></h3><pre><code>
+def openAnything(source):<span>&#x2460;</span>
+    # try to open with urllib (if source is http, ftp, or file URL)
+    import urllib       
+    try:                
+        return urllib.urlopen(source)      <span>&#x2461;</span>
+    except (IOError, OSError):            
+        pass            
+
+    # try to open with native open function (if source is pathname)
+    try:                
+        return open(source)                <span>&#x2462;</span>
+    except (IOError, OSError):            
+        pass            
+
+    # treat source as string
+    import StringIO     
+    return StringIO.StringIO(str(source))  <span>&#x2463;</span></pre>
+<ol>
+<li>The <code>openAnything</code> function takes a single parameter, <var>source</var>, and returns a file-like object. <var>source</var> is a string of some sort; it can either be a <abbr>URL</abbr> (like <code>'http://slashdot.org/slashdot.rdf'</code>), a full or partial pathname to a local file (like <code>'binary.xml'</code>), or a string that contains actual <abbr>XML</abbr> data to be parsed.
+<li>First, you see if <var>source</var> is a <abbr>URL</abbr>. You do this through brute force: you try to open it as a <abbr>URL</abbr> and silently ignore errors caused by trying to open something which is not a <abbr>URL</abbr>. This is actually elegant in the sense that, if <code>urllib</code> ever supports new types of <abbr>URL</abbr>s in the future, you will also support them without recoding. If <code>urllib</code> is able to open <var>source</var>, then the <code>return</code> kicks you out of the function immediately and the following <code>try</code> statements never execute.
+<li>On the other hand, if <code>urllib</code> yelled at you and told you that <var>source</var> wasn't a valid <abbr>URL</abbr>, you assume it's a path to a file on disk and try to open it. Again, you don't do anything fancy to check whether <var>source</var> is a valid filename or not (the rules for valid filenames vary wildly between different platforms anyway, so you'd probably get them wrong anyway). Instead, you just blindly open the file, and silently trap any errors.
+<li>By this point, you need to assume that <var>source</var> is a string that has hard-coded data in it (since nothing else worked), so you use <code>StringIO</code> to create a file-like object out of it and return that. (In fact, since you're using the <code>str</code> function, <var>source</var> doesn't even need to be a string; it could be any object, and you'll use its string representation, as defined by its <code>__str__</code> <a href="#fileinfo.morespecial" title="5.7. Advanced Special Class Methods">special method</a>.)
+<p>Now you can use this <code>openAnything</code> function in conjunction with <code>minidom.parse</code> to make a function that takes a <var>source</var> that refers to an <abbr>XML</abbr> document somehow (either as a <abbr>URL</abbr>, or a local filename, or a hard-coded <abbr>XML</abbr> document in a string) and parses it.
+<div class=example><h3>Example 10.7. Using <code>openAnything</code> in <code>kgp.py</code></h3><pre><code>
+class KantGenerator:
+    def _load(self, source):
+        sock = toolbox.openAnything(source)
+        xmldoc = minidom.parse(sock).documentElement
+        sock.close()
+        return xmldoc</pre><h2 id="kgp.stdio">10.2. Standard input, output, and error</h2>
+<p><abbr>UNIX</abbr> users are already familiar with the concept of standard input, standard output, and standard error. This section is for
+   the rest of you.
+<p>Standard output and standard error (commonly abbreviated <code>stdout</code> and <code>stderr</code>) are pipes that are built into every <abbr>UNIX</abbr> system. When you <code>print</code> something, it goes to the <code>stdout</code> pipe; when your program crashes and prints out debugging information (like a traceback in Python), it goes to the <code>stderr</code> pipe. Both of these pipes are ordinarily just connected to the terminal window where you are working, so when a program
+prints, you see the output, and when a program crashes, you see the debugging information. (If you're working on a system
+with a window-based Python <abbr>IDE</abbr>, <code>stdout</code> and <code>stderr</code> default to your &#8220;Interactive Window&#8221;.)
+<div class=example><h3>Example 10.8. Introducing <code>stdout</code> and <code>stderr</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
+<samp class=p>...    </samp>print 'Dive in'             <span>&#x2460;</span>
+<samp>Dive in
+Dive in
+Dive in</samp>
+<samp class=p>>>> </samp><kbd>import sys</kbd>
+<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
+<samp class=p>...    </samp>sys.stdout.write('Dive in') <span>&#x2461;</span>
+Dive inDive inDive in
+<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
+<samp class=p>...    </samp>sys.stderr.write('Dive in') <span>&#x2462;</span>
+Dive inDive inDive in</pre>
+<ol>
+<li>As you saw in <a href="#fileinfo.for.counter" title="Example 6.9. Simple Counters">Example 6.9, &#8220;Simple Counters&#8221;</a>, you can use Python's built-in <code>range</code> function to build simple counter loops that repeat something a set number of times.
+<li><code>stdout</code> is a file-like object; calling its <code>write</code> function will print out whatever string you give it. In fact, this is what the <code>print</code> function really does; it adds a carriage return to the end of the string you're printing, and calls <code>sys.stdout.write</code>.
+<li>In the simplest case, <code>stdout</code> and <code>stderr</code> send their output to the same place: the Python <abbr>IDE</abbr> (if you're in one), or the terminal (if you're running Python from the command line). Like <code>stdout</code>, <code>stderr</code> does not add carriage returns for you; if you want them, add them yourself.
+<p><code>stdout</code> and <code>stderr</code> are both file-like objects, like the ones you discussed in <a href="#kgp.openanything" title="10.1. Abstracting input sources">Section 10.1, &#8220;Abstracting input sources&#8221;</a>, but they are both write-only. They have no <code>read</code> method, only <code>write</code>. Still, they are file-like objects, and you can assign any other file- or file-like object to them to redirect their output.
+<div class=example><h3>Example 10.9. Redirecting output</h3><pre class=screen>
+<samp class=p>[you@localhost kgp]$ </samp>python stdout.py
+Dive in
+<samp class=p>[you@localhost kgp]$ </samp>cat out.log
+This message will be logged instead of displayed</pre><p>(On Windows, you can use <code>type</code> instead of <code>cat</code> to display the contents of a file.)
+<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
+<pre><code>
+#stdout.py
+import sys
+
+print 'Dive in'      <span>&#x2460;</span>
+saveout = sys.stdout <span>&#x2461;</span>
+fsock = open('out.log', 'w')           <span>&#x2462;</span>
+sys.stdout = fsock   <span>&#x2463;</span>
+print 'This message will be logged instead of displayed' <span>&#x2464;</span>
+sys.stdout = saveout <span>&#x2465;</span>
+fsock.close()        <span>&#x2466;</span>
+</pre>
+<ol>
+<li>This will print to the <abbr>IDE</abbr> &#8220;Interactive Window&#8221; (or the terminal, if running the script from the command line).
+<li>Always save <code>stdout</code> before redirecting it, so you can set it back to normal later.
+<li>Open a file for writing. If the file doesn't exist, it will be created. If the file does exist, it will be overwritten.
+<li>Redirect all further output to the new file you just opened.
+<li>This will be &#8220;printed&#8221; to the log file only; it will not be visible in the <abbr>IDE</abbr> window or on the screen.
+<li>Set <code>stdout</code> back to the way it was before you mucked with it.
+<li>Close the log file.
+<p>Redirecting <code>stderr</code> works exactly the same way, using <code>sys.stderr</code> instead of <code>sys.stdout</code>.
+<div class=example><h3>Example 10.10. Redirecting error information</h3><pre class=screen>
+<samp class=p>[you@localhost kgp]$ </samp>python stderr.py
+<samp class=p>[you@localhost kgp]$ </samp>cat error.log
+<samp>Traceback (most recent line last):
+  File "stderr.py", line 5, in ?
+    raise Exception, 'this error will be logged'
+Exception: this error will be logged</span></pre><p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
+<pre><code>
+#stderr.py
+import sys
+
+fsock = open('error.log', 'w')               <span>&#x2460;</span>
+sys.stderr = fsock         <span>&#x2461;</span>
+raise Exception, 'this error will be logged' <span>&#x2462;</span> <span>&#x2463;</span>
+</pre>
+<ol>
+<li>Open the log file where you want to store debugging information.
+<li>Redirect standard error by assigning the file object of the newly-opened log file to <code>stderr</code>.
+<li>Raise an exception. Note from the screen output that this does <em>not</em> print anything on screen. All the normal traceback information has been written to <code>error.log</code>.
+<li>Also note that you're not explicitly closing your log file, nor are you setting <code>stderr</code> back to its original value. This is fine, since once the program crashes (because of the exception), Python will clean up and close the file for us, and it doesn't make any difference that <code>stderr</code> is never restored, since, as I mentioned, the program crashes and Python ends. Restoring the original is more important for <code>stdout</code>, if you expect to go do other stuff within the same script afterwards.
+<p>Since it is so common to write error messages to standard error, there is a shorthand syntax that can be used instead of going
+through the hassle of redirecting it outright.
+<div class=example><h3 id="kgp.stdio.print.example">Example 10.11. Printing to <code>stderr</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>print 'entering function'</kbd>
+entering function
+<samp class=p>>>> </samp><kbd>import sys</kbd>
+<samp class=p>>>> </samp><kbd>print >> sys.stderr, 'entering function'</kbd> <span>&#x2460;</span>
+entering function
+</pre>
+<ol>
+<li>This shorthand syntax of the <code>print</code> statement can be used to write to any open file, or file-like object. In this case, you can redirect a single <code>print</code> statement to <code>stderr</code> without affecting subsequent <code>print</code> statements.
+<p>Standard input, on the other hand, is a read-only file object, and it represents the data flowing into the program from some
+previous program. This will likely not make much sense to classic Mac OS users, or even Windows users unless you were ever fluent on the <abbr>MS-DOS</abbr> command line. The way it works is that you can construct a chain of commands in a single line, so that one program's output
+becomes the input for the next program in the chain. The first program simply outputs to standard output (without doing any
+special redirecting itself, just doing normal <code>print</code> statements or whatever), and the next program reads from standard input, and the operating system takes care of connecting
+one program's output to the next program's input.
+
+
+
+
+
+
+
 <p class=v><a href=advanced-classes.html rel=prev title='back to &#8220;Advanced Classes&#8221;'><span class=u>&#x261C;</span></a> <a href=xml.html rel=next title='onward to &#8220;XML&#8221;'><span class=u>&#x261E;</span></a>
 
 <p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
diff --git a/http-web-services.html b/http-web-services.html
index ad17fd5..289693d 100644
--- a/http-web-services.html
+++ b/http-web-services.html
@@ -836,15 +836,31 @@ user-agent: Python-httplib2/$Rev: 259 $
 
 <h2 id=furtherreading>Further Reading</h2>
 
+<p><code>httplib2</code>:
+
 <ul>
-<li><a href=http://code.google.com/p/httplib2/><code>httplib2</code></a>
+<li><a href=http://code.google.com/p/httplib2/><code>httplib2</code> project page</a>
+<li><a href=http://code.google.com/p/httplib2/wiki/ExamplesPython3>More <code>httplib2</code> code examples</a>
 <li><a href=http://www.xml.com/pub/a/2006/02/01/doing-http-caching-right-introducing-httplib2.html>Doing <abbr>HTTP</abbr> Caching Right: Introducing <code>httplib2</code></a>
 <li><a href=http://www.xml.com/pub/a/2006/03/29/httplib2-http-persistence-and-authentication.html><code>httplib2</code>: <abbr>HTTP</abbr> Persistence and Authentication</a>
-<li><a href=http://apiwiki.twitter.com/>Twitter <abbr>API</abbr> reference</a>
+</ul>
+
+<p><abbr>HTTP</abbr> caching:
+
+<ul>
 <li><a href=http://www.mnot.net/cache_docs/><abbr>HTTP</abbr> Caching Tutorial</a> by Mark Nottingham
 <li><a href=http://code.google.com/p/doctype/wiki/ArticleHttpCaching>How to control caching with <abbr>HTTP</abbr> headers</a> on Google Doctype
 </ul>
 
+<p><abbr>RFC</abbr>s:
+
+<ul>
+<li><a href=http://www.ietf.org/rfc/rfc2616.txt>RFC 2616: <abbr>HTTP</abbr></a>
+<li><a href=http://www.ietf.org/rfc/rfc2617.txt>RFC 2617: <abbr>HTTP</abbr> Basic Authentication</a>
+<li><a href=http://www.ietf.org/rfc/rfc1951.txt>RFC 1951: deflate compression</a>
+<li><a href=http://www.ietf.org/rfc/rfc1952.txt>RFC 1952: gzip compression</a>
+</ul>
+
 <p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
 <p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
 <script src=j/jquery.js></script>