diff --git a/http-web-services.html b/http-web-services.html
new file mode 100644
index 0000000..17a4d48
--- /dev/null
+++ b/http-web-services.html
@@ -0,0 +1,842 @@
+<!DOCTYPE html>
+<head>
+<meta charset=utf-8>
+<title>HTTP Web Services - Dive into Python 3</title>
+<!--[if IE]><script src=html5.js></script><![endif]-->
+<link rel=stylesheet href=dip3.css>
+<style>
+body{counter-reset:h1 15}
+mark{display:inline}
+</style>
+<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
+<link rel=stylesheet media=print href=print.css>
+<meta name=viewport content='initial-scale=1.0'>
+</head>
+<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
+<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#http-web-services>Dive Into Python 3</a> <span>&#8227;</span>
+<p id=level>Difficulty level: <span title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
+<h1>HTTP Web Services</h1>
+<blockquote class=q>
+<p><span>&#x275D;</span> FIXME <span>&#x275E;</span><br>&mdash; FIXME
+</blockquote>
+<p id=toc>&nbsp;
+<h2 id=divingin>Diving In</h2>
+<p class=f>HTTP web services are programmatic ways of sending and receiving data from remote servers using the operations of <abbr>HTTP</abbr> directly. If you want to get data from the server, use a straight <abbr>HTTP</abbr> GET; if you want to send new data to the server, use <abbr>HTTP</abbr> POST. (Some more advanced <abbr>HTTP</abbr> web service APIs also define ways of modifying existing data and deleting data, using <abbr>HTTP</abbr> PUT and <abbr>HTTP</abbr> DELETE.)  In other words, the &#8220;verbs&#8221; built into the <abbr>HTTP</abbr> protocol (GET, POST, PUT, and DELETE) map directly to application-level operations for receiving, sending, modifying, and deleting data.
+
+<p>The main advantage of this approach is simplicity, and its simplicity has proven popular with a lot of different sites. Data -- usually XML data -- can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an <abbr>HTTP</abbr> library for downloading it. Debugging is also easier; because each &#8220;call&#8221; to the web service had a unique <abbr>URL</abbr>, you can load it in your web browser and immediately see the raw data.
+
+<p>Examples of <abbr>HTTP</abbr> web services:
+<ul>
+<li><a href=http://code.google.com/apis/gdata/>Google Data <abbr>API</abbr>s</a> allow you to interact with a wide variety of Google services, including <a href=http://www.blogger.com/>Blogger</a> and <a href=http://www.youtube.com/>YouTube</a>.
+<li><a href=http://www.flickr.com/services/api/>Flickr Services</a> allow you to upload and download photos from <a href=http://www.flickr.com/>Flickr</a>.
+<li><a href=http://apiwiki.twitter.com/>Twitter <abbr>API</abbr></a> allows you to publish status updates on <a href=http://twitter.com/>Twitter</a>.
+<li><a href=http://www.programmableweb.com/apis/directory/1?sort=mashups>&hellip;and many more</a>
+</ul>
+
+<p>Python 3 comes with two different libraries for interacting with <abbr>HTTP</abbr> web services:
+
+<ul>
+<li><a href=http://docs.python.org/3.0/library/http.client.html><code>http.client</code></a> is a low-level library that implements <a href=http://www.w3.org/Protocols/rfc2616/rfc2616.html>RFC 2616</a>, the <abbr>HTTP</abbr> protocol.
+<li><a href=http://docs.python.org/3.0/library/urllib.request.html><code>urllib.request</code></a> is an abstraction layer built on top of <code>http.client</code>. It provides a standard <abbr>API</abbr> for accessing both <abbr>HTTP</abbr> and <abbr>FTP</abbr> servers, automatically follows <abbr>HTTP</abbr> redirects, and handles some common forms of <abbr>HTTP</abbr> authentication.
+</ul>
+
+<p>Which one should you use? Neither of them. Instead, you should use <a href=http://code.google.com/p/httplib2/><code>httplib2</code></a>, an open source third-party library that implements <abbr>HTTP</abbr> more fully than <code>http.client</code> but provides a better abstraction that <code>urllib.request</code>.
+
+<p>To understand why <code>httplib2</code> is the right choice, you first need to understand <abbr>HTTP</abbr>.
+
+<p class=a>&#x2042;
+
+<h2 id=dont-try-this-at-home>How Not To Fetch Data Over HTTP</h2>
+<p>Let&#8217;s say you want to download a resource over HTTP, such as <a href=xml.html>an Atom feed</a>. But you don&#8217;t just want to download it once; you want to download it over and over again, every hour, to get the latest news from the site that&#8217;s offering the news feed. Let&#8217;s do it the quick-and-dirty way first, and then see how you can do better.
+<pre class=screen>
+<samp class=p>>>> </samp><kbd>import urllib.request</kbd>
+<samp class=p>>>> </samp><kbd>data = urllib.request.urlopen('http://diveintopython3.org/examples/feed.xml').read()</kbd>
+<samp class=p>>>> </samp><kbd>print(data)</kbd>
+<samp>&lt;?xml version="1.0" encoding="utf-8"?>
+&lt;feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
+  &lt;title>dive into mark&lt;/title>
+  &lt;subtitle>currently between addictions&lt;/subtitle>
+  &lt;id>tag:diveintomark.org,2001-07-29:/&lt;/id>
+  &lt;updated>2009-03-27T21:56:07Z&lt;/updated>
+  &lt;link rel="alternate" type="text/html" href="http://diveintomark.org/"/>
+  &lt;-- rest of feed omitted for brevity --></samp></pre>
+<ol>
+<li>Downloading anything over HTTP is incredibly easy in Python; in fact, it&#8217;s a one-liner. The <code>urllib.request</code> module has a handy <code>urlopen()</code> function that takes the address of the page you want, and returns a file-like object that you can just <code>read()</code> from to get the full contents of the page. It just can&#8217;t get any easier.
+</ol>
+
+<p>So what&#8217;s wrong with this?  Well, for a quick one-off during testing or development, there&#8217;s nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis -- and remember, you said you were planning on retrieving this syndicated feed once an hour -- then you&#8217;re being inefficient, and you&#8217;re being rude.
+
+<p>Let&#8217;s talk about some of the basic features of HTTP.
+
+<p class=a>&#x2042;
+
+<h2 id=http-features>Features of HTTP</h2>
+
+<p>FIXME
+<!--
+<p>There are five important features of HTTP which you should support.
+<h3>11.3.1. <code>User-Agent</code></h3>
+<p>The <code>User-Agent</code> is simply a way for a client to tell a server who it is when it requests a web page, a syndicated feed, or any sort of web
+   service over HTTP. When the client requests a resource, it should always announce who it is, as specifically as possible.
+   This allows the server-side administrator to get in touch with the client-side developer if anything is going fantastically
+   wrong.
+<p>By default, Python sends a generic <code>User-Agent</code>: <code>Python-urllib/1.15</code>. In the next section, you&#8217;ll see how to change this to something more specific.
+<h3>11.3.2. Redirects</h3>
+<p>Sometimes resources move around. Web sites get reorganized, pages move to new addresses. Even web services can reorganize.
+   A syndicated feed at <code>http://example.com/index.xml</code> might be moved to <code>http://example.com/xml/atom.xml</code>. Or an entire domain might move, as an organization expands and reorganizes; for instance, <code>http://www.example.com/index.xml</code> might be redirected to <code>http://server-farm-1.example.com/index.xml</code>.
+<p>Every time you request any kind of resource from an HTTP server, the server includes a status code in its response. Status
+   code <code>200</code> means &#8220;everything&#8217;s normal, here&#8217;s the page you asked for&#8221;. Status code <code>404</code> means &#8220;page not found&#8221;. (You&#8217;ve probably seen 404 errors while browsing the web.)
+<p>HTTP has two different ways of signifying that a resource has moved. Status code <code>302</code> is a <em>temporary redirect</em>; it means &#8220;oops, that got moved over here temporarily&#8221; (and then gives the temporary address in a <code>Location:</code> header). Status code <code>301</code> is a <em>permanent redirect</em>; it means &#8220;oops, that got moved permanently&#8221; (and then gives the new address in a <code>Location:</code> header). If you get a <code>302</code> status code and a new address, the HTTP specification says you should use the new address to get what you asked for, but
+   the next time you want to access the same resource, you should retry the old address. But if you get a <code>301</code> status code and a new address, you&#8217;re supposed to use the new address from then on.
+<p><code>urllib.urlopen</code> will automatically &#8220;follow&#8221; redirects when it receives the appropriate status code from the HTTP server, but unfortunately, it doesn&#8217;t tell you when
+   it does so. You&#8217;ll end up getting data you asked for, but you&#8217;ll never know that the underlying library &#8220;helpfully&#8221; followed a redirect for you. So you&#8217;ll continue pounding away at the old address, and each time you&#8217;ll get redirected to
+   the new address. That&#8217;s two round trips instead of one: not very efficient!  Later in this chapter, you&#8217;ll see how to work
+   around this so you can deal with permanent redirects properly and efficiently.
+<h3>11.3.3. <code>Last-Modified</code>/<code>If-Modified-Since</code></h3>
+<p>Some data changes all the time. The home page of CNN.com is constantly updating every few minutes. On the other hand, the
+   home page of Google.com only changes once every few weeks (when they put up a special holiday logo, or advertise a new service).
+   Web services are no different; usually the server knows when the data you requested last changed, and HTTP provides a way
+   for the server to include this last-modified date along with the data you requested.
+<p>If you ask for the same data a second time (or third, or fourth), you can tell the server the last-modified date that you
+   got last time: you send an <code>If-Modified-Since</code> header with your request, with the date you got back from the server last time. If the data hasn&#8217;t changed since then, the
+   server sends back a special HTTP status code <code>304</code>, which means &#8220;this data hasn&#8217;t changed since the last time you asked for it&#8221;. Why is this an improvement?  Because when the server sends a <code>304</code>, <em>it doesn&#8217;t re-send the data</em>. All you get is the status code. So you don&#8217;t need to download the same data over and over again if it hasn&#8217;t changed;
+   the server assumes you have the data cached locally.
+<p>All modern web browsers support last-modified date checking. If you&#8217;ve ever visited a page, re-visited the same page a day
+   later and found that it hadn&#8217;t changed, and wondered why it loaded so quickly the second time -- this could be why. Your
+   web browser cached the contents of the page locally the first time, and when you visited the second time, your browser automatically
+   sent the last-modified date it got from the server the first time. The server simply says <code>304: Not Modified</code>, so your browser knows to load the page from its cache. Web services can be this smart too.
+<p>Python&#8217;s URL library has no built-in support for last-modified date checking, but since you can add arbitrary headers to each request
+   and read arbitrary headers in each response, you can add support for it yourself.
+<h3>11.3.4. <code>ETag</code>/<code>If-None-Match</code></h3>
+<p>ETags are an alternate way to accomplish the same thing as the last-modified date checking: don&#8217;t re-download data that hasn&#8217;t
+   changed. The way it works is, the server sends some sort of hash of the data (in an <code>ETag</code> header) along with the data you requested. Exactly how this hash is determined is entirely up to the server. The second
+   time you request the same data, you include the ETag hash in an <code>If-None-Match:</code> header, and if the data hasn&#8217;t changed, the server will send you back a <code>304</code> status code. As with the last-modified date checking, the server <em>just</em> sends the <code>304</code>; it doesn&#8217;t send you the same data a second time. By including the ETag hash in your second request, you&#8217;re telling the
+   server that there&#8217;s no need to re-send the same data if it still matches this hash, since you still have the data from the
+   last time.
+<p>Python&#8217;s URL library has no built-in support for ETags, but you&#8217;ll see how to add it later in this chapter.
+<h3>11.3.5. Compression</h3>
+<p>The last important HTTP feature is gzip compression. When you talk about HTTP web services, you&#8217;re almost always talking
+   about moving XML back and forth over the wire. XML is text, and quite verbose text at that, and text generally compresses
+   well. When you request a resource over HTTP, you can ask the server that, if it has any new data to send you, to please send
+   it in compressed format. You include the <code>Accept-encoding: gzip</code> header in your request, and if the server supports compression, it will send you back gzip-compressed data and mark it with
+   a <code>Content-encoding: gzip</code> header.
+<p>Python&#8217;s URL library has no built-in support for gzip compression per se, but you can add arbitrary headers to the request. And
+Python comes with a separate <code>gzip</code> module, which has functions you can use to decompress the data yourself.
+<p>Note that <a href="#oa.review" title="11.2. How not to fetch data over HTTP">our little one-line script</a> to download a syndicated feed did not support any of these HTTP features. Let&#8217;s see how you can improve it.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.debug">11.4. Debugging HTTP web services</h2>
+<p>First, let&#8217;s turn on the debugging features of Python&#8217;s HTTP library and see what&#8217;s being sent over the wire. This will be useful throughout the chapter, as you add more and
+   more features.
+<div class=example><h3>Example 11.3. Debugging HTTP</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import httplib</kbd>
+<samp class=p>>>> </samp><kbd>httplib.HTTPConnection.debuglevel = 1</kbd>             <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>import urllib</kbd>
+<samp class=p>>>> </samp><kbd>feeddata = urllib.urlopen('http://diveintomark.org/xml/atom.xml').read()</kbd>
+connect: (diveintomark.org, 80)     <span>&#x2461;</span>
+send: '
+GET /xml/atom.xml HTTP/1.0          <span>&#x2462;</span>
+Host: diveintomark.org              <span>&#x2463;</span>
+User-agent: Python-urllib/1.15      <span>&#x2464;</span>
+'
+reply: 'HTTP/1.1 200 OK\r\n'        <span>&#x2465;</span>
+header: Date: Wed, 14 Apr 2004 22:27:30 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Content-Type: application/atom+xml
+header: Last-Modified: Wed, 14 Apr 2004 22:14:38 GMT  <span>&#x2466;</span>
+header: ETag: "e8284-68e0-4de30f80" <span>&#x2467;</span>
+header: Accept-Ranges: bytes
+header: Content-Length: 26848
+header: Connection: close
+</pre>
+<ol>
+<li><code>urllib</code> relies on another standard Python library, <code>httplib</code>. Normally you don&#8217;t need to <code>import httplib</code> directly (<code>urllib</code> does that automatically), but you will here so you can set the debugging flag on the <code>HTTPConnection</code> class that <code>urllib</code> uses internally to connect to the HTTP server. This is an incredibly useful technique. Some other Python libraries have similar debug flags, but there&#8217;s no particular standard for naming them or turning them on; you need to read
+         the documentation of each library to see if such a feature is available.
+<li>Now that the debugging flag is set, information on the the HTTP request and response is printed out in real time. The first
+         thing it tells you is that you&#8217;re connecting to the server <code>diveintomark.org</code> on port 80, which is the standard port for HTTP.
+<li>When you request the Atom feed, <code>urllib</code> sends three lines to the server. The first line specifies the HTTP verb you&#8217;re using, and the path of the resource (minus
+         the domain name). All the requests in this chapter will use <code>GET</code>, but in the next chapter on <abbr>SOAP</abbr>, you&#8217;ll see that it uses <code>POST</code> for everything. The basic syntax is the same, regardless of the verb.
+<li>The second line is the <code>Host</code> header, which specifies the domain name of the service you&#8217;re accessing. This is important, because a single HTTP server
+         can host multiple separate domains. My server currently hosts 12 domains; other servers can host hundreds or even thousands.
+<li>The third line is the <code>User-Agent</code> header. What you see here is the generic <code>User-Agent</code> that the <code>urllib</code> library adds by default. In the next section, you&#8217;ll see how to customize this to be more specific.
+<li>The server replies with a status code and a bunch of headers (and possibly some data, which got stored in the <var>feeddata</var> variable). The status code here is <code>200</code>, meaning &#8220;everything&#8217;s normal, here&#8217;s the data you requested&#8221;. The server also tells you the date it responded to your request, some information about the server itself, and the content
+         type of the data it&#8217;s giving you. Depending on your application, this might be useful, or not. It&#8217;s certainly reassuring
+         that you thought you were asking for an Atom feed, and lo and behold, you&#8217;re getting an Atom feed (<code>application/atom+xml</code>, which is the registered content type for Atom feeds).
+<li>The server tells you when this Atom feed was last modified (in this case, about 13 minutes ago). You can send this date back
+         to the server the next time you request the same feed, and the server can do last-modified checking.
+<li>The server also tells you that this Atom feed has an ETag hash of <code>"e8284-68e0-4de30f80"</code>. The hash doesn&#8217;t mean anything by itself; there&#8217;s nothing you can do with it, except send it back to the server the next
+         time you request this same feed. Then the server can use it to tell you if the data has changed or not.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.useragent">11.5. Setting the <code>User-Agent</code></h2>
+<p>The first step to improving your HTTP web services client is to identify yourself properly with a <code>User-Agent</code>. To do that, you need to move beyond the basic <code>urllib</code> and dive into <code>urllib2</code>.
+<div class=example><h3>Example 11.4. Introducing <code>urllib2</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import httplib</kbd>
+<samp class=p>>>> </samp><kbd>httplib.HTTPConnection.debuglevel = 1</kbd>           <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>import urllib2</kbd>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request('http://diveintomark.org/xml/atom.xml')</kbd> <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener()</kbd>                 <span>&#x2462;</span>
+<samp class=p>>>> </samp><kbd>feeddata = opener.open(request).read()</kbd>          <span>&#x2463;</span>
+connect: (diveintomark.org, 80)
+send: '
+GET /xml/atom.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 200 OK\r\n'
+header: Date: Wed, 14 Apr 2004 23:23:12 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Content-Type: application/atom+xml
+header: Last-Modified: Wed, 14 Apr 2004 22:14:38 GMT
+header: ETag: "e8284-68e0-4de30f80"
+header: Accept-Ranges: bytes
+header: Content-Length: 26848
+header: Connection: close
+</pre>
+<ol>
+<li>If you still have your Python <abbr>IDE</abbr> open from the previous section&#8217;s example, you can skip this, but this turns on <a href="#oa.debug" title="11.4. Debugging HTTP web services">HTTP debugging</a> so you can see what you&#8217;re actually sending over the wire, and what gets sent back.
+<li>Fetching an HTTP resource with <code>urllib2</code> is a three-step process, for good reasons that will become clear shortly. The first step is to create a <code>Request</code> object, which takes the URL of the resource you&#8217;ll eventually get around to retrieving. Note that this step doesn&#8217;t actually
+            retrieve anything yet.
+<li>The second step is to build a URL opener. This can take any number of handlers, which control how responses are handled.
+             But you can also build an opener without any custom handlers, which is what you&#8217;re doing here. You&#8217;ll see how to define
+            and use custom handlers later in this chapter when you explore redirects.
+<li>The final step is to tell the opener to open the URL, using the <code>Request</code> object you created. As you can see from all the debugging information that gets printed, this step actually retrieves the
+            resource and stores the returned data in <var>feeddata</var>.
+<div class=example><h3>Example 11.5. Adding headers with the <code>Request</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>request</kbd>            <span>&#x2460;</span>
+&lt;urllib2.Request instance at 0x00250AA8>
+<samp class=p>>>> </samp><kbd>request.get_full_url()</kbd>
+http://diveintomark.org/xml/atom.xml
+<samp class=p>>>> </samp><kbd>request.add_header('User-Agent',</kbd>
+<samp class=p>...    </samp><kbd>'OpenAnything/1.0 +http://diveintopython3.org/')</kbd>    <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>feeddata = opener.open(request).read()</kbd>                 <span>&#x2462;</span>
+connect: (diveintomark.org, 80)
+send: '
+GET /xml/atom.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: OpenAnything/1.0 +http://diveintopython3.org/   <span>&#x2463;</span>
+'
+reply: 'HTTP/1.1 200 OK\r\n'
+header: Date: Wed, 14 Apr 2004 23:45:17 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Content-Type: application/atom+xml
+header: Last-Modified: Wed, 14 Apr 2004 22:14:38 GMT
+header: ETag: "e8284-68e0-4de30f80"
+header: Accept-Ranges: bytes
+header: Content-Length: 26848
+header: Connection: close
+</pre>
+<ol>
+<li>You&#8217;re continuing from the previous example; you&#8217;ve already created a <code>Request</code> object with the URL you want to access.
+<li>Using the <code>add_header</code> method on the <code>Request</code> object, you can add arbitrary HTTP headers to the request. The first argument is the header, the second is the value you&#8217;re
+            providing for that header. Convention dictates that a <code>User-Agent</code> should be in this specific format: an application name, followed by a slash, followed by a version number. The rest is free-form,
+            and you&#8217;ll see a lot of variations in the wild, but somewhere it should include a URL of your application. The <code>User-Agent</code> is usually logged by the server along with other details of your request, and including a URL of your application allows
+            server administrators looking through their access logs to contact you if something is wrong.
+<li>The <var>opener</var> object you created before can be reused too, and it will retrieve the same feed again, but with your custom <code>User-Agent</code> header.
+<li>And here&#8217;s you sending your custom <code>User-Agent</code>, in place of the generic one that Python sends by default. If you look closely, you&#8217;ll notice that you defined a <code>User-Agent</code> header, but you actually sent a <code>User-agent</code> header. See the difference?  <code>urllib2</code> changed the case so that only the first letter was capitalized. It doesn&#8217;t really matter; HTTP specifies that header field
+            names are completely case-insensitive.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.etags">11.6. Handling <code>Last-Modified</code> and <code>ETag</code></h2>
+<p>Now that you know how to add custom HTTP headers to your web service requests, let&#8217;s look at adding support for <code>Last-Modified</code> and <code>ETag</code> headers.
+<p>These examples show the output with debugging turned off. If you still have it turned on from the previous section, you can
+turn it off by setting <code>httplib.HTTPConnection.debuglevel = 0</code>. Or you can just leave debugging on, if that helps you.
+<div class=example><h3 id="oa.etags.example.1">Example 11.6. Testing <code>Last-Modified</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import urllib2</kbd>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request('http://diveintomark.org/xml/atom.xml')</kbd>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener()</kbd>
+<samp class=p>>>> </samp><kbd>firstdatastream = opener.open(request)</kbd>
+<samp class=p>>>> </samp><kbd>firstdatastream.headers.dict</kbd>     <span>&#x2460;</span>
+<samp>{'date': 'Thu, 15 Apr 2004 20:42:41 GMT', 
+ 'server': 'Apache/2.0.49 (Debian GNU/Linux)', 
+ 'content-type': 'application/atom+xml',
+ 'last-modified': 'Thu, 15 Apr 2004 19:45:21 GMT', 
+ 'etag': '"e842a-3e53-55d97640"',
+ 'content-length': '15955', 
+ 'accept-ranges': 'bytes', 
+ 'connection': 'close'}</samp>
+<samp class=p>>>> </samp><kbd>request.add_header('If-Modified-Since',</kbd>
+<samp class=p>...    </samp>firstdatastream.headers.get('Last-Modified'))  <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>seconddatastream = opener.open(request)</kbd>            <span>&#x2462;</span>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in ?
+  File "c:\python23\lib\urllib2.py", line 326, in open
+    '_open', req)
+  File "c:\python23\lib\urllib2.py", line 306, in _call_chain
+    result = func(*args)
+  File "c:\python23\lib\urllib2.py", line 901, in http_open
+    return self.do_open(httplib.HTTP, req)
+  File "c:\python23\lib\urllib2.py", line 895, in do_open
+    return self.parent.error('http', req, fp, code, msg, hdrs)
+  File "c:\python23\lib\urllib2.py", line 352, in error
+    return self._call_chain(*args)
+  File "c:\python23\lib\urllib2.py", line 306, in _call_chain
+    result = func(*args)
+  File "c:\python23\lib\urllib2.py", line 412, in http_error_default
+    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
+urllib2.HTTPError: HTTP Error 304: Not Modified</span>
+</pre>
+<ol>
+<li>Remember all those HTTP headers you saw printed out when you turned on debugging?  This is how you can get access to them
+            programmatically: <var>firstdatastream.headers</var> is <a href="#fileinfo.userdict" title="5.5. Exploring UserDict: A Wrapper Class">an object that acts like a dictionary</a> and allows you to get any of the individual headers returned from the HTTP server.
+<li>On the second request, you add the <code>If-Modified-Since</code> header with the last-modified date from the first request. If the data hasn&#8217;t changed, the server should return a <code>304</code> status code.
+<li>Sure enough, the data hasn&#8217;t changed. You can see from the traceback that <code>urllib2</code> throws a special exception, <code>HTTPError</code>, in response to the <code>304</code> status code. This is a little unusual, and not entirely helpful. After all, it&#8217;s not an error; you specifically asked the
+            server not to send you any data if it hadn&#8217;t changed, and the data didn&#8217;t change, so the server told you it wasn&#8217;t sending
+            you any data. That&#8217;s not an error; that&#8217;s exactly what you were hoping for.
+<p><code>urllib2</code> also raises an <code>HTTPError</code> exception for conditions that you would think of as errors, such as <code>404</code> (page not found). In fact, it will raise <code>HTTPError</code> for <em>any</em> status code other than <code>200</code> (OK), <code>301</code> (permanent redirect), or <code>302</code> (temporary redirect). It would be more helpful for your purposes to capture the status code and simply return it, without
+throwing an exception. To do that, you&#8217;ll need to define a custom URL handler.
+<div class=example><h3>Example 11.7. Defining URL handlers</h3>
+<p>This custom URL handler is part of <code>openanything.py</code>.
+<pre><code>
+class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler):    <span>&#x2460;</span>
+    def http_error_default(self, req, fp, code, msg, headers): <span>&#x2461;</span>
+        result = urllib2.HTTPError(         
+            req.get_full_url(), code, msg, headers, fp)       
+        result.status = code                 <span>&#x2462;</span>
+        return result     
+</pre>
+<ol>
+<li><code>urllib2</code> is designed around URL handlers. Each handler is just a class that can define any number of methods. When something happens
+            -- like an HTTP error, or even a <code>304</code> code -- <code>urllib2</code> introspects into the list of defined handlers for a method that can handle it. You used a similar introspection in <a href="#kgp" title="Chapter 9. XML Processing">Chapter 9, <i>XML Processing</i></a> to define handlers for different node types, but <code>urllib2</code> is more flexible, and introspects over as many handlers as are defined for the current request.
+<li><code>urllib2</code> searches through the defined handlers and calls the <code>http_error_default</code> method when it encounters a <code>304</code> status code from the server. By defining a custom error handler, you can prevent <code>urllib2</code> from raising an exception. Instead, you create the <code>HTTPError</code> object, but return it instead of raising it.
+<li>This is the key part: before returning, you save the status code returned by the HTTP server. This will allow you easy access
+            to it from the calling program.
+<div class=example><h3>Example 11.8. Using custom URL handlers</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>request.headers</kbd>         <span>&#x2460;</span>
+{'If-modified-since': 'Thu, 15 Apr 2004 19:45:21 GMT'}
+<samp class=p>>>> </samp><kbd>import openanything</kbd>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener(</kbd>
+<samp class=p>...    </samp>openanything.DefaultErrorHandler())   <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>seconddatastream = opener.open(request)</kbd>
+<samp class=p>>>> </samp><kbd>seconddatastream.status</kbd> <span>&#x2462;</span>
+304
+<samp class=p>>>> </samp><kbd>seconddatastream.read()</kbd> <span>&#x2463;</span>
+''
+</pre>
+<ol>
+<li>You&#8217;re continuing the previous example, so the <code>Request</code> object is already set up, and you&#8217;ve already added the <code>If-Modified-Since</code> header.
+<li>This is the key: now that you&#8217;ve defined your custom URL handler, you need to tell <code>urllib2</code> to use it. Remember how I said that <code>urllib2</code> broke up the process of accessing an HTTP resource into three steps, and for good reason?  This is why building the URL opener
+            is its own step, because you can build it with your own custom URL handlers that override <code>urllib2</code>&#8217;s default behavior.
+<li>Now you can quietly open the resource, and what you get back is an object that, along with the usual headers (use <var>seconddatastream.headers.dict</var> to acess them), also contains the HTTP status code. In this case, as you expected, the status is <code>304</code>, meaning this data hasn&#8217;t changed since the last time you asked for it.
+<li>Note that when the server sends back a <code>304</code> status code, it doesn&#8217;t re-send the data. That&#8217;s the whole point: to save bandwidth by not re-downloading data that hasn&#8217;t
+            changed. So if you actually want that data, you&#8217;ll need to cache it locally the first time you get it.
+<p>Handling <code>ETag</code> works much the same way, but instead of checking for <code>Last-Modified</code> and sending <code>If-Modified-Since</code>, you check for <code>ETag</code> and send <code>If-None-Match</code>. Let&#8217;s start with a fresh <abbr>IDE</abbr> session.
+<div class=example><h3 id="oa.etags.example">Example 11.9. Supporting <code>ETag</code>/<code>If-None-Match</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import urllib2, openanything</kbd>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request('http://diveintomark.org/xml/atom.xml')</kbd>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener(</kbd>
+<samp class=p>...    </samp>openanything.DefaultErrorHandler())
+<samp class=p>>>> </samp><kbd>firstdatastream = opener.open(request)</kbd>
+<samp class=p>>>> </samp><kbd>firstdatastream.headers.get('ETag')</kbd>        <span>&#x2460;</span>
+'"e842a-3e53-55d97640"'
+<samp class=p>>>> </samp><kbd>firstdata = firstdatastream.read()</kbd>
+<samp class=p>>>> </samp><kbd>print firstdata</kbd>          <span>&#x2461;</span>
+<samp>&lt;?xml version="1.0" encoding="iso-8859-1"?>
+&lt;feed version="0.3"
+  xmlns="http://purl.org/atom/ns#"
+  xmlns:dc="http://purl.org/dc/elements/1.1/"
+  xml:lang="en">
+  &lt;title mode="escaped">dive into mark&lt;/title>
+  &lt;link rel="alternate" type="text/html" href="http://diveintomark.org/"/>
+  &lt;-- rest of feed omitted for brevity --&gt;</samp>
+<samp class=p>>>> </samp><kbd>request.add_header('If-None-Match',</kbd>
+<samp class=p>...    </samp>firstdatastream.headers.get('ETag'))   <span>&#x2462;</span>
+<samp class=p>>>> </samp><kbd>seconddatastream = opener.open(request)</kbd>
+<samp class=p>>>> </samp><kbd>seconddatastream.status</kbd>  <span>&#x2463;</span>
+304
+<samp class=p>>>> </samp><kbd>seconddatastream.read()</kbd>  <span>&#x2464;</span>
+''
+</pre>
+<ol>
+<li>Using the <var>firstdatastream.headers</var> pseudo-dictionary, you can get the <code>ETag</code> returned from the server. (What happens if the server didn&#8217;t send back an <code>ETag</code>?  Then this line would return <code>None</code>.)
+<li>OK, you got the data.
+<li>Now set up the second call by setting the <code>If-None-Match</code> header to the <code>ETag</code> you got from the first call.
+<li>The second call succeeds quietly (without throwing an exception), and once again you see that the server has sent back a <code>304</code> status code. Based on the <code>ETag</code> you sent the second time, it knows that the data hasn&#8217;t changed.
+<li>Regardless of whether the <code>304</code> is triggered by <code>Last-Modified</code> date checking or <code>ETag</code> hash matching, you&#8217;ll never get the data along with the <code>304</code>. That&#8217;s the whole point.
+<table id="tip.etag.vs.lastmodified" class=note border="0" summary="">
+
+<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">In these examples, the HTTP server has supported both <code>Last-Modified</code> and <code>ETag</code> headers, but not all servers do. As a web services client, you should be prepared to support both, but you must code defensively
+      in case a server only supports one or the other, or neither.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.redirect">11.7. Handling redirects</h2>
+<p>You can support permanent and temporary redirects using a different kind of custom URL handler.
+<p>First, let&#8217;s see why a redirect handler is necessary in the first place.
+<div class=example><h3>Example 11.10. Accessing web services without a redirect handler</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import urllib2, httplib</kbd>
+<samp class=p>>>> </samp><kbd>httplib.HTTPConnection.debuglevel = 1</kbd>           <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request(</kbd>
+<samp class=p>...    </samp>'http://diveintomark.org/redir/example301.xml') <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener()</kbd>
+<samp class=p>>>> </samp><kbd>f = opener.open(request)</kbd>
+<samp>connect: (diveintomark.org, 80)
+send: '
+GET /redir/example301.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 301 Moved Permanently\r\n'</span>             <span>&#x2462;</span>
+<samp>header: Date: Thu, 15 Apr 2004 22:06:25 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Location: http://diveintomark.org/xml/atom.xml</span>  <span>&#x2463;</span>
+<samp>header: Content-Length: 338
+header: Connection: close
+header: Content-Type: text/html; charset=iso-8859-1
+connect: (diveintomark.org, 80)
+send: '
+GET /xml/atom.xml HTTP/1.0</span>            <span>&#x2464;</span>
+<samp>Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 200 OK\r\n'
+header: Date: Thu, 15 Apr 2004 22:06:25 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT
+header: ETag: "e842a-3e53-55d97640"
+header: Accept-Ranges: bytes
+header: Content-Length: 15955
+header: Connection: close
+header: Content-Type: application/atom+xml</samp>
+<samp class=p>>>> </samp><kbd>f.url</kbd>           <span>&#x2465;</span>
+'http://diveintomark.org/xml/atom.xml'
+<samp class=p>>>> </samp><kbd>f.headers.dict</kbd>
+<samp>{'content-length': '15955', 
+'accept-ranges': 'bytes', 
+'server': 'Apache/2.0.49 (Debian GNU/Linux)', 
+'last-modified': 'Thu, 15 Apr 2004 19:45:21 GMT', 
+'connection': 'close', 
+'etag': '"e842a-3e53-55d97640"', 
+'date': 'Thu, 15 Apr 2004 22:06:25 GMT', 
+'content-type': 'application/atom+xml'}</samp>
+<samp class=p>>>> </samp><kbd>f.status</kbd>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in ?
+AttributeError: addinfourl instance has no attribute 'status'</span>
+</pre>
+<ol>
+<li>You&#8217;ll be better able to see what&#8217;s happening if you turn on debugging.
+<li>This is a URL which I have set up to permanently redirect to my Atom feed at <code>http://diveintomark.org/xml/atom.xml</code>.
+<li>Sure enough, when you try to download the data at that address, the server sends back a <code>301</code> status code, telling you that the resource has moved permanently.
+<li>The server also sends back a <code>Location:</code> header that gives the new address of this data.
+<li><code>urllib2</code> notices the redirect status code and automatically tries to retrieve the data at the new location specified in the <code>Location:</code> header.
+<li>The object you get back from the <var>opener</var> contains the new permanent address and all the headers returned from the second request (retrieved from the new permanent
+            address). But the status code is missing, so you have no way of knowing programmatically whether this redirect was temporary
+            or permanent. And that matters very much: if it was a temporary redirect, then you should continue to ask for the data at
+            the old location. But if it was a permanent redirect (as this was), you should ask for the data at the new location from
+            now on.
+<p>This is suboptimal, but easy to fix. <code>urllib2</code> doesn&#8217;t behave exactly as you want it to when it encounters a <code>301</code> or <code>302</code>, so let&#8217;s override its behavior. How?  With a custom URL handler, <a href="#oa.etags" title="11.6. Handling Last-Modified and ETag">just like you did to handle <code>304</code> codes</a>.
+<div class=example><h3>Example 11.11. Defining the redirect handler</h3>
+<p>This class is defined in <code>openanything.py</code>.
+<pre><code>
+class SmartRedirectHandler(urllib2.HTTPRedirectHandler):     <span>&#x2460;</span>
+    def http_error_301(self, req, fp, code, msg, headers):  
+        result = urllib2.HTTPRedirectHandler.http_error_301( <span>&#x2461;</span>
+            self, req, fp, code, msg, headers)              
+        result.status = code               <span>&#x2462;</span>
+        return result   
+
+    def http_error_302(self, req, fp, code, msg, headers):   <span>&#x2463;</span>
+        result = urllib2.HTTPRedirectHandler.http_error_302(
+            self, req, fp, code, msg, headers)              
+        result.status = code              
+        return result   
+</pre>
+<ol>
+<li>Redirect behavior is defined in <code>urllib2</code> in a class called <code>HTTPRedirectHandler</code>. You don&#8217;t want to completely override the behavior, you just want to extend it a little, so you&#8217;ll subclass <code>HTTPRedirectHandler</code> so you can call the ancestor class to do all the hard work.
+<li>When it encounters a <code>301</code> status code from the server, <code>urllib2</code> will search through its handlers and call the <code>http_error_301</code> method.  The first thing ours does is just call the <code>http_error_301</code> method in the ancestor, which handles the grunt work of looking for the <code>Location:</code> header and following the redirect to the new address.
+<li>Here&#8217;s the key: before you return, you store the status code (<code>301</code>), so that the calling program can access it later.
+<li>Temporary redirects (status code <code>302</code>) work the same way: override the <code>http_error_302</code> method, call the ancestor, and save the status code before returning.
+<p>So what has this bought us?  You can now build a URL opener with the custom redirect handler, and it will still automatically
+follow redirects, but now it will also expose the redirect status code.
+<div class=example><h3>Example 11.12. Using the redirect handler to detect permanent redirects</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request('http://diveintomark.org/redir/example301.xml')</kbd>
+<samp class=p>>>> </samp><kbd>import openanything, httplib</kbd>
+<samp class=p>>>> </samp><kbd>httplib.HTTPConnection.debuglevel = 1</kbd>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener(</kbd>
+<samp class=p>...    </samp>openanything.SmartRedirectHandler())           <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>f = opener.open(request)</kbd>
+<samp>connect: (diveintomark.org, 80)
+send: 'GET /redir/example301.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 301 Moved Permanently\r\n'</span>            <span>&#x2461;</span>
+<samp>header: Date: Thu, 15 Apr 2004 22:13:21 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Location: http://diveintomark.org/xml/atom.xml
+header: Content-Length: 338
+header: Connection: close
+header: Content-Type: text/html; charset=iso-8859-1
+connect: (diveintomark.org, 80)
+send: '
+GET /xml/atom.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 200 OK\r\n'
+header: Date: Thu, 15 Apr 2004 22:13:21 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT
+header: ETag: "e842a-3e53-55d97640"
+header: Accept-Ranges: bytes
+header: Content-Length: 15955
+header: Connection: close
+header: Content-Type: application/atom+xml
+</samp>
+<samp class=p>>>> </samp><kbd>f.status</kbd>       <span>&#x2462;</span>
+301
+<samp class=p>>>> </samp><kbd>f.url</kbd>
+'http://diveintomark.org/xml/atom.xml'
+</pre>
+<ol>
+<li>First, build a URL opener with the redirect handler you just defined.
+<li>You sent off a request, and you got a <code>301</code> status code in response. At this point, the <code>http_error_301</code> method gets called. You call the ancestor method, which follows the redirect and sends a request at the new location (<code>http://diveintomark.org/xml/atom.xml</code>).
+<li>This is the payoff: now, not only do you have access to the new URL, but you have access to the redirect status code, so you
+            can tell that this was a permanent redirect. The next time you request this data, you should request it from the new location
+            (<code>http://diveintomark.org/xml/atom.xml</code>, as specified in <var>f.url</var>). If you had stored the location in a configuration file or a database, you need to update that so you don&#8217;t keep pounding
+            the server with requests at the old address. It&#8217;s time to update your address book.
+<p>The same redirect handler can also tell you that you <em>shouldn&#8217;t</em> update your address book.
+<div class=example><h3>Example 11.13. Using the redirect handler to detect temporary redirects</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request(</kbd>
+<samp class=p>...    </samp>'http://diveintomark.org/redir/example302.xml')   <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>f = opener.open(request)</kbd>
+<samp>connect: (diveintomark.org, 80)
+send: '
+GET /redir/example302.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 302 Found\r\n'</span>         <span>&#x2461;</span>
+<samp>header: Date: Thu, 15 Apr 2004 22:18:21 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Location: http://diveintomark.org/xml/atom.xml
+header: Content-Length: 314
+header: Connection: close
+header: Content-Type: text/html; charset=iso-8859-1
+connect: (diveintomark.org, 80)
+send: '
+GET /xml/atom.xml HTTP/1.0</span>              <span>&#x2462;</span>
+<samp>Host: diveintomark.org
+User-agent: Python-urllib/2.1
+'
+reply: 'HTTP/1.1 200 OK\r\n'
+header: Date: Thu, 15 Apr 2004 22:18:21 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT
+header: ETag: "e842a-3e53-55d97640"
+header: Accept-Ranges: bytes
+header: Content-Length: 15955
+header: Connection: close
+header: Content-Type: application/atom+xml</samp>
+<samp class=p>>>> </samp><kbd>f.status</kbd>          <span>&#x2463;</span>
+302
+<samp class=p>>>> </samp><kbd>f.url</kbd>
+http://diveintomark.org/xml/atom.xml
+</pre>
+<ol>
+<li>This is a sample URL I&#8217;ve set up that is configured to tell clients to <em>temporarily</em> redirect to <code>http://diveintomark.org/xml/atom.xml</code>.
+<li>The server sends back a <code>302</code> status code, indicating a temporary redirect. The temporary new location of the data is given in the <code>Location:</code> header.
+<li><code>urllib2</code> calls your <code>http_error_302</code> method, which calls the ancestor method of the same name in <code>urllib2.HTTPRedirectHandler</code>, which follows the redirect to the new location. Then your <code>http_error_302</code> method stores the status code (<code>302</code>) so the calling application can get it later.
+<li>And here you are, having successfully followed the redirect to <code>http://diveintomark.org/xml/atom.xml</code>. <var>f.status</var> tells you that this was a temporary redirect, which means that you should continue to request data from the original address
+            (<code>http://diveintomark.org/redir/example302.xml</code>). Maybe it will redirect next time too, but maybe not. Maybe it will redirect to a different address. It&#8217;s not for you
+            to say. The server said this redirect was only temporary, so you should respect that. And now you&#8217;re exposing enough information
+            that the calling application can respect that.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.gzip">11.8. Handling compressed data</h2>
+<p>The last important HTTP feature you want to support is compression. Many web services have the ability to send data compressed,
+   which can cut down the amount of data sent over the wire by 60% or more. This is especially true of XML web services, since
+   XML data compresses very well.
+<p>Servers won&#8217;t give you compressed data unless you tell them you can handle it.
+<div class=example><h3>Example 11.14. Telling the server you would like compressed data</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import urllib2, httplib</kbd>
+<samp class=p>>>> </samp><kbd>httplib.HTTPConnection.debuglevel = 1</kbd>
+<samp class=p>>>> </samp><kbd>request = urllib2.Request('http://diveintomark.org/xml/atom.xml')</kbd>
+<samp class=p>>>> </samp><kbd>request.add_header('Accept-encoding', 'gzip')</kbd>        <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>opener = urllib2.build_opener()</kbd>
+<samp class=p>>>> </samp><kbd>f = opener.open(request)</kbd>
+<samp>connect: (diveintomark.org, 80)
+send: '
+GET /xml/atom.xml HTTP/1.0
+Host: diveintomark.org
+User-agent: Python-urllib/2.1
+Accept-encoding: gzip</span><span>&#x2461;</span>
+<samp>'
+reply: 'HTTP/1.1 200 OK\r\n'
+header: Date: Thu, 15 Apr 2004 22:24:39 GMT
+header: Server: Apache/2.0.49 (Debian GNU/Linux)
+header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMT
+header: ETag: "e842a-3e53-55d97640"
+header: Accept-Ranges: bytes
+header: Vary: Accept-Encoding
+header: Content-Encoding: gzip</span>         <span>&#x2462;</span>
+header: Content-Length: 6289           <span>&#x2463;</span>
+<samp>header: Connection: close
+header: Content-Type: application/atom+xml</span>
+</pre>
+<ol>
+<li>This is the key: once you&#8217;ve created your <code>Request</code> object, add an <code>Accept-encoding</code> header to tell the server you can accept gzip-encoded data. <code>gzip</code> is the name of the compression algorithm you&#8217;re using. In theory there could be other compression algorithms, but <code>gzip</code> is the compression algorithm used by 99% of web servers.
+<li>There&#8217;s your header going across the wire.
+<li>And here&#8217;s what the server sends back: the <code>Content-Encoding: gzip</code> header means that the data you&#8217;re about to receive has been gzip-compressed.
+<li>The <code>Content-Length</code> header is the length of the compressed data, not the uncompressed data. As you&#8217;ll see in a minute, the actual length of
+            the uncompressed data was 15955, so gzip compression cut your bandwidth by over 60%!
+<div class=example><h3>Example 11.15. Decompressing the data</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>compresseddata = f.read()</kbd>            <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>len(compresseddata)</kbd>
+6289
+<samp class=p>>>> </samp><kbd>import StringIO</kbd>
+<samp class=p>>>> </samp><kbd>compressedstream = StringIO.StringIO(compresseddata)</kbd>   <span>&#x2461;</span>
+<samp class=p>>>> </samp><kbd>import gzip</kbd>
+<samp class=p>>>> </samp><kbd>gzipper = gzip.GzipFile(fileobj=compressedstream)</kbd>      <span>&#x2462;</span>
+<samp class=p>>>> </samp><kbd>data = gzipper.read()</kbd>                <span>&#x2463;</span>
+<samp class=p>>>> </samp><kbd>print data</kbd>         <span>&#x2464;</span>
+<samp>&lt;?xml version="1.0" encoding="iso-8859-1"?>
+&lt;feed version="0.3"
+  xmlns="http://purl.org/atom/ns#"
+  xmlns:dc="http://purl.org/dc/elements/1.1/"
+  xml:lang="en">
+  &lt;title mode="escaped">dive into mark&lt;/title>
+  &lt;link rel="alternate" type="text/html" href="http://diveintomark.org/"/>
+  &lt;-- rest of feed omitted for brevity --&gt;</samp>
+<samp class=p>>>> </samp><kbd>len(data)</kbd>
+15955
+</pre>
+<ol>
+<li>Continuing from the previous example, <var>f</var> is the file-like object returned from the URL opener. Using its <code>read()</code> method would ordinarily get you the uncompressed data, but since this data has been gzip-compressed, this is just the first
+            step towards getting the data you really want.
+<li>OK, this step is a little bit of messy workaround. Python has a <code>gzip</code> module, which reads (and actually writes) gzip-compressed files on disk. But you don&#8217;t have a file on disk, you have a gzip-compressed
+            buffer in memory, and you don&#8217;t want to write out a temporary file just so you can uncompress it. So what you&#8217;re going to
+            do is create a file-like object out of the in-memory data (<var>compresseddata</var>), using the <code>StringIO</code> module. You first saw the <code>StringIO</code> module in <a href="#kgp.openanything.stringio.example" title="Example 10.4. Introducing StringIO">the previous chapter</a>, but now you&#8217;ve found another use for it.
+<li>Now you can create an instance of <code>GzipFile</code>, and tell it that its &#8220;file&#8221; is the file-like object <var>compressedstream</var>.
+<li>This is the line that does all the actual work: &#8220;reading&#8221; from <code>GzipFile</code> will decompress the data. Strange?  Yes, but it makes sense in a twisted kind of way. <var>gzipper</var> is a file-like object which represents a gzip-compressed file. That &#8220;file&#8221; is not a real file on disk, though; <var>gzipper</var> is really just &#8220;reading&#8221; from the file-like object you created with <code>StringIO</code> to wrap the compressed data, which is only in memory in the variable <var>compresseddata</var>. And where did that compressed data come from?  You originally downloaded it from a remote HTTP server by &#8220;reading&#8221; from the file-like object you built with <code>urllib2.build_opener</code>. And amazingly, this all just works. Every step in the chain has no idea that the previous step is faking it.
+<li>Look ma, real data. (15955 bytes of it, in fact.)<p>&#8220;But wait!&#8221; I hear you cry. &#8220;This could be even easier!&#8221;  I know what you&#8217;re thinking. You&#8217;re thinking that <var>opener.open</var> returns a file-like object, so why not cut out the <code>StringIO</code> middleman and just pass <var>f</var> directly to <code>GzipFile</code>?  OK, maybe you weren&#8217;t thinking that, but don&#8217;t worry about it, because it doesn&#8217;t work.
+<div class=example><h3>Example 11.16. Decompressing the data directly from the server</h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>f = opener.open(request)</kbd><span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>f.headers.get('Content-Encoding')</kbd>         <span>&#x2461;</span>
+'gzip'
+<samp class=p>>>> </samp><kbd>data = gzip.GzipFile(fileobj=f).read()</kbd>    <span>&#x2462;</span>
+<samp class=traceback>Traceback (most recent call last):
+  File "&lt;stdin>", line 1, in ?
+  File "c:\python23\lib\gzip.py", line 217, in read
+    self._read(readsize)
+  File "c:\python23\lib\gzip.py", line 252, in _read
+    pos = self.fileobj.tell()   # Save current position
+AttributeError: addinfourl instance has no attribute 'tell'</span>
+</pre>
+<ol>
+<li>Continuing from the previous example, you already have a <code>Request</code> object set up with an <code>Accept-encoding: gzip</code> header.
+<li>Simply opening the request will get you the headers (though not download any data yet). As you can see from the returned
+<code>Content-Encoding</code> header, this data has been sent gzip-compressed.
+<li>Since <code>opener.open</code> returns a file-like object, and you know from the headers that when you read it, you&#8217;re going to get gzip-compressed data,
+            why not simply pass that file-like object directly to <code>GzipFile</code>?  As you &#8220;read&#8221; from the <code>GzipFile</code> instance, it will &#8220;read&#8221; compressed data from the remote HTTP server and decompress it on the fly. It&#8217;s a good idea, but unfortunately it doesn&#8217;t
+            work. Because of the way gzip compression works, <code>GzipFile</code> needs to save its position and move forwards and backwards through the compressed file. This doesn&#8217;t work when the &#8220;file&#8221; is a stream of bytes coming from a remote server; all you can do with it is retrieve bytes one at a time, not move back and
+            forth through the data stream. So the inelegant hack of using <code>StringIO</code> is the best solution: download the compressed data, create a file-like object out of it with <code>StringIO</code>, and then decompress the data from that.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.alltogether">11.9. Putting it all together</h2>
+<p>You&#8217;ve seen all the pieces for building an intelligent HTTP web services client. Now let&#8217;s see how they all fit together.
+<div class=example><h3>Example 11.17. The <code>openanything</code> function</h3>
+<p>This function is defined in <code>openanything.py</code>.
+<pre><code>
+def openAnything(source, etag=None, lastmodified=None, agent=USER_AGENT):
+    # non-HTTP code omitted for brevity
+    if urlparse.urlparse(source)[0] == 'http':   <span>&#x2460;</span>
+        # open URL with urllib2                 
+        request = urllib2.Request(source)       
+        request.add_header('User-Agent', agent)  <span>&#x2461;</span>
+        if etag:              
+            request.add_header('If-None-Match', etag)              <span>&#x2462;</span>
+        if lastmodified:      
+            request.add_header('If-Modified-Since', lastmodified)  <span>&#x2463;</span>
+        request.add_header('Accept-encoding', 'gzip')              <span>&#x2464;</span>
+        opener = urllib2.build_opener(SmartRedirectHandler(), DefaultErrorHandler()) <span>&#x2465;</span>
+        return opener.open(request)              <span>&#x2466;</span>
+</pre>
+<ol>
+<li><code>urlparse</code> is a handy utility module for, you guessed it, parsing URLs. Its primary function, also called <code>urlparse</code>, takes a URL and splits it into a tuple of (scheme, domain, path, params, query string parameters, and fragment identifier).
+             Of these, the only thing you care about is the scheme, to make sure that you&#8217;re dealing with an HTTP URL (which <code>urllib2</code> can handle).
+<li>You identify yourself to the HTTP server with the <code>User-Agent</code> passed in by the calling function. If no <code>User-Agent</code> was specified, you use a default one defined earlier in the <code>openanything.py</code> module. You never use the default one defined by <code>urllib2</code>.
+<li>If an <code>ETag</code> hash was given, send it in the <code>If-None-Match</code> header.
+<li>If a last-modified date was given, send it in the <code>If-Modified-Since</code> header.
+<li>Tell the server you would like compressed data if possible.
+<li>Build a URL opener that uses <em>both</em> of the custom URL handlers: <code>SmartRedirectHandler</code> for handling <code>301</code> and <code>302</code> redirects, and <code>DefaultErrorHandler</code> for handling <code>304</code>, <code>404</code>, and other error conditions gracefully.
+<li>That&#8217;s it!  Open the URL and return a file-like object to the caller.
+<div class=example><h3>Example 11.18. The <code>fetch</code> function</h3>
+<p>This function is defined in <code>openanything.py</code>.
+<pre><code>
+def fetch(source, etag=None, last_modified=None, agent=USER_AGENT):  
+    '''Fetch data and metadata from a URL, file, stream, or string'''
+    result = {}
+    f = openAnything(source, etag, last_modified, agent)              <span>&#x2460;</span>
+    result['data'] = f.read()     <span>&#x2461;</span>
+    if hasattr(f, 'headers'):    
+        # save ETag, if the server sent one        
+        result['etag'] = f.headers.get('ETag')      <span>&#x2462;</span>
+        # save Last-Modified header, if the server sent one          
+        result['lastmodified'] = f.headers.get('Last-Modified')       <span>&#x2463;</span>
+        if f.headers.get('content-encoding', '') == 'gzip':           <span>&#x2464;</span>
+            # data came back gzip-compressed, decompress it          
+            result['data'] = gzip.GzipFile(fileobj=StringIO(result['data']])).read()
+    if hasattr(f, 'url'):         <span>&#x2465;</span>
+        result['url'] = f.url    
+        result['status'] = 200   
+    if hasattr(f, 'status'):      <span>&#x2466;</span>
+        result['status'] = f.status                
+    f.close()  
+    return result                
+</pre>
+<ol>
+<li>First, you call the <code>openAnything</code> function with a URL, <code>ETag</code> hash, <code>Last-Modified</code> date, and <code>User-Agent</code>.
+<li>Read the actual data returned from the server. This may be compressed; if so, you&#8217;ll decompress it later.
+<li>Save the <code>ETag</code> hash returned from the server, so the calling application can pass it back to you next time, and you can pass it on to <code>openAnything</code>, which can stick it in the <code>If-None-Match</code> header and send it to the remote server.
+<li>Save the <code>Last-Modified</code> date too.
+<li>If the server says that it sent compressed data, decompress it.
+<li>If you got a URL back from the server, save it, and assume that the status code is <code>200</code> until you find out otherwise.
+<li>If one of the custom URL handlers captured a status code, then save that too.
+<div class=example><h3>Example 11.19. Using <code>openanything.py</code></h3><pre class=screen>
+<samp class=p>>>> </samp><kbd>import openanything</kbd>
+<samp class=p>>>> </samp><kbd>useragent = 'MyHTTPWebServicesApp/1.0'</kbd>
+<samp class=p>>>> </samp><kbd>url = 'http://diveintopython3.org/redir/example301.xml'</kbd>
+<samp class=p>>>> </samp><kbd>params = openanything.fetch(url, agent=useragent)</kbd>              <span>&#x2460;</span>
+<samp class=p>>>> </samp><kbd>params</kbd>   <span>&#x2461;</span>
+<samp>{'url': 'http://diveintomark.org/xml/atom.xml', 
+'lastmodified': 'Thu, 15 Apr 2004 19:45:21 GMT', 
+'etag': '"e842a-3e53-55d97640"', 
+'status': 301,
+'data': '&lt;?xml version="1.0" encoding="iso-8859-1"?>
+&lt;feed version="0.3"
+&lt;-- rest of data omitted for brevity --&gt;'}</samp>
+<samp class=p>>>> </samp><kbd>if params['status'] == 301:</kbd><span>&#x2462;</span>
+<samp class=p>...    </samp>url = params['url']
+<samp class=p>>>> </samp><kbd>newparams = openanything.fetch(</kbd>
+<samp class=p>...    </samp>url, params['etag'], params['lastmodified'], useragent)    <span>&#x2463;</span>
+<samp class=p>>>> </samp><kbd>newparams</kbd>
+<samp>{'url': 'http://diveintomark.org/xml/atom.xml', 
+'lastmodified': None, 
+'etag': '"e842a-3e53-55d97640"', 
+'status': 304,
+'data': ''}</span>  <span>&#x2464;</span>
+</pre>
+<ol>
+<li>The very first time you fetch a resource, you don&#8217;t have an <code>ETag</code> hash or <code>Last-Modified</code> date, so you&#8217;ll leave those out. (They&#8217;re <a href="#apihelper.optional" title="4.2. Using Optional and Named Arguments">optional parameters</a>.)
+<li>What you get back is a dictionary of several useful headers, the HTTP status code, and the actual data returned from the server.
+             <code>openanything</code> handles the gzip compression internally; you don&#8217;t care about that at this level.
+<li>If you ever get a <code>301</code> status code, that&#8217;s a permanent redirect, and you need to update your URL to the new address.
+<li>The second time you fetch the same resource, you have all sorts of information to pass back: a (possibly updated) URL, the
+<code>ETag</code> from the last time, the <code>Last-Modified</code> date from the last time, and of course your <code>User-Agent</code>.
+<li>What you get back is again a dictionary, but the data hasn&#8217;t changed, so all you got was a <code>304</code> status code and no data.
+
+<p class=a>&#x2042;
+
+<h2 id="oa.summary">11.10. Summary</h2>
+<p>The <code>openanything.py</code> and its functions should now make perfect sense.
+<p>There are 5 important features of HTTP web services that every client should support:
+<div class=itemizedlist>
+<ul>
+<li>Identifying your application <a href="#oa.useragent" title="11.5. Setting the User-Agent">by setting a proper <code>User-Agent</code></a>.
+
+<li>Handling <a href="#oa.redirect" title="11.7. Handling redirects">permanent redirects properly</a>.
+
+<li>Supporting <a href="#oa.etags" title="11.6. Handling Last-Modified and ETag"><code>Last-Modified</code> date checking</a> to avoid re-downloading data that hasn&#8217;t changed.
+
+<li>Supporting <a href="#oa.etags.example" title="Example 11.9. Supporting ETag/If-None-Match"><code>ETag</code> hashes</a> to avoid re-downloading data that hasn&#8217;t changed.
+
+<li>Supporting <a href="#oa.gzip" title="11.8. Handling compressed data">gzip compression</a> to reduce bandwidth even when data <em>has</em> changed.
+
+</ul>
+-->
+
+<p class=a>&#x2042;
+
+<h2 id=beyond-get>Going Beyond GET</h2>
+
+<p>FIXME
+
+<pre>
+>>> import httplib2
+>>> from urllib.parse import urlencode
+>>> h = httplib2.Http('.cache')
+>>> data = {"status": "Test update from Python 3"}
+>>> h.add_credentials("diveintomark", "<var>MY_SECRET_PASSWORD</var>")
+>>> resp, content = h.request("http://twitter.com/statuses/update.xml", "POST", urlencode(data))
+>>> resp.status
+200
+>>> from xml.etree import ElementTree as etree
+>>> tree = etree.fromstring(content)
+>>> print(etree.tostring(tree))
+&lt;status>
+  &lt;created_at>Sat May 30 19:11:38 +0000 2009&lt;/created_at>
+  &lt;id>1973974228&lt;/id>
+  &lt;text>Test update from Python 3&lt;/text>
+  &lt;source>web&lt;/source>
+  &lt;truncated>false&lt;/truncated>
+  &lt;in_reply_to_status_id />
+  &lt;in_reply_to_user_id />
+  &lt;favorited>false&lt;/favorited>
+  &lt;in_reply_to_screen_name />
+  &lt;user>
+    &lt;id>8294212&lt;/id>
+    &lt;name>Mark Pilgrim&lt;/name>
+    &lt;screen_name>diveintomark&lt;/screen_name>
+    &lt;location>Apex, NC&lt;/location>
+    &lt;description>Like a fine spice&lt;/description>
+    &lt;profile_image_url>http://s3.amazonaws.com/twitter_production/profile_images/72859681/beau_normal.jpg&lt;/profile_image_url>
+
+    &lt;url>http://diveintomark.org/&lt;/url>
+    &lt;protected>false&lt;/protected>
+    &lt;followers_count>2565&lt;/followers_count>
+    &lt;profile_background_color>FFFFFF&lt;/profile_background_color>
+    &lt;profile_text_color>333333&lt;/profile_text_color>
+    &lt;profile_link_color>333333&lt;/profile_link_color>
+    &lt;profile_sidebar_fill_color>ffffff&lt;/profile_sidebar_fill_color>
+    &lt;profile_sidebar_border_color>333333&lt;/profile_sidebar_border_color>
+    &lt;friends_count>44&lt;/friends_count>
+    &lt;created_at>Sun Aug 19 23:58:36 +0000 2007&lt;/created_at>
+    &lt;favourites_count>71&lt;/favourites_count>
+    &lt;utc_offset>-18000&lt;/utc_offset>
+    &lt;time_zone>Eastern Time (US &amp; Canada)&lt;/time_zone>
+    &lt;profile_background_image_url>http://static.twitter.com/images/themes/theme1/bg.gif&lt;/profile_background_image_url>
+    &lt;profile_background_tile>false&lt;/profile_background_tile>
+    &lt;statuses_count>527&lt;/statuses_count>
+    &lt;notifications>false&lt;/notifications>
+    &lt;following>false&lt;/following>
+  &lt;/user>
+&lt;/status>
+</pre>
+
+<p>FIXME
+
+<p class=a>&#x2042;
+
+<h2 id=beyond-post>Going Beyond POST</h2>
+
+<p>FIXME
+
+<pre>
+>>> tree.findtext("id")
+'1973974228'
+>>> resp, delete_content = h.request("http://twitter.com/statuses/destroy/{0}.xml".format(tree.findtext("id")), "DELETE")
+>>> resp.status
+200
+</pre>
+
+<p class=a>&#x2042;
+
+<h2 id=furtherreading>Further Reading</h2>
+
+<ul>
+<li><a href=http://code.google.com/p/httplib2/><code>httplib2</code></a>
+<li><a href=http://www.xml.com/pub/a/2006/02/01/doing-http-caching-right-introducing-httplib2.html>Doing <abbr>HTTP</abbr> Caching Right: Introducing <code>httplib2</code></a>
+<li><a href=http://www.xml.com/pub/a/2006/03/29/httplib2-http-persistence-and-authentication.html><code>httplib2</code>: <abbr>HTTP</abbr> Persistence and Authentication</a>
+<li><a href=http://apiwiki.twitter.com/>Twitter <abbr>API</abbr> reference</a>
+</ul>
+
+<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
+<script src=jquery.js></script>
+<script src=dip3.js></script>
diff --git a/index.html b/index.html
index a41f9e9..e8ef9e5 100644
--- a/index.html
+++ b/index.html
@@ -41,7 +41,7 @@ h1:before{content:""}
 <li class=todo>Files
 <li><a href=xml.html>XML</a>
 <li class=todo>HTML
-<li class=todo>HTTP
+<li><a href=http-web-services.html>HTTP Web Services</a>
 <li class=todo>Performance tuning
 <li class=todo>Packaging Python libraries
 <li class=todo>Creating graphics with the Python Imaging Library
diff --git a/table-of-contents.html b/table-of-contents.html
index 03a8f52..6124c75 100644
--- a/table-of-contents.html
+++ b/table-of-contents.html
@@ -222,7 +222,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
   <li>Putting it all together
   <li>Summary
   </ol>
-<li>HTTP
+<li id=http-web-services>HTTP Web Services
   <ol>
   <li>Diving in
   <li>How not to fetch data over HTTP
@@ -290,7 +290,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
   </ol>
 <li id=case-study-porting-chardet-to-python-3><a href=case-study-porting-chardet-to-python-3.html>Case study: porting <code>chardet</code> to Python 3</a>
   <ol>
-  <li><a href=case-study-porting-chardet-to-python-3.html#divingin>Introducing <code>chardet</code>: a mini-<abbr>FAQ</abbr></a>
+  <li><a href=case-study-porting-chardet-to-python-3.html#divingin>Introducing <code>chardet</code>: a mini-FAQ</a>
     <ol>
     <li><a href=case-study-porting-chardet-to-python-3.html#faq.what>What is character encoding auto-detection?</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#faq.impossible>Isn&#8217;t that impossible?</a>
@@ -300,7 +300,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
     </ol>
   <li><a href=case-study-porting-chardet-to-python-3.html#divingin2>Diving in</a>
     <ol>
-    <li><a href=case-study-porting-chardet-to-python-3.html#how.bom><code>UTF-n</code> with a <abbr>BOM</abbr></a>
+    <li><a href=case-study-porting-chardet-to-python-3.html#how.bom><code>UTF-n</code> with a BOM</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#how.esc>Escaped encodings</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#how.mb>Multi-byte encodings</a>
     <li><a href=case-study-porting-chardet-to-python-3.html#how.sb>Single-byte encodings</a>