diff --git a/http-web-services.html b/http-web-services.html index 3cfe9e2..1aa18d7 100644 --- a/http-web-services.html +++ b/http-web-services.html @@ -336,44 +336,61 @@ Content-Type: application/xml

How httplib2 Handles Caching

-

FIXME +

Remember in the previous section when I said you should always create an httplib2.Http object with a directory name? Caching is the reason.

-# continued from previous example
->>> response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml')
->>> response2.status
+# continued from the previous example
+>>> response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml')  
+>>> response2.status                                                                 
 200
->>> content2[:52]
+>>> content2[:52]                                                                    
 b"<?xml version='1.0' encoding='utf-8'?>\r\n<feed xmlns="
 >>> len(content2)
 3070
    -
  1. FIXME +
  2. This shouldn’t be terribly surprising. It’s the same thing you did last time, except you’re putting the result into two new variables. +
  3. The HTTP status is once again 200, just like last time. +
  4. The downloaded content is the same as last time, too.
+

So… who cares? Quit your Python interactive shell and relaunch it with a new session, and I’ll show you. +

 # NOT continued from previous example!
 # Please exit out of the interactive shell
 # and launch a new one.
 >>> import httplib2
->>> httplib2.debuglevel = 1
->>> h = httplib2.Http('.cache')
->>> response, content = h.request('http://diveintopython3.org/examples/feed.xml')
->>> len(content)
+>>> httplib2.debuglevel = 1                                                        
+>>> h = httplib2.Http('.cache')                                                    
+>>> response, content = h.request('http://diveintopython3.org/examples/feed.xml')  
+>>> len(content)                                                                   
 3070
->>> response.status
+>>> response.status                                                                
 200
->>> response.fromcache
+>>> response.fromcache                                                             
 True
    -
  1. FIXME +
  2. Let’s turn on debugging and see what’s on the wire. This is the httplib2 equivalent of turning on debugging in http.client. httplib2 will print all the data being sent to the server and some key information being sent back. +
  3. Create an httplib2.Http object with the same directory name as before. +
  4. Request the same URL as before. Nothing appears to happen. More precisely, nothing gets sent to the server, and nothing gets returned from the server. There is absolutely no network activity whatsoever. +
  5. Yet we did “receive” some data — in fact, we received all of it. +
  6. We also “received” an HTTP status code indicating that the “request” was successful. +
  7. Here’s the rub: this “response” was generated from httplib2’s local cache. That directory name you passed in when you created the httplib2.Http object — that directory holds httplib2’s cache of all the operations it’s ever performed.
+

You previously requested the data at this URL. That request was successful (status: 200). That response included not only the feed data, but also a set of caching headers that told anyone who was listening that they could cache this resource for up to 24 hours (Cache-Control: max-age=86400, which is 24 hours measured in seconds). httplib2 understand and respects those caching headers, and it stored the previous response in the .cache directory (which you passed in when you create the Http object). That cache hasn’t expired yet, so the second time you request the data at this URL, httplib2 simply returns the cached result without ever hitting the network. + +

I say “simply,” but obviously there is a lot of complexity hidden behind that simplicity. httplib2 handles HTTP caching automatically and by default. If for some reason you need to know whether a response came from the cache, you can check response.fromcache. Otherwise, it Just Works. + +

Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing F5 refreshes the current page, but pressing Ctrl+F5 bypasses the cache and re-requests the current page from the remote server. You might think “oh, I’ll just delete the data from my local cache, then request it again.” You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They’re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid. + +

Instead of manipulating your local cache and hoping for the best, you should use the features of HTTP to ensure that your request actually reaches the remote server. +

-# continued from previous example
+# continued from the previous example
 >>> response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml',
-...     headers={'cache-control':'no-cache'})
-connect: (diveintopython3.org, 80)
+...     headers={'cache-control':'no-cache'})  
+connect: (diveintopython3.org, 80)             
 send: b'GET /examples/feed.xml HTTP/1.1
 Host: diveintopython3.org
 user-agent: Python-httplib2/$Rev: 259 $
@@ -383,9 +400,9 @@ reply: 'HTTP/1.1 200 OK'
 …further debugging information omitted…
 >>> response2.status
 200
->>> response2.fromcache
+>>> response2.fromcache                        
 False
->>> print(dict(response2.items()))
+>>> print(dict(response2.items()))             
 {'status': '200',
  'content-length': '3070',
  'content-location': 'http://diveintopython3.org/examples/feed.xml',
@@ -401,7 +418,10 @@ reply: 'HTTP/1.1 200 OK'
  'date': 'Tue, 02 Jun 2009 00:40:26 GMT',
  'content-type': 'application/xml'}
    -
  1. FIXME +
  2. httplib2 allows you to add arbitrary HTTP headers to any outgoing request. In order to bypass all caches (not just your local disk cache, but also any caching proxies between you and the remote server), add a no-cache header in the headers dictionary. +
  3. Now you see httplib2 initiating a network request. httplib2 understands and respects caching headers in both directions — as part of the incoming response and as part of the outgoing request. It noticed that you added the no-cache header, so it bypassed its local cache altogether and then had no choice but to hit the network to request the data. +
  4. This response was not generated from your local cache. You knew that, of course, because you saw the debugging information on the outgoing request. But it’s nice to have that programmatically verified. +
  5. The request succeeded; you downloaded the entire feed again from the remote server. Of course, the server also sent back a full complement of HTTP headers along with the feed data. That includes caching headers, which httplib2 uses to update its local cache, in the hopes of avoiding network access the next time you request this feed. Everything about HTTP caching is designed to maximize cache hits and minimize network access. Even though you bypassed the cache this time, the remote server would really appreciate it if you would cache the result for next time.

How httplib2 Handles Last-Modified and ETag Headers