diff --git a/http-web-services.html b/http-web-services.html index 1aa18d7..2b36c1b 100644 --- a/http-web-services.html +++ b/http-web-services.html @@ -382,7 +382,7 @@ Content-Type: application/xml

I say “simply,” but obviously there is a lot of complexity hidden behind that simplicity. httplib2 handles HTTP caching automatically and by default. If for some reason you need to know whether a response came from the cache, you can check response.fromcache. Otherwise, it Just Works. -

Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing F5 refreshes the current page, but pressing Ctrl+F5 bypasses the cache and re-requests the current page from the remote server. You might think “oh, I’ll just delete the data from my local cache, then request it again.” You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They’re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid. +

Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing F5 refreshes the current page, but pressing Ctrl+F5 bypasses the cache and re-requests the current page from the remote server. You might think “oh, I’ll just delete the data from my local cache, then request it again.” You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They’re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid.

Instead of manipulating your local cache and hoping for the best, you should use the features of HTTP to ensure that your request actually reaches the remote server. @@ -426,20 +426,22 @@ reply: 'HTTP/1.1 200 OK'

How httplib2 Handles Last-Modified and ETag Headers

-

FIXME +

The Cache-Control and Expires caching headers are called freshness indicators. They tell caches in no uncertain terms that you can completely avoid all network access until the cache expires. And that’s exactly the behavior you saw in the previous section: given a strong validator, httplib2 does not generate a single byte of network activity to serve up cached data (unless you explicitly bypass the cache, of course). + +

But what about the case where the data might have changed, but hasn’t? HTTP defines Last-Modified and Etag headers for this purpose. These headers are called validators. If the local cache is no longer fresh, a client can send the validators with the next request to see if the data has actually changed. If the data hasn’t changed, the server sends back a 304 status code and no data. So there’s still a round-trip over the network, but you end up downloading fewer bytes.

 >>> import httplib2
 >>> httplib2.debuglevel = 1
 >>> h = httplib2.Http('.cache')
->>> response, content = h.request('http://diveintopython3.org/')
+>>> response, content = h.request('http://diveintopython3.org/')  
 connect: (diveintopython3.org, 80)
 send: b'GET / HTTP/1.1
 Host: diveintopython3.org
 accept-encoding: deflate, gzip
 user-agent: Python-httplib2/$Rev: 259 $'
 reply: 'HTTP/1.1 200 OK'
->>> print(dict(response.items()))
+>>> print(dict(response.items()))                                 
 {'-content-encoding': 'gzip',
  'accept-ranges': 'bytes',
  'connection': 'close',
@@ -447,26 +449,47 @@ reply: 'HTTP/1.1 200 OK'
  'content-location': 'http://diveintopython3.org/',
  'content-type': 'text/html',
  'date': 'Tue, 02 Jun 2009 03:26:54 GMT',
- 'etag': '"7f806d-1a01-9fb97900"',
- 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',
+ 'etag': '"7f806d-1a01-9fb97900"',
+ 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',
  'server': 'Apache',
  'status': '304',
  'vary': 'Accept-Encoding,User-Agent'}
->>> len(content)
-6657
->>> response, content = h.request('http://diveintopython3.org/')
+>>> len(content)                                                  
+6657
+
    +
  1. Instead of the feed, this time we’re going to download the site’s home page, which is HTML. Since this is the first time you’lve ever requested this page, httplib2 has little to work with, and it sends out a minimum of headers with the request. +
  2. The response contains a multitude of HTTP headers… but no caching information. However, it does include both an ETag and Last-Modified header. +
  3. At the time I constructed this example, this page was 6657 bytes. It’s probably changed since then, but don’t worry about it. +
+ +
+# continued from the previous example
+>>> response, content = h.request('http://diveintopython3.org/')  
 connect: (diveintopython3.org, 80)
 send: b'GET / HTTP/1.1
 Host: diveintopython3.org
-if-none-match: "7f806d-1a01-9fb97900"
-if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT
+if-none-match: "7f806d-1a01-9fb97900"                             
+if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT                  
 accept-encoding: deflate, gzip
 user-agent: Python-httplib2/$Rev: 259 $'
-reply: 'HTTP/1.1 304 Not Modified'
->>> len(content)
+reply: 'HTTP/1.1 304 Not Modified'                                
+>>> response.fromcache                                            
+True
+>>> response.status                                               
+200
+>>> response.dict['status']                                       
+'304'
+>>> len(content)                                                  
 6657
    -
  1. FIXME +
  2. You request the same page again, with the same Http object (and the same local cache). +
  3. httplib2 sends the ETag validator back to the server in the If-None-Match header. +
  4. httplib2 also sends the Last-Modified validator back to the server in the If-Modified-Since header. +
  5. The server looked at these validators, looked at the page you requested, and determined that the page has not changed since you last requested it, so it sends back a 304 status code and no data. +
  6. Back on the client, httplib2 notices the 304 status code and loads the content of the page from its cache. +
  7. This might be a bit confusing. There are really two status codes — 304 (returned from the server this time, which caused httplib2 to look in its cache), and 200 (returned from the server last time, and stored in httplib2’s cache along with the page data). response.status returns the status from the cache. +
  8. If you want the raw status code returned from the server, you can get that by looking in response.dict, which is a dictionary of the actual headers returned from the server. +
  9. However, you still get the data in the content variable. Generally, you don’t need to know why a response was served from the cache. (You may not even care that it was served from the cache at all, and that’s fine too. httplib2 is smart enough to let you act dumb.) By the time the request() method returns to the caller, httplib2 has already updated its cache and returned the data to you.

How http2lib Handles Compression