diff --git a/http-web-services.html b/http-web-services.html index 1aa18d7..2b36c1b 100644 --- a/http-web-services.html +++ b/http-web-services.html @@ -382,7 +382,7 @@ Content-Type: application/xml
I say “simply,” but obviously there is a lot of complexity hidden behind that simplicity. httplib2 handles HTTP caching automatically and by default. If for some reason you need to know whether a response came from the cache, you can check response.fromcache. Otherwise, it Just Works.
-
Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing F5 refreshes the current page, but pressing Ctrl+F5 bypasses the cache and re-requests the current page from the remote server. You might think “oh, I’ll just delete the data from my local cache, then request it again.” You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They’re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid. +
Now, suppose you have data cached, but you want to bypass the cache and re-request it from the remote server. Browsers sometimes do this if the user specifically requests it. For example, pressing F5 refreshes the current page, but pressing Ctrl+F5 bypasses the cache and re-requests the current page from the remote server. You might think “oh, I’ll just delete the data from my local cache, then request it again.” You could do that, but remember that there may be more parties involved than just you and the remote server. What about those intermediate proxy servers? They’re completely beyond your control, and they may still have that data cached, and will happily return it to you because (as far as they are concerned) their cache is still valid.
Instead of manipulating your local cache and hoping for the best, you should use the features of HTTP to ensure that your request actually reaches the remote server. @@ -426,20 +426,22 @@ reply: 'HTTP/1.1 200 OK'
httplib2 Handles Last-Modified and ETag HeadersFIXME +
The Cache-Control and Expires caching headers are called freshness indicators. They tell caches in no uncertain terms that you can completely avoid all network access until the cache expires. And that’s exactly the behavior you saw in the previous section: given a strong validator, httplib2 does not generate a single byte of network activity to serve up cached data (unless you explicitly bypass the cache, of course).
+
+
But what about the case where the data might have changed, but hasn’t? HTTP defines Last-Modified and Etag headers for this purpose. These headers are called validators. If the local cache is no longer fresh, a client can send the validators with the next request to see if the data has actually changed. If the data hasn’t changed, the server sends back a 304 status code and no data. So there’s still a round-trip over the network, but you end up downloading fewer bytes.
>>> import httplib2
>>> httplib2.debuglevel = 1
>>> h = httplib2.Http('.cache')
->>> response, content = h.request('http://diveintopython3.org/')
+>>> response, content = h.request('http://diveintopython3.org/') ①
connect: (diveintopython3.org, 80)
send: b'GET / HTTP/1.1
Host: diveintopython3.org
accept-encoding: deflate, gzip
user-agent: Python-httplib2/$Rev: 259 $'
reply: 'HTTP/1.1 200 OK'
->>> print(dict(response.items()))
+>>> print(dict(response.items())) ②
{'-content-encoding': 'gzip',
'accept-ranges': 'bytes',
'connection': 'close',
@@ -447,26 +449,47 @@ reply: 'HTTP/1.1 200 OK'
'content-location': 'http://diveintopython3.org/',
'content-type': 'text/html',
'date': 'Tue, 02 Jun 2009 03:26:54 GMT',
- 'etag': '"7f806d-1a01-9fb97900"',
- 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',
+ 'etag': '"7f806d-1a01-9fb97900"',
+ 'last-modified': 'Tue, 02 Jun 2009 02:51:48 GMT',
'server': 'Apache',
'status': '304',
'vary': 'Accept-Encoding,User-Agent'}
->>> len(content)
-6657
->>> response, content = h.request('http://diveintopython3.org/')
+>>> len(content) ③
+6657
+httplib2 has little to work with, and it sends out a minimum of headers with the request.
+ETag and Last-Modified header.
+
+# continued from the previous example
+>>> response, content = h.request('http://diveintopython3.org/') ①
connect: (diveintopython3.org, 80)
send: b'GET / HTTP/1.1
Host: diveintopython3.org
-if-none-match: "7f806d-1a01-9fb97900"
-if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT
+if-none-match: "7f806d-1a01-9fb97900" ②
+if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT ③
accept-encoding: deflate, gzip
user-agent: Python-httplib2/$Rev: 259 $'
-reply: 'HTTP/1.1 304 Not Modified'
->>> len(content)
+reply: 'HTTP/1.1 304 Not Modified' ④
+>>> response.fromcache ⑤
+True
+>>> response.status ⑥
+200
+>>> response.dict['status'] ⑦
+'304'
+>>> len(content) ⑧
6657
Http object (and the same local cache).
+httplib2 sends the ETag validator back to the server in the If-None-Match header.
+httplib2 also sends the Last-Modified validator back to the server in the If-Modified-Since header.
+304 status code and no data.
+httplib2 notices the 304 status code and loads the content of the page from its cache.
+304 (returned from the server this time, which caused httplib2 to look in its cache), and 200 (returned from the server last time, and stored in httplib2’s cache along with the page data). response.status returns the status from the cache.
+response.dict, which is a dictionary of the actual headers returned from the server.
+httplib2 is smart enough to let you act dumb.) By the time the request() method returns to the caller, httplib2 has already updated its cache and returned the data to you.
http2lib Handles Compression