From 278241ada5e87ff28201975afca8e638589af295 Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Wed, 15 Jul 2009 14:54:39 -0400 Subject: [PATCH] mention that urlopen().read() returns bytes --- http-web-services.html | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/http-web-services.html b/http-web-services.html index 06e1b5a..cdeb6a7 100755 --- a/http-web-services.html +++ b/http-web-services.html @@ -178,7 +178,10 @@ Cache-Control: max-age=31536000, public

Let’s say you want to download a resource over HTTP, such as an Atom feed. Being a feed, you’re not just going to download it once; you’re going to download it over and over again. (Most feed readers will check for changes once an hour.) Let’s do it the quick-and-dirty way first, and then see how you can do better.

 >>> import urllib.request
->>> data = urllib.request.urlopen('http://diveintopython3.org/examples/feed.xml').read()  
+>>> a_url = 'http://diveintopython3.org/examples/feed.xml'
+>>> data = urllib.request.urlopen(a_url).read()  
+>>> type(data)                                   
+<class 'bytes'>
 >>> print(data)
 <?xml version='1.0' encoding='utf-8'?>
 <feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
@@ -191,6 +194,7 @@ Cache-Control: max-age=31536000, public
  1. Downloading anything over HTTP is incredibly easy in Python; in fact, it’s a one-liner. The urllib.request module has a handy urlopen() function that takes the address of the page you want, and returns a file-like object that you can just read() from to get the full contents of the page. It just can’t get any easier. +
  2. The urlopen().read() method always returns a bytes object, not a string. Remember, bytes are bytes; characters are an abstraction. HTTP servers don’t deal in abstractions. If you request a resource, you get bytes. If you want a string, you’ll have to convert it yourself.

So what’s wrong with this? For a quick one-off during testing or development, there’s nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis (e.g. requesting this feed once an hour), then you’re being inefficient, and you’re being rude.