mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
mention that urlopen().read() returns bytes
This commit is contained in:
@@ -178,7 +178,10 @@ Cache-Control: max-age=31536000, public</samp></pre>
|
||||
<p>Let’s say you want to download a resource over <abbr>HTTP</abbr>, such as <a href=xml.html>an Atom feed</a>. Being a feed, you’re not just going to download it once; you’re going to download it over and over again. (Most feed readers will check for changes once an hour.) Let’s do it the quick-and-dirty way first, and then see how you can do better.
|
||||
<pre class='nd screen'>
|
||||
<samp class=p>>>> </samp><kbd class=pp>import urllib.request</kbd>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>data = urllib.request.urlopen('http://diveintopython3.org/examples/feed.xml').read()</kbd> <span class=u>①</span></a>
|
||||
<samp class=p>>>> </samp><kbd class=pp>a_url = 'http://diveintopython3.org/examples/feed.xml'
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>data = urllib.request.urlopen(a_url).read()</kbd> <span class=u>①</span></a>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>type(data)</kbd> <span class=u>②</span></a>
|
||||
<samp class=pp><class 'bytes'></samp>
|
||||
<samp class=p>>>> </samp><kbd class=pp>print(data)</kbd>
|
||||
<samp class=pp><?xml version='1.0' encoding='utf-8'?>
|
||||
<feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
|
||||
@@ -191,6 +194,7 @@ Cache-Control: max-age=31536000, public</samp></pre>
|
||||
</samp></pre>
|
||||
<ol>
|
||||
<li>Downloading anything over <abbr>HTTP</abbr> is incredibly easy in Python; in fact, it’s a one-liner. The <code>urllib.request</code> module has a handy <code>urlopen()</code> function that takes the address of the page you want, and returns a file-like object that you can just <code>read()</code> from to get the full contents of the page. It just can’t get any easier.
|
||||
<li>The <code>urlopen().read()</code> method always returns <a href=strings.html#byte-arrays>a <code>bytes</code> object, not a string</a>. Remember, bytes are bytes; characters are an abstraction. <abbr>HTTP</abbr> servers don’t deal in abstractions. If you request a resource, you get bytes. If you want a string, you’ll have to convert it yourself.
|
||||
</ol>
|
||||
|
||||
<p>So what’s wrong with this? For a quick one-off during testing or development, there’s nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis (<i>e.g.</i> requesting this feed once an hour), then you’re being inefficient, and you’re being rude.
|
||||
|
||||
Reference in New Issue
Block a user