diff --git a/http-web-services.html b/http-web-services.html new file mode 100644 index 0000000..17a4d48 --- /dev/null +++ b/http-web-services.html @@ -0,0 +1,842 @@ + + + +HTTP Web Services - Dive into Python 3 + + + + + + + +
  
+

You are here: Home Dive Into Python 3 +

Difficulty level: ♦♦♦♦♢ +

HTTP Web Services

+
+

FIXME
— FIXME +

+

  +

Diving In

+

HTTP web services are programmatic ways of sending and receiving data from remote servers using the operations of HTTP directly. If you want to get data from the server, use a straight HTTP GET; if you want to send new data to the server, use HTTP POST. (Some more advanced HTTP web service APIs also define ways of modifying existing data and deleting data, using HTTP PUT and HTTP DELETE.) In other words, the “verbs” built into the HTTP protocol (GET, POST, PUT, and DELETE) map directly to application-level operations for receiving, sending, modifying, and deleting data. + +

The main advantage of this approach is simplicity, and its simplicity has proven popular with a lot of different sites. Data -- usually XML data -- can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an HTTP library for downloading it. Debugging is also easier; because each “call” to the web service had a unique URL, you can load it in your web browser and immediately see the raw data. + +

Examples of HTTP web services: +

+ +

Python 3 comes with two different libraries for interacting with HTTP web services: + +

+ +

Which one should you use? Neither of them. Instead, you should use httplib2, an open source third-party library that implements HTTP more fully than http.client but provides a better abstraction that urllib.request. + +

To understand why httplib2 is the right choice, you first need to understand HTTP. + +

⁂ + +

How Not To Fetch Data Over HTTP

+

Let’s say you want to download a resource over HTTP, such as an Atom feed. But you don’t just want to download it once; you want to download it over and over again, every hour, to get the latest news from the site that’s offering the news feed. Let’s do it the quick-and-dirty way first, and then see how you can do better. +

+>>> import urllib.request
+>>> data = urllib.request.urlopen('http://diveintopython3.org/examples/feed.xml').read()
+>>> print(data)
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
+  <title>dive into mark</title>
+  <subtitle>currently between addictions</subtitle>
+  <id>tag:diveintomark.org,2001-07-29:/</id>
+  <updated>2009-03-27T21:56:07Z</updated>
+  <link rel="alternate" type="text/html" href="http://diveintomark.org/"/>
+  <-- rest of feed omitted for brevity -->
+
    +
  1. Downloading anything over HTTP is incredibly easy in Python; in fact, it’s a one-liner. The urllib.request module has a handy urlopen() function that takes the address of the page you want, and returns a file-like object that you can just read() from to get the full contents of the page. It just can’t get any easier. +
+ +

So what’s wrong with this? Well, for a quick one-off during testing or development, there’s nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis -- and remember, you said you were planning on retrieving this syndicated feed once an hour -- then you’re being inefficient, and you’re being rude. + +

Let’s talk about some of the basic features of HTTP. + +

⁂ + +

Features of HTTP

+ +

FIXME + + +

⁂ + +

Going Beyond GET

+ +

FIXME + +

+>>> import httplib2
+>>> from urllib.parse import urlencode
+>>> h = httplib2.Http('.cache')
+>>> data = {"status": "Test update from Python 3"}
+>>> h.add_credentials("diveintomark", "MY_SECRET_PASSWORD")
+>>> resp, content = h.request("http://twitter.com/statuses/update.xml", "POST", urlencode(data))
+>>> resp.status
+200
+>>> from xml.etree import ElementTree as etree
+>>> tree = etree.fromstring(content)
+>>> print(etree.tostring(tree))
+<status>
+  <created_at>Sat May 30 19:11:38 +0000 2009</created_at>
+  <id>1973974228</id>
+  <text>Test update from Python 3</text>
+  <source>web</source>
+  <truncated>false</truncated>
+  <in_reply_to_status_id />
+  <in_reply_to_user_id />
+  <favorited>false</favorited>
+  <in_reply_to_screen_name />
+  <user>
+    <id>8294212</id>
+    <name>Mark Pilgrim</name>
+    <screen_name>diveintomark</screen_name>
+    <location>Apex, NC</location>
+    <description>Like a fine spice</description>
+    <profile_image_url>http://s3.amazonaws.com/twitter_production/profile_images/72859681/beau_normal.jpg</profile_image_url>
+
+    <url>http://diveintomark.org/</url>
+    <protected>false</protected>
+    <followers_count>2565</followers_count>
+    <profile_background_color>FFFFFF</profile_background_color>
+    <profile_text_color>333333</profile_text_color>
+    <profile_link_color>333333</profile_link_color>
+    <profile_sidebar_fill_color>ffffff</profile_sidebar_fill_color>
+    <profile_sidebar_border_color>333333</profile_sidebar_border_color>
+    <friends_count>44</friends_count>
+    <created_at>Sun Aug 19 23:58:36 +0000 2007</created_at>
+    <favourites_count>71</favourites_count>
+    <utc_offset>-18000</utc_offset>
+    <time_zone>Eastern Time (US & Canada)</time_zone>
+    <profile_background_image_url>http://static.twitter.com/images/themes/theme1/bg.gif</profile_background_image_url>
+    <profile_background_tile>false</profile_background_tile>
+    <statuses_count>527</statuses_count>
+    <notifications>false</notifications>
+    <following>false</following>
+  </user>
+</status>
+
+ +

FIXME + +

⁂ + +

Going Beyond POST

+ +

FIXME + +

+>>> tree.findtext("id")
+'1973974228'
+>>> resp, delete_content = h.request("http://twitter.com/statuses/destroy/{0}.xml".format(tree.findtext("id")), "DELETE")
+>>> resp.status
+200
+
+ +

⁂ + +

Further Reading

+ + + +

© 2001–9 Mark Pilgrim + + diff --git a/index.html b/index.html index a41f9e9..e8ef9e5 100644 --- a/index.html +++ b/index.html @@ -41,7 +41,7 @@ h1:before{content:""}

  • Files
  • XML
  • HTML -
  • HTTP +
  • HTTP Web Services
  • Performance tuning
  • Packaging Python libraries
  • Creating graphics with the Python Imaging Library diff --git a/table-of-contents.html b/table-of-contents.html index 03a8f52..6124c75 100644 --- a/table-of-contents.html +++ b/table-of-contents.html @@ -222,7 +222,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
  • Putting it all together
  • Summary -
  • HTTP +
  • HTTP Web Services
    1. Diving in
    2. How not to fetch data over HTTP @@ -290,7 +290,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
  • Case study: porting chardet to Python 3
      -
    1. Introducing chardet: a mini-FAQ +
    2. Introducing chardet: a mini-FAQ
      1. What is character encoding auto-detection?
      2. Isn’t that impossible? @@ -300,7 +300,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
    3. Diving in
        -
      1. UTF-n with a BOM +
      2. UTF-n with a BOM
      3. Escaped encodings
      4. Multi-byte encodings
      5. Single-byte encodings