From 99345307655c24cafb3098a669b2e2362ceba4c1 Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Sun, 17 May 2009 00:41:29 -0400 Subject: [PATCH] got a little further in xml chapter --- dip3.css | 4 +++ dip3.js | 2 +- generators.html | 2 +- xml.html | 76 ++++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 81 insertions(+), 3 deletions(-) diff --git a/dip3.css b/dip3.css index 572599a..8ff81ae 100644 --- a/dip3.css +++ b/dip3.css @@ -35,6 +35,10 @@ Classname Legend .q = "quote" = quote at beginning of each chapter .f = "fancy" = first paragraph of each chapter (gets a fancy drop-cap) .c = "centered" = centered footer text (also clears floats) +.s = "simple" = + +.nm = "no mobile" = hide this section on mobile devices +.nd = "no decoration" = hide the widgets on this code block .note = "note/caution/important" = indented block for tips/gotchas/language comparisons .baa = "best available ampersand" = wrapper block for ampersands diff --git a/dip3.js b/dip3.js index ae923d1..80f121f 100644 --- a/dip3.js +++ b/dip3.js @@ -11,7 +11,7 @@ $(document).ready(function() { pre.addClass("code"); } }); - $("pre.code, pre.screen").each(function(i) { + $("pre.code:not(.nd), pre.screen:not(.nd)").each(function(i) { this.id = "autopre" + i; $(this).wrapInner('
'); $(this).prepend('
[' + HS['visible'] + '] [open in new window]
'); diff --git a/generators.html b/generators.html index 9cd9826..e8ad402 100644 --- a/generators.html +++ b/generators.html @@ -276,7 +276,7 @@ finally:
  1. The build_match_and_apply_functions() function has not changed. You’re still using closures to build two functions dynamically that use variables defined in the outer function.
  2. Open the file that contains the pattern strings. -
  3. Read through the file one line at a time, using the for line in <fileobject> idiom. +
  4. Read through the file one line at a time, using the for line in <fileobject> idiom.
  5. Each line in the file really has three values, but they’re separated by whitespace (tabs or spaces, it makes no difference). To split it out, use the split() string method. The first argument to the split() method is None, which means “split on any whitespace (tabs or spaces, it makes no difference).” The second argument is 3, which means “split on whitespace 3 times, then discard the rest of the line.” A line like [sxz]$ $ es will be broken up into the list ['[sxz]$', '$', 'es'], which means that pattern will get '[sxz]$', search will get '$', and replace will get 'es'. That’s a lot of power in one little line of code.
  6. Use a try..finally block to ensure the file object is closed.
diff --git a/xml.html b/xml.html index 08e4cca..94e0ea9 100644 --- a/xml.html +++ b/xml.html @@ -89,9 +89,83 @@ mark{display:inline} </entry> </feed> +

A 5-Minute Crash Course in XML

+ +

If you already know about XML, you can skip this section. + +

XML is a generalized way of describing hierarchical structured data. An XML document contains one or more elements, which are delimited by start and end tags. This is a complete (albeit boring) XML document: + +

<foo>   
+</foo>  
+
    +
  1. This is the start tag of the foo element. +
  2. This is the matching end tag of the foo element. Like balancing parentheses in writing or mathematics or code, every start tag much be closed (matched) by a corresponding end tag. +
+ +

Elements can be nested. An element bar inside an element foo is said to be a subelement or child of foo. + +

<foo>
+  <bar></bar>
+</foo>
+
+ +

Elements can have attributes, which are name-value pairs. Attributes are listed within the start tag of an element. Attribute names can not be repeated on the same element (although they can appear on different elements). Attribute values must be quoted. + +

<foo lang="en">     
+  <bar lang="fr"></bar>  
+</foo>
+
+
    +
  1. The foo element has one attribute, named lang. The value of its lang attribute is en. +
  2. The bar element has one attribute, named lang. The value of its lang attribute is fr. This doesn’t conflict with the foo element in any way. Each element has its own set of attributes. +
+ +

Elements can have text content. + +

<foo lang="en">
+  <bar lang="fr">PapayaWhip</bar>
+</foo>
+
+ +

Elements that contain no text and no children are empty. + +

<foo></foo>
+ +

There is a shorthand for writing empty elements. By putting a / character in the start tag, you can skip the end tag altogther. The XML document in the previous example could be written like this instead: + +

<foo/>
+ +

Like Python functions can be declared in different modules, XML elements can be declared in different namespaces. Namespaces usually look like URLs. You use an xmlns declaration to define a default namespace. A namespace declaration looks similar to an attribute, but it has a different purpose. + +

<feed xmlns="http://www.w3.org/2005/Atom">  
+  <title>dive into mark</title>             
+</feed>
+
+
    +
  1. The feed element is in the http://www.w3.org/2005/Atom namespace. +
  2. The title element is also in the http://www.w3.org/2005/Atom namespace. The namespace declaration affects the element where it’s declared, plus all child elements. +
+ +

You can also use an xmlns:prefix declaration to define a namespace and associate it with a prefix. Then each element in that namespace must be explicitly declared with the prefix. + +

<atom:feed xmlns:atom="http://www.w3.org/2005/Atom">  
+  <atom:title>dive into mark</atom:title>             
+</atom:feed>
+
+
    +
  1. The feed element is in the http://www.w3.org/2005/Atom namespace. +
  2. The title element is also in the http://www.w3.org/2005/Atom namespace. +
+ +

As far as a namespace-aware XML parser is concerned, the previous two XML documents are identical. Namespace + element name = XML identity. Prefixes are irrelevant. +

The Structure Of An Atom Feed

-

FIXME +

Think of a weblog, or in fact any website with frequently updated content, like CNN.com. The site itself has a title (“CNN.com”), a subtitle (“Breaking News, U.S., World, Weather, Entertainment & Video News”), a last-updated date (“updated 12:43 p.m. EDT, Sat May 16, 2009”), and a list of articles posted at different times. Each article also has a title, a first-published date (and maybe also a last-updated date, if they published a correction or fixed a typo), and a unique URL. + +

The Atom syndication format is designed to capture all of this information in a standard format. My weblog and CNN.com are wildly different in design, scope, and audience, but they both have the same basic structure. CNN.com has a title; my blog has a title. CNN.com publishes articles; I publish articles. + +

At the top level is the “root” element, which every Atom feed shares: the <feed> element in the Atom namespace (http://www.w3.org/2005/Atom). ... FIXME

Parsing XML