mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
got a little further in xml chapter
This commit is contained in:
@@ -35,6 +35,10 @@ Classname Legend
|
||||
.q = "quote" = quote at beginning of each chapter
|
||||
.f = "fancy" = first paragraph of each chapter (gets a fancy drop-cap)
|
||||
.c = "centered" = centered footer text (also clears floats)
|
||||
.s = "simple" =
|
||||
|
||||
.nm = "no mobile" = hide this section on mobile devices
|
||||
.nd = "no decoration" = hide the widgets on this code block
|
||||
|
||||
.note = "note/caution/important" = indented block for tips/gotchas/language comparisons
|
||||
.baa = "best available ampersand" = wrapper block for ampersands
|
||||
|
||||
@@ -11,7 +11,7 @@ $(document).ready(function() {
|
||||
pre.addClass("code");
|
||||
}
|
||||
});
|
||||
$("pre.code, pre.screen").each(function(i) {
|
||||
$("pre.code:not(.nd), pre.screen:not(.nd)").each(function(i) {
|
||||
this.id = "autopre" + i;
|
||||
$(this).wrapInner('<div class=b></div>');
|
||||
$(this).prepend('<div class=w>[<a class=toggle href="javascript:toggleCodeBlock(\'' + this.id + '\')">' + HS['visible'] + '</a>] [<a href="javascript:plainTextOnClick(\'' + this.id + '\')">open in new window</a>]</div>');
|
||||
|
||||
+1
-1
@@ -276,7 +276,7 @@ finally:
|
||||
<ol>
|
||||
<li>The <code>build_match_and_apply_functions()</code> function has not changed. You’re still using closures to build two functions dynamically that use variables defined in the outer function.
|
||||
<li>Open the file that contains the pattern strings.
|
||||
<li>Read through the file one line at a time, using the <code>for line in <fileobject></code> idiom.
|
||||
<li>Read through the file one line at a time, using the <code>for line in <fileobject></code> idiom.
|
||||
<li>Each line in the file really has three values, but they’re separated by whitespace (tabs or spaces, it makes no difference). To split it out, use the <code>split()</code> string method. The first argument to the <code>split()</code> method is <code>None</code>, which means “split on any whitespace (tabs or spaces, it makes no difference).” The second argument is <code>3</code>, which means “split on whitespace 3 times, then discard the rest of the line.” A line like <code>[sxz]$ $ es</code> will be broken up into the list <code>['[sxz]$', '$', 'es']</code>, which means that <var>pattern</var> will get <code>'[sxz]$'</code>, <var>search</var> will get <code>'$'</code>, and <var>replace</var> will get <code>'es'</code>. That’s a lot of power in one little line of code.
|
||||
<li>Use a <code>try..finally</code> block to ensure the file object is closed.
|
||||
</ol>
|
||||
|
||||
@@ -89,9 +89,83 @@ mark{display:inline}
|
||||
</entry>
|
||||
</feed></code></pre>
|
||||
|
||||
<h2 id=xml-intro>A 5-Minute Crash Course in XML</h2>
|
||||
|
||||
<p>If you already know about XML, you can skip this section.
|
||||
|
||||
<p>XML is a generalized way of describing hierarchical structured data. An XML <i>document</i> contains one or more <i>elements</i>, which are delimited by <i>start and end tags</i>. This is a complete (albeit boring) XML document:
|
||||
|
||||
<pre class=nd><code><a><foo> <span>①</span></a>
|
||||
<a></foo> <span>②</span></a></code></pre>
|
||||
<ol>
|
||||
<li>This is the <i>start tag</i> of the <code>foo</code> element.
|
||||
<li>This is the matching <i>end tag</i> of the <code>foo</code> element. Like balancing parentheses in writing or mathematics or code, every start tag much be <i>closed</i> (matched) by a corresponding end tag.
|
||||
</ol>
|
||||
|
||||
<p>Elements can be <i>nested</i>. An element <code>bar</code> inside an element <code>foo</code> is said to be a <i>subelement</i> or <i>child</i> of <code>foo</code>.
|
||||
|
||||
<pre class=nd><code><foo>
|
||||
<mark><bar></bar></mark>
|
||||
</foo>
|
||||
</code></pre>
|
||||
|
||||
<p>Elements can have <i>attributes</i>, which are name-value pairs. Attributes are listed within the start tag of an element. <i>Attribute names</i> can not be repeated on the same element (although they can appear on different elements). <i>Attribute values</i> must be quoted.
|
||||
|
||||
<pre class=nd><code><a><foo <mark>lang="en"</mark>> <span>①</span></a>
|
||||
<a> <bar <mark>lang="fr"</mark>></bar> <span>②</span></a>
|
||||
</foo>
|
||||
</code></pre>
|
||||
<ol>
|
||||
<li>The <code>foo</code> element has one attribute, named <code>lang</code>. The value of its <code>lang</code> attribute is <code>en</code>.
|
||||
<li>The <code>bar</code> element has one attribute, named <code>lang</code>. The value of its <code>lang</code> attribute is <code>fr</code>. This doesn’t conflict with the <code>foo</code> element in any way. Each element has its own set of attributes.
|
||||
</ol>
|
||||
|
||||
<p>Elements can have <i>text content</i>.
|
||||
|
||||
<pre class=nd><code><foo lang="en">
|
||||
<bar lang="fr"><mark>PapayaWhip</mark></bar>
|
||||
</foo>
|
||||
</code></pre>
|
||||
|
||||
<p>Elements that contain no text and no children are <i>empty</i>.
|
||||
|
||||
<pre class=nd><code><foo></foo></code></pre>
|
||||
|
||||
<p>There is a shorthand for writing empty elements. By putting a <code>/</code> character in the start tag, you can skip the end tag altogther. The XML document in the previous example could be written like this instead:
|
||||
|
||||
<pre class=nd><code><foo<mark>/</mark>></code></pre>
|
||||
|
||||
<p>Like Python functions can be declared in different <i>modules</i>, XML elements can be declared in different <i>namespaces</i>. Namespaces usually look like URLs. You use an <code>xmlns</code> declaration to define a <i>default namespace</i>. A namespace declaration looks similar to an attribute, but it has a different purpose.
|
||||
|
||||
<pre class=nd><a><code><feed <mark>xmlns="http://www.w3.org/2005/Atom"</mark>> <span>①</span></a>
|
||||
<a> <title>dive into mark</title> <span>②</span></a>
|
||||
</feed>
|
||||
</code></pre>
|
||||
<ol>
|
||||
<li>The <code>feed</code> element is in the <code>http://www.w3.org/2005/Atom</code> namespace.
|
||||
<li>The <code>title</code> element is also in the <code>http://www.w3.org/2005/Atom</code> namespace. The namespace declaration affects the element where it’s declared, plus all child elements.
|
||||
</ol>
|
||||
|
||||
<p>You can also use an <code>xmlns:<var>prefix</var></code> declaration to define a namespace and associate it with a <i>prefix</i>. Then each element in that namespace must be explicitly declared with the prefix.
|
||||
|
||||
<pre class=nd><a><code><atom:feed <mark>xmlns:atom="http://www.w3.org/2005/Atom"</mark>> <span>①</span></a>
|
||||
<a> <atom:title>dive into mark</atom:title> <span>②</span></a>
|
||||
</atom:feed>
|
||||
</code></pre>
|
||||
<ol>
|
||||
<li>The <code>feed</code> element is in the <code>http://www.w3.org/2005/Atom</code> namespace.
|
||||
<li>The <code>title</code> element is also in the <code>http://www.w3.org/2005/Atom</code> namespace.
|
||||
</ol>
|
||||
|
||||
<p>As far as a namespace-aware XML parser is concerned, the previous two XML documents are <em>identical</em>. Namespace + element name = XML identity. Prefixes are irrelevant.
|
||||
|
||||
<h2 id=xml-structure>The Structure Of An Atom Feed</h2>
|
||||
|
||||
<p>FIXME
|
||||
<p>Think of a weblog, or in fact any website with frequently updated content, like <a href=http://www.cnn.com/>CNN.com</a>. The site itself has a title (“CNN.com”), a subtitle (“Breaking News, U.S., World, Weather, Entertainment <i class=baa>&</i> Video News”), a last-updated date (“updated 12:43 p.m. EDT, Sat May 16, 2009”), and a list of articles posted at different times. Each article also has a title, a first-published date (and maybe also a last-updated date, if they published a correction or fixed a typo), and a unique URL.
|
||||
|
||||
<p>The Atom syndication format is designed to capture all of this information in a standard format. My weblog and CNN.com are wildly different in design, scope, and audience, but they both have the same basic structure. CNN.com has a title; my blog has a title. CNN.com publishes articles; I publish articles.
|
||||
|
||||
<p>At the top level is the “root” element, which every Atom feed shares: the <code><feed></code> element in the Atom namespace (<code>http://www.w3.org/2005/Atom</code>). ... FIXME
|
||||
|
||||
<h2 id=xml-parse>Parsing XML</h2>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user