diff --git a/advanced-classes.html b/advanced-classes.html index 215c345..9de1b28 100644 --- a/advanced-classes.html +++ b/advanced-classes.html @@ -22,6 +22,8 @@ body{counter-reset:h1 11}

Ordered Dictionary: Not An Oxymoron

+

[FIXME here's why ordered dicts are useful: http://www.gossamer-threads.com/lists/python/dev/656556 ] +

[download ordereddict.py]

import collections
 import itertools
diff --git a/dip3.js b/dip3.js
index 5ef7c13..a1f49b0 100644
--- a/dip3.js
+++ b/dip3.js
@@ -1,3 +1,31 @@
+/*
+
+"Dive Into Python 3" scripts
+
+Copyright (c) 2009, Mark Pilgrim, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice,
+  this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+*/
+
 var HS = {'visible': 'hide', 'hidden': 'show'};
 //google.load("jquery", "1.3");
 //google.setOnLoadCallback(function() {
@@ -12,10 +40,14 @@ $(document).ready(function() {
 		}
 	    });
 	$("pre.code:not(.nd), pre.screen:not(.nd)").each(function(i) {
+		/* give each code block a unique ID */
 		this.id = "autopre" + i;
+
+		/* wrap code block in a div and insert widget block */
 		$(this).wrapInner('
'); $(this).prepend(''); + /* move download link into widget block */ $(this).prev("p.d").each(function(i) { $(this).next("pre").find("div.w").append(" " + $(this).html()); this.parentNode.removeChild(this); @@ -37,7 +69,7 @@ $(document).ready(function() { $(this).css({'position':'static','width':'auto','height':'auto'}); }); - // synchronized highlighting on callouts and their associated lines within code & screen blocks + /* synchronized highlighting on callouts and their associated lines within code & screen blocks */ var hip = {'background-color':'#eee','cursor':'default'}; var unhip = {'background-color':'inherit','cursor':'inherit'}; $("pre.code, pre.screen").each(function() { @@ -49,7 +81,7 @@ $(document).ready(function() { }); }); - // synchronized highlighting on callouts and their associated table rows + /* synchronized highlighting on callouts and their associated table rows */ $("table").each(function() { $(this).find("tr:gt(0)").each(function(i) { var tr = $(this); diff --git a/htmlminimizer.py b/htmlminimizer.py index 5b18919..3e12ea2 100644 --- a/htmlminimizer.py +++ b/htmlminimizer.py @@ -9,7 +9,7 @@ out = open(output_file, 'w', encoding="utf-8") # encoding argument! important! for line in open(input_file, encoding="utf-8").readlines(): # replace entities with Unicode characters for e in re.findall('&(.+?);', line): - if e in ('lt', 'gt', 'amp', 'quot', 'apos', 'nbsp'): + if e in ('lt', 'amp', 'quot', 'apos', 'nbsp'): continue n = html.entities.name2codepoint.get(e) if not n: diff --git a/xml.html b/xml.html index cdd05d1..3c222ca 100644 --- a/xml.html +++ b/xml.html @@ -242,7 +242,7 @@ mark{display:inline}

Parsing XML

-

Python can parse XML documents in several ways. It has traditional DOM and SAX parsers, but I will focus on a different library called Etree. +

Python can parse XML documents in several ways. It has traditional DOM and SAX parsers, but I will focus on a different library called ElementTree.

[download feed.xml]

@@ -252,14 +252,14 @@ mark{display:inline}
 >>> root                                     
 <Element {http://www.w3.org/2005/Atom}feed at cd1eb0>
    -
  1. The Etree library is part of the Python standard library, in xml.etree.ElementTree. -
  2. The primary entry point for the Etree library is the parse() function, which can take a filename or a file-like object [FIXME xref]. This function parses the entire document at once. If memory is tight, there are ways to parse an XML document incrementally instead. +
  3. The ElementTree library is part of the Python standard library, in xml.etree.ElementTree. +
  4. The primary entry point for the ElementTree library is the parse() function, which can take a filename or a file-like object [FIXME xref]. This function parses the entire document at once. If memory is tight, there are ways to parse an XML document incrementally instead.
  5. The parse() function returns an object which represents the entire document. This is not the root element. To get a reference to the root element, call the getroot() method.
  6. As expected, the root element is the feed element in the http://www.w3.org/2005/Atom namespace. The string representation of this object reinforces an important point: an XML element is a combination of its namespace and its tag name (also called the local name). Every element in this document is in the Atom namespace, so the root element is represented as {http://www.w3.org/2005/Atom}feed.
-

Etree represents XML elements as {namespace}localname. You’ll see and use this format in multiple places in the Etree library. +

ElementTree represents XML elements as {namespace}localname. You’ll see and use this format in multiple places in the ElementTree API.

Elements Are Lists

@@ -411,7 +411,7 @@ mark{display:inline}

Going Further With lxml

-

FIXME +

lxml FIXME

 >>> from lxml import etree
@@ -467,40 +467,72 @@ StopIteration

Generating XML

-

FIXME +

Python’s support for XML is not limited to parsing existing documents. You can also create XML documents from scratch.

 >>> import xml.etree.ElementTree as etree
->>> new_feed = etree.Element("{http://www.w3.org/2005/Atom}feed",
-...     attrib={"{http://www.w3.org/XML/1998/namespace}lang": "en"})
->>> print(etree.tostring(new_feed))
+>>> new_feed = etree.Element("{http://www.w3.org/2005/Atom}feed",     
+...     attrib={"{http://www.w3.org/XML/1998/namespace}lang": "en"})  
+>>> print(etree.tostring(new_feed))                                   
 <ns0:feed xmlns:ns0="http://www.w3.org/2005/Atom" xml:lang="en"/>
+
    +
  1. To create a new element, instantiate the Element class. You pass the element name (namespace + local name) as the first argument. This statement creates a feed element in the Atom namespace. This will be our new document’s root element. +
  2. To add attributes to the newly created element, pass a dictionary of attribute names and values in the attrib argument. Note that the attribute name should be in the standard ElementTree format, {namespace}localname. +
  3. At any time, you can serialize any element (and its children) with the ElementTree tostring() function. +
-

FIXME +

Was that serialization surprising to you? The way ElementTree serializes namespaced XML elements is technically accurate but not optimal. The sample XML document at the beginning of this chapter defined a default namespace (xmlns="http://www.w3.org/2005/Atom"). Defining a default namespace is useful for documents — like Atom feeds — where every element is in the same namespace, because you can declare the namespace once and declare each element with just its local name (<feed>, <link>, <entry>). There is no need to use any prefixes unless you want to declare elements from another namespace. + +

An XML parser won’t “see” any difference between an XML document with a default namespace and an XML document with a prefixed namespace. The resulting DOM of this serialization: + +

<ns0:feed xmlns:ns0="http://www.w3.org/2005/Atom" xml:lang="en"/>
+ +

is identical to the DOM of this serialization: + +

<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"/>
+ +

The only practical difference is that the second serialization is several characters shorter. If we were to recast our entire sample feed with a ns0: prefix in every start and end tag, it would add 4 characters per start tag × 79 tags + 4 characters for the namespace declaration itself, for a total of 316 characters. Assuming UTF-8 encoding, that’s 316 extra bytes. (After gzipping, the difference drops to 21 bytes, but still, 21 bytes is 21 bytes.) Maybe that doesn’t matter to you, but for something like an Atom feed, which may be downloaded several thousand times whenever it changes, saving a few bytes per request can quickly add up. + +

The built-in ElementTree library does not offer this fine-grained control over serializing namespaced elements, but lxml does.

 >>> import lxml.etree
->>> NSMAP = {"atom": "http://www.w3.org/2005/Atom"}
->>> new_feed = lxml.etree.Element("feed", nsmap=NSMAP)
->>> print(lxml.etree.tounicode(new_feed))
+>>> NSMAP = {None: "http://www.w3.org/2005/Atom"}                     
+>>> new_feed = lxml.etree.Element("feed", nsmap=NSMAP)                
+>>> print(lxml.etree.tounicode(new_feed))                             
 <feed xmlns="http://www.w3.org/2005/Atom"/>
->>> new_feed.set("{http://www.w3.org/XML/1998/namespace}lang", "en")
+>>> new_feed.set("{http://www.w3.org/XML/1998/namespace}lang", "en")  
 >>> print(lxml.etree.tounicode(new_feed))
 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"/>
+
    +
  1. To start, define a namespace mapping as a dictionary. Dictionary values are namespaces; dictionary keys are the desired prefix. Using None as a prefix effectively declares a default namespace. +
  2. Now you can pass the lxml-specific nsmap argument when you create an element, and lxml will respect the namespace prefixes you’ve defined. +
  3. As expected, this serialization defines the Atom namespace as the default namespace and declares the feed element without a namespace prefix. +
  4. Oops, we forgot to add the xml:lang attribute. You can always add attributes to any element with the set() method. It takes two arguments: the attribute name in standard ElementTree format, then the attribute value. (This method is not lxml-specific. The only lxml-specific part of this example was the nsmap argument to control the namespace prefixes in the serialized output.) +
-

FIXME +

Are XML documents limited to one element per document? No, of course not. You can easily create child elements, too.

->>> title = lxml.etree.SubElement(new_feed, "title", attrib={"type":"html"})
+>>> title = lxml.etree.SubElement(new_feed, "title",          
+...     attrib={"type":"html"})                               
 >>> print(lxml.etree.tounicode(new_feed))
 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html"/></feed>
->>> title.text = "dive into mark"
->>> print(lxml.etree.tounicode(new_feed))
-<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html">dive into mark</title></feed>
->>> print(lxml.etree.tounicode(new_feed, pretty_print=True))
+>>> title.text = "dive into &hellip;"                         
+>>> print(lxml.etree.tounicode(new_feed))                     
+<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html">dive into &amp;hellip;</title></feed>
+>>> print(lxml.etree.tounicode(new_feed, pretty_print=True))  
 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
-<title type="html">dive into mark</title>
+<title type="html">dive into&amp;hellip;</title>
 </feed>
+
    +
  1. To create a child element of an existing element, instantiate the SubElement class. The only required arguments are the parent element (new_feed in this case) and the new element’s name. Since this child element will inherit the namespace mapping of its parent, there is no need to redeclare the namespace or prefix here. +
  2. You can also pass in an attribute dictionary. Keys are attribute names; values are attribute values. +
  3. As expected, the new title element was created in the Atom namespace, and it was inserted as a child of the feed element. Since the title element has no text content and no children of its own, lxml serializes it as an empty element (with the /> shortcut). +
  4. To set the text content of an element, simply set its .text property. +
  5. Now the title element is serialized with its text content. Any text content that contains less-than signs or ampersands needs to be escaped when serialized. lxml handles this escaping automatically. +
  6. You can also apply “pretty printing” to the serialization, which inserts line breaks after end tags, and after start tags of elements that contain child elements but no text content. In technical terms, lxml adds “insignificant whitespace” to make the output more readable. +

Further Reading

@@ -510,6 +542,7 @@ StopIteration
  • Elements and Element Trees
  • XPath Support in ElementTree
  • The ElementTree iterparse Function +
  • lxml
  • Parsing XML and HTML with lxml
  • XPath and XSLT with lxml