diff --git a/xml.html b/xml.html index ef6443c..ba8bad9 100644 --- a/xml.html +++ b/xml.html @@ -253,7 +253,7 @@ mark{display:inline} <Element {http://www.w3.org/2005/Atom}feed at cd1eb0>
xml.etree.ElementTree.
-parse() function, which can take a filename or a file-like object [FIXME xref]. This function parses the entire document at once. If memory is tight, there are ways to parse an XML document incrementally instead. [FIXME href]
+parse() function, which can take a filename or a file-like object [FIXME xref]. This function parses the entire document at once. If memory is tight, there are ways to parse an XML document incrementally instead.
parse() function returns an object which represents the entire document. This is not the root element. To get a reference to the root element, call the getroot() method.
feed element in the http://www.w3.org/2005/Atom namespace. The string representation of this object reinforces an important point: an XML element is a combination of its namespace and its tag name (also called the local name). Every element in this document is in the Atom namespace, so the root element is represented as {http://www.w3.org/2005/Atom}feed.
lxml is an open source third-party library that builds on the popular libxml2 parser [FIXME href]. It provides a 100% compatible ElementTree API, then extends it with full XPath support and a few other niceties. There are installers available for Windows and Mac OS X (FIXME really?); Linux users can probably use distribution-specific tools like yum or apt-get to install precompiled binaries from their repositories.
+
lxml is an open source third-party library that builds on the popular libxml2 parser. It provides a 100% compatible ElementTree API, then extends it with full XPath support and a few other niceties. There are installers available for Windows; Linux users should always try to use distribution-specific tools like yum or apt-get to install precompiled binaries from their repositories. Otherwise you’ll need to install lxml manually.
>>> from lxml import etree ①
@@ -457,7 +457,7 @@ StopIteration
except ImportError:
import xml.etree.ElementTree as etree
-But lxml is more than just a faster ElementTree. It also integrates support for arbitrary XPath expressions. I’m not going to go into depth about XPath syntax (it can get quite complicated). [FIXME href] is a good beginner’s guide to XPath. +
But lxml is more than just a faster ElementTree. It also integrates support for arbitrary XPath expressions. I’m not going to go into depth about XPath syntax. (That could be a whole book unto itself!) But I will show you how it integrates into lxml.
>>> import lxml.etree ① @@ -549,7 +549,7 @@ except ImportError:Customizing Your XML Parser
-The XML specification mandates that all conforming XML parsers employ “draconian error handling.” That is, they must halt and catch fire as soon as they detect any sort of wellformedness error in the XML document. Wellformedness errors include mismatched start and end tags, undefined entities, illegal Unicode characters, and a number of other esoteric rules. This is in stark contrast to other common formats like HTML — your browser doesn’t stop rendering a web page if you forget to close an HTML tag or escape an ampersand in an attribute value. (It is a common misconception that HTML has no defined error handling. HTML error handling is actually quite well-defined [FIXME href], but it’s significantly more complicated than “halt and catch fire on first error.”) +
The XML specification mandates that all conforming XML parsers employ “draconian error handling.” That is, they must halt and catch fire as soon as they detect any sort of wellformedness error in the XML document. Wellformedness errors include mismatched start and end tags, undefined entities, illegal Unicode characters, and a number of other esoteric rules. This is in stark contrast to other common formats like HTML — your browser doesn’t stop rendering a web page if you forget to close an HTML tag or escape an ampersand in an attribute value. (It is a common misconception that HTML has no defined error handling. HTML error handling is actually quite well-defined, but it’s significantly more complicated than “halt and catch fire on first error.”)
Some people (myself included) believe that it was a mistake for the inventors of XML to mandate draconian error handling. Don’t get me wrong; I can certainly see the allure of simplifying the error handling rules. But in practice, the concept of “wellformedness” is trickier than it sounds, especially for
XMLdocuments (like Atom feeds) that are published on the web and served over HTTP. Despite the maturity of XML, which standardized on draconian error handling in 1997, surveys continually show a significant fraction of Atom feeds on the web are plagued with wellformedness errors. @@ -599,7 +599,7 @@ lxml.etree.XMLSyntaxError: Entity 'hellip' not defined, line 3, column 28 . [rest of serialization snipped for brevity] .
lxml.etree.XMLParser class. It can take a number of different named arguments [FIXME href]. The one we’re interested in here is the recover argument. When set to True, the XML parser will try its best to “recover” from wellformedness errors.
+lxml.etree.XMLParser class. It can take a number of different named arguments. The one we’re interested in here is the recover argument. When set to True, the XML parser will try its best to “recover” from wellformedness errors.
XML document with your custom parser, pass the parser object as the second argument to the parse() function. Note that lxml does not raise an exception about the undefined … entity.
… entity, the parser just silently dropped it. The text content of the title element becomes "dive into ".