diff --git a/about.html b/about.html index 79ec50b..3cc6b67 100644 --- a/about.html +++ b/about.html @@ -12,7 +12,7 @@ h1:before{content:""}
You are here: Home ‣ Dive Into Python 3 ‣
The text of Dive Into Python 3 is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. -
The chardet library referenced in Case study: porting chardet to Python 3 is licensed under the LGPL 2.1 or later. The alphametics solver referenced in Advanced Iterators is based on Raymond Hettinger's solver for Python 2, which he has graciously relicensed under the MIT license so I could port it to Python 3. Advanced Classes and Special Method Names contain snippets of code from the Python standard library which are released under the Python Software Foundation License version 2. All other example code is my original work and is licensed under the MIT license. Full licensing terms are included in each source code file.
+
The chardet library referenced in Case study: porting chardet to Python 3 is licensed under the LGPL 2.1 or later. The alphametics solver referenced in Advanced Iterators is based on Raymond Hettinger's solver for Python 2, which he has graciously relicensed under the MIT license so I could port it to Python 3. Advanced Classes and Special Method Names contain snippets of code from the Python standard library which are released under the Python Software Foundation License version 2. All other example code is my original work and is licensed under the MIT license. Full licensing terms are included in each source code file.
The dynamic highlighting effects in the online edition are built on top of jQuery, which is dual-licensed under the MIT and GPL licenses.
The online edition loads as quickly as it does because
Send corrections and feedback to mark@diveintomark.org. +
Send corrections and feedback to mark@diveintomark.org.
© 2001–9 Mark Pilgrim diff --git a/advanced-iterators.html b/advanced-iterators.html index 62c2886..dca9dab 100644 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -184,7 +184,7 @@ gen = ord_map(unique_characters)
First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you’re doing. Here I’m talking about combinatorics, but if that doesn’t mean anything to you, don’t worry about it. As always, Wikipedia is your friend.) +
First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you’re doing. Here I’m talking about combinatorics, but if that doesn’t mean anything to you, don’t worry about it. As always, Wikipedia is your friend.)
The idea is that you take a list of things (could be numbers, could be letters, could be dancing bears) and find all the possible ways to split them up into smaller lists. All the smaller lists have the same size, which can be as small as 1 and as large as the total number of items. Oh, and nothing can be repeated. Mathematicians say things like “let’s find the permutations of 3 different items taken 2 at a time,” which means you have a sequence of 3 items and you want to find all the possible ordered pairs. @@ -559,11 +559,11 @@ NameError: name '__import__' is not defined
You can use numbers in a boolean context, such as an if statement. Zero values are false, and non-zero values are true.
+
You can use numbers in a boolean context, such as an if statement. Zero values are false, and non-zero values are true.
>>> def is_it_true(anything): ① ... if anything: @@ -452,10 +452,10 @@ KeyError: 'db.diveintopython3.org'yes, it's true
fractions module
-math module
-fractions module
+math module
+© 2001–9 Mark Pilgrim diff --git a/regular-expressions.html b/regular-expressions.html index 3e207b8..f6ab98e 100644 --- a/regular-expressions.html +++ b/regular-expressions.html @@ -295,7 +295,7 @@ body{counter-reset:h1 4} >>> phonePattern.search('800-555-1212-1234') ③ >>>
(\d{3}). What’s \d{3}? Well, the {3} means “match exactly three numeric digits”; it’s a variation on the {n,m} syntax you saw earlier. \d means “any numeric digit” (0 through 9). Putting it in parentheses means “match exactly three numeric digits, and then remember them as a group that I can ask for later”. Then match a literal hyphen. Then match another group of exactly three digits. Then another literal hyphen. Then another group of exactly four digits. Then match the end of the string.
+(\d{3}). What’s \d{3}? Well, the {3} means “match exactly three numeric digits”; it’s a variation on the {n,m} syntax you saw earlier. \d means “any numeric digit” (0 through 9). Putting it in parentheses means “match exactly three numeric digits, and then remember them as a group that I can ask for later”. Then match a literal hyphen. Then match another group of exactly three digits. Then another literal hyphen. Then another group of exactly four digits. Then match the end of the string.
groups() method on the object that the search() method returns. It will return a tuple of however many groups were defined in the regular expression. In this case, you defined three groups, one with three digits, one with three digits, and one with four digits.
my_instance.__call__()
-The zipfile module uses this to define a class that can decrypt an encrypted zip file with a given password. The zip decryption algorithm requires you to store state during decryption. Defining the decryptor as a class allows you to maintain this state within a single instance of the decryptor class. The state is initialized in the __init__() method and updated as the file is decrypted. But since the class is also “callable” like a function, you can pass the instance as the first argument of the map() function, like so:
+
The zipfile module uses this to define a class that can decrypt an encrypted zip file with a given password. The zip decryption algorithm requires you to store state during decryption. Defining the decryptor as a class allows you to maintain this state within a single instance of the decryptor class. The state is initialized in the __init__() method and updated as the file is decrypted. But since the class is also “callable” like a function, you can pass the instance as the first argument of the map() function, like so:
# excerpt from zipfile.py
diff --git a/strings.html b/strings.html
index 30adc9c..2ea3aa3 100644
--- a/strings.html
+++ b/strings.html
@@ -19,7 +19,7 @@ My alphabet starts where your alphabet ends! ❞
— Dr
Some Boring Stuff You Need To Understand Before You Can Dive In
-Did you know that the people of Bougainville have the smallest alphabet in the world? Their Rotokas alphabet is composed of only 12 letters: A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters — 52 if you count uppercase and lowercase separately — plus a handful of !@#$%& punctuation marks.
+
Did you know that the people of Bougainville have the smallest alphabet in the world? Their Rotokas alphabet is composed of only 12 letters: A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters — 52 if you count uppercase and lowercase separately — plus a handful of !@#$%& punctuation marks.
When people talk about “text,” they’re thinking of “characters and symbols on the computer screen.” But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular character encoding. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages.
@@ -209,7 +209,7 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
>>> "{0:.1f} {1}".format(698.25, 'GB')
'698.3 GB'
-For all the gory details on format specifiers, consult the Format Specification Mini-Language in the official Python documentation. +
For all the gory details on format specifiers, consult the Format Specification Mini-Language in the official Python documentation.
Python 3 assumes that your source code — i.e. each .py file — is encoded in UTF-8.
-☞In Python 2, the default encoding for
.pyfiles was ASCII. In Python 3, the default encoding is UTF-8. +☞In Python 2, the default encoding for
.pyfiles was ASCII. In Python 3, the default encoding is UTF-8.
If you would like to use a different encoding within your Python code, you can put an encoding declaration on the first line of each file. This declaration defines a .py file to be windows-1252:
@@ -385,40 +385,40 @@ FIXME: move this to the intro of the upcoming files chapter?
#!/usr/bin/python3
# -*- coding: windows-1252 -*-
-For more information, consult PEP 263: Defining Python Source Code Encodings. +
For more information, consult PEP 263: Defining Python Source Code Encodings.
On Unicode in Python:
On Unicode in general:
On character encoding in other formats:
On strings and string formatting:
string — Common string operations
-string — Common string operations
+
In this chapter, you’re going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now step back and consider what it would take to expand that into a two-way utility. -
The rules for Roman numerals lead to a number of interesting observations: +
In this chapter, you’re going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now step back and consider what it would take to expand that into a two-way utility. +
The rules for Roman numerals lead to a number of interesting observations:
unittest.TestCase class provides the assertRaises method, which takes the following arguments: the exception you’re expecting, the function you’re testing, and the arguments you’re passing to that function. (If the function you’re testing takes more than one argument, pass them all to assertRaises, in order, and it will pass them right along to the function you’re testing.)
Pay close attention to this last line of code. Instead of calling to_roman() directly and manually checking that it raises a particular exception (by wrapping it in a try...except block [FIXME xref]), the assertRaises method has encapsulated all of that for us. All you do is tell it what exception you’re expecting (roman2.OutOfRangeError), the function (to_roman()), and the function’s arguments (4000). The assertRaises method takes care of calling to_roman() and checking that it raises roman2.OutOfRangeError.
-
Also note that you’re passing the to_roman() function itself as an argument; you’re not calling it, and you’re not passing the name of it as a string. Have I mentioned recently how handy it is that everything in Python is an object?
+
Also note that you’re passing the to_roman() function itself as an argument; you’re not calling it, and you’re not passing the name of it as a string. Have I mentioned recently how handy it is that everything in Python is an object?
So what happens when you run the test suite with this new test?
you@localhost:~$ python3 romantest2.py -v
diff --git a/whats-new.html b/whats-new.html
index 897b7fc..ad683c4 100644
--- a/whats-new.html
+++ b/whats-new.html
@@ -34,11 +34,12 @@ h3:before{content:""}
Iterators are everywhere in Python 3, and I understand them a lot better than I did five years ago when I wrote “Dive Into Python”. You need to understand them too, because lots of functions that used to return lists in Python 2 will now return iterators in Python 3. At a minimum, you should read the second half of the Iterators chapter and the second half of the Advanced Iterators chapter.
-
By popular request, I’ve added an appendix on Special Method Names, which is kind of like the Python docs “Data Model” chapter but with more snark.
+
By popular request, I’ve added an appendix on Special Method Names, which is kind of like the Python docs “Data Model” chapter but with more snark.
-
That’s it for now; the book’s not finished yet! The file I/O subsystem is totally different now; I hope to write about that soon. There are much better choices for XML processing now; I hope to write about that, too.
+
When I was writing “Dive Into Python”, all of the available XML libraries sucked. Then Fredrik Lundh wrote ElementTree, which doesn’t suck at all. Then the Python gods wisely incorporated ElementTree into the standard library, and now it forms the basis for my new XML chapter. The old ways of parsing XML are still around, but you should avoid them, because they suck!
+
+
That’s it for now; the book’s not finished yet! The file I/O subsystem is totally different now; I hope to write about that soon.
-
© 2001–9 Mark Pilgrim
diff --git a/xml.html b/xml.html
index a8e94ea..03b5547 100644
--- a/xml.html
+++ b/xml.html
@@ -14,7 +14,7 @@ mark{display:inline}
Difficulty level: ♦♦♦♢♢
XML
-❝ FIXME ❞
— FIXME
+
❝ In the archonship of Aristaechmus, Draco enacted his ordinances. ❞
— Aristotle
Diving In
@@ -270,39 +270,67 @@ mark{display:inline}
# continued from the previous example
->>> root.tag
+>>> root.tag ①
'{http://www.w3.org/2005/Atom}feed'
->>> len(root)
-9
->>> for child in root:
-... print(child)
+>>> len(root) ②
+8
+>>> for child in root: ③
+... print(child) ④
...
<Element {http://www.w3.org/2005/Atom}title at e2b5d0>
<Element {http://www.w3.org/2005/Atom}subtitle at e2b4e0>
<Element {http://www.w3.org/2005/Atom}id at e2b6c0>
<Element {http://www.w3.org/2005/Atom}updated at e2b6f0>
-<Element {http://www.w3.org/2005/Atom}link at e181b0>
<Element {http://www.w3.org/2005/Atom}link at e2b4b0>
<Element {http://www.w3.org/2005/Atom}entry at e2b720>
<Element {http://www.w3.org/2005/Atom}entry at e2b510>
<Element {http://www.w3.org/2005/Atom}entry at e2b750>
+
+- Continuing from the previous example, the root element is
{http://www.w3.org/2005/Atom}feed.
+ - The “length” of the root element is the number of child elements.
+
- You can use the element itself as an iterator to loop through all of its child elements.
+
- As you can see from the output, there are indeed 8 child elements: all of the feed-level metadata (
title, subtitle, id, updated, and link) followed by the three entry elements.
+
+
+You may have guessed this already, but I want to point it out explicitly: the list of child elements only includes direct children. Each of the entry elements contain their own children, but those are not included in the list. They would be included in the list of each entry’s children, but they are not included in the list of the feed’s children. There are ways to find elements no matter how deeply nested they are; we’ll look at two such ways later in this chapter.
Attributes Are Dictonaries
-FIXME
+
XML isn’t just a collection of elements; each element can also have its own set of attributes. Once you have a reference to a specific element, you can easily get its attributes as a Python dictionary.
+
To refresh your memory, here is the first few lines of feed.xml, the XML document we’re working with.
+
+
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
+ <title>dive into mark</title>
+ <subtitle>currently between addictions</subtitle>
+ <id>tag:diveintomark.org,2001-07-29:/</id>
+ <updated>2009-03-27T21:56:07Z</updated>
+ <link rel="alternate" type="text/html" href="http://diveintomark.org/"/>
+ <link rel="self" type="application/atom+xml" href="http://diveintomark.org/feed/"/>
+...
->>> root.attrib
-{'{http://www.w3.org/XML/1998/namespace}lang': 'en'}
->>> root[4]
-<Element {http://www.w3.org/2005/Atom}link at e181b0>
->>> root[4].attrib
-{'href': 'http://diveintomark.org/', 'type': 'text/html', 'rel': 'alternate'}
->>> root[3]
-<Element {http://www.w3.org/2005/Atom}updated at e2b4e0>
->>> root[3].attrib
-{}
-
+<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
+FIXME
+# continuing from the previous example
+>>> root.attrib ①
+{'{http://www.w3.org/XML/1998/namespace}lang': 'en'}
+>>> root[4] ②
+<Element {http://www.w3.org/2005/Atom}link at e181b0>
+>>> root[4].attrib ③
+{'href': 'http://diveintomark.org/',
+ 'type': 'text/html',
+ 'rel': 'alternate'}
+>>> root[3] ④
+<Element {http://www.w3.org/2005/Atom}updated at e2b4e0>
+>>> root[3].attrib ⑤
+{}
+attrib property is a dictionary of the element’s attributes. The original markup here was <feed xmlns
+-
+
-
+
-
+
-
+