syntax highlighting for everyone!

This commit is contained in:
Mark Pilgrim
2009-06-08 12:44:13 -04:00
parent 672132a1d3
commit ae146df0d9
27 changed files with 2621 additions and 1151 deletions
+1 -1
View File
@@ -11,7 +11,7 @@ h1:before{content:''}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8><input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html>Dive Into Python 3</a> <span class=u>&#8227;</span>
<h1>About The Book</h1>
<p>The text of <cite>Dive Into Python 3</cite> is licensed under the <a href=http://creativecommons.org/licenses/by-sa/3.0/ rel=license>Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.
<p>The <code>chardet</code> library referenced in <a href=case-study-porting-chardet-to-python-3.html>Case study: porting <code>chardet</code> to Python 3</a> is licensed under the LGPL 2.1 or later. The alphametics solver referenced in <a href=advanced-iterators.html>Advanced Iterators</a> is based on <a href=http://code.activestate.com/recipes/576615/>Raymond Hettinger's solver for Python 2</a>, which he has graciously relicensed under the MIT license so I could port it to Python 3. <a href=advanced-classes.html>Advanced Classes</a> and <a href=special-method-names.html>Special Method Names</a> contain snippets of code from the Python standard library which are released under the Python Software Foundation License version 2. All other example code is my original work and is licensed under the MIT license. Full licensing terms are included in each source code file.
+6 -5
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 11}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#advanced-classes>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#advanced-classes>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
<h1>Advanced Classes</h1>
<blockquote class=q>
<p><span>&#x275D;</span> FIXME <span>&#x275E;</span><br>&mdash; FIXME
<p><span class=u>&#x275D;</span> FIXME <span class=u>&#x275E;</span><br>&mdash; FIXME
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -27,7 +27,7 @@ body{counter-reset:h1 11}
<p>[FIXME here's why ordered dicts are useful: http://www.gossamer-threads.com/lists/python/dev/656556 ]
<p class=d>[<a href=examples/ordereddict.py>download <code>ordereddict.py</code></a>]
<pre><code>import collections
<pre><code class=pp>import collections
import itertools
class OrderedDict(dict, collections.MutableMapping):
@@ -107,7 +107,7 @@ class OrderedDict(dict, collections.MutableMapping):
<pre class=screen>
<samp class=p>>>> </samp><kbd>import ordereddict</kbd>
<samp class=p>>>> </samp><kbd>od = ordereddict.OrderedDict()</kbd>
<a><samp class=p>>>> </samp><kbd>klass = od.__class__</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>klass = od.__class__</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>type(klass)</kbd>
<samp>&lt;class 'abc.ABCMeta'></samp>
<samp class=p>>>> </samp><kbd>klass.__name__</kbd>
@@ -163,7 +163,8 @@ class OrderedDict(dict, collections.MutableMapping):
<h2 id=implementing-fractions>Implementing Fractions</h2>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+66 -65
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 7}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#advanced-iterators>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#advanced-iterators>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
<h1>Advanced Iterators</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Great fleas have little fleas upon their backs to bite &#8217;em,<br>And little fleas have lesser fleas, and so ad infinitum. <span>&#x275E;</span><br>&mdash; Augustus De Morgan
<p><span class=u>&#x275D;</span> Great fleas have little fleas upon their backs to bite &#8217;em,<br>And little fleas have lesser fleas, and so ad infinitum. <span class=u>&#x275E;</span><br>&mdash; Augustus De Morgan
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -44,7 +44,7 @@ E = 4</code></pre>
<p>In this chapter, we&#8217;ll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles <em>in just 14 lines of code</em>.
<p class=d>[<a href=examples/alphametics.py>download <code>alphametics.py</code></a>]
<pre><code>import re
<pre><code class=pp>import re
import itertools
def solve(puzzle):
@@ -91,9 +91,9 @@ if __name__ == '__main__':
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>re.findall('[0-9]+', '16 2-by-4s in rows of 8')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.findall('[0-9]+', '16 2-by-4s in rows of 8')</kbd> <span class=u>&#x2460;</span></a>
<samp>['16', '2', '4', '8']</samp>
<a><samp class=p>>>> </samp><kbd>re.findall('[A-Z]+', 'SEND + MORE == MONEY')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.findall('[A-Z]+', 'SEND + MORE == MONEY')</kbd> <span class=u>&#x2461;</span></a>
<samp>['SEND', 'MORE', 'MONEY']</samp></pre>
<ol>
<li>The <code>re</code> module is Python&#8217;s implementation of <a href=regular-expressions.html>regular expressions</a>. It has a nifty function called <code>findall()</code> which takes a regular expression pattern and a string, and finds all occurrences of the pattern within the string. In this case, the pattern matches sequences of numbers. The <code>findall()</code> function returns a list of all the substrings that matched the pattern.
@@ -108,15 +108,15 @@ if __name__ == '__main__':
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_list = ['a', 'c', 'b', 'a', 'd', 'b']</kbd>
<a><samp class=p>>>> </samp><kbd>{c for c in a_list}</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>{c for c in a_list}</kbd> <span class=u>&#x2460;</span></a>
<samp>{'a', 'c', 'b', 'd'}</samp>
<samp class=p>>>> </samp><kbd>a_string = 'EAST IS EAST'</kbd>
<a><samp class=p>>>> </samp><kbd>{c for c in a_string}</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>{c for c in a_string}</kbd> <span class=u>&#x2461;</span></a>
<samp>{'A', ' ', 'E', 'I', 'S', 'T'}</samp>
<samp class=p>>>> </samp><kbd>words = ['SEND', 'MORE', 'MONEY']</kbd>
<a><samp class=p>>>> </samp><kbd>''.join(words)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>''.join(words)</kbd> <span class=u>&#x2462;</span></a>
<samp>'SENDMOREMONEY'</samp>
<a><samp class=p>>>> </samp><kbd>{c for c in ''.join(words)}</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>{c for c in ''.join(words)}</kbd> <span class=u>&#x2463;</span></a>
<samp>{'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}</samp></pre>
<ol>
<li>Given a list of several strings, a set comprehension with the identity function will return a set of unique strings from the list. This makes sense if you think of it like a <code>for</code> loop. Take the first item from the list, put it in the set. Second. Third. Fourth &mdash; wait, that&#8217;s in the set already, so it only gets listed once. Fifth. Sixth &mdash; again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn&#8217;t even need to be sorted first.
@@ -127,7 +127,7 @@ if __name__ == '__main__':
<p>The alphametics solver uses this technique to get a list of all the unique characters in the puzzle.
<pre><code>unique_characters = {c for c in ''.join(words)}</code></pre>
<pre><code class=pp>unique_characters = {c for c in ''.join(words)}</code></pre>
<p>This list is later used to assign digits to characters as the solver iterates through the possible solutions.
@@ -138,8 +138,8 @@ if __name__ == '__main__':
<p>Like many programming languages, Python has an <code>assert</code> statement. Here&#8217;s how it works.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>assert 1 + 1 == 2</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>assert 1 + 1 == 3</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>assert 1 + 1 == 2</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>assert 1 + 1 == 3</kbd> <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in <module>
AssertionError</samp></pre>
@@ -150,11 +150,11 @@ AssertionError</samp></pre>
<p>Therefore, this line of code:
<pre><code>assert len(unique_characters) <= 10</code></pre>
<pre><code class=pp>assert len(unique_characters) <= 10</code></pre>
<p>&hellip;is equivalent to&hellip;
<pre><code>if len(unique_characters) > 10:
<pre><code class=pp>if len(unique_characters) > 10:
raise AssertionError</code></pre>
<p>But a bit easier to read and write.
@@ -169,14 +169,14 @@ AssertionError</samp></pre>
<pre class=screen>
<samp>>>> </samp><kbd>unique_characters = {'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}</kbd>
<a><samp>>>> </samp><kbd>gen = (ord(c) for c in unique_characters)</kbd> <span>&#x2460;</span></a>
<a><samp>>>> </samp><kbd>gen</kbd> <span>&#x2461;</span></a>
<a><samp>>>> </samp><kbd>gen = (ord(c) for c in unique_characters)</kbd> <span class=u>&#x2460;</span></a>
<a><samp>>>> </samp><kbd>gen</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;generator object &lt;genexpr> at 0x00BADC10></samp>
<a><samp>>>> </samp><kbd>next(gen)</kbd> <span>&#x2462;</span></a>
<a><samp>>>> </samp><kbd>next(gen)</kbd> <span class=u>&#x2462;</span></a>
<samp>69</samp>
<samp>>>> </samp><kbd>next(gen)</kbd>
<samp>68</samp>
<a><samp>>>> </samp><kbd>tuple(ord(c) for c in unique_characters)</kbd> <span>&#x2463;</span></a>
<a><samp>>>> </samp><kbd>tuple(ord(c) for c in unique_characters)</kbd> <span class=u>&#x2463;</span></a>
<samp>(69, 68, 77, 79, 78, 83, 82, 89)</samp></pre>
<ol>
<li>A generator expression is like an anonymous function that yields values. The expression itself looks like a list comprehension [FIXME have we introduced this yet?], but it&#8217;s wrapped in parentheses instead of square brackets.
@@ -187,7 +187,7 @@ AssertionError</samp></pre>
<p>Here&#8217;s another way to accomplish the same thing, using a <a href=generators.html>generator function</a>:
<pre><code>def ord_map(a_string):
<pre><code class=pp>def ord_map(a_string):
for c in a_string:
yield ord(c)
@@ -202,21 +202,21 @@ gen = ord_map(unique_characters)</code></pre>
<p>The idea is that you take a list of things (could be numbers, could be letters, could be dancing bears) and find all the possible ways to split them up into smaller lists. All the smaller lists have the same size, which can be as small as 1 and as large as the total number of items. Oh, and nothing can be repeated. Mathematicians say things like &#8220;let&#8217;s find the permutations of 3 different items taken 2 at a time,&#8221; which means you have a sequence of 3 items and you want to find all the possible ordered pairs.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>import itertools</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>perms = itertools.permutations([1, 2, 3], 2)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>next(perms)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>import itertools</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>perms = itertools.permutations([1, 2, 3], 2)</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>next(perms)</kbd> <span class=u>&#x2462;</span></a>
<samp>(1, 2)</samp>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<samp>(1, 3)</samp>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<a><samp>(2, 1)</samp> <span>&#x2463;</span></a>
<a><samp>(2, 1)</samp> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<samp>(2, 3)</samp>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<samp>(3, 1)</samp>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<samp>(3, 2)</samp>
<a><samp class=p>>>> </samp><kbd>next(perms)</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>next(perms)</kbd> <span class=u>&#x2464;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in <module>
StopIteration</samp></pre>
@@ -232,9 +232,9 @@ StopIteration</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>import itertools</kbd>
<a><samp class=p>>>> </samp><kbd>perms = itertools.permutations('ABC', 3)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>perms = itertools.permutations('ABC', 3)</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<a><samp>('A', 'B', 'C')</samp> <span>&#x2461;</span></a>
<a><samp>('A', 'B', 'C')</samp> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
<samp>('A', 'C', 'B')</samp>
<samp class=p>>>> </samp><kbd>next(perms)</kbd>
@@ -249,7 +249,7 @@ StopIteration</samp></pre>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in <module>
StopIteration</samp>
<a><samp class=p>>>> </samp><kbd>list(itertools.permutations('ABC', 3))</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>list(itertools.permutations('ABC', 3))</kbd> <span class=u>&#x2462;</span></a>
<samp>[('A', 'B', 'C'), ('A', 'C', 'B'),
('B', 'A', 'C'), ('B', 'C', 'A'),
('C', 'A', 'B'), ('C', 'B', 'A')]</samp></pre>
@@ -264,11 +264,11 @@ StopIteration</samp>
<h2 id=more-itertools>Other Fun Stuff in the <code>itertools</code> Module</h2>
<pre class=screen>
<samp class=p>>>> </samp><kbd>import itertools</kbd>
<a><samp class=p>>>> </samp><kbd>list(itertools.product('ABC', '123'))</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>list(itertools.product('ABC', '123'))</kbd> <span class=u>&#x2460;</span></a>
<samp>[('A', '1'), ('A', '2'), ('A', '3'),
('B', '1'), ('B', '2'), ('B', '3'),
('C', '1'), ('C', '2'), ('C', '3')]</samp>
<a><samp class=p>>>> </samp><kbd>list(itertools.combinations('ABC', 2))</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>list(itertools.combinations('ABC', 2))</kbd> <span class=u>&#x2461;</span></a>
<samp>[('A', 'B'), ('A', 'C'), ('B', 'C')]</samp></pre>
<ol>
<li>The <code>itertools.product()</code> function returns an iterator containing the Cartesian product of two sequences.
@@ -277,19 +277,19 @@ StopIteration</samp>
<p class=d>[<a href=examples/favorite-people.txt>download <code>favorite-people.txt</code></a>]
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>names = list(open('examples/favorite-people.txt'))</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>names = list(open('examples/favorite-people.txt'))</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>names</kbd>
<samp>['Dora\n', 'Ethan\n', 'Wesley\n', 'John\n', 'Anne\n',
'Mike\n', 'Chris\n', 'Sarah\n', 'Alex\n', 'Lizzie\n']</samp>
<a><samp class=p>>>> </samp><kbd>names = [name.rstrip() for name in names]</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>names = [name.rstrip() for name in names]</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>names</kbd>
<samp>['Dora', 'Ethan', 'Wesley', 'John', 'Anne',
'Mike', 'Chris', 'Sarah', 'Alex', 'Lizzie']</samp>
<a><samp class=p>>>> </samp><kbd>names = sorted(names)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>names = sorted(names)</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>names</kbd>
<samp>['Alex', 'Anne', 'Chris', 'Dora', 'Ethan',
'John', 'Lizzie', 'Mike', 'Sarah', 'Wesley']</samp>
<a><samp class=p>>>> </samp><kbd>names = sorted(names, key=len)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>names = sorted(names, key=len)</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>names</kbd>
<samp>['Alex', 'Anne', 'Dora', 'John', 'Mike',
'Chris', 'Ethan', 'Sarah', 'Lizzie', 'Wesley']</samp></pre>
@@ -305,7 +305,7 @@ StopIteration</samp>
<pre class=screen>
<p>&hellip;continuing from the previous interactive shell&hellip;
<samp class=p>>>> </samp><kbd>import itertools</kbd>
<a><samp class=p>>>> </samp><kbd>groups = itertools.groupby(names, len)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>groups = itertools.groupby(names, len)</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>groups</kbd>
<samp>&lt;itertools.groupby object at 0x00BB20C0></samp>
<samp class=p>>>> </samp><kbd>list(groups)</kbd>
@@ -313,7 +313,7 @@ StopIteration</samp>
(5, &lt;itertools._grouper object at 0x00BB4050>),
(6, &lt;itertools._grouper object at 0x00BB4030>)]</samp>
<samp class=p>>>> </samp><kbd>groups = itertools.groupby(names, len)</kbd>
<a><samp class=p>>>> </samp><kbd>for name_length, name_iter in groups:</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>for name_length, name_iter in groups:</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>... </samp><kbd> print('Names with {0:d} letters:'.format(name_length))</kbd>
<samp class=p>... </samp><kbd> for name in name_iter:</kbd>
<samp class=p>... </samp><kbd> print(name)</kbd>
@@ -342,13 +342,13 @@ Wesley</samp></pre>
<samp>[0, 1, 2]</samp>
<samp class=p>>>> </samp><kbd>list(range(10, 13))</kbd>
<samp>[10, 11, 12]</samp>
<a><samp class=p>>>> </samp><kbd>list(itertools.chain(range(0, 3), range(10, 13)))</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>list(itertools.chain(range(0, 3), range(10, 13)))</kbd> <span class=u>&#x2460;</span></a>
<samp>[0, 1, 2, 10, 11, 12]</samp>
<a><samp class=p>>>> </samp><kbd>list(zip(range(0, 3), range(10, 13)))</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>list(zip(range(0, 3), range(10, 13)))</kbd> <span class=u>&#x2461;</span></a>
<samp>[(0, 10), (1, 11), (2, 12)]</samp>
<a><samp class=p>>>> </samp><kbd>list(zip(range(0, 3), range(10, 14)))</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>list(zip(range(0, 3), range(10, 14)))</kbd> <span class=u>&#x2462;</span></a>
<samp>[(0, 10), (1, 11), (2, 12)]</samp>
<a><samp class=p>>>> </samp><kbd>list(itertools.zip_longest(range(0, 3), range(10, 14)))</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>list(itertools.zip_longest(range(0, 3), range(10, 14)))</kbd> <span class=u>&#x2463;</span></a>
<samp>[(0, 10), (1, 11), (2, 12), (None, 13)]</samp></pre>
<ol>
<li>The <code>itertools.chain()</code> function takes two iterators and returns an iterator that contains all the items from the first iterator, followed by all the items from the second iterator. (Actually, it can take any number of iterators, and it chains them all in the order they were passed to the function.)
@@ -362,10 +362,10 @@ Wesley</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')</kbd>
<samp class=p>>>> </samp><kbd>guess = ('1', '2', '0', '3', '4', '5', '6', '7')</kbd>
<a><samp class=p>>>> </samp><kbd>tuple(zip(characters, guess))</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tuple(zip(characters, guess))</kbd> <span class=u>&#x2460;</span></a>
<samp>(('S', '1'), ('M', '2'), ('E', '0'), ('D', '3'),
('O', '4'), ('N', '5'), ('R', '6'), ('Y', '7'))</samp>
<a><samp class=p>>>> </samp><kbd>dict(zip(characters, guess))</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>dict(zip(characters, guess))</kbd> <span class=u>&#x2461;</span></a>
<samp>{'E': '0', 'D': '3', 'M': '2', 'O': '4',
'N': '5', 'S': '1', 'R': '6', 'Y': '7'}</samp></pre>
<ol>
@@ -375,7 +375,7 @@ Wesley</samp></pre>
<p id=guess>The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution.
<pre><code>characters = tuple(ord(c) for c in sorted_characters)
<pre><code class=pp>characters = tuple(ord(c) for c in sorted_characters)
digits = tuple(ord(c) for c in '0123456789')
...
for guess in itertools.permutations(digits, len(characters)):
@@ -391,10 +391,10 @@ for guess in itertools.permutations(digits, len(characters)):
<p>Python strings have many methods. You learned about some of those methods in <a href=strings.html>the Strings chapter</a>: <code>lower()</code>, <code>count()</code>, and <code>format()</code>. Now I want to introduce you to a powerful but little-known string manipulation technique: the <code>translate()</code> method.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>translation_table = {ord('A'): ord('O')}</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>translation_table</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>translation_table = {ord('A'): ord('O')}</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>translation_table</kbd> <span class=u>&#x2461;</span></a>
<samp>{65: 79}</samp>
<a><samp class=p>>>> </samp><kbd>'MARK'.translate(translation_table)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>'MARK'.translate(translation_table)</kbd> <span class=u>&#x2462;</span></a>
<samp>'MORK'</samp></pre>
<ol>
<li>String translation starts with a translation table, which is just a dictionary that maps one character to another. Actually, &#8220;character&#8221; is incorrect &mdash; the translation table really maps one <em>byte</em> to another.
@@ -405,16 +405,16 @@ for guess in itertools.permutations(digits, len(characters)):
<p>What does this have to do with solving alphametic puzzles? As it turns out, everything.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>characters = tuple(ord(c) for c in 'SMEDONRY')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>characters = tuple(ord(c) for c in 'SMEDONRY')</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>characters</kbd>
<samp>(83, 77, 69, 68, 79, 78, 82, 89)</samp>
<a><samp class=p>>>> </samp><kbd>guess = tuple(ord(c) for c in '91570682')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>guess = tuple(ord(c) for c in '91570682')</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>guess</kbd>
<samp>(57, 49, 53, 55, 48, 54, 56, 50)</samp>
<a><samp class=p>>>> </samp><kbd>translation_table = dict(zip(characters, guess))</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>translation_table = dict(zip(characters, guess))</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>translation_table</kbd>
<samp>{68: 55, 69: 53, 77: 49, 78: 54, 79: 48, 82: 56, 83: 57, 89: 50}</samp>
<a><samp class=p>>>> </samp><kbd>'SEND + MORE == MONEY'.translate(translation_table)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>'SEND + MORE == MONEY'.translate(translation_table)</kbd> <span class=u>&#x2463;</span></a>
<samp>'9567 + 1085 == 10652'</samp></pre>
<ol>
<li>Using a <a href=#generator-expressions>generator expression</a>, we quickly compute the byte values for each character in a string. <var>characters</var> is an example of the value of <var>sorted_characters</var> in the <code>alphametics.solve()</code> function.
@@ -455,12 +455,12 @@ for guess in itertools.permutations(digits, len(characters)):
<pre class=screen>
<samp class=p>>>> </samp><kbd>x = 5</kbd>
<a><samp class=p>>>> </samp><kbd>eval("x * 5")</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("x * 5")</kbd> <span class=u>&#x2460;</span></a>
<samp>25</samp>
<a><samp class=p>>>> </samp><kbd>eval("pow(x, 2)")</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("pow(x, 2)")</kbd> <span class=u>&#x2461;</span></a>
<samp>25</samp>
<samp class=p>>>> </samp><kbd>import math</kbd>
<a><samp class=p>>>> </samp><kbd>eval("math.sqrt(x)")</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("math.sqrt(x)")</kbd> <span class=u>&#x2462;</span></a>
<samp>2.2360679774997898</samp></pre>
<ol>
<li>The expression that <code>eval()</code> takes can reference global variables defined outside the <code>eval()</code>. If called within a function, it can reference local variables too.
@@ -472,11 +472,11 @@ for guess in itertools.permutations(digits, len(characters)):
<pre class=screen>
<samp class=p>>>> </samp><kbd>import subprocess</kbd>
<a><samp class=p>>>> </samp><kbd>eval("subprocess.getoutput('ls ~')")</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("subprocess.getoutput('ls ~')")</kbd> <span class=u>&#x2460;</span></a>
<samp>'Desktop Library Pictures \
Documents Movies Public \
Music Sites'</samp>
<a><samp class=p>>>> </samp><kbd>eval("subprocess.getoutput('rm -rf /')")</kbd> <span>&#x2461;</span></a></pre>
<a><samp class=p>>>> </samp><kbd>eval("subprocess.getoutput('rm -rf /')")</kbd> <span class=u>&#x2461;</span></a></pre>
<ol>
<li>The <code>subprocess</code> module allows you to run arbitrary shell commands and get the result as a Python string.
<li>Don&#8217;t do this.
@@ -485,7 +485,7 @@ for guess in itertools.permutations(digits, len(characters)):
<p>It&#8217;s even worse than that, because there&#8217;s a global <code>__import__()</code> function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of <code>eval()</code>, you can construct a single expression that will wipe out all your files:
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>eval("__import__('subprocess').getoutput('rm -rf /')")</kbd> <span>&#x2460;</span></a></pre>
<a><samp class=p>>>> </samp><kbd>eval("__import__('subprocess').getoutput('rm -rf /')")</kbd> <span class=u>&#x2460;</span></a></pre>
<ol>
<li>Don&#8217;t do this either.
</ol>
@@ -498,14 +498,14 @@ for guess in itertools.permutations(digits, len(characters)):
<pre class=screen>
<samp class=p>>>> </samp><kbd>x = 5</kbd>
<a><samp class=p>>>> </samp><kbd>eval("x * 5", {}, {})</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("x * 5", {}, {})</kbd> <span class=u>&#x2460;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
File "&lt;string>", line 1, in &lt;module>
NameError: name 'x' is not defined</samp>
<a><samp class=p>>>> </samp><kbd>eval("x * 5", {"x": x}, {})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("x * 5", {"x": x}, {})</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>import math</kbd>
<a><samp class=p>>>> </samp><kbd>eval("math.sqrt(x)", {"x": x}, {})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("math.sqrt(x)", {"x": x}, {})</kbd> <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
File "&lt;string>", line 1, in &lt;module>
@@ -519,9 +519,9 @@ NameError: name 'math' is not defined</samp></pre>
<p>Gee, that was easy. Lemme make an alphametics web service now!
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>eval("pow(5, 2)", {}, {})</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("pow(5, 2)", {}, {})</kbd> <span class=u>&#x2460;</span></a>
<samp>25</samp>
<a><samp class=p>>>> </samp><kbd>eval("__import__('math').sqrt(5)", {}, {})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>eval("__import__('math').sqrt(5)", {}, {})</kbd> <span class=u>&#x2461;</span></a>
<samp>2.2360679774997898</samp></pre>
<ol>
<li>Even though you&#8217;ve passed empty dictionaries for the global and local namespaces, all of Python&#8217;s built-in functions are still available during evaluation. So <code>pow(5, 2)</code> works, because <code>5</code> and <code>2</code> are literals, and <code>pow()</code> is a built-in function.
@@ -531,7 +531,7 @@ NameError: name 'math' is not defined</samp></pre>
<p>Yeah, that means you can still do nasty things, even if you explicitly set the global and local namespaces to empty dictionaries when calling <code>eval()</code>:
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>eval("__import__('subprocess').getoutput('rm -rf /')", {}, {})</kbd> <span>&#x2460;</span></a></pre>
<a><samp class=p>>>> </samp><kbd>eval("__import__('subprocess').getoutput('rm -rf /')", {}, {})</kbd> <span class=u>&#x2460;</span></a></pre>
<ol>
<li>Please don&#8217;t do this.
</ol>
@@ -540,13 +540,13 @@ NameError: name 'math' is not defined</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>eval("__import__('math').sqrt(5)",</kbd>
<a><samp class=p>... </samp><kbd> {"__builtins__":None}, {})</kbd> <span>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> {"__builtins__":None}, {})</kbd> <span class=u>&#x2460;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
File "&lt;string>", line 1, in &lt;module>
NameError: name '__import__' is not defined</samp>
<samp class=p>>>> </samp><kbd>eval("__import__('subprocess').getoutput('rm -rf /')",</kbd>
<a><samp class=p>... </samp><kbd> {"__builtins__":None}, {})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>... </samp><kbd> {"__builtins__":None}, {})</kbd> <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
@@ -591,9 +591,10 @@ NameError: name '__import__' is not defined</samp></pre>
<p>Many, many thanks to Raymond Hettinger for agreeing to relicense his code so I could port it to Python 3 and use it as the basis for this chapter.
<p class=v><a href=iterators.html rel=prev title='back to &#8220;Iterators&#8221;'><span>&#x261C;</span></a> <a href=unit-testing.html rel=next title='onward to &#8220;Unit Testing&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=iterators.html rel=prev title='back to &#8220;Iterators&#8221;'><span class=u>&#x261C;</span></a> <a href=unit-testing.html rel=next title='onward to &#8220;Unit Testing&#8221;'><span class=u>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+42 -42
View File
@@ -15,11 +15,11 @@ del{background:#f87}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#case-study-porting-chardet-to-python-3>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#case-study-porting-chardet-to-python-3>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=pro>&#x2666;&#x2666;&#x2666;&#x2666;&#x2666;</span>
<h1>Case Study: Porting <code>chardet</code> to Python 3</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Words, words. They&#8217;re all we have to go on. <span>&#x275E;</span><br>&mdash; <a href=http://www.imdb.com/title/tt0100519/quotes>Rosencrantz and Guildenstern are Dead</a>
<p><span class=u>&#x275D;</span> Words, words. They&#8217;re all we have to go on. <span class=u>&#x275E;</span><br>&mdash; <a href=http://www.imdb.com/title/tt0100519/quotes>Rosencrantz and Guildenstern are Dead</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -593,7 +593,7 @@ RefactoringTool: test.py</samp></pre>
^
SyntaxError: invalid syntax</samp></pre>
<p>Hmm, a small snag. In Python 3, <code>False</code> is a reserved word, so you can&#8217;t use it as a variable name. Let&#8217;s look at <code>constants.py</code> to see where it&#8217;s defined. Here&#8217;s the original version from <code>constants.py</code>, before the <code>2to3</code> script changed it:
<pre><code>import __builtin__
<pre><code class=pp>import __builtin__
if not hasattr(__builtin__, 'False'):
False = 0
True = 1
@@ -603,9 +603,9 @@ else:
<p>This piece of code is designed to allow this library to run under older versions of Python 2. Prior to Python 2.3 [FIXME-LINK], Python had no built-in <code>bool</code> type. This code detects the absence of the built-in constants <code>True</code> and <code>False</code>, and defines them if necessary.
<p>However, Python 3 will always have a <code>bool</code> type, so this entire code snippet is unnecessary. The simplest solution is to replace all instances of <code>constants.True</code> and <code>constants.False</code> with <code>True</code> and <code>False</code>, respectively, then delete this dead code from <code>constants.py</code>.
<p>So this line in <code>universaldetector.py</code>:
<pre><code>self.done = constants.False</code></pre>
<pre><code class=pp>self.done = constants.False</code></pre>
<p>Becomes
<pre><code>self.done = False</code></pre>
<pre><code class=pp>self.done = False</code></pre>
<p>Ah, wasn&#8217;t that satisfying? The code is shorter and more readable already.
<h3 id=nomodulenamedconstants>No module named <code>constants</code></h3>
<p>Time to run <code>test.py</code> again and see how far it gets.
@@ -617,12 +617,12 @@ else:
import constants, sys
ImportError: No module named constants</samp></pre>
<p>What&#8217;s that you say? No module named <code>constants</code>? Of course there&#8217;s a module named <code>constants</code>. &hellip;Oh wait, no there isn&#8217;t. Remember when the <code>2to3</code> script fixed up all those import statements? This library has a lot of relative imports &mdash; that is, modules that import other modules within the library. In Python 3, all import statements are absolute by default [FIXME-LINK PEP 0328]. To do relative imports, you need to do something like this instead:
<pre><code>from . import constants</code></pre>
<pre><code class=pp>from . import constants</code></pre>
<p>But wait. Wasn&#8217;t the <code>2to3</code> script supposed to take care of these for you? Well, it did, but this particular import statement combines two different types of imports into one line: a relative import of the <code>constants</code> module within the library, and an absolute import of the <code>sys</code> module that is pre-installed in the Python standard library. In Python 2, you could combine these into one import statement. In Python 3, you can&#8217;t, and the <code>2to3</code> script is not smart enough to split the import statement into two.
<p>The solution is to split the import statement manually. So this two-in-one import:
<pre><code>import constants, sys</code></pre>
<pre><code class=pp>import constants, sys</code></pre>
<p>Needs to become two separate imports:
<pre><code>from . import constants
<pre><code class=pp>from . import constants
import sys</code></pre>
<p>There are variations of this problem scattered throughout the <code>chardet</code> library. In some places it&#8217;s &#8220;<code>import constants, sys</code>&#8221;; in other places, it&#8217;s &#8220;<code>import constants, re</code>&#8221;. The fix is the same: manually split the import statement into two lines, one for the relative import, the other for the absolute import.
<p>FIXME-xref to as-yet-unwritten PEP 8 style section (which says you should put all imports on their own line)
@@ -638,7 +638,7 @@ import sys</code></pre>
NameError: name 'file' is not defined</samp></pre>
<p>This one surprised me, because I&#8217;ve been using this idiom as long as I can remember. In Python 2, the global <code>file()</code> function was an alias for the <code>open()</code> function, which was the standard way of opening files for reading. In Python 3, the entire system for reading and writing files has been refactored into the <code>io</code> module. [FIXME-LINK PEP 3116] I&#8217;ll cover the new I/O module in more detail in Chapter FIXME, but for now, the important bit is that the global <code>file()</code> function no longer exists. However, the <code>open()</code> function does still exist. (Technically, it&#8217;s an alias for <var>io.open()</var>, but never mind that right now.)
<p>Thus, the simplest solution to the problem of the missing <code>file()</code> is to call the <code>open()</code> function instead:
<pre><code>for line in open(f, 'rb'):</code></pre>
<pre><code class=pp>for line in open(f, 'rb'):</code></pre>
<p>And that&#8217;s all I have to say about that.
<h3 id=cantuseastringpattern>Can&#8217;t use a string pattern on a bytes-like object</h3>
<p>Now things are starting to get interesting. And by &#8220;interesting,&#8221; I mean &#8220;confusing as all hell.&#8221;
@@ -651,20 +651,20 @@ NameError: name 'file' is not defined</samp></pre>
if self._highBitDetector.search(aBuf):
TypeError: can't use a string pattern on a bytes-like object</samp></pre>
<p>To debug this, let&#8217;s see what <var>self._highBitDetector</var> is. It&#8217;s defined in the <var>__init__</var> method of the <var>UniversalDetector</var> class:
<pre><code>class UniversalDetector:
<pre><code class=pp>class UniversalDetector:
def __init__(self):
self._highBitDetector = re.compile(r'[\x80-\xFF]')</code></pre>
<p>This pre-compiles a regular expression designed to find non-<abbr>ASCII</abbr> characters in the range 128&ndash;255 (0x80&ndash;0xFF). Wait, that&#8217;s not quite right; I need to be more precise with my terminology. This pattern is designed to find non-<abbr>ASCII</abbr> <em>bytes</em> in the range 128-255.
<p>And therein lies the problem.
<p>In Python 2, a string was an array of bytes whose character encoding was tracked separately. If you wanted Python 2 to keep track of the character encoding, you had to use a Unicode string (<code>u''</code>) instead. But in Python 3, a string is always what Python 2 called a Unicode string &mdash; that is, an array of Unicode characters (of possibly varying byte lengths). Since this regular expression is defined by a string pattern, it can only be used to search a string &mdash; again, an array of characters. But what we&#8217;re searching is not a string, it&#8217;s a byte array. Looking at the traceback, this error occurred in <code>universaldetector.py</code>:
<pre><code>def feed(self, aBuf):
<pre><code class=pp>def feed(self, aBuf):
.
.
.
if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):</code></pre>
<p>And what is <var>aBuf</var>? Let&#8217;s backtrack further to a place that calls <code>UniversalDetector.feed()</code>. One place that calls it is the test harness, <code>test.py</code>.
<pre><code>u = UniversalDetector()
<pre><code class=pp>u = UniversalDetector()
.
.
.
@@ -674,7 +674,7 @@ for line in open(f, 'rb'):
<p>And here we find our answer: in the <code>UniversalDetector.feed()</code> method, <var>aBuf</var> is a line read from a file on disk. Look carefully at the parameters used to open the file: <code>'rb'</code>. <code>'r'</code> is for &#8220;read&#8221;; OK, big deal, we&#8217;re reading the file. Ah, but <code>'b'</code> is for &#8220;binary.&#8221; Without the <code>'b'</code> flag, this <code>for</code> loop would read the file, line by line, and convert each line into a string &mdash; an array of Unicode characters &mdash; according to the system default character encoding. (You could override the system encoding with another parameter to the <code>open()</code> function, but never mind that for now.) But with the <code>'b'</code> flag, this <code>for</code> loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to <code>UniversalDetector.feed()</code>, and eventually gets passed to the pre-compiled regular expression, <var>self._highBitDetector</var>, to search for high-bit&hellip; characters. But we don&#8217;t have characters; we have bytes. Oops.
<p>What we need this regular expression to search is not an array of characters, but an array of bytes.
<p>Once you realize that, the solution is not difficult. Regular expressions defined with strings can search strings. Regular expressions defined with byte arrays can search byte arrays. To define a byte array pattern, we simply change the type of the argument we use to define the regular expression to a byte array. (There is one other case of this same problem, on the very next line.)
<pre><code> class UniversalDetector:
<pre><code class=pp> class UniversalDetector:
def __init__(self):
<del>- self._highBitDetector = re.compile(r'[\x80-\xFF]')</del>
<del>- self._escDetector = re.compile(r'(\033|~{)')</del>
@@ -684,7 +684,7 @@ for line in open(f, 'rb'):
self._mCharSetProbers = []
self.reset()</code></pre>
<p>Searching the entire codebase for other uses of the <code>re</code> module turns up two more instances, in <code>charsetprober.py</code>. Again, the code is defining regular expressions as strings but executing them on <var>aBuf</var>, which is a byte array. The solution is the same: define the regular expression patterns as byte arrays.
<pre><code> class CharSetProber:
<pre><code class=pp> class CharSetProber:
.
.
.
@@ -709,7 +709,7 @@ for line in open(f, 'rb'):
elif (self._mInputState == ePureAscii) and self._escDetector.search(self._mLastChar + aBuf):
TypeError: Can't convert 'bytes' object to str implicitly</samp></pre>
<p>There&#8217;s an unfortunate clash of coding style and Python interpreter here. The <code>TypeError</code> could be anywhere on that line, but the traceback doesn&#8217;t tell you exactly where it is. It could be in the first conditional or the second, and the traceback would look the same. To narrow it down, you should split the line in half, like this:
<pre><code>elif (self._mInputState == ePureAscii) and \
<pre><code class=pp>elif (self._mInputState == ePureAscii) and \
self._escDetector.search(self._mLastChar + aBuf):</code></pre>
<p>And re-run the test:
<pre class=screen><samp class=p>C:\home\chardet> </samp><kbd>python test.py tests\*\*</kbd>
@@ -722,7 +722,7 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp></pre>
TypeError: Can't convert 'bytes' object to str implicitly</samp></pre>
<p>Aha! The problem was not in the first conditional (<code>self._mInputState == ePureAscii</code>) but in the second one. So what could cause a <code>TypeError</code> there? Perhaps you&#8217;re thinking that the <code>search()</code> method is expecting a value of a different type, but that wouldn&#8217;t generate this traceback. Python functions can take any value; if you pass the right number of arguments, the function will execute. It may <em>crash</em> if you pass it a value of a different type than it&#8217;s expecting, but if that happened, the traceback would point to somewhere inside the function. But this traceback says it never got as far as calling the <code>search()</code> method. So the problem must be in that <code>+</code> operation, as it&#8217;s trying to construct the value that it will eventually pass to the <code>search()</code> method.
<p>We know from <a href=#cantuseastringpattern>previous debugging</a> that <var>aBuf</var> is a byte array. So what is <code>self._mLastChar</code>? It&#8217;s an instance variable, defined in the <code>reset()</code> method, which is actually called from the <code>__init__()</code> method.
<pre><code>class UniversalDetector:
<pre><code class=pp>class UniversalDetector:
def __init__(self):
self._highBitDetector = re.compile(b'[\x80-\xFF]')
self._escDetector = re.compile(b'(\033|~{)')
@@ -739,7 +739,7 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp></pre>
<mark> self._mLastChar = ''</mark></code></pre>
<p>And now we have our answer. Do you see it? <var>self._mLastChar</var> is a string, but <var>aBuf</var> is a byte array. And you can&#8217;t concatenate a string to a byte array &mdash; not even a zero-length string.
<p>So what is <var>self._mLastChar</var> anyway? The answer is in the <code>feed()</code> method, just a few lines down from where the trackback occurred.
<pre><code>if self._mInputState == ePureAscii:
<pre><code class=pp>if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):
self._mInputState = eHighbyte
elif (self._mInputState == ePureAscii) and \
@@ -748,15 +748,14 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp></pre>
<mark>self._mLastChar = aBuf[-1]</mark></code></pre>
<p>The calling function calls this <code>feed()</code> method over and over again with a few bytes at a time. The method processes the bytes it was given (passed in as <var>aBuf</var>), then stores the last byte in <var>self._mLastChar</var> in case it&#8217;s needed during the next call. (In a multi-byte encoding, the <code>feed()</code> method might get called with half of a character, then called again with the other half.) But because <var>aBuf</var> is now a byte array instead of a string, <var>self._mLastChar</var> needs to be a byte array as well. Thus:
<pre><code> def reset(self):
<pre><code class=pp> def reset(self):
.
.
.
<del>- self._mLastChar = ''</del>
<ins>+ self._mLastChar = b''</ins></code></pre>
<p>Searching the entire codebase for &#8220;<code>mLastChar</code>&#8221; turns up a similar problem in <code>mbcharsetprober.py</code>, but instead of tracking the last character, it tracks the last <em>two</em> characters. The <code>MultiByteCharSetProber</code> class uses a list of 1-character strings to track the last two characters; in Python 3, it needs to use a list of integers.
<pre><code>
class MultiByteCharSetProber(CharSetProber):
<pre><code class=pp> class MultiByteCharSetProber(CharSetProber):
def __init__(self):
CharSetProber.__init__(self)
self._mDistributionAnalyzer = None
@@ -785,7 +784,7 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'</samp></pre>
<p>&hellip;The bad news is it doesn&#8217;t always feel like progress.
<p>But this is progress! Really! Even though the traceback calls out the same line of code, it&#8217;s a different error than it used to be. Progress! So what&#8217;s the problem now? The last time I checked, this line of code didn&#8217;t try to concatenate an <code>int</code> with a byte array (<code>bytes</code>). In fact, you just spent a lot of time <a href=#cantconvertbytesobject>ensuring that <var>self._mLastChar</var> was a byte array</a>. How did it turn into an <code>int</code>?
<p>The answer lies not in the previous lines of code, but in the following lines.
<pre><code>if self._mInputState == ePureAscii:
<pre><code class=pp>if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):
self._mInputState = eHighbyte
elif (self._mInputState == ePureAscii) and \
@@ -796,22 +795,22 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'</samp></pre>
<aside>Each item in a string is a string. Each item in a byte array is an integer.</aside>
<p>This error doesn&#8217;t occur the first time the <code>feed()</code> method gets called; it occurs the <em>second time</em>, after <var>self._mLastChar</var> has been set to the last byte of <var>aBuf</var>. Well, what&#8217;s the problem with that? Getting a single element from a byte array yields an integer, not a byte array. To see the difference, follow me to the interactive shell:
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>aBuf = b'\xEF\xBB\xBF'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>aBuf = b'\xEF\xBB\xBF'</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>len(aBuf)</kbd>
<samp>3</samp>
<samp class=p>>>> </samp><kbd>mLastChar = aBuf[-1]</kbd>
<a><samp class=p>>>> </samp><kbd>mLastChar</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>mLastChar</kbd> <span class=u>&#x2461;</span></a>
<samp>191</samp>
<a><samp class=p>>>> </samp><kbd>type(mLastChar)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>type(mLastChar)</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;class 'int'></samp>
<a><samp class=p>>>> </samp><kbd>mLastChar + aBuf</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>mLastChar + aBuf</kbd> <span class=u>&#x2463;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
TypeError: unsupported operand type(s) for +: 'int' and 'bytes'</samp>
<a><samp class=p>>>> </samp><kbd>mLastChar = aBuf[-1:]</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>mLastChar = aBuf[-1:]</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp><kbd>mLastChar</kbd>
<samp>b'\xbf'</samp>
<a><samp class=p>>>> </samp><kbd>mLastChar + aBuf</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>mLastChar + aBuf</kbd> <span class=u>&#x2465;</span></a>
<samp>b'\xbf\xef\xbb\xbf'</samp></pre>
<ol>
<li>Define a byte array of length 3.
@@ -822,7 +821,7 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'</samp>
<li>Concatenating a byte array of length 1 with a byte array of length 3 returns a new byte array of length 4.
</ol>
<p>So, to ensure that the <code>feed()</code> method in <code>universaldetector.py</code> continues to work no matter how often it&#8217;s called, you need to <a href=#cantconvertbytesobject>initialize <var>self._mLastChar</var> as a 0-length byte array</a>, then <em>make sure it stays a byte array</em>.
<pre><code> self._escDetector.search(self._mLastChar + aBuf):
<pre><code class=pp> self._escDetector.search(self._mLastChar + aBuf):
self._mInputState = eEscAscii
<del>- self._mLastChar = aBuf[-1]</del>
@@ -845,25 +844,25 @@ tests\Big5\0804.blogspot.com.xml</samp>
byteCls = self._mModel['classTable'][ord(c)]
TypeError: ord() expected string of length 1, but int found</samp></pre>
<p>OK, so <var>c</var> is an <code>int</code>, but the <code>ord()</code> function was expecting a 1-character string. Fair enough. Where is <var>c</var> defined?
<pre><code># codingstatemachine.py
<pre><code class=pp># codingstatemachine.py
def next_state(self, c):
# for each byte we get its class
# if it is first byte, we also get byte length
byteCls = self._mModel['classTable'][ord(c)]</code></pre>
<p>That&#8217;s no help; it&#8217;s just passed into the function. Let&#8217;s pop the stack.
<pre><code># utf8prober.py
<pre><code class=pp># utf8prober.py
def feed(self, aBuf):
for c in aBuf:
codingState = self._mCodingSM.next_state(c)</code></pre>
<p>And now we have the answer. Do you see it? In Python 2, <var>aBuf</var> was a string, so <var>c</var> was a 1-character string. (That&#8217;s what you get when you iterate over a string &mdash; all the characters, one by one.) But now, <var>aBuf</var> is a byte array, so <var>c</var> is an <code>int</code>, not a 1-character string. In other words, there&#8217;s no need to call the <code>ord()</code> function because <var>c</var> is already an <code>int</code>!
<p>Thus:
<pre><code> def next_state(self, c):
<pre><code class=pp> def next_state(self, c):
# for each byte we get its class
# if it is first byte, we also get byte length
<del>- byteCls = self._mModel['classTable'][ord(c)]</del>
<ins>+ byteCls = self._mModel['classTable'][c]</ins></code></pre>
<p>Searching the entire codebase for instances of &#8220;<code>ord(c)</code>&#8221; uncovers similar problems in <code>sbcharsetprober.py</code>&hellip;
<pre><code># sbcharsetprober.py
<pre><code class=pp># sbcharsetprober.py
def feed(self, aBuf):
if not self._mModel['keepEnglishLetter']:
aBuf = self.filter_without_english_letters(aBuf)
@@ -873,13 +872,13 @@ def feed(self, aBuf):
for c in aBuf:
<mark> order = self._mModel['charToOrderMap'][ord(c)]</mark></code></pre>
<p>&hellip;and <code>latin1prober.py</code>&hellip;
<pre><code># latin1prober.py
<pre><code class=pp># latin1prober.py
def feed(self, aBuf):
aBuf = self.filter_with_english_letters(aBuf)
for c in aBuf:
<mark> charClass = Latin1_CharToClass[ord(c)]</mark></code></pre>
<p><var>c</var> is iterating over <var>aBuf</var>, which means it is an integer, not a 1-character string. The solution is the same: change <code>ord(c)</code> to just plain <code>c</code>.
<pre><code> # sbcharsetprober.py
<pre><code class=pp> # sbcharsetprober.py
def feed(self, aBuf):
if not self._mModel['keepEnglishLetter']:
aBuf = self.filter_without_english_letters(aBuf)
@@ -918,7 +917,7 @@ tests\Big5\0804.blogspot.com.xml</samp>
TypeError: unorderable types: int() >= str()</samp></pre>
<p>Did you notice? This time around, the code passed the first test case (<code>tests\ascii\howto.diveintomark.org.xml</code>). You&#8217;re making real progress here.
<p>So what&#8217;s this all about? &#8220;Unorderable types&#8221;? Once again, the difference between byte arrays and strings is rearing its ugly head. Take a look at the code:
<pre><code>class SJISContextAnalysis(JapaneseContextAnalysis):
<pre><code class=pp>class SJISContextAnalysis(JapaneseContextAnalysis):
def get_order(self, aStr):
if not aStr: return -1, 1
# find out current char's byte length
@@ -928,7 +927,7 @@ TypeError: unorderable types: int() >= str()</samp></pre>
else:
charLen = 1</code></pre>
<p>And where does <var>aStr</var> come from? Let&#8217;s pop the stack:
<pre><code>def feed(self, aBuf, aLen):
<pre><code class=pp>def feed(self, aBuf, aLen):
.
.
.
@@ -938,7 +937,7 @@ TypeError: unorderable types: int() >= str()</samp></pre>
<p>Oh look, it&#8217;s our old friend, <var>aBuf</var>. As you might have guessed from every other issue we&#8217;ve encountered in this chapter, <var>aBuf</var> is a byte array. Here, the <code>feed()</code> method isn&#8217;t just passing it on wholesale; it&#8217;s slicing it. But as you saw <a href=#unsupportedoperandtypeforplus>earlier in this chapter</a>, slicing a byte array returns a byte array, so the <var>aStr</var> parameter that gets passed to the <code>get_order()</code> method is still a byte array.
<p>And what is this code trying to do with <var>aStr</var>? It&#8217;s taking the first element of the byte array and comparing it to a string of length 1. In Python 2, that worked, because <var>aStr</var> and <var>aBuf</var> were strings, and <var>aStr[0]</var> would be a string, and you can compare strings for inequality. But in Python 3, <var>aStr</var> and <var>aBuf</var> are byte arrays, <var>aStr[0]</var> is an integer, and you can&#8217;t compare integers and strings for inequality without explicitly coercing one of them.
<p>In this case, there&#8217;s no need to make the code more complicated by adding an explicit coercion. <var>aStr[0]</var> yields an integer; the things you&#8217;re comparing to are all constants. Let&#8217;s change them from 1-character strings to integers.
<pre><code> class SJISContextAnalysis(JapaneseContextAnalysis):
<pre><code class=pp> class SJISContextAnalysis(JapaneseContextAnalysis):
def get_order(self, aStr):
if not aStr: return -1, 1
# find out current char's byte length
@@ -1009,7 +1008,7 @@ tests\Big5\0804.blogspot.com.xml</samp>
if (aStr[0] >= '\x81') and (aStr[0] &lt;= '\x9F'):
TypeError: unorderable types: int() >= str()</samp></pre>
<p>The fix is the same:
<pre><code> class EUCTWDistributionAnalysis(CharDistributionAnalysis):
<pre><code class=pp> class EUCTWDistributionAnalysis(CharDistributionAnalysis):
def __init__(self):
CharDistributionAnalysis.__init__(self)
self._mCharToFreqOrder = EUCTWCharToFreqOrder
@@ -1127,21 +1126,21 @@ tests\Big5\0804.blogspot.com.xml</samp>
total = reduce(operator.add, self._mFreqCounter)
NameError: global name 'reduce' is not defined</samp></pre>
<p>According to the official <a href=http://docs.python.org/3.0/whatsnew/3.0.html#builtins>What&#8217;s New In Python 3.0</a> guide, the <code>reduce()</code> function has been moved out of the global namespace and into the <code>functools</code> module. Quoting the guide: &#8220;Use <code>functools.reduce()</code> if you really need it; however, 99 percent of the time an explicit <code>for</code> loop is more readable.&#8221; You can read more about the decision from Guido van Rossum&#8217;s weblog: <a href='http://www.artima.com/weblogs/viewpost.jsp?thread=98196'>The fate of reduce() in Python 3000</a>.
<pre><code>def get_confidence(self):
<pre><code class=pp>def get_confidence(self):
if self.get_state() == constants.eNotMe:
return 0.01
<mark> total = reduce(operator.add, self._mFreqCounter)</mark></code></pre>
<p>The <code>reduce()</code> function takes two arguments &mdash; a function and a list (strictly speaking, any iterable object will do) &mdash; and applies the function cumulatively to each item of the list. In other words, this is a fancy and roundabout way of adding up all the items in a list and returning the result.
<p>This monstrosity was so common that Python added a global <code>sum()</code> function.
<pre><code> def get_confidence(self):
<pre><code class=pp> def get_confidence(self):
if self.get_state() == constants.eNotMe:
return 0.01
<del>- total = reduce(operator.add, self._mFreqCounter)</del>
<ins>+ total = sum(self._mFreqCounter)</ins></code></pre>
<p>Since you&#8217;re no longer using the <code>operator</code> module, you can remove that <code>import</code> from the top of the file as well.
<pre><code> from .charsetprober import CharSetProber
<pre><code class=pp> from .charsetprober import CharSetProber
from . import constants
<del>- import operator</del></code></pre>
<p>I CAN HAZ TESTZ?
@@ -1192,7 +1191,8 @@ tests\EUC-JP\arclamp.jp.xml EUC-JP with confide
<li>Test cases are essential. Don&#8217;t port anything without them. Don&#8217;t even try. The <em>only</em> reason I have any confidence at all that <code>chardet</code> works in Python 3 is because I had a test suite that exercised every line of code in the entire library. I <em>never</em> would have found half of these problems with manual spot-checking.
</ol>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next href=where-to-go-from-here.html title='onward to &#8220;Where To Go From Here&#8221;'><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next href=where-to-go-from-here.html title='onward to &#8220;Where To Go From Here&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+23 -6
View File
@@ -37,6 +37,7 @@ Classname Legend
.c = "centered" = centered footer text (also clears floats)
.a = "asterism" = section break
.v = "navigation" = prev/next navigation links (not breadcrumbs)
.u = "Unicode" = text contains Unicode characters (requires special font declaration)
.nm = "no mobile" = hide this section on mobile devices
.nd = "no decoration" = hide the widgets on this code block
@@ -53,6 +54,7 @@ Acknowledgements & Inspirations
"Use the Best Available Ampersand" ....................... http://simplebits.com/notebook/2008/08/14/ampersands.html
"Unicode Support in HTML, Fonts, and Web Browsers" ....... http://alanwood.net/unicode/
"Punctuation" ............................................ http://en.wikipedia.org/wiki/Punctuation
"Google Code Prettify" ................................... http://code.google.com/p/google-code-prettify/
*/
/* typography */
@@ -61,15 +63,15 @@ body, .w a {
font: medium/1.75 'Gill Sans', 'Gill Sans MT', Corbel, Helvetica, 'Nimbus Sans L', sans-serif;
word-spacing: 0.1em;
}
pre, kbd, samp, code, var, .b {
pre, kbd, samp, code, var, .b, pre span {
font: small/2.154 Consolas, 'Andale Mono', Monaco, 'Liberation Mono', 'Bitstream Vera Sans Mono', 'DejaVu Sans Mono', monospace;
word-spacing: 0;
}
span {
font: medium 'Arial Unicode MS', FreeSerif, OpenSymbol, 'DejaVu Sans', sans-serif;
span.u {
font: medium/1.75 'Arial Unicode MS', FreeSerif, OpenSymbol, 'DejaVu Sans', sans-serif;
}
pre span, .a {
font-family: 'Arial Unicode MS', 'DejaVu Sans', FreeSerif, OpenSymbol, sans-serif;
pre span.u, pre span.u span, .a {
font: medium/1.75 'Arial Unicode MS', 'DejaVu Sans', FreeSerif, OpenSymbol, sans-serif;
}
.baa {
font: oblique large Constantia, Baskerville, Palatino, 'Palatino Linotype', 'URW Palladio L', serif;
@@ -201,7 +203,7 @@ li ol, .q {
code, var, samp {
line-height:inherit !important;
}
pre a, .w a, pre a:hover {
pre a, td code a, .w a, pre a:hover {
border: 0;
}
@@ -271,6 +273,7 @@ aside {
#level span {
color: #82b445;
}
/* previous/next navigation links */
.v a {
@@ -290,3 +293,17 @@ aside {
margin: 0;
text-shadow: gainsboro 3px 3px 3px;
}
/* syntax highlighting */
.str { color: #080; }
.kwd { color: #008; }
.com { color: #800; }
.typ { color: #606; }
.lit { color: #066; }
.pun { color: #660; }
.pln { color: #000; }
.tag { color: #008; }
.atn { color: #606; }
.atv { color: #080; }
.dec { color: #606; }
+4 -3
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 12}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#strings>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#strings>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
<h1>Files</h1>
<blockquote class=q>
<p><span>&#x275D;</span> FIXME <span>&#x275E;</span><br>&mdash; FIXME
<p><span class=u>&#x275D;</span> FIXME <span class=u>&#x275E;</span><br>&mdash; FIXME
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
@@ -26,8 +26,9 @@ body{counter-reset:h1 12}
OK, so a string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a &#8220;text file&#8221; from disk, how does Python convert that sequence of bytes into a sequence of characters? The answer is that it decodes the bytes according to a specific character encoding algorithm, and returns a sequence of Unicode characters, otherwise known as a string.
-->
<p class=v><a href=advanced-classes.html rel=prev title='back to &#8220;Advanced Classes&#8221;'><span>&#x261C;</span></a> <a href=xml.html rel=next title='onward to &#8220;XML&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=advanced-classes.html rel=prev title='back to &#8220;Advanced Classes&#8221;'><span class=u>&#x261C;</span></a> <a href=xml.html rel=next title='onward to &#8220;XML&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+59 -59
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 5}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#generators>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#generators>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
<h1>Generators</h1>
<blockquote class=q>
<p><span>&#x275D;</span> My spelling is Wobbly. It&#8217;s good spelling but it Wobbles, and the letters get in the wrong places. <span>&#x275E;</span><br>&mdash; Winnie-the-Pooh
<p><span class=u>&#x275D;</span> My spelling is Wobbly. It&#8217;s good spelling but it Wobbles, and the letters get in the wrong places. <span class=u>&#x275E;</span><br>&mdash; Winnie-the-Pooh
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -38,11 +38,11 @@ body{counter-reset:h1 5}
<h2 id=i-know>I Know, Let&#8217;s Use Regular Expressions!</h2>
<p>So you&#8217;re looking at words, which, at least in English, means you&#8217;re looking at strings of characters. You have rules that say you need to find different combinations of characters, then do different things to them. This sounds like a job for regular expressions!
<p class=d>[<a href=examples/plural1.py>download <code>plural1.py</code></a>]
<pre><code>import re
<pre><code class=pp>import re
def plural(noun):
<a> if re.search('[sxz]$', noun): <span>&#x2460;</span></a>
<a> return re.sub('$', 'es', noun) <span>&#x2461;</span></a>
<a> if re.search('[sxz]$', noun): <span class=u>&#x2460;</span></a>
<a> return re.sub('$', 'es', noun) <span class=u>&#x2461;</span></a>
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
@@ -57,13 +57,13 @@ def plural(noun):
<p>Let&#8217;s look at regular expression substitutions in more detail.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>re.search('[abc]', 'Mark')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search('[abc]', 'Mark')</kbd> <span class=u>&#x2460;</span></a>
&lt;_sre.SRE_Match object at 0x001C1FA8>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'Mark')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'Mark')</kbd> <span class=u>&#x2461;</span></a>
<samp>'Mork'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'rock')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'rock')</kbd> <span class=u>&#x2462;</span></a>
<samp>'rook'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'caps')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'caps')</kbd> <span class=u>&#x2463;</span></a>
<samp>'oops'</samp></pre>
<ol>
<li>Does the string <code>Mark</code> contain <code>a</code>, <code>b</code>, or <code>c</code>? Yes, it contains <code>a</code>.
@@ -74,11 +74,11 @@ def plural(noun):
<p>And now, back to the <code>plural()</code> function&hellip;
<pre><code>def plural(noun):
<pre><code class=pp>def plural(noun):
if re.search('[sxz]$', noun):
<a> return re.sub('$', 'es', noun) <span>&#x2460;</span></a>
<a> elif re.search('[^aeioudgkprt]h$', noun): <span>&#x2461;</span></a>
<a> return re.sub('$', 'es', noun) <span>&#x2462;</span></a>
<a> return re.sub('$', 'es', noun) <span class=u>&#x2460;</span></a>
<a> elif re.search('[^aeioudgkprt]h$', noun): <span class=u>&#x2461;</span></a>
<a> return re.sub('$', 'es', noun) <span class=u>&#x2462;</span></a>
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
@@ -93,13 +93,13 @@ def plural(noun):
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'vacancy')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'vacancy')</kbd> <span class=u>&#x2460;</span></a>
&lt;_sre.SRE_Match object at 0x001C1FA8>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'boy')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'boy')</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp>
<samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'day')</kbd>
<samp class=p>>>> </samp>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'pita')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'pita')</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li><code>vacancy</code> matches this regular expression, because it ends in <code>cy</code>, and <code>c</code> is not <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>.
@@ -107,11 +107,11 @@ def plural(noun):
<li><code>pita</code> does not match, because it does not end in <code>y</code>.
</ol>
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'vacancy')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'vacancy')</kbd> <span class=u>&#x2460;</span></a>
<samp>'vacancies'</samp>
<samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'agency')</kbd>
<samp>'agencies'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('([^aeiou])y$', r'\1ies', 'vacancy')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('([^aeiou])y$', r'\1ies', 'vacancy')</kbd> <span class=u>&#x2461;</span></a>
<samp>'vacancies'</samp></pre>
<ol>
<li>This regular expression turns <code>vacancy</code> into <code>vacancies</code> and <code>agency</code> into <code>agencies</code>, which is what you wanted. Note that it would also turn <code>boy</code> into <code>boies</code>, but that will never happen in the function because you did that <code>re.search</code> first to find out whether you should do this <code>re.sub</code>.
@@ -126,7 +126,7 @@ def plural(noun):
<p>Now you&#8217;re going to add a level of abstraction. You started by defining a list of rules: if this, do that, otherwise go to the next rule. Let&#8217;s temporarily complicate part of the program so you can simplify another part.
<p class=d>[<a href=examples/plural2.py>download <code>plural2.py</code></a>]
<pre><code>import re
<pre><code class=pp>import re
def match_sxz(noun):
return re.search('[sxz]$', noun)
@@ -140,10 +140,10 @@ def match_h(noun):
def apply_h(noun):
return re.sub('$', 'es', noun)
<a>def match_y(noun): <span>&#x2460;</span></a>
<a>def match_y(noun): <span class=u>&#x2460;</span></a>
return re.search('[^aeiou]y$', noun)
<a>def apply_y(noun): <span>&#x2461;</span></a>
<a>def apply_y(noun): <span class=u>&#x2461;</span></a>
return re.sub('y$', 'ies', noun)
def match_default(noun):
@@ -152,14 +152,14 @@ def match_default(noun):
def apply_default(noun):
return noun + 's'
<a>rules = [[match_sxz, apply_sxz], <span>&#x2462;</span></a>
<a>rules = [[match_sxz, apply_sxz], <span class=u>&#x2462;</span></a>
[match_h, apply_h],
[match_y, apply_y],
[match_default, apply_default]
]
def plural(noun):
<a> for matches_rule, apply_rule in rules: <span>&#x2463;</span></a>
<a> for matches_rule, apply_rule in rules: <span class=u>&#x2463;</span></a>
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<ol>
@@ -174,7 +174,7 @@ def plural(noun):
<p>If this additional level of abstraction is confusing, try unrolling the function to see the equivalence. The entire <code>for</code> loop is equivalent to the following:
<pre><code>
<pre><code class=pp>
def plural(noun):
if match_sxz(noun):
return apply_sxz(noun)
@@ -206,14 +206,14 @@ def plural(noun):
<p>Defining separate named functions for each match and apply rule isn&#8217;t really necessary. You never call them directly; you add them to the <var>rules</var> list and call them through there. Furthermore, each function follows one of two patterns. All the match functions call <code>re.search()</code>, and all the apply functions call <code>re.sub()</code>. Let&#8217;s factor out the patterns so that defining new rules can be easier.
<p class=d>[<a href=examples/plural3.py>download <code>plural3.py</code></a>]
<pre><code>import re
<pre><code class=pp>import re
def build_match_and_apply_functions(pattern, search, replace):
<a> def matches_rule(word): <span>&#x2460;</span></a>
<a> def matches_rule(word): <span class=u>&#x2460;</span></a>
return re.search(pattern, word)
<a> def apply_rule(word): <span>&#x2461;</span></a>
<a> def apply_rule(word): <span class=u>&#x2461;</span></a>
return re.sub(search, replace, word)
<a> return [matches_rule, apply_rule] <span>&#x2462;</span></a></code></pre>
<a> return [matches_rule, apply_rule] <span class=u>&#x2462;</span></a></code></pre>
<ol>
<li><code>build_match_and_apply_functions()</code> is a function that builds other functions dynamically. It takes <var>pattern</var>, <var>search</var> and <var>replace</var>, then defines a <code>matches_rule()</code> function which calls <code>re.search()</code> with the <var>pattern</var> that was passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>matches_rule()</code> function you&#8217;re building. Whoa.
<li>Building the apply function works the same way. The apply function is a function that takes one parameter, and calls <code>re.sub()</code> with the <var>search</var> and <var>replace</var> parameters that were passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>apply_rule()</code> function you&#8217;re building. This technique of using the values of outside parameters within a dynamic function is called <em>closures</em>. You&#8217;re essentially defining constants within the apply function you&#8217;re building: it takes one parameter (<var>word</var>), but it then acts on that plus two other values (<var>search</var> and <var>replace</var>) which were set when you defined the apply function.
@@ -222,15 +222,14 @@ def build_match_and_apply_functions(pattern, search, replace):
<p>If this is incredibly confusing (and it should be, this is weird stuff), it may become clearer when you see how to use it.
<pre><code>
<a>patterns = \ <span>&#x2460;</span></a>
<pre><code class=pp><a>patterns = \ <span class=u>&#x2460;</span></a>
[
['[sxz]$', '$', 'es'],
['[^aeioudgkprt]h$', '$', 'es'],
['(qu|[^aeiou])y$', 'y$', 'ies'],
['$', '$', 's']
]
<a>rules = [build_match_and_apply_functions(pattern, search, replace) <span>&#x2461;</span></a>
<a>rules = [build_match_and_apply_functions(pattern, search, replace) <span class=u>&#x2461;</span></a>
for (pattern, search, replace) in patterns]</code></pre>
<ol>
<li>Our pluralization rules are now defined as a list of lists of strings (not functions). The first string in each group is the regular expression pattern that you would use in <code>re.search()</code> to see if this rule matches. The second and third strings in each group are the search and replace expressions you would use in <code>re.sub()</code> to actually apply the rule to turn a noun into its plural.
@@ -239,8 +238,8 @@ def build_match_and_apply_functions(pattern, search, replace):
<p>Rounding out this version of the script is the main entry point, the <code>plural()</code> function.
<pre><code>def plural(noun):
<a> for matches_rule, apply_rule in rules: <span>&#x2460;</span></a>
<pre><code class=pp>def plural(noun):
<a> for matches_rule, apply_rule in rules: <span class=u>&#x2460;</span></a>
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<ol>
@@ -256,7 +255,7 @@ def build_match_and_apply_functions(pattern, search, replace):
<p>First, let&#8217;s create a text file that contains the rules you want. No fancy data structures, just whitespace-delimited strings in three columns. Let&#8217;s call it <code>plural4-rules.txt</code>.
<p class=d>[<a href=examples/plural4-rules.txt>download <code>plural4-rules.txt</code></a>]
<pre><code>[sxz]$ $ es
<pre><code class=pp>[sxz]$ $ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s</code></pre>
@@ -266,9 +265,9 @@ $ $ s</code></pre>
<p>[FIXME: now that this chapter comes before the I/O chapter, need to at least mention what open() does]
<p>[FIXME: try/finally -> with]
<p class=d>[<a href=examples/plural4.py>download <code>plural4.py</code></a>]
<pre><code>import re
<pre><code class=pp>import re
<a>def build_match_and_apply_functions(pattern, search, replace): <span>&#x2460;</span></a>
<a>def build_match_and_apply_functions(pattern, search, replace): <span class=u>&#x2460;</span></a>
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
@@ -276,14 +275,14 @@ $ $ s</code></pre>
return [matches_rule, apply_rule]
rules = []
<a>pattern_file = open('plural4-rules.txt') <span>&#x2461;</span></a>
<a>pattern_file = open('plural4-rules.txt') <span class=u>&#x2461;</span></a>
try:
<a> for line in pattern_file: <span>&#x2462;</span></a>
<a> pattern, search, replace = line.split(None, 3) <span>&#x2463;</span></a>
<a> rules.append(build_match_and_apply_functions( <span>&#x2464;</span></a>
<a> for line in pattern_file: <span class=u>&#x2462;</span></a>
<a> pattern, search, replace = line.split(None, 3) <span class=u>&#x2463;</span></a>
<a> rules.append(build_match_and_apply_functions( <span class=u>&#x2464;</span></a>
pattern, search, replace))
finally:
<a> pattern_file.close() <span>&#x2465;</span></a></code></pre>
<a> pattern_file.close() <span class=u>&#x2465;</span></a></code></pre>
<ol>
<li>The <code>build_match_and_apply_functions()</code> function has not changed. You&#8217;re still using closures to build two functions dynamically that use variables defined in the outer function.
<li>Open the file that contains the pattern strings.
@@ -301,7 +300,7 @@ finally:
<p>Wouldn&#8217;t it be grand to have a generic <code>plural()</code> function that parses the rules file? Get rules, check for a match, apply appropriate transformation, go to next rule. That&#8217;s all the <code>plural()</code> function has to do, and that&#8217;s all the <code>plural()</code> function should do.
<p class=d>[<a href=examples/plural5.py>download <code>plural5.py</code></a>]
<pre><code>def rules():
<pre><code class=pp>def rules():
for line in open('plural5-rules.txt'):
pattern, search, replace = line.split(None, 3)
yield build_match_and_apply_functions(pattern, search, replace)
@@ -317,20 +316,20 @@ def plural(noun):
<samp class=p>>>> </samp><kbd>def make_counter(x):</kbd>
<samp class=p>... </samp><kbd> print('entering make_counter')</kbd>
<samp class=p>... </samp><kbd> while True:</kbd>
<a><samp class=p>... </samp><kbd> yield x</kbd> <span>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> yield x</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>... </samp><kbd> print('incrementing x')</kbd>
<samp class=p>... </samp><kbd> x = x + 1</kbd>
<samp class=p>... </samp>
<a><samp class=p>>>> </samp><kbd>counter = make_counter(2)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>counter</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>counter = make_counter(2)</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>counter</kbd> <span class=u>&#x2462;</span></a>
&lt;generator object at 0x001C9C10>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span class=u>&#x2463;</span></a>
<samp>entering make_counter
2</samp>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span class=u>&#x2464;</span></a>
<samp>incrementing x
3</samp>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span class=u>&#x2465;</span></a>
<samp>incrementing x
4</samp></pre>
<ol>
@@ -347,11 +346,11 @@ def plural(noun):
<h3 id=a-fibonacci-generator>A Fibonacci Generator</h3>
<p class=d>[<a href=examples/fibonacci.py>download <code>fibonacci.py</code></a>]
<pre><code>def fib(max):
<a> a, b = 0, 1 <span>&#x2460;</span></a>
<pre><code class=pp>def fib(max):
<a> a, b = 0, 1 <span class=u>&#x2460;</span></a>
while a &lt; max:
<a> yield a <span>&#x2461;</span></a>
<a> a, b = b, a + b <span>&#x2462;</span></a></code></pre>
<a> yield a <span class=u>&#x2461;</span></a>
<a> a, b = b, a + b <span class=u>&#x2462;</span></a></code></pre>
<ol>
<li>The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with <code>0</code> and <code>1</code>, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: <var>a</var> starts at <code>0</code>, and <var>b</var> starts at <code>1</code>.
<li><var>a</var> is the current number in the sequence, so yield it.
@@ -364,8 +363,8 @@ def plural(noun):
<pre class=screen>
<samp class=p>>>> </samp><kbd>from fibonacci import fib</kbd>
<a><samp class=p>>>> </samp><kbd>for n in fib(1000):</kbd> <span>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> print(n, end=' ')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>for n in fib(1000):</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> print(n, end=' ')</kbd> <span class=u>&#x2461;</span></a>
<samp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp></pre>
<ol>
<li>You can use a generator like <code>fib()</code> in a <code>for</code> loop directly. The <code>for</code> loop will automatically call the <code>next()</code> function to get values from the <code>fib()</code> generator and assign them to the <code>for</code> loop index variable (<var>n</var>).
@@ -376,13 +375,13 @@ def plural(noun):
<p>Let&#8217;s go back to <code>plural5.py</code> and see how this version of the <code>plural()</code> function works.
<pre><code>def rules():
<pre><code class=pp>def rules():
for line in open('plural5-rules.txt'):
<a> pattern, search, replace = line.split(None, 3) <span>&#x2461;</span></a>
<a> yield build_match_and_apply_functions(pattern, search, replace) <span>&#x2462;</span></a>
<a> pattern, search, replace = line.split(None, 3) <span class=u>&#x2461;</span></a>
<a> yield build_match_and_apply_functions(pattern, search, replace) <span class=u>&#x2462;</span></a>
def plural(noun):
<a> for matches_rule, apply_rule in rules(): <span>&#x2463;</span></a>
<a> for matches_rule, apply_rule in rules(): <span class=u>&#x2463;</span></a>
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<ol>
@@ -406,8 +405,9 @@ def plural(noun):
<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
</ul>
<p class=v><a href=regular-expressions.html rel=prev title='back to &#8220;Regular Expressions&#8221;'><span>&#x261C;</span></a> <a href=iterators.html rel=next title='onward to &#8220;Iterators&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=regular-expressions.html rel=prev title='back to &#8220;Regular Expressions&#8221;'><span class=u>&#x261C;</span></a> <a href=iterators.html rel=next title='onward to &#8220;Iterators&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+53 -52
View File
@@ -13,11 +13,11 @@ mark{display:inline}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#http-web-services>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#http-web-services>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
<h1>HTTP Web Services</h1>
<blockquote class=q>
<p><span>&#x275D;</span> A ruffled mind makes a restless pillow. <span>&#x275E;</span><br>&mdash; Charlotte Bront&euml;
<p><span class=u>&#x275D;</span> A ruffled mind makes a restless pillow. <span class=u>&#x275E;</span><br>&mdash; Charlotte Bront&euml;
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -137,7 +137,7 @@ The second time you request the same data, you include the ETag hash in an <code
<p>Again with the <kbd>curl</kbd>:
<pre class=screen>
<a><samp class=p>you@localhost:~$ </samp><kbd>curl -I <mark>-H "If-None-Match: \"3075-ddc8d800\""</mark> http://wearehugh.com/m.jpg</kbd> <span>&#x2460;</span></a>
<a><samp class=p>you@localhost:~$ </samp><kbd>curl -I <mark>-H "If-None-Match: \"3075-ddc8d800\""</mark> http://wearehugh.com/m.jpg</kbd> <span class=u>&#x2460;</span></a>
<samp>HTTP/1.1 304 Not Modified
Date: Sun, 31 May 2009 18:04:39 GMT
Server: Apache
@@ -188,7 +188,7 @@ Cache-Control: max-age=31536000, public</samp></pre>
<p>Let&#8217;s say you want to download a resource over <abbr>HTTP</abbr>, such as <a href=xml.html>an Atom feed</a>. Being a feed, you&#8217;re not just going to download it once; you&#8217;re going to download it over and over again. (Most feed readers will check for changes once an hour.) Let&#8217;s do it the quick-and-dirty way first, and then see how you can do better.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import urllib.request</kbd>
<a><samp class=p>>>> </samp><kbd>data = urllib.request.urlopen('http://diveintopython3.org/examples/feed.xml').read()</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>data = urllib.request.urlopen('http://diveintopython3.org/examples/feed.xml').read()</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>print(data)</kbd>
<samp>&lt;?xml version='1.0' encoding='utf-8'?>
&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
@@ -213,13 +213,13 @@ Cache-Control: max-age=31536000, public</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>from http.client import HTTPConnection</kbd>
<a><samp class=p>>>> </samp><kbd>HTTPConnection.debuglevel = 1</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>HTTPConnection.debuglevel = 1</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>from urllib.request import urlopen</kbd>
<a><samp class=p>>>> </samp><kbd>response = urlopen('http://diveintopython3.org/examples/feed.xml')</kbd> <span>&#x2461;</span></a>
<samp><a>send: b'GET /examples/feed.xml HTTP/1.1 <span>&#x2462;</span></a>
<a>Host: diveintopython3.org <span>&#x2463;</span></a>
<a>Accept-Encoding: identity <span>&#x2464;</span></a>
<a>User-Agent: Python-urllib/3.0' <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>response = urlopen('http://diveintopython3.org/examples/feed.xml')</kbd> <span class=u>&#x2461;</span></a>
<samp><a>send: b'GET /examples/feed.xml HTTP/1.1 <span class=u>&#x2462;</span></a>
<a>Host: diveintopython3.org <span class=u>&#x2463;</span></a>
<a>Accept-Encoding: identity <span class=u>&#x2464;</span></a>
<a>User-Agent: Python-urllib/3.0' <span class=u>&#x2465;</span></a>
Connection: close
reply: 'HTTP/1.1 200 OK'
&hellip;further debugging information omitted&hellip;</samp></pre>
@@ -236,19 +236,19 @@ reply: 'HTTP/1.1 200 OK'
<pre class=screen>
# continued from previous example
<a><samp class=p>>>> </samp><kbd>print(response.headers.as_string())</kbd> <span>&#x2460;</span></a>
<samp><a>Date: Sun, 31 May 2009 19:23:06 GMT <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>print(response.headers.as_string())</kbd> <span class=u>&#x2460;</span></a>
<samp><a>Date: Sun, 31 May 2009 19:23:06 GMT <span class=u>&#x2461;</span></a>
Server: Apache
<a>Last-Modified: Sun, 31 May 2009 06:39:55 GMT <span>&#x2462;</span></a>
<a>ETag: "bfe-93d9c4c0" <span>&#x2463;</span></a>
<a>Last-Modified: Sun, 31 May 2009 06:39:55 GMT <span class=u>&#x2462;</span></a>
<a>ETag: "bfe-93d9c4c0" <span class=u>&#x2463;</span></a>
Accept-Ranges: bytes
<a>Content-Length: 3070 <span>&#x2464;</span></a>
<a>Cache-Control: max-age=86400 <span>&#x2465;</span></a>
<a>Content-Length: 3070 <span class=u>&#x2464;</span></a>
<a>Cache-Control: max-age=86400 <span class=u>&#x2465;</span></a>
Expires: Mon, 01 Jun 2009 19:23:06 GMT
Vary: Accept-Encoding
Connection: close
Content-Type: application/xml</samp>
<a><samp class=p>>>> </samp><kbd>data = response.read()</kbd> <span>&#x2466;</span></a>
<a><samp class=p>>>> </samp><kbd>data = response.read()</kbd> <span class=u>&#x2466;</span></a>
<samp class=p>>>> </samp><kbd>len(data)</kbd>
<samp>3070</samp></pre>
<ol>
@@ -282,7 +282,7 @@ reply: 'HTTP/1.1 200 OK'
<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd>print(response2.headers.as_string())</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>print(response2.headers.as_string())</kbd> <span class=u>&#x2460;</span></a>
<samp>Date: Mon, 01 Jun 2009 03:58:00 GMT
Server: Apache
Last-Modified: Sun, 31 May 2009 22:51:11 GMT
@@ -295,9 +295,9 @@ Vary: Accept-Encoding
Connection: close
Content-Type: application/xml</samp>
<samp class=p>>>> </samp><kbd>data2 = response2.read()</kbd>
<a><samp class=p>>>> </samp><kbd>len(data2)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>len(data2)</kbd> <span class=u>&#x2461;</span></a>
<samp>3070</samp>
<a><samp class=p>>>> </samp><kbd>data2 == data</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>data2 == data</kbd> <span class=u>&#x2462;</span></a>
<samp>True</samp></pre>
<ol>
<li>The server is still sending the same array of &#8220;smart&#8221; headers: <code>Cache-Control</code> and <code>Expires</code> to allow caching, <code>Last-Modified</code> and <code>ETag</code> to enable &#8220;not-modified&#8221; tracking. Even the <code>Vary: Accept-Encoding</code> header hints that the server would support compression, if only you would ask for it. But you didn&#8217;t.
@@ -315,11 +315,11 @@ Content-Type: application/xml</samp>
<pre class=screen>
<samp class=p>>>> </samp><kbd>import httplib2</kbd>
<a><samp class=p>>>> </samp><kbd>h = httplib2.Http('.cache')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>response.status</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>h = httplib2.Http('.cache')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>response.status</kbd> <span class=u>&#x2462;</span></a>
<samp>200</samp>
<a><samp class=p>>>> </samp><kbd>content[:52]</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>content[:52]</kbd> <span class=u>&#x2463;</span></a>
<samp>b"&lt;?xml version='1.0' encoding='utf-8'?>\r\n&lt;feed xmlns="</samp>
<samp class=p>>>> </samp><kbd>len(content)</kbd>
<samp>3070</samp></pre>
@@ -331,7 +331,7 @@ Content-Type: application/xml</samp>
</ol>
<blockquote class=note>
<p><span>&#x261E;</span>You probably only need one <code>httplib2.Http</code> object. There are valid reasons for creating more than one, but you should only do so if you know why you need them. &#8220;I need to request data from two different <abbr>URL</abbr>s&#8221; is not a valid reason. Re-use the <code>Http</code> object and just call the <code>request()</code> method twice.
<p><span class=u>&#x261E;</span>You probably only need one <code>httplib2.Http</code> object. There are valid reasons for creating more than one, but you should only do so if you know why you need them. &#8220;I need to request data from two different <abbr>URL</abbr>s&#8221; is not a valid reason. Re-use the <code>Http</code> object and just call the <code>request()</code> method twice.
</blockquote>
<h3 id=httplib2-caching>How <code>httplib2</code> Handles Caching</h3>
@@ -340,10 +340,10 @@ Content-Type: application/xml</samp>
<pre class=screen>
# continued from the <a href=#introducing-httplib2>previous example</a>
<a><samp class=p>>>> </samp><kbd>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>response2.status</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>response2.status</kbd> <span class=u>&#x2461;</span></a>
<samp>200</samp>
<a><samp class=p>>>> </samp><kbd>content2[:52]</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>content2[:52]</kbd> <span class=u>&#x2462;</span></a>
<samp>b"&lt;?xml version='1.0' encoding='utf-8'?>\r\n&lt;feed xmlns="</samp>
<samp class=p>>>> </samp><kbd>len(content2)</kbd>
<samp>3070</samp></pre>
@@ -360,14 +360,14 @@ Content-Type: application/xml</samp>
# Please exit out of the interactive shell
# and launch a new one.
<samp class=p>>>> </samp><kbd>import httplib2</kbd>
<a><samp class=p>>>> </samp><kbd>httplib2.debuglevel = 1</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>h = httplib2.Http('.cache')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>len(content)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>httplib2.debuglevel = 1</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>h = httplib2.Http('.cache')</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/examples/feed.xml')</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>len(content)</kbd> <span class=u>&#x2463;</span></a>
<samp>3070</samp>
<a><samp class=p>>>> </samp><kbd>response.status</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>response.status</kbd> <span class=u>&#x2464;</span></a>
<samp>200</samp>
<a><samp class=p>>>> </samp><kbd>response.fromcache</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>response.fromcache</kbd> <span class=u>&#x2465;</span></a>
<samp>True</samp></pre>
<ol>
<li>Let&#8217;s turn on debugging and see <a href=#whats-on-the-wire>what&#8217;s on the wire</a>. This is the <code>httplib2</code> equivalent of turning on debugging in <code>http.client</code>. <code>httplib2</code> will print all the data being sent to the server and some key information being sent back.
@@ -389,8 +389,8 @@ Content-Type: application/xml</samp>
<pre class=screen>
# continued from the previous example
<samp class=p>>>> </samp><kbd>response2, content2 = h.request('http://diveintopython3.org/examples/feed.xml',</kbd>
<a><samp class=p>... </samp><kbd> headers={'cache-control':'no-cache'})</kbd> <span>&#x2460;</span></a>
<a><samp>connect: (diveintopython3.org, 80) <span>&#x2461;</span></a>
<a><samp class=p>... </samp><kbd> headers={'cache-control':'no-cache'})</kbd> <span class=u>&#x2460;</span></a>
<a><samp>connect: (diveintopython3.org, 80) <span class=u>&#x2461;</span></a>
send: b'GET /examples/feed.xml HTTP/1.1
Host: diveintopython3.org
user-agent: Python-httplib2/$Rev: 259 $
@@ -400,9 +400,9 @@ reply: 'HTTP/1.1 200 OK'
&hellip;further debugging information omitted&hellip;</samp>
<samp class=p>>>> </samp><kbd>response2.status</kbd>
<samp>200</samp>
<a><samp class=p>>>> </samp><kbd>response2.fromcache</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>response2.fromcache</kbd> <span class=u>&#x2462;</span></a>
<samp>False</samp>
<a><samp class=p>>>> </samp><kbd>print(dict(response2.items()))</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>print(dict(response2.items()))</kbd> <span class=u>&#x2463;</span></a>
<samp>{'status': '200',
'content-length': '3070',
'content-location': 'http://diveintopython3.org/examples/feed.xml',
@@ -434,14 +434,14 @@ reply: 'HTTP/1.1 200 OK'
<samp class=p>>>> </samp><kbd>import httplib2</kbd>
<samp class=p>>>> </samp><kbd>httplib2.debuglevel = 1</kbd>
<samp class=p>>>> </samp><kbd>h = httplib2.Http('.cache')</kbd>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/')</kbd> <span class=u>&#x2460;</span></a>
<samp>connect: (diveintopython3.org, 80)
send: b'GET / HTTP/1.1
Host: diveintopython3.org
accept-encoding: deflate, gzip
user-agent: Python-httplib2/$Rev: 259 $'
reply: 'HTTP/1.1 200 OK'</samp>
<a><samp class=p>>>> </samp><kbd>print(dict(response.items()))</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>print(dict(response.items()))</kbd> <span class=u>&#x2461;</span></a>
<samp>{'-content-encoding': 'gzip',
'accept-ranges': 'bytes',
'connection': 'close',
@@ -454,7 +454,7 @@ reply: 'HTTP/1.1 200 OK'</samp>
'server': 'Apache',
'status': '304',
'vary': 'Accept-Encoding,User-Agent'}</samp>
<a><samp class=p>>>> </samp><kbd>len(content)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>len(content)</kbd> <span class=u>&#x2462;</span></a>
<samp>6657</samp></pre>
<ol>
<li>Instead of the feed, this time we&#8217;re going to download the site&#8217;s home page, which is <abbr>HTML</abbr>. Since this is the first time you&#8217;lve ever requested this page, <code>httplib2</code> has little to work with, and it sends out a minimum of headers with the request.
@@ -464,22 +464,22 @@ reply: 'HTTP/1.1 200 OK'</samp>
<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>response, content = h.request('http://diveintopython3.org/')</kbd> <span class=u>&#x2460;</span></a>
<samp>connect: (diveintopython3.org, 80)
send: b'GET / HTTP/1.1
Host: diveintopython3.org
<a>if-none-match: "7f806d-1a01-9fb97900" <span>&#x2461;</span></a>
<a>if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT <span>&#x2462;</span></a>
<a>if-none-match: "7f806d-1a01-9fb97900" <span class=u>&#x2461;</span></a>
<a>if-modified-since: Tue, 02 Jun 2009 02:51:48 GMT <span class=u>&#x2462;</span></a>
accept-encoding: deflate, gzip
user-agent: Python-httplib2/$Rev: 259 $'
<a>reply: 'HTTP/1.1 304 Not Modified' <span>&#x2463;</span></a></samp>
<a><samp class=p>>>> </samp><kbd>response.fromcache</kbd> <span>&#x2464;</span></a>
<a>reply: 'HTTP/1.1 304 Not Modified' <span class=u>&#x2463;</span></a></samp>
<a><samp class=p>>>> </samp><kbd>response.fromcache</kbd> <span class=u>&#x2464;</span></a>
<samp>True</samp>
<a><samp class=p>>>> </samp><kbd>response.status</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>response.status</kbd> <span class=u>&#x2465;</span></a>
<samp>200</samp>
<a><samp class=p>>>> </samp><kbd>response.dict['status']</kbd> <span>&#x2466;</span></a>
<a><samp class=p>>>> </samp><kbd>response.dict['status']</kbd> <span class=u>&#x2466;</span></a>
<samp>'304'</samp>
<a><samp class=p>>>> </samp><kbd>len(content)</kbd> <span>&#x2467;</span></a>
<a><samp class=p>>>> </samp><kbd>len(content)</kbd> <span class=u>&#x2467;</span></a>
<samp>6657</samp></pre>
<ol>
<li>You request the same page again, with the same <code>Http</code> object (and the same local cache).
@@ -501,11 +501,11 @@ user-agent: Python-httplib2/$Rev: 259 $'
<samp>connect: (diveintopython3.org, 80)
send: b'GET / HTTP/1.1
Host: diveintopython3.org
<a>accept-encoding: deflate, gzip <span>&#x2460;</span></a>
<a>accept-encoding: deflate, gzip <span class=u>&#x2460;</span></a>
user-agent: Python-httplib2/$Rev: 259 $'
reply: 'HTTP/1.1 200 OK'</samp>
<samp class=p>>>> </samp><kbd>print(dict(response.items()))</kbd>
<samp><a>{'-content-encoding': 'gzip', <span>&#x2461;</span></a>
<samp><a>{'-content-encoding': 'gzip', <span class=u>&#x2461;</span></a>
'accept-ranges': 'bytes',
'connection': 'close',
'content-length': '6657',
@@ -681,7 +681,8 @@ reply: 'HTTP/1.1 301 Moved Permanently'</samp>
<li><a href=http://code.google.com/p/doctype/wiki/ArticleHttpCaching>How to control caching with <abbr>HTTP</abbr> headers</a> on Google Doctype
</ul>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+2 -2
View File
@@ -13,10 +13,10 @@ h1:before{counter-increment:h1;content:''}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html>Dive Into Python 3</a> <span class=u>&#8227;</span>
<h1>Secret Leftover Page</h1>
<blockquote class=q>
<p><span>&#x275D;</span> You step in the stream / but the water has moved on. / This page is not here. <span>&#x275E;</span><br>&mdash; 404 Not Found haiku
<p><span class=u>&#x275D;</span> You step in the stream / but the water has moved on. / This page is not here. <span class=u>&#x275E;</span><br>&mdash; 404 Not Found haiku
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Huh?</h2>
+51 -52
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 6}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#iterators>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#iterators>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
<h1>Iterators</h1>
<blockquote class=q>
<p><span>&#x275D;</span> East is East, and West is West, and never the twain shall meet. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Rudyard_Kipling>Rudyard Kipling</a>
<p><span class=u>&#x275D;</span> East is East, and West is West, and never the twain shall meet. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Rudyard_Kipling>Rudyard Kipling</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -25,7 +25,7 @@ body{counter-reset:h1 6}
<p>Remember <a href=generators.html#a-fibonacci-generator>the Fibonacci generator</a>? Here it is as a built-from-scratch iterator:
<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
<pre><code>class Fib:
<pre><code class=pp>class Fib:
'''iterator that yields numbers in the Fibonacci sequence'''
def __init__(self, max):
@@ -45,7 +45,7 @@ body{counter-reset:h1 6}
<p>Let&#8217;s take that one line at a time.
<pre><code>class Fib:</code></pre>
<pre><code class=pp>class Fib:</code></pre>
<p><code>class</code>? What&#8217;s a class?
@@ -57,9 +57,8 @@ body{counter-reset:h1 6}
<p>Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word <code>class</code>, followed by the class name. Technically, that&#8217;s all that&#8217;s required, since a class doesn&#8217;t need to inherit from any other class.
<pre><code>
class PapayaWhip: <span>&#x2460;</span>
pass <span>&#x2461;</span></code></pre>
<pre><code class=pp><a>class PapayaWhip: <span class=u>&#x2460;</span></a>
<a> pass <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>The name of this class is <code>PapayaWhip</code>, and it doesn&#8217;t inherit from any other class. Class names are usually capitalized, <code>EachWordLikeThis</code>, but this is only a convention, not a requirement.
<li>You probably guessed this, but everything in a class is indented, just like the code within a function, <code>if</code> statement, <code>for</code> loop, or any other block of code. The first line not indented is outside the class.
@@ -68,7 +67,7 @@ class PapayaWhip: <span>&#x2460;</span>
<p>This <code>PapayaWhip</code> class doesn&#8217;t define any methods or attributes, but syntactically, there needs to be something in the definition, thus the <code>pass</code> statement. This is a Python reserved word that just means &#8220;move along, nothing to see here&#8221;. It&#8217;s a statement that does nothing, and it&#8217;s a good placeholder when you&#8217;re stubbing out functions or classes.
<blockquote class='note compare java'>
<p><span>&#x261E;</span>The <code>pass</code> statement in Python is like a empty set of curly braces (<code>{}</code>) in Java or C.
<p><span class=u>&#x261E;</span>The <code>pass</code> statement in Python is like a empty set of curly braces (<code>{}</code>) in Java or C.
</blockquote>
<p>Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don&#8217;t have explicit constructors and destructors. Although it&#8217;s not required, Python classes <em>can</em> have something similar to a constructor: the <code>__init__()</code> method.
@@ -77,11 +76,10 @@ class PapayaWhip: <span>&#x2460;</span>
<p>This example shows the initialization of the <code>Fib</code> class using the <code>__init__</code> method.
<pre><code>
class Fib:
<a> '''iterator that yields numbers in the Fibonacci sequence''' <span>&#x2460;</span></a>
<pre><code class=pp>class Fib:
<a> '''iterator that yields numbers in the Fibonacci sequence''' <span class=u>&#x2460;</span></a>
<a> def __init__(self, max): <span>&#x2461;</span></a></code></pre>
<a> def __init__(self, max): <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>Classes can (and should) have <code>docstring</code>s too, just like modules and functions.
<li>The <code>__init__()</code> method is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor of the class. It&#8217;s tempting, because it looks like a constructor (by convention, the <code>__init__()</code> method is the first method defined for the class), acts like one (it&#8217;s the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the <code>__init__()</code> method is called, and you already have a valid reference to the new instance of the class.
@@ -98,12 +96,12 @@ class Fib:
<p>Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the <code>__init__()</code> method requires. The return value will be the newly created object.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import fibonacci2</kbd>
<a><samp class=p>>>> </samp><kbd>fib = fibonacci2.Fib(100)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>fib</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>fib = fibonacci2.Fib(100)</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>fib</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;fibonacci2.Fib object at 0x00DB8810></samp>
<a><samp class=p>>>> </samp><kbd>fib.__class__</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>fib.__class__</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;class 'fibonacci2.Fib'></samp>
<a><samp class=p>>>> </samp><kbd>fib.__doc__</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>fib.__doc__</kbd> <span class=u>&#x2463;</span></a>
<samp>'iterator that yields numbers in the Fibonacci sequence'</samp></pre>
<ol>
<li>You are creating an instance of the <code>Fib</code> class (defined in the <code>fibonacci2</code> module) and assigning the newly created instance to the variable <var>fib</var>. You are passing one parameter, <code>100</code>, which will end up as the <var>max</var> argument in <code>Fib</code>&#8217;s <code>__init__()</code> method.
@@ -113,7 +111,7 @@ class Fib:
</ol>
<blockquote class='note compare java'>
<p><span>&#x261E;</span>In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit <code>new</code> operator like <abbr>C++</abbr> or Java.
<p><span class=u>&#x261E;</span>In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit <code>new</code> operator like <abbr>C++</abbr> or Java.
</blockquote>
<p class=a>&#x2042;
@@ -122,22 +120,22 @@ class Fib:
<p>On to the next line:
<pre><code>class Fib:
<pre><code class=pp>class Fib:
def __init__(self, max):
<a> self.max = max <span>&#x2460;</span></a></code></pre>
<a> self.max = max <span class=u>&#x2460;</span></a></code></pre>
<ol>
<li>What is <var>self.max</var>? It&#8217;s an instance variable. It is completely separate from <var>max</var>, which was passed into the <code>__init__()</code> method as an argument. <var>self.max</var> is &#8220;global&#8221; to the instance. That means that you can access it from other methods.
</ol>
<pre><code>class Fib:
<pre><code class=pp>class Fib:
def __init__(self, max):
<a> self.max = max <span>&#x2460;</span></a>
<a> self.max = max <span class=u>&#x2460;</span></a>
.
.
.
def __next__(self):
fib = self.a
<a> if fib > self.max: <span>&#x2461;</span></a></code></pre>
<a> if fib > self.max: <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li><var>self.max</var> is defined in the <code>__init__()</code> method&hellip;
<li>&hellip;and referenced in the <code>__next__()</code> method.
@@ -161,20 +159,20 @@ class Fib:
<p><em>Now</em> you&#8217;re ready to learn how to build an iterator. An iterator is just a class that defines an <code>__iter__()</code> method.
<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
<pre><code><a>class Fib: <span>&#x2460;</span></a>
<a> def __init__(self, max): <span>&#x2461;</span></a>
<pre><code class=pp><a>class Fib: <span class=u>&#x2460;</span></a>
<a> def __init__(self, max): <span class=u>&#x2461;</span></a>
self.max = max
<a> def __iter__(self): <span>&#x2462;</span></a>
<a> def __iter__(self): <span class=u>&#x2462;</span></a>
self.a, self.b = 0, 1
return self
<a> def __next__(self): <span>&#x2463;</span></a>
<a> def __next__(self): <span class=u>&#x2463;</span></a>
fib = self.a
if fib > self.max:
<a> raise StopIteration <span>&#x2464;</span></a>
<a> raise StopIteration <span class=u>&#x2464;</span></a>
self.a, self.b = self.b, self.a + self.b
<a> return fib <span>&#x2465;</span></a></code></pre>
<a> return fib <span class=u>&#x2465;</span></a></code></pre>
<ol>
<li>To build an iterator from scratch, <code>fib</code> needs to be a class, not a function.
<li>&#8220;Calling&#8221; <code>Fib(max)</code> is really creating an instance of this class and calling its <code>__init__()</code> method with <var>max</var>. The <code>__init__()</code> method saves the maximum value as an instance variable so other methods can refer to it later.
@@ -211,7 +209,7 @@ class Fib:
<p>Now it&#8217;s time for the finale. Let&#8217;s rewrite the <a href=generators.html>plural rules generator</a> as an iterator.
<p class=d>[<a href=examples/plural6.py>download <code>plural6.py</code></a>]
<pre><code>class LazyRules:
<pre><code class=pp>class LazyRules:
rules_filename = 'plural6-rules.txt'
def __init__(self):
@@ -247,12 +245,12 @@ rules = LazyRules()</code></pre>
<p>Let&#8217;s take the class one bite at a time.
<pre><code>class LazyRules:
<pre><code class=pp>class LazyRules:
rules_filename = 'plural6-rules.txt'
<a> def __init__(self): <span>&#x2460;</span></a>
<a> self.pattern_file = open(self.rules_filename) <span>&#x2462;</span></a>
<a> self.cache = [] <span>&#x2461;</span></a></code></pre>
<a> def __init__(self): <span class=u>&#x2460;</span></a>
<a> self.pattern_file = open(self.rules_filename) <span class=u>&#x2462;</span></a>
<a> self.cache = [] <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>The <code>__init__()</code> method is only going to be called once, when you instantiate the class and assign it to <var>rules</var>.
<li>Since this is only going to get called once, it&#8217;s the perfect place to open the pattern file. You&#8217;ll read it later; no point doing more than you absolutely have to until absolutely necessary!
@@ -265,16 +263,16 @@ rules = LazyRules()</code></pre>
<samp class=p>>>> </samp><kbd>import plural6</kbd>
<samp class=p>>>> </samp><kbd>r1 = plural6.LazyRules()</kbd>
<samp class=p>>>> </samp><kbd>r2 = plural6.LazyRules()</kbd>
<samp class=p>>>> </samp><kbd>r1.rules_filename</kbd> <span>&#x2460;</span>
<samp class=p>>>> </samp><kbd>r1.rules_filename</kbd> <span class=u>&#x2460;</span>
<samp>'plural6-rules.txt'</samp>
<samp class=p>>>> </samp><kbd>r2.rules_filename</kbd>
<samp>'plural6-rules.txt'</samp>
<samp class=p>>>> </samp><kbd>r1.__class__.rules_filename</kbd> <span>&#x2461;</span>
<samp class=p>>>> </samp><kbd>r1.__class__.rules_filename</kbd> <span class=u>&#x2461;</span>
<samp>'plural6-rules.txt'</samp>
<samp class=p>>>> </samp><kbd>r1.__class__.rules_filename = 'papayawhip.txt'</kbd> <span>&#x2462;</span>
<samp class=p>>>> </samp><kbd>r1.__class__.rules_filename = 'papayawhip.txt'</kbd> <span class=u>&#x2462;</span>
<samp class=p>>>> </samp><kbd>r1.rules_filename</kbd>
<samp>'papayawhip.txt'</samp>
<samp class=p>>>> </samp><kbd>r2.rules_filename</kbd> <span>&#x2463;</span>
<samp class=p>>>> </samp><kbd>r2.rules_filename</kbd> <span class=u>&#x2463;</span>
<samp>'papayawhip.txt'</samp></pre>
<ol>
<li>FIXME
@@ -285,9 +283,9 @@ rules = LazyRules()</code></pre>
<p>And now back to our show.
<pre><code><a> def __iter__(self): <span>&#x2460;</span></a>
<a> self.cache_index = 0 <span>&#x2461;</span></a>
<a> return self <span>&#x2462;</span></a>
<pre><code class=pp><a> def __iter__(self): <span class=u>&#x2460;</span></a>
<a> self.cache_index = 0 <span class=u>&#x2461;</span></a>
<a> return self <span class=u>&#x2462;</span></a>
</code></pre>
<ol>
<li>The <code>__iter__()</code> method will be called every time someone &mdash; say, a <code>for</code> loop &mdash; calls <code>iter(rules)</code>.
@@ -295,14 +293,14 @@ rules = LazyRules()</code></pre>
<li>Finally, the <code>__iter__()</code> method returns <var>self</var>, which signals that this class will take care of returning its own values throughout an iteration.
</ol>
<pre><code><a> def __next__(self): <span>&#x2460;</span></a>
<pre><code class=pp><a> def __next__(self): <span class=u>&#x2460;</span></a>
.
.
.
pattern, search, replace = line.split(None, 3)
<a> funcs = build_match_and_apply_functions( <span>&#x2461;</span></a>
<a> funcs = build_match_and_apply_functions( <span class=u>&#x2461;</span></a>
pattern, search, replace)
<a> self.cache.append(funcs) <span>&#x2462;</span></a>
<a> self.cache.append(funcs) <span class=u>&#x2462;</span></a>
return funcs</code></pre>
<ol>
<li>The <code>__next__()</code> method gets called whenever someone &mdash; say, a <code>for</code> loop &mdash; calls <code>next(rules)</code>. This method will only make sense if we start at the end and work backwards. So let&#8217;s do that.
@@ -312,32 +310,32 @@ rules = LazyRules()</code></pre>
<p>Moving backwards&hellip;
<pre><code> def __next__(self):
<pre><code class=pp> def __next__(self):
.
.
.
<a> line = self.pattern_file.readline() <span>&#x2460;</span></a>
<a> if not line: <span>&#x2461;</span></a>
<a> line = self.pattern_file.readline() <span class=u>&#x2460;</span></a>
<a> if not line: <span class=u>&#x2461;</span></a>
self.pattern_file.close()
<a> raise StopIteration <span>&#x2462;</span></a>
<a> raise StopIteration <span class=u>&#x2462;</span></a>
.
.
.</code></pre>
<ol>
<li>A bit of advanced file trickery here. The <code>readline()</code> method (note: singular, not the plural <code>readlines()</code>) reads exactly one line from an open file. Specifically, the next line. (<em>File objects are iterators too! It&#8217;s iterators all the way down&hellip;</em>)
<li>If there was a line for <code>readline()</code> to read, <var>line</var> will not be an empty string. Even if the file contained a blank line, <var>line</var> would end up as the one-character string <code>'\n'</code> (a carriage return). If <var>line</var> is really an empty string, that means there are no more lines to read from the file.
<li>When we reach the end of the file, we should close the file and raise the magic <code>StopIteration</code> exception. Remember, we got to this point because we needed a match and apply function for the next rule. The next rule comes from the next line of the file&hellip; but there is no next line! Therefore, we have no value to return. The iteration is over. (<span>&#x266B;</span> The party&#8217;s over&hellip; <span>&#x266B;</span>)
<li>When we reach the end of the file, we should close the file and raise the magic <code>StopIteration</code> exception. Remember, we got to this point because we needed a match and apply function for the next rule. The next rule comes from the next line of the file&hellip; but there is no next line! Therefore, we have no value to return. The iteration is over. (<span class=u>&#x266B;</span> The party&#8217;s over&hellip; <span class=u>&#x266B;</span>)
</ol>
<p>Moving backwards all the way to the start of the <code>__next__()</code> method&hellip;
<pre><code> def __next__(self):
<pre><code class=pp> def __next__(self):
self.cache_index += 1
if len(self.cache) >= self.cache_index:
<a> return self.cache[self.cache_index - 1] <span>&#x2460;</span></a>
<a> return self.cache[self.cache_index - 1] <span class=u>&#x2460;</span></a>
if self.pattern_file.closed:
<a> raise StopIteration <span>&#x2461;</span></a>
<a> raise StopIteration <span class=u>&#x2461;</span></a>
.
.
.</code></pre>
@@ -374,8 +372,9 @@ rules = LazyRules()</code></pre>
<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
</ul>
<p class=v><a href=generators.html rel=prev title='back to &#8220;Generators&#8221;'><span>&#x261C;</span></a> <a href=advanced-iterators.html rel=next title='onward to &#8220;Advanced Iterators&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=generators.html rel=prev title='back to &#8220;Generators&#8221;'><span class=u>&#x261C;</span></a> <a href=advanced-iterators.html rel=next title='onward to &#8220;Advanced Iterators&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+4 -2
View File
@@ -29,7 +29,8 @@ POSSIBILITY OF SUCH DAMAGE.
var HS = {'visible': 'hide', 'hidden': 'show'};
$(document).ready(function() {
hideTOC();
prettyPrint();
/* "hide", "open in new window", and (optionally) "download" widgets on code & screen blocks */
$("pre > code").each(function(i) {
var pre = $(this.parentNode);
@@ -90,6 +91,7 @@ $(document).ready(function() {
}
});
});
}); /* document.ready */
function toggleCodeBlock(id) {
@@ -100,7 +102,7 @@ function toggleCodeBlock(id) {
function plainTextOnClick(id) {
var clone = $("#" + id).clone();
clone.find("div.w, span").remove();
clone.find("div.w, span.u").remove();
var win = window.open("about:blank", "plaintext", "toolbar=0,scrollbars=1,location=0,statusbar=0,menubar=0,resizable=1,width=600,height=400,left=35,top=75");
win.document.open();
win.document.write('<pre>' + clone.html());
+1427
View File
File diff suppressed because it is too large Load Diff
+79 -78
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 2}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#native-datatypes>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#native-datatypes>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=beginner>&#x2666;&#x2666;&#x2662;&#x2662;&#x2662;</span>
<h1>Native Datatypes</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Wonder is the foundation of all philosophy, inquiry its progress, ignorance its end. <span>&#x275E;</span><br>&mdash; Michel de Montaigne
<p><span class=u>&#x275D;</span> Wonder is the foundation of all philosophy, inquiry its progress, ignorance its end. <span class=u>&#x275E;</span><br>&mdash; Michel de Montaigne
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -39,7 +39,7 @@ body{counter-reset:h1 2}
<aside>You can use virtually any expression in a boolean context.</aside>
<p>Booleans are either true or false. Python has two constants, <code>True</code> and <code>False</code>, which can be used to assign boolean values directly. Expressions can also evaluate to a boolean value. In certain places (like <code>if</code> statements), Python expects an expression to evaluate to a boolean value. These places are called <i>boolean contexts</i>. You can use virtually any expression in a boolean context, and Python will try to determine its truth value. Different datatypes have different rules about which values are true or false in a boolean context. (This will make more sense once you see some concrete examples later in this chapter.)
<p>For example, take this snippet from <a href=your-first-python-program.html#divingin><code>humansize.py</code></a>:
<pre><code>if size &lt; 0:
<pre><code class=pp>if size &lt; 0:
raise ValueError('number must be non-negative')</code></pre>
<p><var>size</var> is an integer, <code>0</code> is an integer, and <code>&lt;</code> is a numerical operator. The result of the expression <code>size &lt; 0</code> is always a boolean. You can test this yourself in the Python interactive shell:
<pre class=screen>
@@ -57,11 +57,11 @@ body{counter-reset:h1 2}
<h2 id=numbers>Numbers</h2>
<p>Numbers are awesome. There are so many to choose from. Python supports both integers and floating point numbers. There&#8217;s no type declaration to distinguish them; Python tells them apart by the presence or absence of a decimal point.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>type(1)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>type(1)</kbd> <span class=u>&#x2460;</span></a>
<samp>&lt;class 'int'></samp>
<a><samp class=p>>>> </samp><kbd>1 + 1</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>1 + 1</kbd> <span class=u>&#x2461;</span></a>
<samp>2</samp>
<a><samp class=p>>>> </samp><kbd>1 + 1.0</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>1 + 1.0</kbd> <span class=u>&#x2462;</span></a>
<samp>2.0</samp>
<samp class=p>>>> </samp><kbd>type(2.0)</kbd>
<samp>&lt;class 'float'></samp></pre>
@@ -73,17 +73,17 @@ body{counter-reset:h1 2}
<h3 id=number-coercion>Coercing Integers To Floats And Vice-Versa</h3>
<p>As you just saw, some operators (like addition) will coerce integers to floating point numbers as needed. You can also coerce them by yourself.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>float(2)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>float(2)</kbd> <span class=u>&#x2460;</span></a>
<samp>2.0</samp>
<a><samp class=p>>>> </samp><kbd>int(2.0)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>int(2.0)</kbd> <span class=u>&#x2461;</span></a>
<samp>2</samp>
<a><samp class=p>>>> </samp><kbd>int(2.5)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>int(2.5)</kbd> <span class=u>&#x2462;</span></a>
<samp>2</samp>
<a><samp class=p>>>> </samp><kbd>int(-2.5)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>int(-2.5)</kbd> <span class=u>&#x2463;</span></a>
<samp>-2</samp>
<a><samp class=p>>>> </samp><kbd>1.12345678901234567890</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>1.12345678901234567890</kbd> <span class=u>&#x2464;</span></a>
<samp>1.1234567890123457</samp>
<a><samp class=p>>>> </samp><kbd>type(1000000000000000)</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>type(1000000000000000)</kbd> <span class=u>&#x2465;</span></a>
<samp>&lt;class 'int'></samp></pre>
<ol>
<li>You can explicitly coerce an <code>int</code> to a <code>float</code> by calling the <code>float()</code> function.
@@ -94,22 +94,22 @@ body{counter-reset:h1 2}
<li>Integers can be arbitrarily large.
</ol>
<blockquote class='note compare python2'>
<p><span>&#x261E;</span>Python 2 had separate types for <code>int</code> and <code>long</code>. The <code>int</code> datatype was limited by <code>sys.maxint</code>, which varied by platform but was usually <code>2<sup>32</sup>-1</code>. Python 3 has just one integer type, which behaves mostly like the old <code>long</code> type from Python 2. See <a href=http://www.python.org/dev/peps/pep-0237><abbr>PEP</abbr> 237</a> for details.
<p><span class=u>&#x261E;</span>Python 2 had separate types for <code>int</code> and <code>long</code>. The <code>int</code> datatype was limited by <code>sys.maxint</code>, which varied by platform but was usually <code>2<sup>32</sup>-1</code>. Python 3 has just one integer type, which behaves mostly like the old <code>long</code> type from Python 2. See <a href=http://www.python.org/dev/peps/pep-0237><abbr>PEP</abbr> 237</a> for details.
</blockquote>
<h3 id=common-numerical-operations>Common Numerical Operations</h3>
<p>You can do all kinds of things with numbers.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>11 / 2</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>11 / 2</kbd> <span class=u>&#x2460;</span></a>
<samp>5.5</samp>
<a><samp class=p>>>> </samp><kbd>11 // 2</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>11 // 2</kbd> <span class=u>&#x2461;</span></a>
<samp>5</samp>
<a><samp class=p>>>> </samp><kbd>&minus;11 // 2</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>&minus;11 // 2</kbd> <span class=u>&#x2462;</span></a>
<samp>&minus;6</samp>
<a><samp class=p>>>> </samp><kbd>11.0 // 2</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>11.0 // 2</kbd> <span class=u>&#x2463;</span></a>
<samp>5.0</samp>
<a><samp class=p>>>> </samp><kbd>11 ** 2</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>11 ** 2</kbd> <span class=u>&#x2464;</span></a>
<samp>121</samp>
<a><samp class=p>>>> </samp><kbd>11 % 2</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>11 % 2</kbd> <span class=u>&#x2465;</span></a>
<samp>1</samp>
</pre>
<ol>
@@ -121,18 +121,18 @@ body{counter-reset:h1 2}
<li>The <code>%</code> operator gives the remainder after performing integer division. <code>11</code> divided by <code>2</code> is <code>5</code> with a remainder of <code>1</code>, so the result here is <code>1</code>.
</ol>
<blockquote class='note compare python2'>
<p><span>&#x261E;</span>In Python 2, the <code>/</code> operator usually meant integer division, but you could make it behave like floating point division by including a special directive in your code. In Python 3, the <code>/</code> operator always means floating point division. See <a href=http://www.python.org/dev/peps/pep-0238/><abbr>PEP</abbr> 238</a> for details.
<p><span class=u>&#x261E;</span>In Python 2, the <code>/</code> operator usually meant integer division, but you could make it behave like floating point division by including a special directive in your code. In Python 3, the <code>/</code> operator always means floating point division. See <a href=http://www.python.org/dev/peps/pep-0238/><abbr>PEP</abbr> 238</a> for details.
</blockquote>
<h3 id=fractions>Fractions</h3>
<p>Python isn&#8217;t limited to integers and floating point numbers. It can also do all the fancy math you learned in high school and promptly forgot about.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>import fractions</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>x = fractions.Fraction(1, 3)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>import fractions</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>x = fractions.Fraction(1, 3)</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>x</kbd>
<samp>Fraction(1, 3)</samp>
<a><samp class=p>>>> </samp><kbd>x * 2</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>x * 2</kbd> <span class=u>&#x2462;</span></a>
<samp>Fraction(2, 3)</samp>
<a><samp class=p>>>> </samp><kbd>fractions.Fraction(6, 4)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>fractions.Fraction(6, 4)</kbd> <span class=u>&#x2463;</span></a>
<samp>Fraction(3, 2)</samp></pre>
<ol>
<li>To start using fractions, import the <code>fractions</code> module.
@@ -144,11 +144,11 @@ body{counter-reset:h1 2}
<p>You can also do basic trigonometry in Python.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import math</kbd>
<a><samp class=p>>>> </samp><kbd>math.pi</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>math.pi</kbd> <span class=u>&#x2460;</span></a>
<samp>3.1415926535897931</samp>
<a><samp class=p>>>> </samp><kbd>math.sin(math.pi / 2)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>math.sin(math.pi / 2)</kbd> <span class=u>&#x2461;</span></a>
<samp>1.0</samp>
<a><samp class=p>>>> </samp><kbd>math.tan(math.pi / 4)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>math.tan(math.pi / 4)</kbd> <span class=u>&#x2462;</span></a>
<samp>0.99999999999999989</samp></pre>
<ol>
<li>The <code>math</code> module has a constant for &pi;, the ratio of a circle&#8217;s circumference to its diameter.
@@ -159,24 +159,24 @@ body{counter-reset:h1 2}
<aside>Zero values are false, and non-zero values are true.</aside>
<p>You can use numbers <a href=#booleans>in a boolean context</a>, such as an <code>if</code> statement. Zero values are false, and non-zero values are true.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>def is_it_true(anything):</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>def is_it_true(anything):</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>... </samp><kbd> if anything:</kbd>
<samp class=p>... </samp><kbd> print('yes, it's true')</kbd>
<samp class=p>... </samp><kbd> else:</kbd>
<samp class=p>... </samp><kbd> print('no, it's false')</kbd>
<samp class=p>...</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true(1)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true(1)</kbd> <span class=u>&#x2461;</span></a>
<samp>yes, it's true</samp>
<samp class=p>>>> </samp><kbd>is_it_true(-1)</kbd>
<samp>yes, it's true</samp>
<samp class=p>>>> </samp><kbd>is_it_true(0)</kbd>
<samp>no, it's false</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true(0.1)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true(0.1)</kbd> <span class=u>&#x2462;</span></a>
<samp>yes, it's true</samp>
<samp class=p>>>> </samp><kbd>is_it_true(0.0)</kbd>
<samp>no, it's false</samp>
<samp class=p>>>> </samp><kbd>import fractions</kbd>
<a><samp class=p>>>> </samp><kbd>is_it_true(fractions.Fraction(1, 2))</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true(fractions.Fraction(1, 2))</kbd> <span class=u>&#x2463;</span></a>
<samp>yes, it's true</samp>
<samp class=p>>>> </samp><kbd>is_it_true(fractions.Fraction(0, 1))</kbd>
<samp>no, it's false</samp></pre>
@@ -191,24 +191,24 @@ body{counter-reset:h1 2}
<h2 id=lists>Lists</h2>
<p>Lists are Python&#8217;s workhorse datatype. When I say &#8220;list,&#8221; you might be thinking &#8220;array whose size I have to declare in advance, that can only contain items of the same type, <i class=baa>&amp;</i>c.&#8221; Don&#8217;t think that. Lists are much cooler than that.
<blockquote class='note compare perl5'>
<p><span>&#x261E;</span>A list in Python is like an array in Perl 5. In Perl 5, variables that store arrays always start with the <code>@</code> character; in Python, variables can be named anything, and Python keeps track of the datatype internally.
<p><span class=u>&#x261E;</span>A list in Python is like an array in Perl 5. In Perl 5, variables that store arrays always start with the <code>@</code> character; in Python, variables can be named anything, and Python keeps track of the datatype internally.
</blockquote>
<blockquote class='note compare java'>
<p><span>&#x261E;</span>A list in Python is much more than an array in Java (although it can be used as one if that&#8217;s really all you want out of life). A better analogy would be to the <code>ArrayList</code> class, which can hold arbitrary objects and can expand dynamically as new items are added.
<p><span class=u>&#x261E;</span>A list in Python is much more than an array in Java (although it can be used as one if that&#8217;s really all you want out of life). A better analogy would be to the <code>ArrayList</code> class, which can hold arbitrary objects and can expand dynamically as new items are added.
</blockquote>
<h3 id=creatinglists>Creating A List</h3>
<p>Creating a list is easy: use square brackets to wrap a comma-separated list of values.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>a_list = ['a', 'b', 'mpilgrim', 'z', 'example']</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list = ['a', 'b', 'mpilgrim', 'z', 'example']</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
['a', 'b', 'mpilgrim', 'z', 'example']
<a><samp class=p>>>> </samp><kbd>a_list[0]</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[0]</kbd> <span class=u>&#x2461;</span></a>
<samp>'a'</samp>
<a><samp class=p>>>> </samp><kbd>a_list[4]</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[4]</kbd> <span class=u>&#x2462;</span></a>
<samp>'example'</samp>
<a><samp class=p>>>> </samp><kbd>a_list[-1]</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[-1]</kbd> <span class=u>&#x2463;</span></a>
<samp>'example'</samp>
<a><samp class=p>>>> </samp><kbd>a_list[-3]</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[-3]</kbd> <span class=u>&#x2464;</span></a>
<samp>'mpilgrim'</samp></pre>
<ol>
<li>First, you define a list of five items. Note that they retain their original order. This is not an accident. A list is an ordered set of items.
@@ -223,17 +223,17 @@ body{counter-reset:h1 2}
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 'b', 'mpilgrim', 'z', 'example']</samp>
<a><samp class=p>>>> </samp><kbd>a_list[1:3]</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[1:3]</kbd> <span class=u>&#x2460;</span></a>
<samp>['b', 'mpilgrim']</samp>
<a><samp class=p>>>> </samp><kbd>a_list[1:-1]</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[1:-1]</kbd> <span class=u>&#x2461;</span></a>
<samp>['b', 'mpilgrim', 'z']</samp>
<a><samp class=p>>>> </samp><kbd>a_list[0:3]</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[0:3]</kbd> <span class=u>&#x2462;</span></a>
<samp>['a', 'b', 'mpilgrim']</samp>
<a><samp class=p>>>> </samp><kbd>a_list[:3]</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[:3]</kbd> <span class=u>&#x2463;</span></a>
<samp>['a', 'b', 'mpilgrim']</samp>
<a><samp class=p>>>> </samp><kbd>a_list[3:]</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[3:]</kbd> <span class=u>&#x2464;</span></a>
<samp>['z', 'example']</samp>
<a><samp class=p>>>> </samp><kbd>a_list[:]</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list[:]</kbd> <span class=u>&#x2465;</span></a>
['a', 'b', 'mpilgrim', 'z', 'example']</pre>
<ol>
<li>You can get a part of a list, called a &#8220;slice&#8221;, by specifying two indices. The return value is a new list containing all the items of the list, in order, starting with the first slice index (in this case <code>a_list[1]</code>), up to but not including the second slice index (in this case <code>a_list[3]</code>).
@@ -247,16 +247,16 @@ body{counter-reset:h1 2}
<p>There are four ways to add items to a list.
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_list = ['a']</kbd>
<a><samp class=p>>>> </samp><kbd>a_list = a_list + [2.0, 3]</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list = a_list + [2.0, 3]</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 2.0, 3]</samp>
<a><samp class=p>>>> </samp><kbd>a_list.append(True)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.append(True)</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 2.0, 3, True]</samp>
<a><samp class=p>>>> </samp><kbd>a_list.extend(['four', 'e'])</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.extend(['four', 'e'])</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 2.0, 3, True, 'four', 'e']</samp>
<a><samp class=p>>>> </samp><kbd>a_list.insert(1, 'a')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.insert(1, 'a')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 'a', 2.0, 3, True, 'four', 'e']</samp></pre>
<ol>
@@ -268,17 +268,17 @@ body{counter-reset:h1 2}
<p>Let&#8217;s look closer at the difference between <code>append()</code> and <code>extend()</code>.
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_list = ['a', 'b', 'c']</kbd>
<a><samp class=p>>>> </samp><kbd>a_list.extend(['d', 'e', 'f'])</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.extend(['d', 'e', 'f'])</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 'b', 'c', 'd', 'e', 'f']</samp>
<a><samp class=p>>>> </samp><kbd>len(a_list)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>len(a_list)</kbd> <span class=u>&#x2461;</span></a>
<samp>6</samp>
<samp class=p>>>> </samp><kbd>a_list[-1]</kbd>
<samp>'f'</samp>
<a><samp class=p>>>> </samp><kbd>a_list.append(['g', 'h', 'i'])</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.append(['g', 'h', 'i'])</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['a', 'b', 'c', 'd', 'e', 'f', ['g', 'h', 'i']]</samp>
<a><samp class=p>>>> </samp><kbd>len(a_list)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>len(a_list)</kbd> <span class=u>&#x2463;</span></a>
<samp>7</samp>
<samp class=p>>>> </samp><kbd>a_list[-1]</kbd>
<samp>['g', 'h', 'i']</samp></pre>
@@ -291,15 +291,15 @@ body{counter-reset:h1 2}
<h3 id=searchinglists>Searching For Values In A List</h3>
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_list = ['a', 'b', 'new', 'mpilgrim', 'new']</kbd>
<a><samp class=p>>>> </samp><kbd>'mpilgrim' in a_list</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>'mpilgrim' in a_list</kbd> <span class=u>&#x2460;</span></a>
<samp>True</samp>
<a><samp class=p>>>> </samp><kbd>a_list.index('mpilgrim')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.index('mpilgrim')</kbd> <span class=u>&#x2461;</span></a>
<samp>3</samp>
<a><samp class=p>>>> </samp><kbd>a_list.index('new')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.index('new')</kbd> <span class=u>&#x2462;</span></a>
<samp>2</samp>
<a><samp class=p>>>> </samp><kbd>'c' in a_list</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>'c' in a_list</kbd> <span class=u>&#x2463;</span></a>
<samp>False</samp>
<a><samp class=p>>>> </samp><kbd>a_list.index('c')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list.index('c')</kbd> <span class=u>&#x2464;</span></a>
<samp class=traceback>Traceback (innermost last):
File "&lt;interactive input>", line 1, in ?
ValueError: list.index(x): x not in list</samp></pre>
@@ -320,11 +320,11 @@ ValueError: list.index(x): x not in list</samp></pre>
<samp class=p>... </samp><kbd> else:</kbd>
<samp class=p>... </samp><kbd> print('no, it's false')</kbd>
<samp class=p>...</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true([])</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true([])</kbd> <span class=u>&#x2461;</span></a>
<samp>no, it's false</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true(['a'])</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true(['a'])</kbd> <span class=u>&#x2462;</span></a>
<samp>yes, it's true</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true([False])</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true([False])</kbd> <span class=u>&#x2463;</span></a>
<samp>yes, it's true</samp></pre>
<ol>
<li>In a boolean context, an empty list is false.
@@ -342,19 +342,19 @@ ValueError: list.index(x): x not in list</samp></pre>
<h2 id=dictionaries>Dictionaries</h2>
<p>One of Python&#8217;s most important datatypes is the dictionary, which defines one-to-one relationships between keys and values.
<blockquote class='note compare perl5'>
<p><span>&#x261E;</span>A dictionary in Python is like a hash in Perl 5. In Perl 5, variables that store hashes always start with a <code>%</code> character. In Python, variables can be named anything, and Python keeps track of the datatype internally.
<p><span class=u>&#x261E;</span>A dictionary in Python is like a hash in Perl 5. In Perl 5, variables that store hashes always start with a <code>%</code> character. In Python, variables can be named anything, and Python keeps track of the datatype internally.
</blockquote>
<h3 id=creating-dictionaries>Creating A Dictionary</h3>
<p>Creating a dictionary is easy. The syntax is similar to <a href=#sets>sets</a>, but instead of values, you have key-value pairs. Once you have a dictionary, you can look up values by their key.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>a_dict = {'server':'db.diveintopython3.org', 'database':'mysql'}</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict = {'server':'db.diveintopython3.org', 'database':'mysql'}</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>a_dict</kbd>
<samp>{'server': 'db.diveintopython3.org', 'database': 'mysql'}</samp>
<a><samp class=p>>>> </samp><kbd>a_dict['server']</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['server']</kbd> <span class=u>&#x2461;</span></a>
'db.diveintopython3.org'
<a><samp class=p>>>> </samp><kbd>a_dict['database']</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['database']</kbd> <span class=u>&#x2462;</span></a>
'mysql'
<a><samp class=p>>>> </samp><kbd>a_dict['db.diveintopython3.org']</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['db.diveintopython3.org']</kbd> <span class=u>&#x2463;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
KeyError: 'db.diveintopython3.org'</samp></pre>
@@ -369,16 +369,16 @@ KeyError: 'db.diveintopython3.org'</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_dict</kbd>
<samp>{'server': 'db.diveintopython3.org', 'database': 'mysql'}</samp>
<a><samp class=p>>>> </samp><kbd>a_dict['database'] = 'blog'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['database'] = 'blog'</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>a_dict</kbd>
<samp>{'server': 'db.diveintopython3.org', 'database': 'blog'}</samp>
<a><samp class=p>>>> </samp><kbd>a_dict['user'] = 'mark'</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['user'] = 'mark'</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict</kbd> <span class=u>&#x2462;</span></a>
<samp>{'server': 'db.diveintopython3.org', 'user': 'mark', 'database': 'blog'}</samp>
<a><samp class=p>>>> </samp><kbd>a_dict['user'] = 'dora'</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['user'] = 'dora'</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>a_dict</kbd>
<samp>{'server': 'db.diveintopython3.org', 'user': 'dora', 'database': 'blog'}</samp>
<a><samp class=p>>>> </samp><kbd>a_dict['User'] = 'mark'</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict['User'] = 'mark'</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp><kbd>a_dict</kbd>
<samp>{'User': 'mark', 'server': 'db.diveintopython3.org', 'user': 'dora', 'database': 'blog'}</samp></pre>
<ol>
@@ -391,19 +391,19 @@ KeyError: 'db.diveintopython3.org'</samp></pre>
<h3 id=mixed-value-dictionaries>Mixed-Value Dictionaries</h3>
<p>Dictionaries aren&#8217;t just for strings. Dictionary values can be any datatype, including integers, booleans, arbitrary objects, or even other dictionaries. And within a single dictionary, the values don&#8217;t all need to be the same type; you can mix and match as needed. Dictionary keys are more restricted, but they can be strings, integers, and a few other types. You can also mix and match key datatypes within a dictionary.
<p>In fact, you&#8217;ve already seen a dictionary with non-string keys and values, in <a href=your-first-python-program.html#divingin>your first Python program</a>.
<pre><code>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
<pre><code class=pp>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}</code></pre>
<p>Let's tear that apart in the interactive shell.
<pre class=screen>
<samp class=p>>>> </samp><kbd>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],</kbd>
<samp class=p>... </samp><kbd> 1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}</kbd>
<a><samp class=p>>>> </samp><kbd>len(SUFFIXES)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>len(SUFFIXES)</kbd> <span class=u>&#x2460;</span></a>
<samp>2</samp>
<a><samp class=p>>>> </samp><kbd>SUFFIXES[1000]</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>SUFFIXES[1000]</kbd> <span class=u>&#x2461;</span></a>
<samp>['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']</samp>
<a><samp class=p>>>> </samp><kbd>SUFFIXES[1024]</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>SUFFIXES[1024]</kbd> <span class=u>&#x2462;</span></a>
<samp>['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']</samp>
<a><samp class=p>>>> </samp><kbd>SUFFIXES[1000][3]</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>SUFFIXES[1000][3]</kbd> <span class=u>&#x2463;</span></a>
<samp>'TB'</samp></pre>
<ol>
<li>As with <a href=#lists>lists</a><!-- and <a href=#sets>sets</a>-->, the <code>len()</code> function gives you the number of items in a dictionary.
@@ -421,9 +421,9 @@ KeyError: 'db.diveintopython3.org'</samp></pre>
<samp class=p>... </samp><kbd> else:</kbd>
<samp class=p>... </samp><kbd> print('no, it's false')</kbd>
<samp class=p>...</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true({})</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true({})</kbd> <span class=u>&#x2460;</span></a>
<samp>no, it's false</samp>
<a><samp class=p>>>> </samp><kbd>is_it_true({'a': 1})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>is_it_true({'a': 1})</kbd> <span class=u>&#x2461;</span></a>
<samp>yes, it's true</samp></pre>
<ol>
<li>In a boolean context, an empty dictionary is false.
@@ -474,7 +474,8 @@ KeyError: 'db.diveintopython3.org'</samp></pre>
<li><a href=http://www.python.org/dev/peps/pep-0237/><abbr>PEP</abbr> 237: Unifying Long Integers and Integers</a>
<li><a href=http://www.python.org/dev/peps/pep-0238/><abbr>PEP</abbr> 238: Changing the Division Operator</a>
</ul>
<p class=v><a href=your-first-python-program.html rel=prev title='back to &#8220;Your First Python Program&#8221;'><span>&#x261C;</span></a> <a href=strings.html rel=next title='onward to &#8220;Strings&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=your-first-python-program.html rel=prev title='back to &#8220;Your First Python Program&#8221;'><span class=u>&#x261C;</span></a> <a href=strings.html rel=next title='onward to &#8220;Strings&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
File diff suppressed because it is too large Load Diff
+13
View File
@@ -110,3 +110,16 @@ aside {
.w, .d, form, form + p, #level, #toc {
display: none !important;
}
/* syntax highlighting */
.str { color: #060; }
.kwd { color: #006; font-weight: bold; }
.com { color: #600; font-style: italic; }
.typ { color: #404; font-weight: bold; }
.lit { color: #044; }
.pun { color: #440; }
.pln { color: #000; }
.tag { color: #006; font-weight: bold; }
.atn { color: #404; }
.atv { color: #060; }
+4 -2
View File
@@ -31,12 +31,14 @@ done
# minimize JS and CSS
echo "minimizing JS"
revision=`hg log|grep changeset|cut -d":" -f3|head -1`
java -jar util/yuicompressor-2.4.2.jar build/j/dip3.js > build/j/dip3-$revision.js
java -jar util/yuicompressor-2.4.2.jar build/j/prettify.js > build/j/prettify.min.js
java -jar util/yuicompressor-2.4.2.jar build/j/dip3.js > build/j/dip3.min.js
# combine jQuery and our script
echo "combining JS"
cat build/j/jquery.min.js build/j/dip3-$revision.js > build/j/$revision.js
cat build/j/jquery.min.js build/j/prettify.min.js build/j/dip3.min.js > build/j/$revision.js
sed -i -e "s|<script src=j/jquery.js></script>||g" build/*.html
sed -i -e "s|<script src=j/prettify.js></script>||g" build/*.html
sed -i -e "s|<script src=j/dip3.js>|<script src=j/${revision}.js>|g" build/*.html
echo "minimizing CSS"
+33 -34
View File
@@ -12,18 +12,18 @@ body{counter-reset:h1 10}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#refactoring>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#refactoring>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
<h1>Refactoring</h1>
<blockquote class=q>
<p><span>&#x275D;</span> After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Fr%C3%A9d%C3%A9ric_Chopin>Fr&eacute;d&eacute;ric Chopin</a>
<p><span class=u>&#x275D;</span> After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Fr%C3%A9d%C3%A9ric_Chopin>Fr&eacute;d&eacute;ric Chopin</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
<p class=f>Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by &#8220;bug&#8221;? A bug is a test case you haven&#8217;t written yet.
<pre class=screen><samp class=p>>>> </samp><kbd>import roman7</kbd>
<a><samp class=p>>>> </samp><kbd>roman7.from_roman('')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>roman7.from_roman('')</kbd> <span class=u>&#x2460;</span></a>
<samp>0</samp></pre>
<ol>
<li>Remember in the [FIXME-xref] previous section when you kept seeing that an empty string would match the regular expression you were using to check for valid Roman numerals? Well, it turns out that this is still true for the final version of the regular expression. And that&#8217;s a bug; you want an empty string to raise an <code>InvalidRomanNumeralError</code> exception just like any other sequence of characters that don&#8217;t represent a valid Roman numeral.
@@ -31,13 +31,13 @@ body{counter-reset:h1 10}
<p>After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.
<pre><code>class FromRomanBadInput(unittest.TestCase):
<pre><code class=pp>class FromRomanBadInput(unittest.TestCase):
.
.
.
def testBlank(self):
'''from_roman should fail with blank string'''
<a> self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, '') <span>&#x2460;</span></a></code></pre>
<a> self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, '') <span class=u>&#x2460;</span></a></code></pre>
<ol>
<li>Pretty simple stuff here. Call <code>from_roman()</code> with an empty string and make sure it raises an <code>InvalidRomanNumeralError</code> exception. The hard part was finding the bug; now that you know about it, testing for it is the easy part.
</ol>
@@ -72,9 +72,9 @@ FAILED (failures=1)</samp></pre>
<p><em>Now</em> you can fix the bug.
<pre><code>def from_roman(s):
<pre><code class=pp>def from_roman(s):
'''convert Roman numeral to integer'''
<a> if not s: <span>&#x2460;</span></a>
<a> if not s: <span class=u>&#x2460;</span></a>
raise InvalidRomanNumeralError, 'Input can not be blank'
if not re.search(romanNumeralPattern, s):
raise InvalidRomanNumeralError, 'Invalid Roman numeral: {0}'.format(s)
@@ -92,7 +92,7 @@ FAILED (failures=1)</samp></pre>
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest8.py -v</kbd>
<a><samp>from_roman should fail with blank string ... ok</samp> <span>&#x2460;</span></a>
<a><samp>from_roman should fail with blank string ... ok</samp> <span class=u>&#x2460;</span></a>
<samp>from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
@@ -107,7 +107,7 @@ to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 11 tests in 0.156s
</samp>
<a><samp>OK</samp> <span>&#x2461;</span></a></pre>
<a><samp>OK</samp> <span class=u>&#x2461;</span></a></pre>
<ol>
<li>The blank string test case now passes, so the bug is fixed.
<li>All the other test cases still pass, which means that this bug fix didn&#8217;t break anything else. Stop coding.
@@ -123,14 +123,13 @@ Ran 11 tests in 0.156s
<p>Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember [FIXME-xref] the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 <code>M</code> characters in a row to represent <code>4000</code>. If you make this change, you&#8217;ll be able to expand the range of convertible numbers from <code>1..3999</code> to <code>1..4999</code>. But first, you need to make some changes to your test cases.
<p class=d>[<a href=examples/roman8.py>download <code>roman8.py</code></a>]
<pre><code>
class KnownValues(unittest.TestCase):
<pre><code class=pp>class KnownValues(unittest.TestCase):
known_values = ( (1, 'I'),
.
.
.
(3999, 'MMMCMXCIX'),
<a> (4000, 'MMMM'), <span>&#x2460;</span></a>
<a> (4000, 'MMMM'), <span class=u>&#x2460;</span></a>
(4500, 'MMMMD'),
(4888, 'MMMMDCCCLXXXVIII'),
(4999, 'MMMMCMXCIX') )
@@ -138,7 +137,7 @@ class KnownValues(unittest.TestCase):
class ToRomanBadInput(unittest.TestCase):
def test_too_large(self):
'''to_roman should fail with large input'''
<a> self.assertRaises(roman8.OutOfRangeError, roman8.to_roman, 5000) <span>&#x2461;</span></a>
<a> self.assertRaises(roman8.OutOfRangeError, roman8.to_roman, 5000) <span class=u>&#x2461;</span></a>
.
.
@@ -147,7 +146,7 @@ class ToRomanBadInput(unittest.TestCase):
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
'''from_roman should fail with too many repeated numerals'''
<a> for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): <span>&#x2462;</span></a>
<a> for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): <span class=u>&#x2462;</span></a>
self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s)
.
@@ -157,7 +156,7 @@ class FromRomanBadInput(unittest.TestCase):
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
'''from_roman(to_roman(n))==n for all n'''
<a> for integer in range(1, 5000): <span>&#x2463;</span></a>
<a> for integer in range(1, 5000): <span class=u>&#x2463;</span></a>
numeral = roman8.to_roman(integer)
result = roman8.from_roman(numeral)
self.assertEqual(integer, result)</code></pre>
@@ -177,9 +176,9 @@ from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
<a>from_roman should give known result with known input ... ERROR <span>&#x2460;</span></a>
<a>to_roman should give known result with known input ... ERROR <span>&#x2461;</span></a>
<a>from_roman(to_roman(n))==n for all n ... ERROR <span>&#x2462;</span></a>
<a>from_roman should give known result with known input ... ERROR <span class=u>&#x2460;</span></a>
<a>to_roman should give known result with known input ... ERROR <span class=u>&#x2461;</span></a>
<a>from_roman(to_roman(n))==n for all n ... ERROR <span class=u>&#x2462;</span></a>
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
@@ -228,10 +227,9 @@ FAILED (errors=3)</samp></pre>
<p>Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being tested is never &#8220;ahead&#8221; of the test cases. While it&#8217;s behind, you still have some work to do, and as soon as it catches up to the test cases, you stop coding.)
<p class=d>[<a href=examples/roman9.py>download <code>roman9.py</code></a>]
<pre><code>
roman_numeral_pattern = re.compile('''
<pre><code class=pp>roman_numeral_pattern = re.compile('''
^ # beginning of string
<a> M{0,4} # thousands - 0 to 4 M's <span>&#x2460;</span></a>
<a> M{0,4} # thousands - 0 to 4 M's <span class=u>&#x2460;</span></a>
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
@@ -243,7 +241,7 @@ roman_numeral_pattern = re.compile('''
def to_roman(n):
'''convert integer to Roman numeral'''
<a> if not (0 < n < 5000): <span>&#x2461;</span></a>
<a> if not (0 < n < 5000): <span class=u>&#x2461;</span></a>
raise OutOfRangeError('number out of range (must be 0..4999)')
if not isinstance(n, int):
raise NotIntegerError('non-integers can not be converted')
@@ -284,7 +282,7 @@ to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 12 tests in 0.203s
<a>OK <span>&#x2460;</span></a></samp></pre>
<a>OK <span class=u>&#x2460;</span></a></samp></pre>
<ol>
<li>All the test cases pass. Stop coding.
</ol>
@@ -306,7 +304,7 @@ Ran 12 tests in 0.203s
<p>And best of all, you already have a complete set of unit tests. You can change over half the code in the module, but the unit tests will stay the same. That means you can prove &mdash; to yourself and to others &mdash; that the new code works just as well as the original.
<p class=d>[<a href=examples/roman10.py>download <code>roman10.py</code></a>]
<pre><code>class OutOfRangeError(ValueError): pass
<pre><code class=pp>class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
class InvalidRomanNumeralError(ValueError): pass
@@ -366,19 +364,19 @@ build_lookup_tables()</code></pre>
<p>Let&#8217;s break that down into digestable pieces. Arguably, the most important line is the last one:
<pre><code>build_lookup_tables()</code></pre>
<pre><code class=pp>build_lookup_tables()</code></pre>
<p>You will note that is a function call, but there&#8217;s no <code>if</code> statement around it. This is not an <code>if __name__ == '__main__'</code> block; it gets called <em>when the module is imported</em>. (It is important to understand that modules are only imported once, then cached. If you import an already-imported module, it does nothing. So this code will only get called the first time you import this module.)
<p>So what does the <code>build_lookup_tables()</code> function do? I&#8217;m glad you asked.
<pre><code><a>to_roman_table = [ None ]
<pre><code class=pp>to_roman_table = [ None ]
from_roman_table = {}
.
.
.
def build_lookup_tables():
<a> def to_roman(n): <span>&#x2460;</span></a>
<a> def to_roman(n): <span class=u>&#x2460;</span></a>
result = ''
for numeral, integer in roman_numeral_map:
if n >= integer:
@@ -390,8 +388,8 @@ def build_lookup_tables():
return result
for integer in range(1, 5000):
<a> roman_numeral = to_roman(integer) <span>&#x2461;</span></a>
<a> to_roman_table.append(roman_numeral) <span>&#x2462;</span></a>
<a> roman_numeral = to_roman(integer) <span class=u>&#x2461;</span></a>
<a> to_roman_table.append(roman_numeral) <span class=u>&#x2462;</span></a>
from_roman_table[roman_numeral] = integer</code></pre>
<ol>
<li>This is a clever bit of programming&hellip; perhaps too clever. The <code>to_roman()</code> function is defined above; it looks up values in the lookup table and returns them. But the <code>build_lookup_tables()</code> function redefines the <code>to_roman()</code> function to actually do work (like the previous examples did, before you added a lookup table). Within the <code>build_lookup_tables()</code> function, calling <code>to_roman()</code> will call this redefined version. Once the <code>build_lookup_tables()</code> function exits, the redefined version disappears &mdash; it is only defined in the local scope of the <code>build_lookup_tables()</code> function.
@@ -401,13 +399,13 @@ def build_lookup_tables():
<p>Once the lookup tables are built, the rest of the code is both easy and fast.
<pre><code>def to_roman(n):
<pre><code class=pp>def to_roman(n):
'''convert integer to Roman numeral'''
if not (0 < n < 5000):
raise OutOfRangeError('number out of range (must be 1..4999)')
if int(n) != n:
raise NotIntegerError('non-integers can not be converted')
<a> return to_roman_table[n] <span>&#x2460;</span></a>
<a> return to_roman_table[n] <span class=u>&#x2460;</span></a>
def from_roman(s):
'''convert Roman numeral to integer'''
@@ -417,7 +415,7 @@ def from_roman(s):
raise InvalidRomanNumeralError('Input can not be blank')
if s not in from_roman_table:
raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s))
<a> return from_roman_table[s] <span>&#x2461;</span></a></code></pre>
<a> return from_roman_table[s] <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>After doing the same bounds checking as before, the <code>to_roman()</code> function simply finds the appropriate value in the lookup table and returns it.
<li>Similarly, the <code>from_roman()</code> function is reduced to some bounds checking and one line of code. No more regular expressions. No more looping. O(1) conversion to and from Roman numerals.
@@ -441,7 +439,7 @@ to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
<a>Ran 12 tests in 0.031s <span>&#x2460;</span></a>
<a>Ran 12 tests in 0.031s <span class=u>&#x2460;</span></a>
OK</samp></pre>
<ol>
@@ -473,7 +471,8 @@ OK</samp></pre>
<li>Refactoring mercilessly to improve performance, scalability, readability, maintainability, or whatever other -ility you&#8217;re lacking
</ul>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+75 -74
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 4}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#regular-expressions>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#regular-expressions>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
<h1>Regular Expressions</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Some people, when confronted with a problem, think &#8220;I know, I&#8217;ll use regular expressions.&#8221; Now they have two problems. <span>&#x275E;</span><br>&mdash; <a href=http://www.jwz.org/hacks/marginal.html>Jamie Zawinski</a>
<p><span class=u>&#x275D;</span> Some people, when confronted with a problem, think &#8220;I know, I&#8217;ll use regular expressions.&#8221; Now they have two problems. <span class=u>&#x275E;</span><br>&mdash; <a href=http://www.jwz.org/hacks/marginal.html>Jamie Zawinski</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -24,7 +24,7 @@ body{counter-reset:h1 4}
<p>If your goal can be accomplished with string methods, you should use them. They&#8217;re fast and simple and easy to read, and there&#8217;s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with <code>if</code> statements to handle special cases, or if you&#8217;re chaining calls to <code>split()</code> and <code>join()</code> to slice-and-dice your strings, you may need to move up to regular expressions.
<p>Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing text with complex patterns of characters. Although the regular expression syntax is tight and unlike normal code, the result can end up being <em>more</em> readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them.
<blockquote class='note compare perl5'>
<p><span>&#x261E;</span>If you&#8217;ve used regular expressions in other languages (like Perl 5), Python&#8217;s syntax will be very familiar. Read the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
<p><span class=u>&#x261E;</span>If you&#8217;ve used regular expressions in other languages (like Perl 5), Python&#8217;s syntax will be very familiar. Read the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
</blockquote>
<p class=a>&#x2042;
@@ -32,15 +32,15 @@ body{counter-reset:h1 4}
<p>This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don&#8217;t just make this stuff up; it&#8217;s actually useful.) This example shows how I approached the problem.
<pre class=screen>
<samp class=p>>>> </samp><kbd>s = '100 NORTH MAIN ROAD'</kbd>
<a><samp class=p>>>> </samp><kbd>s.replace('ROAD', 'RD.')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>s.replace('ROAD', 'RD.')</kbd> <span class=u>&#x2460;</span></a>
<samp>'100 NORTH MAIN RD.'</samp>
<samp class=p>>>> </samp><kbd>s = '100 NORTH BROAD ROAD'</kbd>
<a><samp class=p>>>> </samp><kbd>s.replace('ROAD', 'RD.')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>s.replace('ROAD', 'RD.')</kbd> <span class=u>&#x2461;</span></a>
<samp>'100 NORTH BRD. RD.'</samp>
<a><samp class=p>>>> </samp><kbd>s[:-4] + s[-4:].replace('ROAD', 'RD.')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>s[:-4] + s[-4:].replace('ROAD', 'RD.')</kbd> <span class=u>&#x2462;</span></a>
<samp>'100 NORTH BROAD RD.'</samp>
<a><samp class=p>>>> </samp><kbd>import re</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('ROAD$', 'RD.', s)</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>import re</kbd> <span class=u>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('ROAD$', 'RD.', s)</kbd> <span class=u>&#x2464;</span></a>
<samp>'100 NORTH BROAD RD.'</samp></pre>
<ol>
<li>My goal is to standardize a street address so that <code>'ROAD'</code> is always abbreviated as <code>'RD.'</code>. At first glance, I thought this was simple enough that I could just use the string method <code>replace()</code>. After all, all the data was already uppercase, so case mismatches would not be a problem. And the search string, <code>'ROAD'</code>, was a constant. And in this deceptively simple example, <code>s.replace()</code> does indeed work.
@@ -55,14 +55,14 @@ body{counter-reset:h1 4}
<samp class=p>>>> </samp><kbd>s = '100 BROAD'</kbd>
<samp class=p>>>> </samp><kbd>re.sub('ROAD$', 'RD.', s)</kbd>
<samp>'100 BRD.'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('\\bROAD$', 'RD.', s)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub('\\bROAD$', 'RD.', s)</kbd> <span class=u>&#x2460;</span></a>
<samp>'100 BROAD'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub(r'\bROAD$', 'RD.', s)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub(r'\bROAD$', 'RD.', s)</kbd> <span class=u>&#x2461;</span></a>
<samp>'100 BROAD'</samp>
<samp class=p>>>> </samp><kbd>s = '100 BROAD ROAD APT. 3'</kbd>
<a><samp class=p>>>> </samp><kbd>re.sub(r'\bROAD$', 'RD.', s)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub(r'\bROAD$', 'RD.', s)</kbd> <span class=u>&#x2462;</span></a>
<samp>'100 BROAD ROAD APT. 3'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub(r'\bROAD\b', 'RD.', s)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.sub(r'\bROAD\b', 'RD.', s)</kbd> <span class=u>&#x2463;</span></a>
<samp>'100 BROAD RD. APT 3'</samp></pre>
<ol>
<li>What I <em>really</em> wanted was to match <code>'ROAD'</code> when it was at the end of the string <em>and</em> it was its own word (and not a part of some larger word). To express this in a regular expression, you use <code>\b</code>, which means &#8220;a word boundary must occur right here.&#8221; In Python, this is complicated by the fact that the <code>'\'</code> character in a string must itself be escaped. This is sometimes referred to as the backslash plague, and it is one reason why regular expressions are easier in Perl than in Python. On the down side, Perl mixes regular expressions with other syntax, so if you have a bug, it may be hard to tell whether it&#8217;s a bug in syntax or a bug in your regular expression.
@@ -96,15 +96,15 @@ body{counter-reset:h1 4}
<p>What would it take to validate that an arbitrary string is a valid Roman numeral? Let&#8217;s take it one digit at a time. Since Roman numerals are always written highest to lowest, let&#8217;s start with the highest: the thousands place. For numbers 1000 and higher, the thousands are represented by a series of <code>M</code> characters.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>pattern = '^M?M?M?$'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>pattern = '^M?M?M?$'</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;SRE_Match object at 0106FB58></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MM')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MM')</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;SRE_Match object at 0106C290></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMM')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMM')</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;SRE_Match object at 0106AA38></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMM')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, '')</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMM')</kbd> <span class=u>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, '')</kbd> <span class=u>&#x2465;</span></a>
<samp>&lt;SRE_Match object at 0106F4A8></samp></pre>
<ol>
<li>This pattern has three parts. <code>^</code> matches what follows only at the beginning of the string. If this were not specified, the pattern would match no matter where the <code>M</code> characters were, which is not what you want. You want to make sure that the <code>M</code> characters, if they&#8217;re there, are at the beginning of the string. <code>M?</code> optionally matches a single <code>M</code> character. Since this is repeated three times, you&#8217;re matching anywhere from zero to three <code>M</code> characters in a row. And <code>$</code> matches the end of the string. When combined with the <code>^</code> character at the beginning, this means that the pattern must match the entire string, with no other characters before or after the <code>M</code> characters.
@@ -142,15 +142,15 @@ body{counter-reset:h1 4}
<p>This example shows how to validate the hundreds place of a Roman numeral.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>pattern = '^M?M?M?(CM|CD|D?C?C?C?)$'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCM')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>pattern = '^M?M?M?(CM|CD|D?C?C?C?)$'</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCM')</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;SRE_Match object at 01070390></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MD')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MD')</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;SRE_Match object at 01073A50></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMCCC')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMCCC')</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;SRE_Match object at 010748A8></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMC')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, '')</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMC')</kbd> <span class=u>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, '')</kbd> <span class=u>&#x2465;</span></a>
<samp>&lt;SRE_Match object at 01071D98></samp></pre>
<ol>
<li>This pattern starts out the same as the previous one, checking for the beginning of the string (<code>^</code>), then the thousands place (<code>M?M?M?</code>). Then it has the new part, in parentheses, which defines a set of three mutually exclusive patterns, separated by vertical bars: <code>CM</code>, <code>CD</code>, and <code>D?C?C?C?</code> (which is an optional <code>D</code> followed by zero to three optional <code>C</code> characters). The regular expression parser checks for each of these patterns in order (from left to right), takes the first one that matches, and ignores the rest.
@@ -169,15 +169,15 @@ body{counter-reset:h1 4}
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<samp class=p>>>> </samp><kbd>pattern = '^M?M?M?$'</kbd>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span class=u>&#x2460;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EE090></samp>
<samp class=p>>>> </samp><kbd>pattern = '^M?M?M?$'</kbd>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MM')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MM')</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<samp class=p>>>> </samp><kbd>pattern = '^M?M?M?$'</kbd>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMM')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMM')</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EE090></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMM')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMM')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>This matches the start of the string, and then the first optional <code>M</code>, but not the second and third <code>M</code> (but that&#8217;s okay because they&#8217;re optional), and then the end of the string.
@@ -186,14 +186,14 @@ body{counter-reset:h1 4}
<li>This matches the start of the string, and then all three optional <code>M</code>, but then does not match the the end of the string (because there is still one unmatched <code>M</code>), so the pattern does not match and returns <code>None</code>.
</ol>
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>pattern = '^M{0,3}$'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>pattern = '^M{0,3}$'</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MM')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MM')</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EE090></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMM')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMM')</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEDA8></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMM')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMM')</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>This pattern says: &#8220;Match the start of the string, then anywhere from zero to three <code>M</code> characters, then the end of the string.&#8221; The 0 and 3 can be any numbers; if you want to match at least one but no more than three <code>M</code> characters, you could say <code>M{1,3}</code>.
@@ -206,15 +206,15 @@ body{counter-reset:h1 4}
<p>Now let&#8217;s expand the Roman numeral regular expression to cover the tens and ones place. This example shows the check for tens.
<pre class=screen>
<samp class=p>>>> </samp><kbd>pattern = '^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)$'</kbd>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMXL')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMXL')</kbd> <span class=u>&#x2460;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCML')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCML')</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLX')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLX')</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLXXX')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLXXX')</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLXXXX')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLXXXX')</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>This matches the start of the string, then the first optional <code>M</code>, then <code>CM</code>, then <code>XL</code>, then the end of the string. Remember, the <code>(A|B|C)</code> syntax means &#8220;match exactly one of A, B, or C&#8221;. You match <code>XL</code>, so you ignore the <code>XC</code> and <code>L?X?X?X?</code> choices, and then move on to the end of the string. <code>MCML</code> is the Roman numeral representation of <code>1940</code>.
@@ -230,13 +230,13 @@ body{counter-reset:h1 4}
</pre><p>So what does that look like using this alternate <code>{n,m}</code> syntax? This example shows the new syntax.
<pre class=screen>
<samp class=p>>>> </samp><kbd>pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'</kbd>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MDLV')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MDLV')</kbd> <span class=u>&#x2460;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMDCLXVI')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMDCLXVI')</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMDCCCLXXXVIII')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMDCCCLXXXVIII')</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'I')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'I')</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp></pre>
<ol>
<li>This matches the start of the string, then one of a possible three <code>M</code> characters, then <code>D?C{0,3}</code>. Of that, it matches the optional <code>D</code> and zero of three possible <code>C</code> characters. Moving on, it matches <code>L?X{0,3}</code> by matching the optional <code>L</code> and zero of three possible <code>X</code> characters. Then it matches <code>V?I{0,3}</code> by matching the optional <code>V</code> and zero of three possible <code>I</code> characters, and finally the end of the string. <code>MDLV</code> is the Roman numeral representation of <code>1555</code>.
@@ -268,13 +268,13 @@ body{counter-reset:h1 4}
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
'''</kbd>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M', re.VERBOSE)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M', re.VERBOSE)</kbd> <span class=u>&#x2460;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLXXXIX', re.VERBOSE)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MCMLXXXIX', re.VERBOSE)</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMDCCCLXXXVIII', re.VERBOSE)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'MMMDCCCLXXXVIII', re.VERBOSE)</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;_sre.SRE_Match object at 0x008EEB48></samp>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span>&#x2463;</span></a></pre>
<a><samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span class=u>&#x2463;</span></a></pre>
<ol>
<li>The most important thing to remember when using verbose regular expressions is that you need to pass an extra argument when working with them: <code>re.VERBOSE</code> is a constant defined in the <code>re</code> module that signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored). Once you ignore the whitespace and the comments, this is exactly the same regular expression as you saw in the previous section, but it&#8217;s a lot more readable.
<li>This matches the start of the string, then one of a possible three <code>M</code>, then <code>CM</code>, then <code>L</code> and three of a possible three <code>X</code>, then <code>IX</code>, then the end of the string.
@@ -302,10 +302,10 @@ body{counter-reset:h1 4}
<p>Quite a variety! In each of these cases, I need to know that the area code was <code>800</code>, the trunk was <code>555</code>, and the rest of the phone number was <code>1212</code>. For those with an extension, I need to know that the extension was <code>1234</code>.
<p>Let&#8217;s work through developing a solution for phone number parsing. This example shows the first step.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})-(\d{3})-(\d{4})$')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212').groups()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})-(\d{3})-(\d{4})$')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212').groups()</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212-1234')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212-1234')</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>Always read regular expressions from left to right. This one matches the beginning of the string, and then <code>(\d{3})</code>. What&#8217;s <code>\d{3}</code>? Well, the <code>{3}</code> means &#8220;match exactly three numeric digits&#8221;; it&#8217;s a variation on the <a href=#nmsyntax><code>{n,m} syntax</code></a> you saw earlier. <code>\d</code> means &#8220;any numeric digit&#8221; (<code>0</code> through <code>9</code>). Putting it in parentheses means &#8220;match exactly three numeric digits, <em>and then remember them as a group that I can ask for later</em>&#8221;. Then match a literal hyphen. Then match another group of exactly three digits. Then another literal hyphen. Then another group of exactly four digits. Then match the end of the string.
@@ -313,12 +313,12 @@ body{counter-reset:h1 4}
<li>This regular expression is not the final answer, because it doesn&#8217;t handle a phone number with an extension on the end. For that, you&#8217;ll need to expand the regular expression.
</ol>
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})-(\d{3})-(\d{4})-(\d+)$')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212-1234').groups()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})-(\d{3})-(\d{4})-(\d+)$')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212-1234').groups()</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800 555 1212 1234')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800 555 1212 1234')</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>This regular expression is almost identical to the previous one. Just as before, you match the beginning of the string, then a remembered group of three digits, then a hyphen, then a remembered group of three digits, then a hyphen, then a remembered group of four digits. What&#8217;s new is that you then match another hyphen, and a remembered group of one or more digits, then the end of the string.
@@ -328,14 +328,14 @@ body{counter-reset:h1 4}
</ol>
<p>The next example shows the regular expression to handle separators between the different parts of the phone number.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})\D+(\d{3})\D+(\d{4})\D+(\d+)$')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800 555 1212 1234').groups()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})\D+(\d{3})\D+(\d{4})\D+(\d+)$')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800 555 1212 1234').groups()</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212-1234').groups()</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212-1234').groups()</kbd> <span class=u>&#x2462;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('80055512121234')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('80055512121234')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>Hang on to your hat. You&#8217;re matching the beginning of the string, then a group of three digits, then <code>\D+</code>. What the heck is that? Well, <code>\D</code> matches any character <em>except</em> a numeric digit, and <code>+</code> means &#8220;1 or more&#8221;. So <code>\D+</code> matches one or more characters that are not digits. This is what you&#8217;re using instead of a literal hyphen, to try to match different separators.
@@ -346,14 +346,14 @@ body{counter-reset:h1 4}
</ol>
<p>The next example shows the regular expression for handling phone numbers <em>without</em> separators.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('80055512121234').groups()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('80055512121234').groups()</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800.555.1212 x1234').groups()</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800.555.1212 x1234').groups()</kbd> <span class=u>&#x2462;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212').groups()</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212').groups()</kbd> <span class=u>&#x2463;</span></a>
<samp>('800', '555', '1212', '')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('(800)5551212 x1234')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('(800)5551212 x1234')</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>The only change you&#8217;ve made since that last step is changing all the <code>+</code> to <code>*</code>. Instead of <code>\D+</code> between the parts of the phone number, you now match on <code>\D*</code>. Remember that <code>+</code> means &#8220;1 or more&#8221;? Well, <code>*</code> means &#8220;zero or more&#8221;. So now you should be able to parse phone numbers even when there is no separator character at all.
@@ -364,12 +364,12 @@ body{counter-reset:h1 4}
</ol>
<p>The next example shows how to handle leading characters in phone numbers.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('(800)5551212 ext. 1234').groups()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('(800)5551212 ext. 1234').groups()</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212').groups()</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212').groups()</kbd> <span class=u>&#x2462;</span></a>
<samp>('800', '555', '1212', '')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('work 1-(800) 555.1212 #1234')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('work 1-(800) 555.1212 #1234')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li>This is the same as in the previous example, except now you&#8217;re matching <code>\D*</code>, zero or more non-numeric characters, before the first remembered group (the area code). Notice that you&#8217;re not remembering these non-numeric characters (they&#8217;re not in parentheses). If you find them, you&#8217;ll just skip over them and then start remembering the area code whenever you get to it.
@@ -379,12 +379,12 @@ body{counter-reset:h1 4}
</ol>
<p>Let&#8217;s back up for a second. So far the regular expressions have all matched from the beginning of the string. But now you see that there may be an indeterminate amount of stuff at the beginning of the string that you want to ignore. Rather than trying to match it all just so you can skip over it, let&#8217;s take a different approach: don&#8217;t explicitly match the beginning of the string at all. This approach is shown in the next example.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('work 1-(800) 555.1212 #1234').groups()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('work 1-(800) 555.1212 #1234').groups()</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span class=u>&#x2462;</span></a>
<samp>('800', '555', '1212', '')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('80055512121234')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('80055512121234')</kbd> <span class=u>&#x2463;</span></a>
<samp>('800', '555', '1212', '1234')</samp></pre>
<ol>
<li>Note the lack of <code>^</code> in this regular expression. You are not matching the beginning of the string anymore. There&#8217;s nothing that says you need to match the entire input with your regular expression. The regular expression engine will do the hard work of figuring out where the input string starts to match, and go from there.
@@ -406,9 +406,9 @@ body{counter-reset:h1 4}
(\d*) # extension is optional and can be any number of digits
$ # end of string
''', re.VERBOSE)</kbd>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('work 1-(800) 555.1212 #1234').groups()</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('work 1-(800) 555.1212 #1234').groups()</kbd> <span class=u>&#x2460;</span></a>
<samp>('800', '555', '1212', '1234')</samp>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>phonePattern.search('800-555-1212')</kbd> <span class=u>&#x2461;</span></a>
<samp>('800', '555', '1212', '')</samp></pre>
<ol>
<li>Other than being spread out over multiple lines, this is exactly the same regular expression as the last step, so it&#8217;s no surprise that it parses the same inputs.
@@ -433,7 +433,8 @@ body{counter-reset:h1 4}
<li><code>(x)</code> in general is a <em>remembered group</em>. You can get the value of what matched by using the <code>groups()</code> method of the object returned by <code>re.search</code>.
</ul>
<p>Regular expressions are extremely powerful, but they are not the correct solution for every problem. You should learn enough about them to know when they are appropriate, when they will solve your problems, and when they will cause more problems than they solve.
<p class=v><a href=strings.html rel=prev title='back to &#8220;Strings&#8221;'><span>&#x261C;</span></a> <a href=generators.html rel=next title='onward to &#8220;Generators&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=strings.html rel=prev title='back to &#8220;Strings&#8221;'><span class=u>&#x261C;</span></a> <a href=generators.html rel=next title='onward to &#8220;Generators&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+132 -134
View File
@@ -23,11 +23,11 @@ td a:link, td a:visited{border:0}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#special-method-names>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#special-method-names>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=pro>&#x2666;&#x2666;&#x2666;&#x2666;&#x2666;</span>
<h1>Special Method Names</h1>
<blockquote class=q>
<p><span>&#x275D;</span> My specialty is being right when other people are wrong. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/George_Bernard_Shaw>George Bernard Shaw</a>
<p><span class=u>&#x275D;</span> My specialty is being right when other people are wrong. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/George_Bernard_Shaw>George Bernard Shaw</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
@@ -44,24 +44,24 @@ td a:link, td a:visited{border:0}
<th>And Python Calls&hellip;
<tr><th>&#x2460;
<td>to initialize an instance
<td><code>x = MyClass()</code>
<td><code>x.__init__()</code>
<td><code class=pp>x = MyClass()</code>
<td><code class=pp>x.__init__()</code>
<tr><th>&#x2461;
<td>the &#8220;official&#8221; representation as a string
<td><code>repr(x)</code>
<td><code>x.__repr__()</code>
<td><code class=pp>repr(x)</code>
<td><code class=pp>x.__repr__()</code>
<tr><th>&#x2462;
<td>the &#8220;informal&#8221; value as a string
<td><code>str(x)</code>
<td><code>x.__str__()</code>
<td><code class=pp>str(x)</code>
<td><code class=pp>x.__str__()</code>
<tr><th>&#x2463;
<td>the &#8220;informal&#8221; value as a byte array
<td><code>bytes(x)</code>
<td><code>x.__bytes__()</code>
<td><code class=pp>bytes(x)</code>
<td><code class=pp>x.__bytes__()</code>
<tr><th>&#x2464;
<td>the value as a formatted string
<td><code>format(x)</code>
<td><code>x.__format__(<var>format_spec</var>)</code>
<td><code class=pp>format(x)</code>
<td><code class=pp>x.__format__(<var>format_spec</var>)</code>
</table>
<ol>
<li>The <code>__init__()</code> method is called <em>after</em> the instance is created. If you want to control the actual creation process, use <a href=#esoterica>the <code>__new__()</code> method</a>.
@@ -82,15 +82,15 @@ td a:link, td a:visited{border:0}
<th>And Python Calls&hellip;
<tr><th>&#x2460;
<td>to iterate through a sequence
<td><code>iter(seq)</code>
<td><code class=pp>iter(seq)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__iter__><code>seq.__iter__()</code></a>
<tr><th>&#x2461;
<td>to get the next value from an iterator
<td><code>next(seq)</code>
<td><code class=pp>next(seq)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__next__><code>seq.__next__()</code></a>
<tr><th>&#x2462;
<td>to create an iterator in reverse order
<td><code>reversed(seq)</code>
<td><code class=pp>reversed(seq)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__reversed__><code>seq.__reversed__()</code></a>
</table>
<ol>
@@ -110,23 +110,23 @@ td a:link, td a:visited{border:0}
<th>And Python Calls&hellip;
<tr><th>&#x2460;
<td>to get a computed attribute (unconditionally)
<td><code>x.my_property</code>
<td><code class=pp>x.my_property</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__getattribute__><code>x.__getattribute__(<var>'my_property'</var>)</code></a>
<tr><th>&#x2461;
<td>to get a computed attribute (fallback)
<td><code>x.my_property</code>
<td><code class=pp>x.my_property</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__getattr__><code>x.__getattr__(<var>'my_property'</var>)</code></a>
<tr><th>&#x2462;
<td>to set an attribute
<td><code>x.my_property = value</code>
<td><code class=pp>x.my_property = value</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__setattr__><code>x.__setattr__(<var>'my_property'</var>, <var>value</var>)</code></a>
<tr><th>&#x2463;
<td>to delete an attribute
<td><code>del x.my_property</code>
<td><code class=pp>del x.my_property</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__delattr__><code>x.__delattr__(<var>'my_property'</var>)</code></a>
<tr><th>&#x2464;
<td>to list all attributes and methods
<td><code>dir(x)</code>
<td><code class=pp>dir(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__dir__><code>x.__dir__()</code></a>
</table>
<ol>
@@ -142,16 +142,16 @@ td a:link, td a:visited{border:0}
<pre class=screen>
<code>class Dynamo:
def __getattr__(self, key):
<a> if key == 'color': <span>&#x2460;</span></a>
<a> if key == 'color': <span class=u>&#x2460;</span></a>
return 'PapayaWhip'
else:
<a> raise AttributeError <span>&#x2461;</span></a></code>
<a> raise AttributeError <span class=u>&#x2461;</span></a></code>
<samp class=p>>>> </samp><kbd>dyn = Dynamo()</kbd>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span class=u>&#x2462;</span></a>
<samp>'PapayaWhip'</samp>
<samp class=p>>>> </samp><kbd>dyn.color = 'LemonChiffon'</kbd>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span class=u>&#x2463;</span></a>
<samp>'LemonChiffon'</samp></pre>
<ol>
<li>The attribute name is passed into the <code>__getattr()__</code> method as a string. If the name is <code>'color'</code>, the method returns a value. (In this case, it&#8217;s just a hard-coded string, but you would normally do some sort of computation and return the result.)
@@ -171,10 +171,10 @@ td a:link, td a:visited{border:0}
raise AttributeError</code>
<samp class=p>>>> </samp><kbd>dyn = SuperDynamo()</kbd>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span class=u>&#x2460;</span></a>
<samp>'PapayaWhip'</samp>
<samp class=p>>>> </samp><kbd>dyn.color = 'LemonChiffon'</kbd>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>dyn.color</kbd> <span class=u>&#x2461;</span></a>
<samp>'PapayaWhip'</samp></pre>
<ol>
<li>The <code>__getattribute__()</code> method is called to provide a value for <var>dyn.color</var>.
@@ -182,7 +182,7 @@ td a:link, td a:visited{border:0}
</ol>
<blockquote class=note>
<p><span>&#x261E;</span>If your class defines a <code>__getattribute__()</code> method, you probably also want to define a <code>__setattr__()</code> method and coordinate between them to keep track of attribute values. Otherwise, any attributes you set after creating an instance will disappear into a black hole.
<p><span class=u>&#x261E;</span>If your class defines a <code>__getattribute__()</code> method, you probably also want to define a <code>__setattr__()</code> method and coordinate between them to keep track of attribute values. Otherwise, any attributes you set after creating an instance will disappear into a black hole.
</blockquote>
<p>You need to be extra careful with the <code>__getattribute__()</code> method, because it is also called when Python looks up a method name on your class.
@@ -190,12 +190,12 @@ td a:link, td a:visited{border:0}
<pre class=screen>
<code>class Rastan:
def __getattribute__(self, key):
<a> raise AttributeError <span>&#x2460;</span></a>
<a> raise AttributeError <span class=u>&#x2460;</span></a>
def swim(self):
pass</code>
<samp class=p>>>> </samp><kbd>hero = Rastan()</kbd>
<a><samp class=p>>>> </samp><kbd>hero.swim()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>hero.swim()</kbd> <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
File "&lt;stdin>", line 3, in __getattribute__
@@ -216,26 +216,25 @@ AttributeError</samp></pre>
<th>And Python Calls&hellip;
<tr><th>
<td>to &#8220;call&#8221; an instance like a function
<td><code>my_instance()</code>
<td><code class=pp>my_instance()</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__call__><code>my_instance.__call__()</code></a>
</table>
<p>The <a href=http://docs.python.org/3.0/library/zipfile.html><code>zipfile</code> module</a> uses this to define a class that can decrypt an encrypted zip file with a given password. The zip decryption algorithm requires you to store state during decryption. Defining the decryptor as a class allows you to maintain this state within a single instance of the decryptor class. The state is initialized in the <code>__init__()</code> method and updated as the file is decrypted. But since the class is also &#8220;callable&#8221; like a function, you can pass the instance as the first argument of the <code>map()</code> function, like so:
<pre><code>
# excerpt from zipfile.py
<pre><code class=pp># excerpt from zipfile.py
class _ZipDecrypter:
.
.
.
def __init__(self, pwd):
<a> self.key0 = 305419896 <span>&#x2460;</span></a>
<a> self.key0 = 305419896 <span class=u>&#x2460;</span></a>
self.key1 = 591751049
self.key2 = 878082192
for p in pwd:
self._UpdateKeys(p)
<a> def __call__(self, c): <span>&#x2461;</span></a>
<a> def __call__(self, c): <span class=u>&#x2461;</span></a>
assert isinstance(c, int)
k = self.key2 | 2
c = c ^ (((k * (k^1)) >> 8) & 255)
@@ -244,9 +243,9 @@ class _ZipDecrypter:
.
.
.
<a>zd = _ZipDecrypter(pwd) <span>&#x2462;</span></a>
<a>zd = _ZipDecrypter(pwd) <span class=u>&#x2462;</span></a>
bytes = zef_file.read(12)
<a>h = list(map(zd, bytes[0:12])) <span>&#x2463;</span></a></code></pre>
<a>h = list(map(zd, bytes[0:12])) <span class=u>&#x2463;</span></a></code></pre>
<ol>
<li>The <code>_ZipDecryptor</code> class maintains state in the form of three rotating keys, which are later updated in the <code>_UpdateKeys()</code> method (not shown here).
<li>The class defines a <code>__call__()</code> method, which makes class instances callable like functions. In this case, the <code>__call__()</code> method decrypts a single byte of the zip file, then updates the rotating keys based on the byte that was decrypted.
@@ -265,21 +264,20 @@ bytes = zef_file.read(12)
<th>And Python Calls&hellip;
<tr><th>
<td>the length of a sequence
<td><code>len(seq)</code>
<td><code class=pp>len(seq)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__len__><code>seq.__len__()</code></a>
<tr><th>
<td>to know whether a sequence contains a specific value
<td><code>x in seq</code>
<td><code class=pp>x in seq</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__contains__><code>seq.__contains__(<var>x</var>)</code></a>
</table>
<p id=acts-like-list-example>The <a href=http://docs.python.org/3.0/library/cgi.html><code>cgi</code> module</a> uses these methods in its <code>FieldStorage</code> class, which represents all of the form fields or query parameters submitted to a dynamic web page.
<pre><code>
# A script which responds to http://example.com/search?q=cgi
<pre><code class=pp># A script which responds to http://example.com/search?q=cgi
import cgi
fs = cgi.FieldStorage()
<a>if 'q' in fs: <span>&#x2460;</span></a>
<a>if 'q' in fs: <span class=u>&#x2460;</span></a>
do_search()
# An excerpt from cgi.py that explains how that works
@@ -287,12 +285,12 @@ class FieldStorage:
.
.
.
<a> def __contains__(self, key): <span>&#x2461;</span></a>
<a> def __contains__(self, key): <span class=u>&#x2461;</span></a>
if self.list is None:
raise TypeError('not indexable')
<a> return any(item.name == key for item in self.list) <span>&#x2462;</span></a>
<a> return any(item.name == key for item in self.list) <span class=u>&#x2462;</span></a>
<a> def __len__(self): <span>&#x2463;</span></a>
<a> def __len__(self): <span class=u>&#x2463;</span></a>
return len(self.keys())</code></pre>
<ol>
<li>Once you create an instance of the <code>cgi.FieldStorage</code> class, you can use the &#8220;<code>in</code>&#8221; operator to check whether a particular parameter was included in the query string.
@@ -312,37 +310,36 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>to get a value by its key
<td><code>x[key]</code>
<td><code class=pp>x[key]</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__getitem__><code>x.__getitem__(<var>'key'</var>)</code></a>
<tr><th>
<td>to set a value by its key
<td><code>x[key] = value</code>
<td><code class=pp>x[key] = value</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__setitem__><code>x.__setitem__(<var>'key'</var>, <var>value</var>)</code></a>
<tr><th>
<td>to delete a key-value pair
<td><code>del x[key]</code>
<td><code class=pp>del x[key]</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__delitem__><code>x.__delitem__(<var>'key'</var>)</code></a>
<tr><th>
<td>to provide a default value for missing keys
<td><code>x[nonexistent_key]</code>
<td><code class=pp>x[nonexistent_key]</code>
<td><a href=http://docs.python.org/3.0/library/collections.html#collections.defaultdict.__missing__><code>x.__missing__(<var>'nonexistent_key'</var>)</code></a>
</table>
<p>The <a href=#acts-like-list-example><code>FieldStorage</code> class</a> from the <a href=http://docs.python.org/3.0/library/cgi.html><code>cgi</code> module</a> also defines these special methods, which means you can do things like this:
<pre><code>
# A script which responds to http://example.com/search?q=cgi
<pre><code class=pp># A script which responds to http://example.com/search?q=cgi
import cgi
fs = cgi.FieldStorage()
if 'q' in fs:
<a> do_search(fs['q']) <span>&#x2460;</span></a>
<a> do_search(fs['q']) <span class=u>&#x2460;</span></a>
# An excerpt from cgi.py that shows how it works
class FieldStorage:
.
.
.
<a> def __getitem__(self, key): <span>&#x2461;</span></a>
<a> def __getitem__(self, key): <span class=u>&#x2461;</span></a>
if self.list is None:
raise TypeError('not indexable')
found = []
@@ -378,55 +375,55 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>addition
<td><code>x + y</code>
<td><code class=pp>x + y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__add__><code>x.__add__(<var>y</var>)</code></a>
<tr><th>
<td>subtraction
<td><code>x - y</code>
<td><code class=pp>x - y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__sub__><code>x.__sub__(<var>y</var>)</code></a>
<tr><th>
<td>multiplication
<td><code>x * y</code>
<td><code class=pp>x * y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__mul__><code>x.__mul__(<var>y</var>)</code></a>
<tr><th>
<td>division
<td><code>x / y</code>
<td><code class=pp>x / y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__truediv__><code>x.__truediv__(<var>y</var>)</code></a>
<tr><th>
<td>floor division
<td><code>x // y</code>
<td><code class=pp>x // y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__floordiv__><code>x.__floordiv__(<var>y</var>)</code></a>
<tr><th>
<td>modulo (remainder)
<td><code>x % y</code>
<td><code class=pp>x % y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__mod__><code>x.__mod__(<var>y</var>)</code></a>
<tr><th>
<td>floor division <i class=baa>&amp;</i> modulo
<td><code>divmod(x, y)</code>
<td><code class=pp>divmod(x, y)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__divmod__><code>x.__divmod__(<var>y</var>)</code></a>
<tr><th>
<td>raise to power
<td><code>x ** y</code>
<td><code class=pp>x ** y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__pow__><code>x.__pow__(<var>y</var>)</code></a>
<tr><th>
<td>left bit-shift
<td><code>x &lt;&lt; y</code>
<td><code class=pp>x &lt;&lt; y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__lshift__><code>x.__lshift__(<var>y</var>)</code></a>
<tr><th>
<td>right bit-shift
<td><code>x >> y</code>
<td><code class=pp>x >> y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rshift__><code>x.__rshift__(<var>y</var>)</code></a>
<tr><th>
<td>bitwise <code>and</code>
<td><code>x &amp; y</code>
<td><code class=pp>x &amp; y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__and__><code>x.__and__(<var>y</var>)</code></a>
<tr><th>
<td>bitwise <code>xor</code>
<td><code>x ^ y</code>
<td><code class=pp>x ^ y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__xor__><code>x.__xor__(<var>y</var>)</code></a>
<tr><th>
<td>bitwise <code>or</code>
<td><code>x | y</code>
<td><code class=pp>x | y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__or__><code>x.__or__(<var>y</var>)</code></a>
</table>
@@ -456,55 +453,55 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>addition
<td><code>x + y</code>
<td><code class=pp>x + y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__radd__><code>y.__radd__(<var>x</var>)</code></a>
<tr><th>
<td>subtraction
<td><code>x - y</code>
<td><code class=pp>x - y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rsub__><code>y.__rsub__(<var>x</var>)</code></a>
<tr><th>
<td>multiplication
<td><code>x * y</code>
<td><code class=pp>x * y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rmul__><code>y.__rmul__(<var>x</var>)</code></a>
<tr><th>
<td>division
<td><code>x / y</code>
<td><code class=pp>x / y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rtruediv__><code>y.__rtruediv__(<var>x</var>)</code></a>
<tr><th>
<td>floor division
<td><code>x // y</code>
<td><code class=pp>x // y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rfloordiv__><code>y.__rfloordiv__(<var>x</var>)</code></a>
<tr><th>
<td>modulo (remainder)
<td><code>x % y</code>
<td><code class=pp>x % y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rmod__><code>y.__rmod__(<var>x</var>)</code></a>
<tr><th>
<td>floor division <i class=baa>&amp;</i> modulo
<td><code>divmod(x, y)</code>
<td><code class=pp>divmod(x, y)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rdivmod__><code>y.__rdivmod__(<var>x</var>)</code></a>
<tr><th>
<td>raise to power
<td><code>x ** y</code>
<td><code class=pp>x ** y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rpow__><code>y.__rpow__(<var>x</var>)</code></a>
<tr><th>
<td>left bit-shift
<td><code>x &lt;&lt; y</code>
<td><code class=pp>x &lt;&lt; y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rlshift__><code>y.__rlshift__(<var>x</var>)</code></a>
<tr><th>
<td>right bit-shift
<td><code>x >> y</code>
<td><code class=pp>x >> y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rrshift__><code>y.__rrshift__(<var>x</var>)</code></a>
<tr><th>
<td>bitwise <code>and</code>
<td><code>x &amp; y</code>
<td><code class=pp>x &amp; y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rand__><code>y.__rand__(<var>x</var>)</code></a>
<tr><th>
<td>bitwise <code>xor</code>
<td><code>x ^ y</code>
<td><code class=pp>x ^ y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__rxor__><code>y.__rxor__(<var>x</var>)</code></a>
<tr><th>
<td>bitwise <code>or</code>
<td><code>x | y</code>
<td><code class=pp>x | y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ror__><code>y.__ror__(<var>x</var>)</code></a>
</table>
@@ -517,51 +514,51 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>in-place addition
<td><code>x += y</code>
<td><code class=pp>x += y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__iadd__><code>x.__iadd__(<var>y</var>)</code></a>
<tr><th>
<td>in-place subtraction
<td><code>x -= y</code>
<td><code class=pp>x -= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__isub__><code>x.__isub__(<var>y</var>)</code></a>
<tr><th>
<td>in-place multiplication
<td><code>x *= y</code>
<td><code class=pp>x *= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__imul__><code>x.__imul__(<var>y</var>)</code></a>
<tr><th>
<td>in-place division
<td><code>x /= y</code>
<td><code class=pp>x /= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__itruediv__><code>x.__itruediv__(<var>y</var>)</code></a>
<tr><th>
<td>in-place floor division
<td><code>x //= y</code>
<td><code class=pp>x //= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ifloordiv__><code>x.__ifloordiv__(<var>y</var>)</code></a>
<tr><th>
<td>in-place modulo
<td><code>x %= y</code>
<td><code class=pp>x %= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__imod__><code>x.__imod__(<var>y</var>)</code></a>
<tr><th>
<td>in-place raise to power
<td><code>x **= y</code>
<td><code class=pp>x **= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ipow__><code>x.__ipow__(<var>y</var>)</code></a>
<tr><th>
<td>in-place left bit-shift
<td><code>x &lt;&lt;= y</code>
<td><code class=pp>x &lt;&lt;= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ilshift__><code>x.__ilshift__(<var>y</var>)</code></a>
<tr><th>
<td>in-place right bit-shift
<td><code>x >>= y</code>
<td><code class=pp>x >>= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__irshift__><code>x.__irshift__(<var>y</var>)</code></a>
<tr><th>
<td>in-place bitwise <code>and</code>
<td><code>x &amp;= y</code>
<td><code class=pp>x &amp;= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__iand__><code>x.__iand__(<var>y</var>)</code></a>
<tr><th>
<td>in-place bitwise <code>xor</code>
<td><code>x ^= y</code>
<td><code class=pp>x ^= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ixor__><code>x.__ixor__(<var>y</var>)</code></a>
<tr><th>
<td>in-place bitwise <code>or</code>
<td><code>x |= y</code>
<td><code class=pp>x |= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ior__><code>x.__ior__(<var>y</var>)</code></a>
</table>
@@ -584,55 +581,55 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>negative number
<td><code>-x</code>
<td><code class=pp>-x</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__neg__><code>x.__neg__()</code></a>
<tr><th>
<td>positive number
<td><code>+x</code>
<td><code class=pp>+x</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__pos__><code>x.__pos__()</code></a>
<tr><th>
<td>absolute value
<td><code>abs(x)</code>
<td><code class=pp>abs(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__abs__><code>x.__abs__()</code></a>
<tr><th>
<td>inverse
<td><code>~x</code>
<td><code class=pp>~x</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__invert__><code>x.__invert__()</code></a>
<tr><th>
<td>complex number
<td><code>complex(x)</code>
<td><code class=pp>complex(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__complex__><code>x.__complex__()</code></a>
<tr><th>
<td>integer
<td><code>int(x)</code>
<td><code class=pp>int(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__int__><code>x.__int__()</code></a>
<tr><th>
<td>floating point number
<td><code>float(x)</code>
<td><code class=pp>float(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__float__><code>x.__float__()</code></a>
<tr><th>
<td>number rounded to nearest integer
<td><code>round(x)</code>
<td><code class=pp>round(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__round__><code>x.__round__()</code></a>
<tr><th>
<td>number rounded to nearest <var>n</var> digits
<td><code>round(x, n)</code>
<td><code class=pp>round(x, n)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__round__><code>x.__round__(n)</code></a>
<tr><th>
<td>smallest integer <code>>= x</code>
<td><code>math.ceil(x)</code>
<td><code class=pp>math.ceil(x)</code>
<td><a href=http://docs.python.org/3.0/library/math.html#math.ceil><code>x.__ceil__()</code></a>
<tr><th>
<td>largest integer <code>&lt;= x</code>
<td><code>math.floor(x)</code>
<td><code class=pp>math.floor(x)</code>
<td><a href=http://docs.python.org/3.0/library/math.html#math.floor><code>x.__floor__()</code></a>
<tr><th>
<td>truncate <code>x</code> to nearest integer toward <code>0</code>
<td><code>math.trunc(x)</code>
<td><code class=pp>math.trunc(x)</code>
<td><a href=http://docs.python.org/3.0/library/math.html#math.trunc><code>x.__trunc__()</code></a>
<tr><th><a href=http://www.python.org/dev/peps/pep-0357/>PEP 357</a>
<td>number as a list index
<td><code>a_list[<var>x</var>]</code>
<td><code class=pp>a_list[<var>x</var>]</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__index__><code>x.__index__()</code></a>
</table>
@@ -647,31 +644,31 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>equality
<td><code>x == y</code>
<td><code class=pp>x == y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__eq__><code>x.__eq__(<var>y</var>)</code></a>
<tr><th>
<td>inequality
<td><code>x != y</code>
<td><code class=pp>x != y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ne__><code>x.__ne__(<var>y</var>)</code></a>
<tr><th>
<td>less than
<td><code>x &lt; y</code>
<td><code class=pp>x &lt; y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__lt__><code>x.__lt__(<var>y</var>)</code></a>
<tr><th>
<td>less than or equal to
<td><code>x &lt;= y</code>
<td><code class=pp>x &lt;= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__le__><code>x.__le__(<var>y</var>)</code></a>
<tr><th>
<td>greater than
<td><code>x > y</code>
<td><code class=pp>x > y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__gt__><code>x.__gt__(<var>y</var>)</code></a>
<tr><th>
<td>greater than or equal to
<td><code>x >= y</code>
<td><code class=pp>x >= y</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__ge__><code>x.__ge__(<var>y</var>)</code></a>
<tr><th>
<td>truth value in a boolean context
<td><code>if x:</code>
<td><code class=pp>if x:</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__bool__><code>x.__bool__()</code></a>
</table>
@@ -687,31 +684,31 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>a custom object copy
<td><code>copy.copy(x)</code>
<td><code class=pp>copy.copy(x)</code>
<td><a href=http://docs.python.org/3.0/library/copy.html><code>x.__copy__()</code></a>
<tr><th>
<td>a custom object deepcopy
<td><code>copy.deepcopy(x)</code>
<td><code class=pp>copy.deepcopy(x)</code>
<td><a href=http://docs.python.org/3.0/library/copy.html><code>x.__deepcopy__()</code></a>
<tr><th>
<td>to get an object&#8217;s state before pickling
<td><code>pickle.dump(x, <var>file</var>)</code>
<td><code class=pp>pickle.dump(x, <var>file</var>)</code>
<td><a href=http://docs.python.org/3.0/library/pickle.html#pickle-state><code>x.__getstate__()</code></a>
<tr><th>
<td>to serialize an object
<td><code>pickle.dump(x, <var>file</var>)</code>
<td><code class=pp>pickle.dump(x, <var>file</var>)</code>
<td><a href=http://docs.python.org/3.0/library/pickle.html#pickling-class-instances><code>x.__reduce__()</code></a>
<tr><th>
<td>to serialize an object (new pickling protocol)
<td><code>pickle.dump(x, <var>file</var>, <var>protocol_version</var>)</code>
<td><code class=pp>pickle.dump(x, <var>file</var>, <var>protocol_version</var>)</code>
<td><a href=http://docs.python.org/3.0/library/pickle.html#pickling-class-instances><code>x.__reduce_ex__(<var>protocol_version</var>)</code></a>
<tr><th>
<td>control over how an object is created during unpickling
<td><code>x = pickle.load(<var>file</var>)</code>
<td><code class=pp>x = pickle.load(<var>file</var>)</code>
<td><a href=http://docs.python.org/3.0/library/pickle.html#pickling-class-instances><code>x.__getnewargs__()</code></a>
<tr><th>
<td>to restore an object&#8217;s state after unpickling
<td><code>x = pickle.load(<var>file</var>)</code>
<td><code class=pp>x = pickle.load(<var>file</var>)</code>
<td><a href=http://docs.python.org/3.0/library/pickle.html#pickle-state><code>x.__setstate__()</code></a>
</table>
@@ -728,11 +725,11 @@ class FieldStorage:
<th>And Python Calls&hellip;
<tr><th>
<td>do something special when entering a <code>with</code> block
<td><code>with x:</code>
<td><code class=pp>with x:</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__enter__><code>x.__enter__()</code></a>
<tr><th>
<td>do something special when leaving a <code>with</code> block
<td><code>with x:</code>
<td><code class=pp>with x:</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__exit__><code>x.__exit__()</code></a>
</table>
@@ -740,7 +737,7 @@ class FieldStorage:
<p>FIXME-xref to as-yet-unwritten section on function annotations
<pre><code># excerpt from io.py:
<pre><code class=pp># excerpt from io.py:
def _checkClosed(self, msg=None):
'''Internal: raise an ValueError if file is closed
'''
@@ -750,12 +747,12 @@ def _checkClosed(self, msg=None):
def __enter__(self) -> 'IOBase':
'''Context management protocol. Returns self.'''
<a> self._checkClosed() <span>&#x2460;</span></a>
<a> return self <span>&#x2461;</span></a>
<a> self._checkClosed() <span class=u>&#x2460;</span></a>
<a> return self <span class=u>&#x2461;</span></a>
def __exit__(self, *args) -> None:
'''Context management protocol. Calls close()'''
<a> self.close() <span>&#x2462;</span></a></code></pre>
<a> self.close() <span class=u>&#x2462;</span></a></code></pre>
<ol>
<li>The file object defines both an <code>__enter__()</code> and an <code>__exit__()</code> method. The <code>__enter__()</code> method checks that the file is open; if it&#8217;s not, the <code>_checkClosed()</code> method raises an exception.
<li>The <code>__enter__()</code> method should almost always return <var>self</var> &mdash; this is the object that the <code>with</code> block will use to dispatch properties and methods.
@@ -763,7 +760,7 @@ def __exit__(self, *args) -> None:
</ol>
<blockquote class=note>
<p><span>&#x261E;</span>The <code>__exit__()</code> method will always be called, even if an exception is raised inside the <code>with</code> block. In fact, if an exception is raises, the exception information will be passed to the <code>__exit__()</code> method. See <a href=http://www.python.org/doc/3.0/reference/datamodel.html#with-statement-context-managers>With Statement Context Managers</a> for more details.
<p><span class=u>&#x261E;</span>The <code>__exit__()</code> method will always be called, even if an exception is raised inside the <code>with</code> block. In fact, if an exception is raises, the exception information will be passed to the <code>__exit__()</code> method. See <a href=http://www.python.org/doc/3.0/reference/datamodel.html#with-statement-context-managers>With Statement Context Managers</a> for more details.
</blockquote>
<h2 id=esoterica>Really Esoteric Stuff</h2>
@@ -777,11 +774,11 @@ def __exit__(self, *args) -> None:
<th>And Python Calls&hellip;
<tr><th>
<td>a class constructor
<td><code>x = MyClass()</code>
<td><code class=pp>x = MyClass()</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__new__><code>x.__new__()</code></a>
<tr><th>
<td>a class destructor
<td><code>del x</code>
<td><code class=pp>del x</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__del__><code>x.__del__()</code></a>
<tr><th>
<td>only a specific set of attributes to be defined
@@ -789,35 +786,36 @@ def __exit__(self, *args) -> None:
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__slots__><code>x.__slots__()</code></a>
<tr><th>
<td>a custom hash value
<td><code>hash(x)</code>
<td><code class=pp>hash(x)</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__hash__><code>x.__hash__()</code></a>
<tr><th>
<td>to get an attribute&#8217;s value
<td><code>x.color</code>
<td><code class=pp>x.color</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__get__><code>type(x).__dict__['color'].__get__(x, type(x))</code></a>
<tr><th>
<td>to set an attribute&#8217;s value
<td><code>x.color = 'PapayaWhip'</code>
<td><code class=pp>x.color = 'PapayaWhip'</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__set__><code>type(x).__dict__['color'].__set__(x, 'PapayaWhip')</code></a>
<tr><th>
<td>to delete an attribute
<td><code>del x.color</code>
<td><code class=pp>del x.color</code>
<td><a href=http://www.python.org/doc/3.0/reference/datamodel.html#object.__delete__><code>type(x).__dict__['color'].__del__(x)</code></a>
<tr><th>
<td>to control whether an object is an instance of your class
<td><code>isinstance(x, MyClass)</code>
<td><code class=pp>isinstance(x, MyClass)</code>
<td><a href=http://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass><code>MyClass.__instancecheck__(x)</code></a>
<tr><th>
<td>to control whether a class is a subclass of your class
<td><code>issubclass(C, MyClass)</code>
<td><code class=pp>issubclass(C, MyClass)</code>
<td><a href=http://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass><code>MyClass.__subclasscheck__(C)</code></a>
<tr><th>
<td>to control whether a class is a subclass of your abstract base class
<td><code>issubclass(C, MyABC)</code>
<td><code class=pp>issubclass(C, MyABC)</code>
<td><a href=http://docs.python.org/3.0/library/abc.html#abc.ABCMeta.__subclasshook__><code>MyABC.__subclasshook__(C)</code></a>
</table>
<p class=v><a href=porting-code-to-python-3-with-2to3.html rel=prev title='back to &#8220;Porting code to Python 3 with 2to3&#8221;'><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a href=porting-code-to-python-3-with-2to3.html rel=prev title='back to &#8220;Porting code to Python 3 with 2to3&#8221;'><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+48 -48
View File
@@ -12,12 +12,12 @@ body{counter-reset:h1 3}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#strings>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#strings>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
<h1>Strings</h1>
<blockquote class=q>
<p><span>&#x275D;</span> I&#8217;m telling you this &#8217;cause you&#8217;re one of my friends.<br>
My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr. Seuss, On Beyond Zebra!
<p><span class=u>&#x275D;</span> I&#8217;m telling you this &#8217;cause you&#8217;re one of my friends.<br>
My alphabet starts where your alphabet ends! <span class=u>&#x275E;</span><br>&mdash; Dr. Seuss, On Beyond Zebra!
</blockquote>
<p id=toc>&nbsp;
<h2 id=boring-stuff>Some Boring Stuff You Need To Understand Before You Can Dive In</h2>
@@ -84,12 +84,12 @@ My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr
<p>In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8, or a Python string encoded as CP-1252. &#8220;Is this string UTF-8?&#8221; is an invalid question. UTF-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>s = '深入 Python'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>len(s)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>s = '深入 Python'</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>len(s)</kbd> <span class=u>&#x2461;</span></a>
<samp>9</samp>
<a><samp class=p>>>> </samp><kbd>s[0]</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>s[0]</kbd> <span class=u>&#x2462;</span></a>
<samp>'深'</samp>
<a><samp class=p>>>> </samp><kbd>s + ' 3'</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>s + ' 3'</kbd> <span class=u>&#x2463;</span></a>
<samp>'深入 Python 3'</samp></pre>
<ol>
<li>To create a string, enclose it in quotes. Python strings can be defined with either single quotes (<code>'</code>) or double quotes (<code>"</code>).<!--"-->
@@ -106,12 +106,11 @@ My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr
<p>Let&#8217;s take another look at <a href=your-first-python-program.html#divingin><code>humansize.py</code></a>:
<p class=d>[<a href=examples/humansize.py>download <code>humansize.py</code></a>]
<pre><code>
<a>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], <span>&#x2460;</span></a>
<pre><code class=pp><a>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], <span class=u>&#x2460;</span></a>
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<a> '''Convert a file size to human-readable form. <span>&#x2461;</span></a>
<a> '''Convert a file size to human-readable form. <span class=u>&#x2461;</span></a>
Keyword arguments:
size -- file size in bytes
@@ -120,15 +119,15 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
Returns: string
<a> ''' <span>&#x2462;</span></a>
<a> ''' <span class=u>&#x2462;</span></a>
if size &lt; 0:
<a> raise ValueError('number must be non-negative') <span>&#x2463;</span></a>
<a> raise ValueError('number must be non-negative') <span class=u>&#x2463;</span></a>
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]:
size /= multiple
if size &lt; multiple:
<a> return '{0:.1f} {1}'.format(size, suffix) <span>&#x2464;</span></a>
<a> return '{0:.1f} {1}'.format(size, suffix) <span class=u>&#x2464;</span></a>
raise ValueError('number too large')</code></pre>
<ol>
@@ -143,8 +142,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<pre class=screen>
<samp class=p>>>> </samp><kbd>username = 'mark'</kbd>
<a><samp class=p>>>> </samp><kbd>password = 'PapayaWhip'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>"{0}'s password is {1}".format(username, password)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>password = 'PapayaWhip'</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>"{0}'s password is {1}".format(username, password)</kbd> <span class=u>&#x2461;</span></a>
<samp>"mark's password is PapayaWhip"</samp></pre>
<ol>
<li>No, my password is not really <kbd>PapayaWhip</kbd>.
@@ -157,10 +156,10 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<pre class=screen>
<samp class=p>>>> </samp><kbd>import humansize</kbd>
<a><samp class=p>>>> </samp><kbd>si_suffixes = humansize.SUFFIXES[1000]</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>si_suffixes = humansize.SUFFIXES[1000]</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>si_suffixes</kbd>
<samp>['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']</samp>
<a><samp class=p>>>> </samp><kbd>'1000{0[0]} = 1{0[1]}'.format(si_suffixes)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>'1000{0[0]} = 1{0[1]}'.format(si_suffixes)</kbd> <span class=u>&#x2461;</span></a>
<samp>'1000KB = 1MB'</samp>
</pre>
<ol>
@@ -202,13 +201,13 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<p>But wait! There&#8217;s more! Let&#8217;s take another look at that strange line of code from <code>humansize.py</code>:
<pre><code>if size &lt; multiple:
<pre><code class=pp>if size &lt; multiple:
return '{0:.1f} {1}'.format(size, suffix)</code></pre>
<p><code>{1}</code> is replaced with the second argument passed to the <code>format()</code> method, which is <var>suffix</var>. But what is <code>{0:.1f}</code>? It&#8217;s two things: <code>{0}</code>, which you recognize, and <code>:.1f</code>, which you don&#8217;t. The second half (including and after the colon) defines the <i>format specifier</i>, which further refines how the replaced variable should be formatted.
<blockquote class='note compare clang'>
<p><span>&#x261E;</span>Format specifiers allow you to munge the replacement text in a variety of useful ways, like the <code>printf()</code> function in C. You can add zero- or space-padding, align strings, control decimal precision, and even convert numbers to hexadecimal.
<p><span class=u>&#x261E;</span>Format specifiers allow you to munge the replacement text in a variety of useful ways, like the <code>printf()</code> function in C. You can add zero- or space-padding, align strings, control decimal precision, and even convert numbers to hexadecimal.
</blockquote>
<p>Within a replacement field, a colon (<code>:</code>) marks the start of the format specifier. The format specifier &#8220;<code>.1</code>&#8221; means &#8220;round to the nearest tenth&#8221; (<i>i.e.</i> display only one digit after the decimal point). The format specifier &#8220;<code>f</code>&#8221; means &#8220;fixed-point number&#8221; (as opposed to exponential notation or some other decimal representation). Thus, given a <var>size</var> of <code>698.25</code> and <var>suffix</var> of <code>'GB'</code>, the formatted string would be <code>'698.3 GB'</code>, because <code>698.25</code> gets rounded to one decimal place, then the suffix is appended after the number.
@@ -226,21 +225,21 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<p>Besides formatting, strings can do a number of other useful tricks.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>s = '''Finished files are the re-</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>s = '''Finished files are the re-</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>... </samp><kbd>sult of years of scientif-</kbd>
<samp class=p>... </samp><kbd>ic study combined with the</kbd>
<samp class=p>... </samp><kbd>experience of years.'''</kbd>
<a><samp class=p>>>> </samp><kbd>s.splitlines()</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>s.splitlines()</kbd> <span class=u>&#x2461;</span></a>
<samp>['Finished files are the re-',
'sult of years of scientif-',
'ic study combined with the',
'experience of years.']</samp>
<a><samp class=p>>>> </samp><kbd>print(s.lower())</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>print(s.lower())</kbd> <span class=u>&#x2462;</span></a>
<samp>finished files are the re-
sult of years of scientif-
ic study combined with the
experience of years.</samp>
<a><samp class=p>>>> </samp><kbd>s.lower().count('f')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>s.lower().count('f')</kbd> <span class=u>&#x2463;</span></a>
<samp>6</samp></pre>
<ol>
<li>You can input multi-line strings in the Python interactive shell. Once you start a multi-line string with triple quotation marks, just hit <kbd>ENTER</kbd> and the interactive shell will prompt you to continue the string. Typing the closing triple quotation marks ends the string, and the next <kbd>ENTER</kbd> will execute the command (in this case, assigning the string to <var>s</var>).
@@ -253,13 +252,13 @@ experience of years.</samp>
<pre class=screen>
<samp class=p>>>> </samp><kbd>query = 'user=pilgrim&amp;database=master&amp;password=PapayaWhip'</kbd>
<a><samp class=p>>>> </samp><kbd>a_list = query.split('&amp;')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list = query.split('&amp;')</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp>['user=pilgrim', 'database=master', 'password=PapayaWhip']</samp>
<a><samp class=p>>>> </samp><kbd>a_list_of_lists = [v.split('=', 1) for v in a_list]</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>a_list_of_lists = [v.split('=', 1) for v in a_list]</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>a_list_of_lists</kbd>
<samp>[['user', 'pilgrim'], ['database', 'master'], ['password', 'PapayaWhip']]</samp>
<a><samp class=p>>>> </samp><kbd>a_dict = dict(a_list_of_lists)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>a_dict = dict(a_list_of_lists)</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>a_dict</kbd>
<samp>{'password': 'PapayaWhip', 'user': 'pilgrim', 'database': 'master'}</samp></pre>
@@ -276,21 +275,21 @@ experience of years.</samp>
<p>Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a <i>string</i>. An immutable sequence of numbers-between-0-and-255 is called a <i>bytes</i> object.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>by = b'abcd\x65'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>by = b'abcd\x65'</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>by</kbd>
<samp>b'abcde'</samp>
<a><samp class=p>>>> </samp><kbd>type(by)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>type(by)</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;class 'bytes'></samp>
<a><samp class=p>>>> </samp><kbd>len(by)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>len(by)</kbd> <span class=u>&#x2462;</span></a>
<samp>5</samp>
<a><samp class=p>>>> </samp><kbd>by += b'\xff'</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>by += b'\xff'</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>by</kbd>
<samp>b'abcde\xff'</samp>
<a><samp class=p>>>> </samp><kbd>len(by)</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>len(by)</kbd> <span class=u>&#x2464;</span></a>
<samp>6</samp>
<a><samp class=p>>>> </samp><kbd>by[0]</kbd> <span>&#x2465;</span></a>
<a><samp class=p>>>> </samp><kbd>by[0]</kbd> <span class=u>&#x2465;</span></a>
<samp>97</samp>
<a><samp class=p>>>> </samp><kbd>by[0] = 102</kbd> <span>&#x2466;</span></a>
<a><samp class=p>>>> </samp><kbd>by[0] = 102</kbd> <span class=u>&#x2466;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
TypeError: 'bytes' object does not support item assignment</samp></pre>
@@ -306,12 +305,12 @@ TypeError: 'bytes' object does not support item assignment</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>by = b'abcd\x65'</kbd>
<a><samp class=p>>>> </samp><kbd>barr = bytearray(by)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>barr = bytearray(by)</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>barr</kbd>
<samp>bytearray(b'abcde')</samp>
<a><samp class=p>>>> </samp><kbd>len(barr)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>len(barr)</kbd> <span class=u>&#x2461;</span></a>
<samp>5</samp>
<a><samp class=p>>>> </samp><kbd>barr[0] = 102</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>barr[0] = 102</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>barr</kbd>
<samp>bytearray(b'fbcde')</samp></pre>
<ol>
@@ -325,15 +324,15 @@ TypeError: 'bytes' object does not support item assignment</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>by = b'd'</kbd>
<samp class=p>>>> </samp><kbd>s = 'abcde'</kbd>
<a><samp class=p>>>> </samp><kbd>by + s</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>by + s</kbd> <span class=u>&#x2460;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
TypeError: can't concat bytes to str</samp>
<a><samp class=p>>>> </samp><kbd>s.count(by)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>s.count(by)</kbd> <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
TypeError: Can't convert 'bytes' object to str implicitly</samp>
<a><samp class=p>>>> </samp><kbd>s.count(by.decode('ascii'))</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>s.count(by.decode('ascii'))</kbd> <span class=u>&#x2462;</span></a>
<samp>1</samp></pre>
<ol>
<li>You can&#8217;t concatenate bytes and strings. They are two different data types.
@@ -344,25 +343,25 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
<p>And here is the link between strings and bytes: <code>bytes</code> objects have a <code>decode()</code> method that takes a character encoding and returns a string, and strings have an <code>encode()</code> method that takes a character encoding and returns a <code>bytes</code> object. In the previous example, the decoding was relatively straightforward &mdash; converting a sequence of bytes n the <abbr>ASCII</abbr> encoding into a string of characters. But the same process works with any encoding that supports the characters of the string &mdash; even legacy (non-Unicode) encodings.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>a_string = '深入 Python'</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>a_string = '深入 Python'</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>len(a_string)</kbd>
<samp>9</samp>
<a><samp class=p>>>> </samp><kbd>by = a_string.encode('utf-8')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>by = a_string.encode('utf-8')</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>by</kbd>
<samp>b'\xe6\xb7\xb1\xe5\x85\xa5 Python'</samp>
<samp class=p>>>> </samp><kbd>len(by)</kbd>
<samp>13</samp>
<a><samp class=p>>>> </samp><kbd>by = a_string.encode('gb18030')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>by = a_string.encode('gb18030')</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>by</kbd>
<samp>b'\xc9\xee\xc8\xeb Python'</samp>
<samp class=p>>>> </samp><kbd>len(by)</kbd>
<samp>11</samp>
<a><samp class=p>>>> </samp><kbd>by = a_string.encode('big5')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>by = a_string.encode('big5')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>by</kbd>
<samp>b'\xb2`\xa4J Python'</samp>
<samp class=p>>>> </samp><kbd>len(by)</kbd>
<samp>11</samp>
<a><samp class=p>>>> </samp><kbd>roundtrip = by.decode('big5')</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>roundtrip = by.decode('big5')</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp><kbd>roundtrip</kbd>
<samp>'深入 Python'</samp>
<samp class=p>>>> </samp><kbd>a_string == roundtrip</kbd>
@@ -382,16 +381,16 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
<p>Python 3 assumes that your source code &mdash; <i>i.e.</i> each <code>.py</code> file &mdash; is encoded in UTF-8.
<blockquote class='note compare python2'>
<p><span>&#x261E;</span>In Python 2, the default encoding for <code>.py</code> files was <abbr>ASCII</abbr>. In Python 3, <a href=http://www.python.org/dev/peps/pep-3120/>the default encoding is UTF-8</a>.
<p><span class=u>&#x261E;</span>In Python 2, the default encoding for <code>.py</code> files was <abbr>ASCII</abbr>. In Python 3, <a href=http://www.python.org/dev/peps/pep-3120/>the default encoding is UTF-8</a>.
</blockquote>
<p>If you would like to use a different encoding within your Python code, you can put an encoding declaration on the first line of each file. This declaration defines a <code>.py</code> file to be windows-1252:
<pre><code># -*- coding: windows-1252 -*-</code></pre>
<pre><code class=pp># -*- coding: windows-1252 -*-</code></pre>
<p>Technically, the character encoding override can also be on the second line, if the first line is a <abbr>UNIX</abbr>-like hash-bang command.
<pre><code>#!/usr/bin/python3
<pre><code class=pp>#!/usr/bin/python3
# -*- coding: windows-1252 -*-</code></pre>
<p>For more information, consult <a href=http://www.python.org/dev/peps/pep-0263/><abbr>PEP</abbr> 263: Defining Python Source Code Encodings</a>.
@@ -432,8 +431,9 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
<li><a href=http://www.python.org/dev/peps/pep-3101/><abbr>PEP</abbr> 3101: Advanced String Formatting</a>
</ul>
<p class=v><a href=native-datatypes.html rel=prev title='back to &#8220;Native Datatypes&#8221;'><span>&#x261C;</span></a> <a href=regular-expressions.html rel=next title='onward to &#8220;Regular Expressions&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=native-datatypes.html rel=prev title='back to &#8220;Native Datatypes&#8221;'><span class=u>&#x261C;</span></a> <a href=regular-expressions.html rel=next title='onward to &#8220;Regular Expressions&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+1 -1
View File
@@ -15,7 +15,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8><input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> Dive Into Python 3 <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> Dive Into Python 3 <span class=u>&#8227;</span>
<h1>Table of Contents</h1>
<ol start=-1>
<li id=whats-new><a href=whats-new.html>What&#8217;s New In &#8220;Dive Into Python 3&#8221;</a>
+47 -48
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 8}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#unit-testing>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#unit-testing>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=beginner>&#x2666;&#x2666;&#x2662;&#x2662;&#x2662;</span>
<h1>Unit Testing</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Certitude is not the test of certainty. We have been cocksure of many things that were not so. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Oliver_Wendell_Holmes,_Jr.>Oliver Wendell Holmes, Jr.</a>
<p><span class=u>&#x275D;</span> Certitude is not the test of certainty. We have been cocksure of many things that were not so. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Oliver_Wendell_Holmes,_Jr.>Oliver Wendell Holmes, Jr.</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>(Not) Diving In</h2>
@@ -57,10 +57,10 @@ body{counter-reset:h1 8}
</ol>
<p>It is not immediately obvious how this code does&hellip; well, <em>anything</em>. It defines a class which has no <code>__init__()</code> method. The class <em>does</em> have another method, but it is never called. The entire script has a <code>__main__</code> block, but it doesn&#8217;t reference the class or its method. But it does do something, I promise.
<p class=d>[<a href=examples/romantest1.py>download <code>romantest1.py</code></a>]
<pre><code>import roman1
<pre><code class=pp>import roman1
import unittest
<a>class KnownValues(unittest.TestCase): <span>&#x2460;</span></a>
<a>class KnownValues(unittest.TestCase): <span class=u>&#x2460;</span></a>
known_values = ( (1, 'I'),
(2, 'II'),
(3, 'III'),
@@ -116,13 +116,13 @@ import unittest
(3844, 'MMMDCCCXLIV'),
(3888, 'MMMDCCCLXXXVIII'),
(3940, 'MMMCMXL'),
<a> (3999, 'MMMCMXCIX')) <span>&#x2461;</span></a>
<a> (3999, 'MMMCMXCIX')) <span class=u>&#x2461;</span></a>
<a> def test_to_roman_known_values(self): <span>&#x2462;</span></a>
<a> def test_to_roman_known_values(self): <span class=u>&#x2462;</span></a>
'''to_roman should give known result with known input'''
for integer, numeral in self.known_values:
<a> result = roman1.to_roman(integer) <span>&#x2463;</span></a>
<a> self.assertEqual(numeral, result) <span>&#x2464;</span></a>
<a> result = roman1.to_roman(integer) <span class=u>&#x2463;</span></a>
<a> self.assertEqual(numeral, result) <span class=u>&#x2464;</span></a>
if __name__ == '__main__':
unittest.main()</code></pre>
@@ -135,18 +135,18 @@ if __name__ == '__main__':
</ol>
<aside>Write a test that fails, then code until it passes.</aside>
<p>Once you have a test case, you can start coding the <code>to_roman()</code> function. First, you should stub it out as an empty function and make sure the tests fail. If the tests succeed before you&#8217;ve written any code, you&#8217;re doing it wrong &mdash; your tests aren&#8217;t testing your code at all! Write a test that fails, then code until it passes.
<pre><code># roman1.py
<pre><code class=pp># roman1.py
function to_roman(n):
'''convert integer to Roman numeral'''
<a> pass <span>&#x2460;</span></a></code></pre>
<a> pass <span class=u>&#x2460;</span></a></code></pre>
<ol>
<li>At this stage, you want to define the <abbr>API</abbr> of the <code>to_roman()</code> function, but you don&#8217;t want to code it yet. (Your test needs to fail first.) To stub it out, use the Python reserved word <code>pass</code> [FIXME ref], which does precisely nothing.
</ol>
<p>Execute <code>romantest1.py</code> on the command line to run the test. If you call it with the <code>-v</code> command-line option, it will give more verbose output so you can see exactly what&#8217;s going on as each test case runs. With any luck, your output should look like this:
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest1.py -v</kbd>
<samp><a>to_roman should give known result with known input ... FAIL <span>&#x2460;</span></a>
<samp><a>to_roman should give known result with known input ... FAIL <span class=u>&#x2460;</span></a>
======================================================================
FAIL: to_roman should give known result with known input
@@ -154,12 +154,12 @@ FAIL: to_roman should give known result with known input
Traceback (most recent call last):
File "romantest1.py", line 73, in test_to_roman_known_values
self.assertEqual(numeral, result)
<a>AssertionError: 'I' != None <span>&#x2461;</span></a>
<a>AssertionError: 'I' != None <span class=u>&#x2461;</span></a>
----------------------------------------------------------------------
<a>Ran 1 test in 0.016s <span>&#x2462;</span></a>
<a>Ran 1 test in 0.016s <span class=u>&#x2462;</span></a>
<a>FAILED (failures=1) <span>&#x2463;</span></a></samp></pre>
<a>FAILED (failures=1) <span class=u>&#x2463;</span></a></samp></pre>
<ol>
<li>Running the script runs <code>unittest.main()</code>, which runs each test case. Each test case is a method within each class in <code>romantest.py</code> that inherits from <code>unittest.TestCase</code>. For each test case, the <code>unittest</code> module will print out the <code>docstring</code> of the method and whether that test passed or failed. As expected, this test case fails.
<li>For each failed test case, <code>unittest</code> displays the trace information showing exactly what happened. In this case, the call to <code>assertEqual()</code> raised an <code>AssertionError</code> because it was expecting <code>to_roman(1)</code> to return <code>'I'</code>, but it didn&#8217;t. (Since there was no explicit return statement, the function returned <code>None</code>, the Python null value.)
@@ -168,7 +168,7 @@ Traceback (most recent call last):
</ol>
<p><em>Now</em>, finally, you can write the <code>to_roman()</code> function.
<p class=d>[<a href=examples/roman1.py>download <code>roman1.py</code></a>]
<pre><code>roman_numeral_map = (('M', 1000),
<pre><code class=pp>roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
@@ -180,13 +180,13 @@ Traceback (most recent call last):
('IX', 9),
('V', 5),
('IV', 4),
<a> ('I', 1)) <span>&#x2460;</span></a>
<a> ('I', 1)) <span class=u>&#x2460;</span></a>
def to_roman(n):
'''convert integer to Roman numeral'''
result = ''
for numeral, integer in roman_numeral_map:
<a> while n >= integer: <span>&#x2461;</span></a>
<a> while n >= integer: <span class=u>&#x2461;</span></a>
result += numeral
n -= integer
return result</code></pre>
@@ -195,7 +195,7 @@ def to_roman(n):
<li>Here&#8217;s where the rich data structure of <var>roman_numeral_map</var> pays off, because you don&#8217;t need any special logic to handle the subtraction rule. To convert to Roman numerals, simply iterate through <var>roman_numeral_map</var> looking for the largest integer value less than or equal to the input. Once found, add the Roman numeral representation to the end of the output, subtract the corresponding integer value from the input, lather, rinse, repeat.
</ol>
<p>If you&#8217;re still not clear how the <code>to_roman()</code> function works, add a <code>print()</code> call to the end of the <code>while</code> loop:
<pre><code>
<pre><code class=pp>
while n >= integer:
result += numeral
n -= integer
@@ -234,7 +234,7 @@ OK</samp></pre>
<samp>'MMMM'</samp>
<samp class=p>>>> </samp><kbd>roman1.to_roman(5000)</kbd>
<samp>'MMMMM'</samp>
<a><samp class=p>>>> </samp><kbd>roman1.to_roman(9000)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>roman1.to_roman(9000)</kbd> <span class=u>&#x2460;</span></a>
<samp>'MMMMMMMMM'</samp></pre>
<ol>
<li>That&#8217;s definitely not what you wanted &mdash; that&#8217;s not even a valid Roman numeral! In fact, each of these numbers is outside the range of acceptable input, but the function returns a bogus value anyway. Silently returning bad values is <em>baaaaaaad</em>; if a program is going to fail, it is far better that it fail quickly and noisily. &#8220;Halt and catch fire,&#8221; as the saying goes. The Pythonic way to halt and catch fire is to raise an exception.
@@ -245,11 +245,10 @@ OK</samp></pre>
</blockquote>
<p>What would that test look like?
<p class=d>[<a href=examples/romantest2.py>download <code>romantest2.py</code></a>]
<pre><code>
<a>class ToRomanBadInput(unittest.TestCase): <span>&#x2460;</span></a>
<a> def test_too_large(self): <span>&#x2461;</span></a>
<pre><code class=pp><a>class ToRomanBadInput(unittest.TestCase): <span class=u>&#x2460;</span></a>
<a> def test_too_large(self): <span class=u>&#x2461;</span></a>
'''to_roman should fail with large input'''
<a> self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 4000) <span>&#x2462;</span></a></code></pre>
<a> self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 4000) <span class=u>&#x2462;</span></a></code></pre>
<ol>
<li>Like the previous test case, you create a class that inherits from <code>unittest.TestCase</code>. You can have more than one test per class (as you&#8217;ll see later in this chapter), but I chose to create a new class here because this test is something different than the last one. We&#8217;ll keep all the good input tests together in one class, and all the bad input tests together in another.
<li>Like the previous test case, the test itself is a method of the class, with a name starting with <code>test</code>.
@@ -261,7 +260,7 @@ OK</samp></pre>
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest2.py -v</kbd>
<samp>to_roman should give known result with known input ... ok
<a>to_roman should fail with large input ... ERROR <span>&#x2460;</span></a>
<a>to_roman should fail with large input ... ERROR <span class=u>&#x2460;</span></a>
======================================================================
ERROR: to_roman should fail with large input
@@ -269,7 +268,7 @@ ERROR: to_roman should fail with large input
Traceback (most recent call last):
File "romantest2.py", line 78, in test_too_large
self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 4000)
<a>AttributeError: 'module' object has no attribute 'OutOfRangeError' <span>&#x2461;</span></a>
<a>AttributeError: 'module' object has no attribute 'OutOfRangeError' <span class=u>&#x2461;</span></a>
----------------------------------------------------------------------
Ran 2 tests in 0.000s
@@ -280,8 +279,8 @@ FAILED (errors=1)</samp></pre>
<li>Why didn&#8217;t the code execute properly? The traceback gives the answer: the module you&#8217;re testing doesn&#8217;t have an exception called <code>OutOfRangeError</code>. Remember, you passed this exception to the <code>assertRaises()</code> method, because it&#8217;s the exception you want the function to raise given an out-of-range input. But the exception doesn&#8217;t exist, so the call to the <code>assertRaises()</code> method failed. It never got a chance to test the <code>to_roman()</code> function; it didn&#8217;t get that far.
</ol>
<p>To solve this problem, you need to define the <code>OutOfRangeError</code> exception in <code>roman2.py</code>.
<pre><code><a>class OutOfRangeError(ValueError): <span>&#x2460;</span></a>
<a> pass <span>&#x2461;</span></a></code></pre>
<pre><code class=pp><a>class OutOfRangeError(ValueError): <span class=u>&#x2460;</span></a>
<a> pass <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>Exceptions are classes. An &#8220;out of range&#8221; error is a kind of value error &mdash; the argument value is out of its acceptable range. So this exception inherits from the built-in <code>ValueError</code> exception. This is not strictly necessary (it could just inherit from the base <code>Exception</code> class), but it feels right.
<li>Exceptions don&#8217;t actually do anything, but you need at least one line of code to make a class. Calling <code>pass</code> does precisely nothing, but it&#8217;s a line of Python code, so that makes it a class.
@@ -290,7 +289,7 @@ FAILED (errors=1)</samp></pre>
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest2.py -v</kbd>
<samp>to_roman should give known result with known input ... ok
<a>to_roman should fail with large input ... FAIL <span>&#x2460;</span></a>
<a>to_roman should fail with large input ... FAIL <span class=u>&#x2460;</span></a>
======================================================================
FAIL: to_roman should fail with large input
@@ -298,7 +297,7 @@ FAIL: to_roman should fail with large input
Traceback (most recent call last):
File "romantest2.py", line 78, in test_too_large
self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 4000)
<a>AssertionError: OutOfRangeError not raised by to_roman <span>&#x2461;</span></a>
<a>AssertionError: OutOfRangeError not raised by to_roman <span class=u>&#x2461;</span></a>
----------------------------------------------------------------------
Ran 2 tests in 0.016s
@@ -310,10 +309,10 @@ FAILED (failures=1)</samp></pre>
</ol>
<p>Now you can write the code to make this test pass.
<p class=d>[<a href=examples/roman2.py>download <code>roman2.py</code></a>]
<pre><code>def to_roman(n):
<pre><code class=pp>def to_roman(n):
'''convert integer to Roman numeral'''
if n > 3999:
<a> raise OutOfRangeError('number out of range (must be less than 3999)') <span>&#x2460;</span></a>
<a> raise OutOfRangeError('number out of range (must be less than 3999)') <span class=u>&#x2460;</span></a>
result = ''
for numeral, integer in roman_numeral_map:
@@ -328,7 +327,7 @@ FAILED (failures=1)</samp></pre>
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest2.py -v</kbd>
<samp>to_roman should give known result with known input ... ok
<a>to_roman should fail with large input ... ok <span>&#x2460;</span></a>
<a>to_roman should fail with large input ... ok <span class=u>&#x2460;</span></a>
----------------------------------------------------------------------
Ran 2 tests in 0.000s
@@ -354,19 +353,18 @@ OK</samp></pre>
<p>Well <em>that&#8217;s</em> not good. Let&#8217;s add tests for each of these conditions.
<p class=d>[<a href=examples/romantest3.py>download <code>romantest3.py</code></a>]
<pre><code>
class ToRomanBadInput(unittest.TestCase):
<pre><code class=pp>class ToRomanBadInput(unittest.TestCase):
def test_too_large(self):
'''to_roman should fail with large input'''
<a> self.assertRaises(roman3.OutOfRangeError, roman3.to_roman, 4000) <span>&#x2460;</span></a>
<a> self.assertRaises(roman3.OutOfRangeError, roman3.to_roman, 4000) <span class=u>&#x2460;</span></a>
def test_zero(self):
'''to_roman should fail with 0 input'''
<a> self.assertRaises(roman3.OutOfRangeError, roman3.to_roman, 0) <span>&#x2461;</span></a>
<a> self.assertRaises(roman3.OutOfRangeError, roman3.to_roman, 0) <span class=u>&#x2461;</span></a>
def test_negative(self):
'''to_roman should fail with negative input'''
<a> self.assertRaises(roman3.OutOfRangeError, roman3.to_roman, -1) <span>&#x2462;</span></a></code></pre>
<a> self.assertRaises(roman3.OutOfRangeError, roman3.to_roman, -1) <span class=u>&#x2462;</span></a></code></pre>
<ol>
<li>The <code>test_too_large()</code> method has not changed since the previous step. I&#8217;m including it here to show where the new code fits.
<li>Here&#8217;s a new test: the <code>test_zero()</code> method. Like the <code>test_too_large()</code> method, it tells the <code>assertRaises()</code> method defined in <code>unittest.TestCase</code> to call our <code>to_roman()</code> function with a parameter of <code>0</code>, and check that it raises the appropriate exception, <code>OutOfRangeError</code>.
@@ -406,10 +404,10 @@ FAILED (failures=2)</samp></pre>
<p>Excellent. Both tests failed, as expected. Now let&#8217;s switch over to the code and see what we can do to make them pass.
<p class=d>[<a href=examples/roman3.py>download <code>roman3.py</code></a>]
<pre><code>def to_roman(n):
<pre><code class=pp>def to_roman(n):
'''convert integer to Roman numeral'''
<a> if not (0 < n < 4000): <span>&#x2460;</span></a>
<a> raise OutOfRangeError('number out of range (must be 0..3999)') <span>&#x2461;</span></a>
<a> if not (0 < n < 4000): <span class=u>&#x2460;</span></a>
<a> raise OutOfRangeError('number out of range (must be 0..3999)') <span class=u>&#x2461;</span></a>
result = ''
for numeral, integer in roman_numeral_map:
@@ -444,9 +442,9 @@ OK</samp></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd>import roman3</kbd>
<a><samp class=p>>>> </samp><kbd>roman3.to_roman(0.5)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>roman3.to_roman(0.5)</kbd> <span class=u>&#x2460;</span></a>
<samp>''</samp>
<a><samp class=p>>>> </samp><kbd>roman3.to_roman(1.5)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>roman3.to_roman(1.5)</kbd> <span class=u>&#x2461;</span></a>
<samp>'I'</samp></pre>
<ol>
<li>Oh, that&#8217;s bad.
@@ -455,13 +453,13 @@ OK</samp></pre>
<p>Testing for non-integers is not difficult. First, define a <code>NonIntegerError</code> exception.
<pre><code># roman4.py
<pre><code class=pp># roman4.py
class OutOfRangeError(ValueError): pass
<mark>class NotIntegerError(ValueError): pass</mark></code></pre>
<p>Next, write a test case that checks for the <code>NonIntegerError</code> exception.
<pre><code>class ToRomanBadInput(unittest.TestCase):
<pre><code class=pp>class ToRomanBadInput(unittest.TestCase):
.
.
.
@@ -494,12 +492,12 @@ FAILED (failures=1)</samp></pre>
<p>Write the code that makes the test pass.
<pre><code>def to_roman(n):
<pre><code class=pp>def to_roman(n):
'''convert integer to Roman numeral'''
if not (0 < n < 4000):
raise OutOfRangeError('number out of range (must be 0..3999)')
<a> if not isinstance(n, int): <span>&#x2460;</span></a>
<a> raise NotIntegerError('non-integers can not be converted') <span>&#x2461;</span></a>
<a> if not isinstance(n, int): <span class=u>&#x2460;</span></a>
<a> raise NotIntegerError('non-integers can not be converted') <span class=u>&#x2461;</span></a>
result = ''
for numeral, integer in roman_numeral_map:
@@ -529,7 +527,8 @@ OK</samp></pre>
<p>Now stop coding.
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+10 -2
View File
@@ -4,12 +4,20 @@ from pyquery import PyQuery as pq
import glob
import sys
# These selectors are kept regardless of whether this script thinks they are used.
# Most of these match nodes that are dynamically inserted or manipulated by script
# after the page has loaded, which is why a static analysis thinks they're unused.
SELECTOR_EXCEPTIONS = ('.w', '.b', '.str', '.kwd', '.com', '.typ', '.lit', '.pun', '.tag', '.atn', '.atv', '.dec', 'pre span.u', 'pre span.u span')
filename = sys.argv[1]
pqd = pq(filename=filename)
raw_data = open(filename, 'rb').read()
if raw_data.count('<pre><code>') or raw_data.count('<pre class=screen>'):
if raw_data.count('<pre><code') or raw_data.count('<pre class=screen>'):
def keep(s):
return s == '.w' or s.startswith('.w ') or s == '.b' or s.startswith('.b ')
for selector in SELECTOR_EXCEPTIONS:
if s == selector: return True
if s.startswith(selector + ' '): return True
return False
else:
def keep(s):
return False
+2 -4
View File
@@ -13,10 +13,10 @@ h3:before{content:''}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#whats-new>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#whats-new>Dive Into Python 3</a> <span class=u>&#8227;</span>
<h1>What&#8217;s New In &#8220;Dive Into Python 3&#8221;</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Isn&#8217;t this where we came in? <span>&#x275E;</span><br>&mdash; Pink Floyd, The Wall
<p><span class=u>&#x275D;</span> Isn&#8217;t this where we came in? <span class=u>&#x275E;</span><br>&mdash; Pink Floyd, The Wall
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin><i>a.k.a.</i> &#8220;the minus level&#8221;</h2>
@@ -40,5 +40,3 @@ h3:before{content:''}
<p>That&#8217;s it for now; the book&#8217;s not finished yet! The file I/O subsystem is totally different now; I hope to write about that soon.
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/dip3.js></script>
+3 -3
View File
@@ -12,11 +12,11 @@ body{counter-reset:h1 20}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#where-to-go-from-here>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#where-to-go-from-here>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=pro>&#x2666;&#x2666;&#x2666;&#x2666;&#x2666;</span>
<h1>Where To Go From Here</h1>
<blockquote class=q>
<p><span>&#x275D;</span> FIXME <span>&#x275E;</span><br>&mdash; FIXME
<p><span class=u>&#x275D;</span> FIXME <span class=u>&#x275E;</span><br>&mdash; FIXME
</blockquote>
<p id=toc>&nbsp;
<h2 id=things-to-read>Things to Read</h2>
@@ -63,7 +63,7 @@ body{counter-reset:h1 20}
<li><a href='http://bitbucket.org/repo/all/?name=python3'>BitBucket: list of projects matching &#8220;python3&#8221;</a> (and <a href='http://bitbucket.org/repo/all/?name=python+3'>those matching &#8220;python 3&#8221;</a>)
</ul>
<p class=v><a href=case-study-porting-chardet-to-python-3.html rel=prev title='back to &#8220;Case Study: Porting chardet to Python 3&#8221;'><span>&#x261C;</span></a> <a href=porting-code-to-python-3-with-2to3.html rel=next title='onward to &#8220;Porting Code to Python 3 with 2to3&#8221;'><span>&#x261E;</span></a>
<p class=v><a href=case-study-porting-chardet-to-python-3.html rel=prev title='back to &#8220;Case Study: Porting chardet to Python 3&#8221;'><span class=u>&#x261C;</span></a> <a href=porting-code-to-python-3-with-2to3.html rel=next title='onward to &#8220;Porting Code to Python 3 with 2to3&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
+95 -94
View File
@@ -13,11 +13,11 @@ mark{display:inline}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#xml>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#xml>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
<h1>XML</h1>
<blockquote class=q>
<p><span>&#x275D;</span> In the archonship of Aristaechmus, Draco enacted his ordinances. <span>&#x275E;</span><br>&mdash; <a href='http://www.perseus.tufts.edu/cgi-bin/ptext?doc=Perseus:text:1999.01.0046;query=chapter%3D%235;layout=;loc=3.1'>Aristotle</a>
<p><span class=u>&#x275D;</span> In the archonship of Aristaechmus, Draco enacted his ordinances. <span class=u>&#x275E;</span><br>&mdash; <a href='http://www.perseus.tufts.edu/cgi-bin/ptext?doc=Perseus:text:1999.01.0046;query=chapter%3D%235;layout=;loc=3.1'>Aristotle</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
@@ -26,7 +26,7 @@ mark{display:inline}
<p>Here, then, is the <abbr>XML</abbr> data we&#8217;ll be working with in this chapter. It&#8217;s a feed &mdash; specifically, an <a href=http://atompub.org/rfc4287.html>Atom syndication feed</a>.
<p class=d>[<a href=examples/feed.xml>download <code>feed.xml</code></a>]
<pre><code>&lt;?xml version='1.0' encoding='utf-8'?>
<pre><code class=pp>&lt;?xml version='1.0' encoding='utf-8'?>
&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
&lt;title>dive into mark&lt;/title>
&lt;subtitle>currently between addictions&lt;/subtitle>
@@ -99,8 +99,8 @@ mark{display:inline}
<p><abbr>XML</abbr> is a generalized way of describing hierarchical structured data. An <abbr>XML</abbr> <i>document</i> contains one or more <i>elements</i>, which are delimited by <i>start and end tags</i>. This is a complete (albeit boring) <abbr>XML</abbr> document:
<pre class=nd><code><a>&lt;foo> <span>&#x2460;</span></a>
<a>&lt;/foo> <span>&#x2461;</span></a></code></pre>
<pre class=nd><code class=pp><a>&lt;foo> <span class=u>&#x2460;</span></a>
<a>&lt;/foo> <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>This is the <i>start tag</i> of the <code>foo</code> element.
<li>This is the matching <i>end tag</i> of the <code>foo</code> element. Like balancing parentheses in writing or mathematics or code, every start tag much be <i>closed</i> (matched) by a corresponding end tag.
@@ -108,20 +108,20 @@ mark{display:inline}
<p>Elements can be <i>nested</i> to any depth. An element <code>bar</code> inside an element <code>foo</code> is said to be a <i>subelement</i> or <i>child</i> of <code>foo</code>.
<pre class=nd><code>&lt;foo>
<pre class=nd><code class=pp>&lt;foo>
<mark>&lt;bar>&lt;/bar></mark>
&lt;/foo>
</code></pre>
<p>The first element in every <abbr>XML</abbr> document is called the <i>root element</i>. An <abbr>XML</abbr> document can only have one root element. The following is <strong>not an <abbr>XML</abbr> document</strong>, because it has two root elements:
<pre class=nd><code>&lt;foo>&lt;/foo>
<pre class=nd><code class=pp>&lt;foo>&lt;/foo>
&lt;bar>&lt;/bar></code></pre>
<p>Elements can have <i>attributes</i>, which are name-value pairs. Attributes are listed within the start tag of an element and separated by whitespace. <i>Attribute names</i> can not be repeated within an element. <i>Attribute values</i> must be quoted.
<pre class=nd><code><a>&lt;foo <mark>lang='en'</mark>> <span>&#x2460;</span></a>
<a> &lt;bar <mark>lang='fr'</mark>>&lt;/bar> <span>&#x2461;</span></a>
<pre class=nd><code class=pp><a>&lt;foo <mark>lang='en'</mark>> <span class=u>&#x2460;</span></a>
<a> &lt;bar <mark>lang='fr'</mark>>&lt;/bar> <span class=u>&#x2461;</span></a>
&lt;/foo>
</code></pre>
<ol>
@@ -133,23 +133,23 @@ mark{display:inline}
<p>Elements can have <i>text content</i>.
<pre class=nd><code>&lt;foo lang='en'>
<pre class=nd><code class=pp>&lt;foo lang='en'>
&lt;bar lang='fr'><mark>PapayaWhip</mark>&lt;/bar>
&lt;/foo>
</code></pre>
<p>Elements that contain no text and no children are <i>empty</i>.
<pre class=nd><code>&lt;foo>&lt;/foo></code></pre>
<pre class=nd><code class=pp>&lt;foo>&lt;/foo></code></pre>
<p>There is a shorthand for writing empty elements. By putting a <code>/</code> character in the start tag, you can skip the end tag altogther. The <abbr>XML</abbr> document in the previous example could be written like this instead:
<pre class=nd><code>&lt;foo<mark>/</mark>></code></pre>
<pre class=nd><code class=pp>&lt;foo<mark>/</mark>></code></pre>
<p>Like Python functions can be declared in different <i>modules</i>, <abbr>XML</abbr> elements can be declared in different <i>namespaces</i>. Namespaces usually look like URLs. You use an <code>xmlns</code> declaration to define a <i>default namespace</i>. A namespace declaration looks similar to an attribute, but it has a different purpose.
<pre class=nd><code><a>&lt;feed <mark>xmlns='http://www.w3.org/2005/Atom'</mark>> <span>&#x2460;</span></a>
<a> &lt;title>dive into mark&lt;/title> <span>&#x2461;</span></a>
<pre class=nd><code class=pp><a>&lt;feed <mark>xmlns='http://www.w3.org/2005/Atom'</mark>> <span class=u>&#x2460;</span></a>
<a> &lt;title>dive into mark&lt;/title> <span class=u>&#x2461;</span></a>
&lt;/feed>
</code></pre>
<ol>
@@ -159,8 +159,8 @@ mark{display:inline}
<p>You can also use an <code>xmlns:<var>prefix</var></code> declaration to define a namespace and associate it with a <i>prefix</i>. Then each element in that namespace must be explicitly declared with the prefix.
<pre class=nd><code><a>&lt;atom:feed <mark>xmlns:atom='http://www.w3.org/2005/Atom'</mark>> <span>&#x2460;</span></a>
<a> &lt;atom:title>dive into mark&lt;/atom:title> <span>&#x2461;</span></a>
<pre class=nd><code class=pp><a>&lt;atom:feed <mark>xmlns:atom='http://www.w3.org/2005/Atom'</mark>> <span class=u>&#x2460;</span></a>
<a> &lt;atom:title>dive into mark&lt;/atom:title> <span class=u>&#x2461;</span></a>
&lt;/atom:feed></code></pre>
<ol>
<li>The <code>feed</code> element is in the <code>http://www.w3.org/2005/Atom</code> namespace.
@@ -171,7 +171,7 @@ mark{display:inline}
<p>Finally, <abbr>XML</abbr> documents can contain <a href=strings.html#one-ring-to-rule-them-all>character encoding information</a> on the first line, before the root element. (If you&#8217;re curious how a document can contain information which needs to be known before the document can be parsed, <a href=http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>Section F of the <abbr>XML</abbr> specification</a> details how to resolve this Catch-22.)
<pre class=nd><code>&lt;?xml version='1.0' <mark>encoding='utf-8'</mark>?></code></pre>
<pre class=nd><code class=pp>&lt;?xml version='1.0' <mark>encoding='utf-8'</mark>?></code></pre>
<p>And now you know just enough <abbr>XML</abbr> to be dangerous!
@@ -185,8 +185,8 @@ mark{display:inline}
<p>At the top level is the <i>root element</i>, which every Atom feed shares: the <code>feed</code> element in the <code>http://www.w3.org/2005/Atom</code> namespace.
<pre><code><a>&lt;feed xmlns='http://www.w3.org/2005/Atom' <span>&#x2460;</span></a>
<a> xml:lang='en'> <span>&#x2461;</span></a></code></pre>
<pre><code class=pp><a>&lt;feed xmlns='http://www.w3.org/2005/Atom' <span class=u>&#x2460;</span></a>
<a> xml:lang='en'> <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li><code>http://www.w3.org/2005/Atom</code> is the Atom namespace.
<li>Any element can contain an <code>xml:lang</code> attribute, which declares the language of the element and its children. In this case, the <code>xml:lang</code> attribute is declared once on the root element, which means the entire feed is in English.
@@ -194,12 +194,12 @@ mark{display:inline}
<p>An Atom feed contains several pieces of information about the feed itself. These are declared as children of the root-level <code>feed</code> element.
<pre><code>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
<a> &lt;title>dive into mark&lt;/title> <span>&#x2460;</span></a>
<a> &lt;subtitle>currently between addictions&lt;/subtitle> <span>&#x2461;</span></a>
<a> &lt;id>tag:diveintomark.org,2001-07-29:/&lt;/id> <span>&#x2462;</span></a>
<a> &lt;updated>2009-03-27T21:56:07Z&lt;/updated> <span>&#x2463;</span></a>
<a> &lt;link rel='alternate' type='text/html' href='http://diveintomark.org/'/> <span>&#x2464;</span></a></code></pre>
<pre><code class=pp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
<a> &lt;title>dive into mark&lt;/title> <span class=u>&#x2460;</span></a>
<a> &lt;subtitle>currently between addictions&lt;/subtitle> <span class=u>&#x2461;</span></a>
<a> &lt;id>tag:diveintomark.org,2001-07-29:/&lt;/id> <span class=u>&#x2462;</span></a>
<a> &lt;updated>2009-03-27T21:56:07Z&lt;/updated> <span class=u>&#x2463;</span></a>
<a> &lt;link rel='alternate' type='text/html' href='http://diveintomark.org/'/> <span class=u>&#x2464;</span></a></code></pre>
<ol>
<li>The title of this feed is <code>dive into mark</code>.
<li>The subtitle of this feed is <code>currently between addictions</code>.
@@ -211,30 +211,30 @@ mark{display:inline}
<p>Now we know that this is a feed for a site named &#8220;dive into mark&#8220; which is available at <a href=http://diveintomark.org/><code>http://diveintomark.org/</code></a> and was last updated on March 27, 2009.
<blockquote class=note>
<p><span>&#x261E;</span>Although the order of elements can be relevant in some <abbr>XML</abbr> documents, it is not relevant in an Atom feed.
<p><span class=u>&#x261E;</span>Although the order of elements can be relevant in some <abbr>XML</abbr> documents, it is not relevant in an Atom feed.
</blockquote>
<p>After the feed-level metadata is the list of the most recent articles. An article looks like this:
<pre><code>&lt;entry>
<a> &lt;author> <span>&#x2460;</span></a>
<pre><code class=pp>&lt;entry>
<a> &lt;author> <span class=u>&#x2460;</span></a>
&lt;name>Mark&lt;/name>
&lt;uri>http://diveintomark.org/&lt;/uri>
&lt;/author>
<a> &lt;title>Dive into history, 2009 edition&lt;/title> <span>&#x2461;</span></a>
<a> &lt;link rel='alternate' type='text/html' <span>&#x2462;</span></a>
<a> &lt;title>Dive into history, 2009 edition&lt;/title> <span class=u>&#x2461;</span></a>
<a> &lt;link rel='alternate' type='text/html' <span class=u>&#x2462;</span></a>
href='http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'/>
<a> &lt;id>tag:diveintomark.org,2009-03-27:/archives/20090327172042&lt;/id> <span>&#x2463;</span></a>
<a> &lt;updated>2009-03-27T21:56:07Z&lt;/updated> <span>&#x2464;</span></a>
<a> &lt;id>tag:diveintomark.org,2009-03-27:/archives/20090327172042&lt;/id> <span class=u>&#x2463;</span></a>
<a> &lt;updated>2009-03-27T21:56:07Z&lt;/updated> <span class=u>&#x2464;</span></a>
&lt;published>2009-03-27T17:20:42Z&lt;/published>
<a> &lt;category scheme='http://diveintomark.org' term='diveintopython'/> <span>&#x2465;</span></a>
<a> &lt;category scheme='http://diveintomark.org' term='diveintopython'/> <span class=u>&#x2465;</span></a>
&lt;category scheme='http://diveintomark.org' term='docbook'/>
&lt;category scheme='http://diveintomark.org' term='html'/>
<a> &lt;summary type='html'>Putting an entire chapter on one page sounds <span>&#x2466;</span></a>
<a> &lt;summary type='html'>Putting an entire chapter on one page sounds <span class=u>&#x2466;</span></a>
bloated, but consider this &amp;amp;mdash; my longest chapter so far
would be 75 printed pages, and it loads in under 5 seconds&amp;amp;hellip;
On dialup.&lt;/summary>
<a>&lt;/entry> <span>&#x2467;</span></a></code></pre>
<a>&lt;/entry> <span class=u>&#x2467;</span></a></code></pre>
<ol>
<li>The <code>author</code> element tells who wrote this article: some guy named Mark, whom you can find loafing at <code>http://diveintomark.org/</code>. (This is the same as the alternate link in the feed metadata, but it doesn&#8217;t have to be. Many weblogs have multiple authors, each with their own personal website.)
<li>The <code>title</code> element gives the title of the article, &#8220;Dive into history, 2009 edition&#8221;.
@@ -254,10 +254,10 @@ mark{display:inline}
<p class=d>[<a href=examples/feed.xml>download <code>feed.xml</code></a>]
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>import xml.etree.ElementTree as etree</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree = etree.parse('examples/feed.xml')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>root = tree.getroot()</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>root</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>import xml.etree.ElementTree as etree</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree = etree.parse('examples/feed.xml')</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>root = tree.getroot()</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>root</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;Element {http://www.w3.org/2005/Atom}feed at cd1eb0></samp></pre>
<ol>
<li>The ElementTree library is part of the Python standard library, in <code>xml.etree.ElementTree</code>.
@@ -267,7 +267,7 @@ mark{display:inline}
</ol>
<blockquote class=note>
<p><span>&#x261E;</span>ElementTree represents <abbr>XML</abbr> elements as <code>{<var>namespace</var>}<var>localname</var></code>. You&#8217;ll see and use this format in multiple places in the ElementTree <abbr>API</abbr>.
<p><span class=u>&#x261E;</span>ElementTree represents <abbr>XML</abbr> elements as <code>{<var>namespace</var>}<var>localname</var></code>. You&#8217;ll see and use this format in multiple places in the ElementTree <abbr>API</abbr>.
</blockquote>
<h3 id=xml-elements>Elements Are Lists</h3>
@@ -276,12 +276,12 @@ mark{display:inline}
<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd>root.tag</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>root.tag</kbd> <span class=u>&#x2460;</span></a>
<samp>'{http://www.w3.org/2005/Atom}feed'</samp>
<a><samp class=p>>>> </samp><kbd>len(root)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>len(root)</kbd> <span class=u>&#x2461;</span></a>
<samp>8</samp>
<a><samp class=p>>>> </samp><kbd>for child in root:</kbd> <span>&#x2462;</span></a>
<a><samp class=p>... </samp><kbd> print(child)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>for child in root:</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>... </samp><kbd> print(child)</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>... </samp>
<samp>&lt;Element {http://www.w3.org/2005/Atom}title at e2b5d0>
&lt;Element {http://www.w3.org/2005/Atom}subtitle at e2b4e0>
@@ -306,17 +306,17 @@ mark{display:inline}
<pre class=screen>
# continuing from the previous example
<a><samp class=p>>>> </samp><kbd>root.attrib</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>root.attrib</kbd> <span class=u>&#x2460;</span></a>
<samp>{'{http://www.w3.org/XML/1998/namespace}lang': 'en'}</samp>
<a><samp class=p>>>> </samp><kbd>root[4]</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>root[4]</kbd> <span class=u>&#x2461;</span></a>
<samp>&lt;Element {http://www.w3.org/2005/Atom}link at e181b0></samp>
<a><samp class=p>>>> </samp><kbd>root[4].attrib</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>root[4].attrib</kbd> <span class=u>&#x2462;</span></a>
<samp>{'href': 'http://diveintomark.org/',
'type': 'text/html',
'rel': 'alternate'}</samp>
<a><samp class=p>>>> </samp><kbd>root[3]</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>root[3]</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;Element {http://www.w3.org/2005/Atom}updated at e2b4e0></samp>
<a><samp class=p>>>> </samp><kbd>root[3].attrib</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>root[3].attrib</kbd> <span class=u>&#x2464;</span></a>
<samp>{}</samp></pre>
<ol>
<li>The <code>attrib</code> property is a dictionary of the element&#8217;s attributes. The original markup here was <code>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'></code>. The <code>xml:</code> prefix refers to a built-in namespace that every <abbr>XML</abbr> document can use without declaring it.
@@ -336,15 +336,15 @@ mark{display:inline}
<samp class=p>>>> </samp><kbd>import xml.etree.ElementTree as etree</kbd>
<samp class=p>>>> </samp><kbd>tree = etree.parse('examples/feed.xml')</kbd>
<samp class=p>>>> </samp><kbd>root = tree.getroot()</kbd>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}entry')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}entry')</kbd> <span class=u>&#x2460;</span></a>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}entry at e2b4e0>,
&lt;Element {http://www.w3.org/2005/Atom}entry at e2b510>,
&lt;Element {http://www.w3.org/2005/Atom}entry at e2b540>]</samp>
<samp class=p>>>> </samp><kbd>root.tag</kbd>
<samp>'{http://www.w3.org/2005/Atom}feed'</samp>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}feed')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}feed')</kbd> <span class=u>&#x2461;</span></a>
<samp>[]</samp>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}author')</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}author')</kbd> <span class=u>&#x2462;</span></a>
<samp>[]</samp></pre>
<ol>
<li>The <code>findall()</code> method finds child elements that match a specific query. (More on the query format in a minute.)
@@ -353,11 +353,11 @@ mark{display:inline}
</ol>
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>tree.findall('{http://www.w3.org/2005/Atom}entry')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree.findall('{http://www.w3.org/2005/Atom}entry')</kbd> <span class=u>&#x2460;</span></a>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}entry at e2b4e0>,
&lt;Element {http://www.w3.org/2005/Atom}entry at e2b510>,
&lt;Element {http://www.w3.org/2005/Atom}entry at e2b540>]</samp>
<a><samp class=p>>>> </samp><kbd>tree.findall('{http://www.w3.org/2005/Atom}author')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>tree.findall('{http://www.w3.org/2005/Atom}author')</kbd> <span class=u>&#x2461;</span></a>
<samp>[]</samp>
</pre>
<ol>
@@ -368,17 +368,17 @@ mark{display:inline}
<p>There <em>is</em> a way to search for <em>descendant</em> elements, <i>i.e.</i> children, grandchildren, and any element at any nesting level.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>all_links = tree.findall('//{http://www.w3.org/2005/Atom}link')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>all_links = tree.findall('//{http://www.w3.org/2005/Atom}link')</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>all_links</kbd>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}link at e181b0>,
&lt;Element {http://www.w3.org/2005/Atom}link at e2b570>,
&lt;Element {http://www.w3.org/2005/Atom}link at e2b480>,
&lt;Element {http://www.w3.org/2005/Atom}link at e2b5a0>]</samp>
<a><samp class=p>>>> </samp><kbd>all_links[0].attrib</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>all_links[0].attrib</kbd> <span class=u>&#x2461;</span></a>
<samp>{'href': 'http://diveintomark.org/',
'type': 'text/html',
'rel': 'alternate'}</samp>
<a><samp class=p>>>> </samp><kbd>all_links[1].attrib</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>all_links[1].attrib</kbd> <span class=u>&#x2462;</span></a>
<samp>{'href': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'type': 'text/html',
'rel': 'alternate'}</samp>
@@ -400,8 +400,8 @@ mark{display:inline}
<pre class=screen>
# continuing from the previous example
<a><samp class=p>>>> </samp><kbd>it = tree.getiterator('{http://www.w3.org/2005/Atom}link')</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>next(it)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>it = tree.getiterator('{http://www.w3.org/2005/Atom}link')</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>next(it)</kbd> <span class=u>&#x2461;</span></a>
&lt;Element {http://www.w3.org/2005/Atom}link at 122f1b0>
<samp class=p>>>> </samp><kbd>next(it)</kbd>
&lt;Element {http://www.w3.org/2005/Atom}link at 122f1e0>
@@ -427,10 +427,10 @@ StopIteration</samp></pre>
<p><a href=http://codespeak.net/lxml/><code>lxml</code></a> is an open source third-party library that builds on the popular <a href=http://www.xmlsoft.org/>libxml2 parser</a>. It provides a 100% compatible ElementTree <abbr>API</abbr>, then extends it with full XPath support and a few other niceties. There are <a href=http://pypi.python.org/pypi/lxml/>installers available for Windows</a>; Linux users should always try to use distribution-specific tools like <code>yum</code> or <code>apt-get</code> to install precompiled binaries from their repositories. Otherwise you&#8217;ll need to <a href=http://codespeak.net/lxml/installation.html>install <code>lxml</code> manually</a>.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>from lxml import etree</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree = etree.parse('examples/feed.xml')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>root = tree.getroot()</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}entry')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>from lxml import etree</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree = etree.parse('examples/feed.xml')</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>root = tree.getroot()</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>root.findall('{http://www.w3.org/2005/Atom}entry')</kbd> <span class=u>&#x2463;</span></a>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}entry at e2b4e0>,
&lt;Element {http://www.w3.org/2005/Atom}entry at e2b510>,
&lt;Element {http://www.w3.org/2005/Atom}entry at e2b540>]</samp></pre>
@@ -443,7 +443,7 @@ StopIteration</samp></pre>
<p>For large <abbr>XML</abbr> documents, <code>lxml</code> is significantly faster than the built-in ElementTree libary. If you&#8217;re only using the ElementTree <abbr>API</abbr> and want to use the fastest available implementation, you can try to import <code>lxml</code> and fall back to the built-in ElementTree.
<pre><code>try:
<pre><code class=pp>try:
from lxml import etree
except ImportError:
import xml.etree.ElementTree as etree</code></pre>
@@ -451,17 +451,17 @@ except ImportError:
<p>But <code>lxml</code> is more than just a faster ElementTree. Its <code>findall()</code> method includes support for more complicated expressions.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>import lxml.etree</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>import lxml.etree</kbd> <span class=u>&#x2460;</span></a>
<samp class=p>>>> </samp><kbd>tree = lxml.etree.parse('examples/feed.xml')</kbd>
<a><samp class=p>>>> </samp><kbd>tree.findall('//{http://www.w3.org/2005/Atom}*[@href]')</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>tree.findall('//{http://www.w3.org/2005/Atom}*[@href]')</kbd> <span class=u>&#x2461;</span></a>
[&lt;Element {http://www.w3.org/2005/Atom}link at eeb8a0>,
&lt;Element {http://www.w3.org/2005/Atom}link at eeb990>,
&lt;Element {http://www.w3.org/2005/Atom}link at eeb960>,
&lt;Element {http://www.w3.org/2005/Atom}link at eeb9c0>]
<a><samp class=p>>>> </samp><kbd>tree.findall("//{http://www.w3.org/2005/Atom}*[@href='http://diveintomark.org/']")</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>tree.findall("//{http://www.w3.org/2005/Atom}*[@href='http://diveintomark.org/']")</kbd> <span class=u>&#x2462;</span></a>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}link at eeb930>]</samp>
<samp class=p>>>> </samp><kbd>NS = '{http://www.w3.org/2005/Atom}'</kbd>
<a><samp class=p>>>> </samp><kbd>tree.findall('//{NS}author[{NS}uri]'.format(NS=NS))</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>tree.findall('//{NS}author[{NS}uri]'.format(NS=NS))</kbd> <span class=u>&#x2463;</span></a>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}author at eeba80>,
&lt;Element {http://www.w3.org/2005/Atom}author at eebba0>]</samp></pre>
<ol>
@@ -476,13 +476,13 @@ except ImportError:
<pre class=screen>
<samp class=p>>>> </samp><kbd>import lxml.etree</kbd>
<samp class=p>>>> </samp><kbd>tree = lxml.etree.parse('examples/feed.xml')</kbd>
<a><samp class=p>>>> </samp><kbd>NSMAP = {'atom': 'http://www.w3.org/2005/Atom'}</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>entries = tree.xpath("//atom:category[@term='accessibility']/..",</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>NSMAP = {'atom': 'http://www.w3.org/2005/Atom'}</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>entries = tree.xpath("//atom:category[@term='accessibility']/..",</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>... </samp><kbd> namespaces=NSMAP)</kbd>
<a><samp class=p>>>> </samp><kbd>entries</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>entries</kbd> <span class=u>&#x2462;</span></a>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}entry at e2b630>]</samp>
<samp class=p>>>> </samp><kbd>entry = entries[0]</kbd>
<a><samp class=p>>>> </samp><kbd>entry.xpath('./atom:title/text()', namespaces=nsmap)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>entry.xpath('./atom:title/text()', namespaces=nsmap)</kbd> <span class=u>&#x2463;</span></a>
<samp>['Accessibility is a harsh mistress']</samp></pre>
<ol>
<li>To perform XPath queries on namespaced elements, you need to define a namespace prefix mapping. This is just a Python dictionary.
@@ -499,9 +499,9 @@ except ImportError:
<pre class=screen>
<samp class=p>>>> </samp><kbd>import xml.etree.ElementTree as etree</kbd>
<a><samp class=p>>>> </samp><kbd>new_feed = etree.Element('{http://www.w3.org/2005/Atom}feed',</kbd> <span>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> attrib={'{http://www.w3.org/XML/1998/namespace}lang': 'en'})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>print(etree.tostring(new_feed))</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>new_feed = etree.Element('{http://www.w3.org/2005/Atom}feed',</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> attrib={'{http://www.w3.org/XML/1998/namespace}lang': 'en'})</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>print(etree.tostring(new_feed))</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;ns0:feed xmlns:ns0='http://www.w3.org/2005/Atom' xml:lang='en'/></samp></pre>
<ol>
<li>To create a new element, instantiate the <code>Element</code> class. You pass the element name (namespace + local name) as the first argument. This statement creates a <code>feed</code> element in the Atom namespace. This will be our new document&#8217;s root element.
@@ -513,11 +513,11 @@ except ImportError:
<p>An <abbr>XML</abbr> parser won&#8217;t &#8220;see&#8221; any difference between an <abbr>XML</abbr> document with a default namespace and an <abbr>XML</abbr> document with a prefixed namespace. The resulting <abbr>DOM</abbr> of this serialization:
<pre class=nd><code>&lt;ns0:feed xmlns:ns0='http://www.w3.org/2005/Atom' xml:lang='en'/></code></pre>
<pre class=nd><code class=pp>&lt;ns0:feed xmlns:ns0='http://www.w3.org/2005/Atom' xml:lang='en'/></code></pre>
<p>is identical to the <abbr>DOM</abbr> of this serialization:
<pre class=nd><code>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'/></code></pre>
<pre class=nd><code class=pp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'/></code></pre>
<p>The only practical difference is that the second serialization is several characters shorter. If we were to recast our entire sample feed with a <code>ns0:</code> prefix in every start and end tag, it would add 4 characters per start tag &times; 79 tags + 4 characters for the namespace declaration itself, for a total of 316 characters. Assuming <a href=strings.html#byte-arrays>UTF-8 encoding</a>, that&#8217;s 316 extra bytes. (After gzipping, the difference drops to 21 bytes, but still, 21 bytes is 21 bytes.) Maybe that doesn&#8217;t matter to you, but for something like an Atom feed, which may be downloaded several thousand times whenever it changes, saving a few bytes per request can quickly add up.
@@ -525,11 +525,11 @@ except ImportError:
<pre class=screen>
<samp class=p>>>> </samp><kbd>import lxml.etree</kbd>
<a><samp class=p>>>> </samp><kbd>NSMAP = {None: 'http://www.w3.org/2005/Atom'}</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>new_feed = lxml.etree.Element('feed', nsmap=NSMAP)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed))</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>NSMAP = {None: 'http://www.w3.org/2005/Atom'}</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>new_feed = lxml.etree.Element('feed', nsmap=NSMAP)</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed))</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;feed xmlns='http://www.w3.org/2005/Atom'/></samp>
<a><samp class=p>>>> </samp><kbd>new_feed.set('{http://www.w3.org/XML/1998/namespace}lang', 'en')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>new_feed.set('{http://www.w3.org/XML/1998/namespace}lang', 'en')</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed))</kbd>
<samp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'/></samp></pre>
<ol>
@@ -542,14 +542,14 @@ except ImportError:
<p>Are <abbr>XML</abbr> documents limited to one element per document? No, of course not. You can easily create child elements, too.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>title = lxml.etree.SubElement(new_feed, 'title',</kbd> <span>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> attrib={'type':'html'})</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>title = lxml.etree.SubElement(new_feed, 'title',</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> attrib={'type':'html'})</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed))</kbd>
<samp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>&lt;title type='html'/>&lt;/feed></samp>
<a><samp class=p>>>> </samp><kbd>title.text = 'dive into &amp;hellip;'</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed))</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>title.text = 'dive into &amp;hellip;'</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed))</kbd> <span class=u>&#x2463;</span></a>
<samp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>&lt;title type='html'>dive into &amp;amp;hellip;&lt;/title>&lt;/feed></samp>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed, pretty_print=True))</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(new_feed, pretty_print=True))</kbd> <span class=u>&#x2464;</span></a>
<samp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
&lt;title type='html'>dive into&amp;amp;hellip;&lt;/title>
&lt;/feed></samp></pre>
@@ -574,9 +574,9 @@ except ImportError:
<p>Here is a fragment of a broken <abbr>XML</abbr> document. I&#8217;ve highlighted the wellformedness error.
<pre class=nd><code>&lt;?xml version='1.0' encoding='utf-8'?>
<pre class=nd><code class=pp>&lt;?xml version='1.0' encoding='utf-8'?>
&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
&lt;title>dive into <mark>&hellip;</mark>&lt;/title>
&lt;title>dive into <mark>&amp;hellip;</mark>&lt;/title>
...
&lt;/feed></code></pre>
@@ -600,16 +600,16 @@ lxml.etree.XMLSyntaxError: Entity 'hellip' not defined, line 3, column 28</samp>
<p>To parse this broken <abbr>XML</abbr> document, despite its wellformedness error, you need to create a custom <abbr>XML</abbr> parser.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>parser = lxml.etree.XMLParser(recover=True)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree = lxml.etree.parse('examples/feed-broken.xml', parser)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>parser.error_log</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>parser = lxml.etree.XMLParser(recover=True)</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>tree = lxml.etree.parse('examples/feed-broken.xml', parser)</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>parser.error_log</kbd> <span class=u>&#x2462;</span></a>
<samp>examples/feed-broken.xml:3:28:FATAL:PARSER:ERR_UNDECLARED_ENTITY: Entity 'hellip' not defined</samp>
<samp class=p>>>> </samp><kbd>tree.findall('{http://www.w3.org/2005/Atom}title')</kbd>
<samp>[&lt;Element {http://www.w3.org/2005/Atom}title at ead510>]</samp>
<samp class=p>>>> </samp><kbd>title = tree.findall('{http://www.w3.org/2005/Atom}title')[0]</kbd>
<a><samp class=p>>>> </samp><kbd>title.text</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>title.text</kbd> <span class=u>&#x2463;</span></a>
<samp>'dive into '</samp>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(tree.getroot()))</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>print(lxml.etree.tounicode(tree.getroot()))</kbd> <span class=u>&#x2464;</span></a>
<samp>&lt;feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'>
&lt;title>dive into &lt;/title>
.
@@ -640,7 +640,8 @@ lxml.etree.XMLSyntaxError: Entity 'hellip' not defined, line 3, column 28</samp>
<li><a href=http://codespeak.net/lxml/1.3/xpathxslt.html>XPath and <abbr>XSLT</abbr> with <code>lxml</code></a>
</ul>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next class=todo><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>
+38 -38
View File
@@ -4,6 +4,7 @@
<title>Your first Python program - Dive into Python 3</title>
<!--[if IE]><script src=j/html5.js></script><![endif]-->
<link rel=stylesheet href=dip3.css>
<link rel=stylesheet href=prettify.css>
<style>
body{counter-reset:h1 1}
table{border:1px solid #bbb;border-collapse:collapse;margin:auto}
@@ -15,17 +16,17 @@ th{text-align:left}
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#your-first-python-program>Dive Into Python 3</a> <span>&#8227;</span>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#your-first-python-program>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span title=novice>&#x2666;&#x2662;&#x2662;&#x2662;&#x2662;</span>
<h1>Your First Python Program</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Don&#8217;t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Buddhism>Ven. Henepola Gunaratana</a>
<p><span class=u>&#x275D;</span> Don&#8217;t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Buddhism>Ven. Henepola Gunaratana</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
<p class=f>Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let&#8217;s skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don&#8217;t worry about that, because you&#8217;re going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
<p class=d>[<a href=examples/humansize.py>download <code>humansize.py</code></a>]
<pre><code>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
<pre><code class=pp>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
@@ -74,16 +75,16 @@ if __name__ == '__main__':
<h2 id=declaringfunctions>Declaring Functions</h2>
<p>Python has functions like most other languages, but it does not have separate header files like <abbr>C++</abbr> or <code>interface</code>/<code>implementation</code> sections like Pascal. When you need a function, just declare it, like this:
<pre><code>def approximate_size(size, a_kilobyte_is_1024_bytes=True):</code></pre>
<pre><code class=pp>def approximate_size(size, a_kilobyte_is_1024_bytes=True):</code></pre>
<aside>When you need a function, just declare it.</aside>
<p>The keyword <code>def</code> starts the function declaration, followed by the function name, followed by the arguments in parentheses. Multiple arguments are separated with commas.
<p>Also note that the function doesn&#8217;t define a return datatype. Python functions do not specify the datatype of their return value; they don&#8217;t even specify whether or not they return a value. (In fact, every Python function returns a value; if the function ever executes a <code>return</code> statement, it will return that value, otherwise it will return <code>None</code>, the Python null value.)
<blockquote class=note>
<p><span>&#x261E;</span>In some languages, functions (that return a value) start with <code>function</code>, and subroutines (that do not return a value) start with <code>sub</code>. There are no subroutines in Python. Everything is a function, all functions return a value (even if it&#8217;s <code>None</code>), and all functions start with <code>def</code>.
<p><span class=u>&#x261E;</span>In some languages, functions (that return a value) start with <code>function</code>, and subroutines (that do not return a value) start with <code>sub</code>. There are no subroutines in Python. Everything is a function, all functions return a value (even if it&#8217;s <code>None</code>), and all functions start with <code>def</code>.
</blockquote>
<p>The <code>approximate_size()</code> function takes the two arguments &mdash; <var>size</var> and <var>a_kilobyte_is_1024_bytes</var> &mdash; but neither argument specifies a datatype. In Python, variables are never explicitly typed. Python figures out what type a variable is and keeps track of it internally.
<blockquote class='note compare java'>
<p><span>&#x261E;</span>In Java and other statically-typed languages, you must specify the datatype of the function return value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally.
<p><span class=u>&#x261E;</span>In Java and other statically-typed languages, you must specify the datatype of the function return value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally.
</blockquote>
<h3 id=optional-arguments>Optional and Named Arguments</h3>
@@ -92,16 +93,15 @@ if __name__ == '__main__':
<p>Let&#8217;s take another look at that <code>approximate_size()</code> function declaration:
<pre><code>def approximate_size(size, a_kilobyte_is_1024_bytes=True):</code></pre>
<pre><code class=pp>def approximate_size(size, a_kilobyte_is_1024_bytes=True):</code></pre>
<p>The second argument, <var>a_kilobyte_is_1024_bytes</var>, specifies a default value of <code>True</code>. This means the argument is <i>optional</i>; you can call the function without it, and Python will act as if you had called it with <code>True</code> as a second parameter.
<p>Now look at the bottom of the script:
<pre><code>
if __name__ == '__main__':
<a> print(approximate_size(1000000000000, False)) <span>&#x2460;</span></a>
<a> print(approximate_size(1000000000000)) <span>&#x2461;</span></a></code></pre>
<pre><code class=pp>if __name__ == '__main__':
<a> print(approximate_size(1000000000000, False)) <span class=u>&#x2460;</span></a>
<a> print(approximate_size(1000000000000)) <span class=u>&#x2461;</span></a></code></pre>
<ol>
<li>This calls the <code>approximate_size()</code> function with two argument. Within the <code>approximate_size()</code> function, <var>a_kilobyte_is_1024_bytes</var> will be <code>False</code>, since you explicitly passed <code>False</code> as the second argument.
<li>This calls the <code>approximate_size()</code> function with only one argument. But that&#8217;s OK, because the second argument is optional! Since the caller doesn&#8217;t specify, the second argument defaults to <code>True</code>, as defined by the function declaration.
@@ -111,16 +111,16 @@ if __name__ == '__main__':
<pre class=screen>
<samp class=p>>>> </samp><kbd>from humansize import approximate_size</kbd>
<a><samp class=p>>>> </samp><kbd>approximate_size(4000, a_kilobyte_is_1024_bytes=False)</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>approximate_size(4000, a_kilobyte_is_1024_bytes=False)</kbd> <span class=u>&#x2460;</span></a>
<samp>'4.0 KB'</samp>
<a><samp class=p>>>> </samp><kbd>approximate_size(size=4000, a_kilobyte_is_1024_bytes=False)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>approximate_size(size=4000, a_kilobyte_is_1024_bytes=False)</kbd> <span class=u>&#x2461;</span></a>
<samp>'4.0 KB'</samp>
<a><samp class=p>>>> </samp><kbd>approximate_size(a_kilobyte_is_1024_bytes=False, size=4000)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>approximate_size(a_kilobyte_is_1024_bytes=False, size=4000)</kbd> <span class=u>&#x2462;</span></a>
<samp>'4.0 KB'</samp>
<a><samp class=p>>>> </samp><kbd>approximate_size(a_kilobyte_is_1024_bytes=False, 4000)</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>approximate_size(a_kilobyte_is_1024_bytes=False, 4000)</kbd> <span class=u>&#x2463;</span></a>
<samp class=traceback> File "&lt;stdin>", line 1
SyntaxError: non-keyword arg after keyword arg</samp>
<a><samp class=p>>>> </samp><kbd>approximate_size(size=4000, False)</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>approximate_size(size=4000, False)</kbd> <span class=u>&#x2464;</span></a>
<samp class=traceback> File "&lt;stdin>", line 1
SyntaxError: non-keyword arg after keyword arg</samp></pre>
<ol>
@@ -137,7 +137,7 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<p>I won&#8217;t bore you with a long finger-wagging speech about the importance of documenting your code. Just know that code is written once but read many times, and the most important audience for your code is yourself, six months after writing it (i.e. after you&#8217;ve forgotten everything but need to fix something). Python makes it easy to write readable code, so take advantage of it. You&#8217;ll thank me in six months.
<h3 id=docstrings>Documentation Strings</h3>
<p>You can document a Python function by giving it a documentation string (<code>docstring</code> for short). In this program, the <code>approximate_size()</code> function has a <code>docstring</code>:
<pre><code>def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<pre><code class=pp>def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
@@ -151,11 +151,11 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<aside>Every function deserves a decent docstring.</aside>
<p>Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including carriage returns, leading white space, and other quote characters. You can use them anywhere, but you&#8217;ll see them most often used when defining a <code>docstring</code>.
<blockquote class='note compare perl5'>
<p><span>&#x261E;</span>Triple quotes are also an easy way to define a string with both single and double quotes, like <code>qq/.../</code> in Perl 5.
<p><span class=u>&#x261E;</span>Triple quotes are also an easy way to define a string with both single and double quotes, like <code>qq/.../</code> in Perl 5.
</blockquote>
<p>Everything between the triple quotes is the function&#8217;s <code>docstring</code>, which documents what the function does. A <code>docstring</code>, if it exists, must be the first thing defined in a function (that is, on the next line after the function declaration). You don&#8217;t technically need to give your function a <code>docstring</code>, but you always should. I know you&#8217;ve heard this in every programming class you&#8217;ve ever taken, but Python gives you an added incentive: the <code>docstring</code> is available at runtime as an attribute of the function.
<blockquote class=note>
<p><span>&#x261E;</span>Many Python <abbr>IDE</abbr>s use the <code>docstring</code> to provide context-sensitive documentation, so that when you type a function name, its <code>docstring</code> appears as a tooltip. This can be incredibly helpful, but it&#8217;s only as good as the <code>docstring</code>s you write.
<p><span class=u>&#x261E;</span>Many Python <abbr>IDE</abbr>s use the <code>docstring</code> to provide context-sensitive documentation, so that when you type a function name, its <code>docstring</code> appears as a tooltip. This can be incredibly helpful, but it&#8217;s only as good as the <code>docstring</code>s you write.
</blockquote>
<p class=a>&#x2042;
@@ -163,10 +163,10 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<p>In case you missed it, I just said that Python functions have attributes, and that those attributes are available at runtime. A function, like everything else in Python, is an object.
<p>Run the interactive Python shell and follow along:
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>import humansize</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>print(humansize.approximate_size(4096, True))</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>import humansize</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>print(humansize.approximate_size(4096, True))</kbd> <span class=u>&#x2461;</span></a>
<samp>4.0 KiB</samp>
<a><samp class=p>>>> </samp><kbd>print(humansize.approximate_size.__doc__)</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>print(humansize.approximate_size.__doc__)</kbd> <span class=u>&#x2462;</span></a>
<samp>Convert a file size to human-readable form.
Keyword arguments:
@@ -183,13 +183,13 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<li>Instead of calling the function as you would expect to, you asked for one of the function&#8217;s attributes, <code>__doc__</code>.
</ol>
<blockquote class='note compare perl5'>
<p><span>&#x261E;</span><code>import</code> in Python is like <code>require</code> in Perl. Once you <code>import</code> a Python module, you access its functions with <code><var>module</var>.<var>function</var></code>; once you <code>require</code> a Perl module, you access its functions with <code><var>module</var>::<var>function</var></code>.
<p><span class=u>&#x261E;</span><code>import</code> in Python is like <code>require</code> in Perl. Once you <code>import</code> a Python module, you access its functions with <code><var>module</var>.<var>function</var></code>; once you <code>require</code> a Perl module, you access its functions with <code><var>module</var>::<var>function</var></code>.
</blockquote>
<h3 id=importsearchpath>The <code>import</code> Search Path</h3>
<p>Before this goes any further, I want to briefly mention the library search path. Python looks in several places when you try to import a module. Specifically, it looks in all the directories defined in <code>sys.path</code>. This is just a list, and you can easily view it or modify it with standard list methods. (You&#8217;ll learn more about lists in <a href=native-datatypes.html#lists>Native Datatypes</a>.)
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>import sys</kbd> <span>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>sys.path</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>import sys</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd>sys.path</kbd> <span class=u>&#x2461;</span></a>
<samp>['',
'/usr/lib/python30.zip',
'/usr/lib/python3.0',
@@ -197,10 +197,10 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
'/usr/lib/python3.0/lib-dynload',
'/usr/lib/python3.0/dist-packages',
'/usr/local/lib/python3.0/dist-packages']</samp>
<a><samp class=p>>>> </samp><kbd>sys</kbd> <span>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd>sys</kbd> <span class=u>&#x2462;</span></a>
<samp>&lt;module 'sys' (built-in)></samp>
<a><samp class=p>>>> </samp><kbd>sys.path.insert(0, '/home/mark/py')</kbd> <span>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>sys.path</kbd> <span>&#x2464;</span></a>
<a><samp class=p>>>> </samp><kbd>sys.path.insert(0, '/home/mark/py')</kbd> <span class=u>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd>sys.path</kbd> <span class=u>&#x2464;</span></a>
<samp>['/home/mark/py',
'',
'/usr/lib/python30.zip',
@@ -225,13 +225,12 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<h2 id=indentingcode>Indenting Code</h2>
<p>Python functions have no explicit <code>begin</code> or <code>end</code>, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (<code>:</code>) and the indentation of the code itself.
<pre><code>
<a>def approximate_size(size, a_kilobyte_is_1024_bytes=True): <span>&#x2460;</span></a>
<a> if size &lt; 0: <span>&#x2461;</span></a>
<a> raise ValueError('number must be non-negative') <span>&#x2462;</span></a>
<a> <span>&#x2463;</span></a>
<pre><code class=pp><a>def approximate_size(size, a_kilobyte_is_1024_bytes=True): <span class=u>&#x2460;</span></a>
<a> if size &lt; 0: <span class=u>&#x2461;</span></a>
<a> raise ValueError('number must be non-negative') <span class=u>&#x2462;</span></a>
<a> <span class=u>&#x2463;</span></a>
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
<a> for suffix in SUFFIXES[multiple]: <span>&#x2464;</span></a>
<a> for suffix in SUFFIXES[multiple]: <span class=u>&#x2464;</span></a>
size /= multiple
if size &lt; multiple:
return '{0:.1f} {1}'.format(size, suffix)
@@ -246,19 +245,19 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
</ol>
<p>After some initial protests and several snide analogies to Fortran, you will make peace with this and start seeing its benefits. One major benefit is that all Python programs look similar, since indentation is a language requirement and not a matter of style. This makes it easier to read and understand other people&#8217;s Python code.
<blockquote class='note compare java'>
<p><span>&#x261E;</span>Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. <abbr>C++</abbr> and Java use semicolons to separate statements and curly braces to separate code blocks.
<p><span class=u>&#x261E;</span>Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. <abbr>C++</abbr> and Java use semicolons to separate statements and curly braces to separate code blocks.
</blockquote>
<p class=a>&#x2042;
<h2 id=runningscripts>Running Scripts</h2>
<aside>Everything in Python is an object.</aside>
<p>Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them, by including a special block of code that executes when you run the Python file on the command line. Take the last few lines of <code>humansize.py</code>:
<pre><code>
<pre><code class=pp>
if __name__ == '__main__':
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))</code></pre>
<blockquote class='note compare clang'>
<p><span>&#x261E;</span>Like <abbr>C</abbr>, Python uses <code>==</code> for comparison and <code>=</code> for assignment. Unlike <abbr>C</abbr>, Python does not support in-line assignment, so there&#8217;s no chance of accidentally assigning the value you thought you were comparing.
<p><span class=u>&#x261E;</span>Like <abbr>C</abbr>, Python uses <code>==</code> for comparison and <code>=</code> for assignment. Unlike <abbr>C</abbr>, Python does not support in-line assignment, so there&#8217;s no chance of accidentally assigning the value you thought you were comparing.
</blockquote>
<p>So what makes this <code>if</code> statement special? Well, modules are objects, and all modules have a built-in attribute <code>__name__</code>. A module&#8217;s <code>__name__</code> depends on how you&#8217;re using the module. If you <code>import</code> the module, then <code>__name__</code> is the module&#8217;s filename, without a directory path or file extension.
<pre class=screen>
@@ -280,7 +279,8 @@ if __name__ == '__main__':
<li><a href=http://www.python.org/dev/peps/pep-0008/>PEP 8: Style Guide for Python Code</a> discusses good indentation style.
<li><a href=http://docs.python.org/3.0/reference/><cite>Python Reference Manual</cite></a> explains what it means to say that <a href=http://docs.python.org/3.0/reference/datamodel.html#objects-values-and-types>everything in Python is an object</a>, because some people are <a href=http://www.douglasadams.com/dna/pedants.html>pedants</a> and like to discuss that sort of thing at great length.
</ul>
<p class=v><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next href=native-datatypes.html title='onward to &#8220;Native Datatypes&#8221;'><span>&#x261E;</span></a>
<p class=v><a rel=prev class=todo><span class=u>&#x261C;</span></a> <a rel=next href=native-datatypes.html title='onward to &#8220;Native Datatypes&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>