iterators and generators chapter

--HG--
rename : humansize.py => examples/humansize.py
rename : roman1.py => examples/roman1.py
rename : roman2.py => examples/roman2.py
rename : roman3.py => examples/roman3.py
rename : roman4.py => examples/roman4.py
rename : roman5.py => examples/roman5.py
rename : roman6.py => examples/roman6.py
rename : roman7.py => examples/roman7.py
rename : roman8.py => examples/roman8.py
rename : romantest1.py => examples/romantest1.py
rename : romantest2.py => examples/romantest2.py
rename : romantest3.py => examples/romantest3.py
rename : romantest4.py => examples/romantest4.py
rename : romantest5.py => examples/romantest5.py
rename : romantest6.py => examples/romantest6.py
rename : romantest7.py => examples/romantest7.py
rename : romantest8.py => examples/romantest8.py
This commit is contained in:
Mark Pilgrim
2009-03-27 01:43:33 -05:00
parent 18b0144075
commit 933dc9459a
52 changed files with 2247 additions and 695 deletions
+2 -2
View File
@@ -14,9 +14,9 @@ mark{background:#ff8;font-weight:bold}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#case-study-porting-chardet-to-python-3>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Case study: porting <code>chardet</code> to Python 3</h1>
<h1>Case Study: Porting <code>chardet</code> to Python 3</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Words, words. They&#8217;re all we have to go on. <span>&#x275E;</span><br>&mdash; <cite>Rosencrantz and Guildenstern are Dead</cite>
<p><span>&#x275D;</span> Words, words. They&#8217;re all we have to go on. <span>&#x275E;</span><br>&mdash; <a href=http://www.imdb.com/title/tt0100519/quotes>Rosencrantz and Guildenstern are Dead</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
-463
View File
@@ -7541,469 +7541,6 @@ if __name__ == "__main__":
<div class=footnote>
<p><sup>[<a name="ftn.d0e36079" href="#d0e36079">8</a>] </sup>Again, I should point out that <code>map</code> can take a list, a tuple, or any object that acts like a sequence. See previous footnote about <code>filter</code>.
<div class=chapter>
<h2 id="plural">Chapter 17. Dynamic functions</h2>
<h2 id="plural.divein">17.1. Diving in</h2>
<p>I want to talk about plural nouns. Also, functions that return other functions, advanced regular expressions, and generators.
Generators are new in Python 2.3. But first, let's talk about how to make plural nouns.
<p>If you haven't read <a href="#re" title="Chapter 7. Regular Expressions">Chapter 7, <i>Regular Expressions</i></a>, now would be a good time. This chapter assumes you understand the basics of regular expressions, and quickly descends into
more advanced uses.
<p>English is a schizophrenic language that borrows from a lot of other languages, and the rules for making singular nouns into
plural nouns are varied and complex. There are rules, and then there are exceptions to those rules, and then there are exceptions
to the exceptions.
<p>If you grew up in an English-speaking country or learned English in a formal school setting, you're probably familiar with
the basic rules:
<div class=orderedlist>
<ol>
<li>If a word ends in S, X, or Z, add ES. &#8220;Bass&#8221; becomes &#8220;basses&#8221;, &#8220;fax&#8221; becomes &#8220;faxes&#8221;, and &#8220;waltz&#8221; becomes &#8220;waltzes&#8221;.
<li>If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What's a noisy H? One that gets combined with
other letters to make a sound that you can hear. So &#8220;coach&#8221; becomes &#8220;coaches&#8221; and &#8220;rash&#8221; becomes &#8220;rashes&#8221;, because you can hear the CH and SH sounds when you say them. But &#8220;cheetah&#8221; becomes &#8220;cheetahs&#8221;, because the H is silent.
<li>If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else,
just add S. So &#8220;vacancy&#8221; becomes &#8220;vacancies&#8221;, but &#8220;day&#8221; becomes &#8220;days&#8221;.
<li>If all else fails, just add S and hope for the best.
</ol>
<p>(I know, there are a lot of exceptions. &#8220;Man&#8221; becomes &#8220;men&#8221; and &#8220;woman&#8221; becomes &#8220;women&#8221;, but &#8220;human&#8221; becomes &#8220;humans&#8221;. &#8220;Mouse&#8221; becomes &#8220;mice&#8221; and &#8220;louse&#8221; becomes &#8220;lice&#8221;, but &#8220;house&#8221; becomes &#8220;houses&#8221;. &#8220;Knife&#8221; becomes &#8220;knives&#8221; and &#8220;wife&#8221; becomes &#8220;wives&#8221;, but &#8220;lowlife&#8221; becomes &#8220;lowlifes&#8221;. And don't even get me started on words that are their own plural, like &#8220;sheep&#8221;, &#8220;deer&#8221;, and &#8220;haiku&#8221;.)
<p>Other languages are, of course, completely different.
<p>Let's design a module that pluralizes nouns. Start with just English nouns, and just these four rules, but keep in mind that
you'll inevitably need to add more rules, and you may eventually need to add more languages.
<h2 id="plural.stage1">17.2. <code>plural.py</code>, stage 1</h2>
<p>So you're looking at words, which at least in English are strings of characters. And you have rules that say you need to
find different combinations of characters, and then do different things to them. This sounds like a job for regular expressions.
<div class=example><h3>Example 17.1. <code>plural1.py</code></h3><pre><code>
import re
def plural(noun):
if re.search('[sxz]$', noun): <span>&#x2460;</span>
return re.sub('$', 'es', noun) <span>&#x2461;</span>
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'
</pre><div class=calloutlist>
<ol>
<li>OK, this is a regular expression, but it uses a syntax you didn't see in <a href="#re" title="Chapter 7. Regular Expressions">Chapter 7, <i>Regular Expressions</i></a>. The square brackets mean &#8220;match exactly one of these characters&#8221;. So <code>[sxz]</code> means &#8220;<code>s</code>, or <code>x</code>, or <code>z</code>&#8221;, but only one of them. The <code>$</code> should be familiar; it matches the end of string. So you're checking to see if <var>noun</var> ends with <code>s</code>, <code>x</code>, or <code>z</code>.
<li>This <code>re.sub</code> function performs regular expression-based string substitutions. Let's look at it in more detail.
<div class=example><h3>Example 17.2. Introducing <code>re.sub</code></h3><pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<samp class=p>>>> </samp><kbd>re.search('[abc]', 'Mark')</kbd> <span>&#x2460;</span>
&lt;_sre.SRE_Match object at 0x001C1FA8>
<samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'Mark')</kbd> <span>&#x2461;</span>
'Mork'
<samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'rock')</kbd> <span>&#x2462;</span>
'rook'
<samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'caps')</kbd> <span>&#x2463;</span>
'oops'
</pre><div class=calloutlist>
<ol>
<li>Does the string <code>Mark</code> contain <code>a</code>, <code>b</code>, or <code>c</code>? Yes, it contains <code>a</code>.
<li>OK, now find <code>a</code>, <code>b</code>, or <code>c</code>, and replace it with <code>o</code>. <code>Mark</code> becomes <code>Mork</code>.
<li>The same function turns <code>rock</code> into <code>rook</code>.
<li>You might think this would turn <code>caps</code> into <code>oaps</code>, but it doesn't. <code>re.sub</code> replaces <em>all</em> of the matches, not just the first one. So this regular expression turns <code>caps</code> into <code>oops</code>, because both the <code>c</code> and the <code>a</code> get turned into <code>o</code>.
<div class=example><h3>Example 17.3. Back to <code>plural1.py</code></h3><pre><code>
import re
def plural(noun):
if re.search('[sxz]$', noun):
return re.sub('$', 'es', noun) <span>&#x2460;</span>
elif re.search('[^aeioudgkprt]h$', noun): <span>&#x2461;</span>
return re.sub('$', 'es', noun) <span>&#x2462;</span>
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'
</pre><div class=calloutlist>
<ol>
<li>Back to the <code>plural</code> function. What are you doing? You're replacing the end of string with <code>es</code>. In other words, adding <code>es</code> to the string. You could accomplish the same thing with string concatenation, for example <code>noun + 'es'</code>, but I'm using regular expressions for everything, for consistency, for reasons that will become clear later in the chapter.
<li>Look closely, this is another new variation. The <code>^</code> as the first character inside the square brackets means something special: negation. <code>[^abc]</code> means &#8220;any single character <em>except</em> <code>a</code>, <code>b</code>, or <code>c</code>&#8221;. So <code>[^aeioudgkprt]</code> means any character except <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, <code>u</code>, <code>d</code>, <code>g</code>, <code>k</code>, <code>p</code>, <code>r</code>, or <code>t</code>. Then that character needs to be followed by <code>h</code>, followed by end of string. You're looking for words that end in H where the H can be heard.
<li>Same pattern here: match words that end in Y, where the character before the Y is <em>not</em> <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>. You're looking for words that end in Y that sounds like I.
<div class=example><h3>Example 17.4. More on negation regular expressions</h3><pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'vacancy')</kbd> <span>&#x2460;</span>
&lt;_sre.SRE_Match object at 0x001C1FA8>
<samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'boy')</kbd> <span>&#x2461;</span>
<samp class=p>>>> </samp><kbd></kbd>
<samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'day')</kbd>
<samp class=p>>>> </samp><kbd></kbd>
<samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'pita')</kbd> <span>&#x2462;</span>
<samp class=p>>>> </samp><kbd></kbd>
</pre><div class=calloutlist>
<ol>
<li><code>vacancy</code> matches this regular expression, because it ends in <code>cy</code>, and <code>c</code> is not <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>.
<li><code>boy</code> does not match, because it ends in <code>oy</code>, and you specifically said that the character before the <code>y</code> could not be <code>o</code>. <code>day</code> does not match, because it ends in <code>ay</code>.
<li><code>pita</code> does not match, because it does not end in <code>y</code>.
<div class=example><h3>Example 17.5. More on <code>re.sub</code></h3><pre class=screen>
<samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'vacancy')</kbd> <span>&#x2460;</span>
'vacancies'
<samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'agency')</kbd>
'agencies'
<samp class=p>>>> </samp><kbd>re.sub('([^aeiou])y$', r'\1ies', 'vacancy')</kbd> <span>&#x2461;</span>
'vacancies'
</pre><div class=calloutlist>
<ol>
<li>This regular expression turns <code>vacancy</code> into <code>vacancies</code> and <code>agency</code> into <code>agencies</code>, which is what you wanted. Note that it would also turn <code>boy</code> into <code>boies</code>, but that will never happen in the function because you did that <code>re.search</code> first to find out whether you should do this <code>re.sub</code>.
<li>Just in passing, I want to point out that it is possible to combine these two regular expressions (one to find out if the
rule applies, and another to actually apply it) into a single regular expression. Here's what that would look like. Most
of it should look familiar: you're using a remembered group, which you learned in <a href="#re.phone" title="7.6. Case study: Parsing Phone Numbers">Section 7.6, &#8220;Case study: Parsing Phone Numbers&#8221;</a>, to remember the character before the <code>y</code>. Then in the substitution string, you use a new syntax, <code>\1</code>, which means &#8220;hey, that first group you remembered? put it here&#8221;. In this case, you remember the <code>c</code> before the <code>y</code>, and then when you do the substitution, you substitute <code>c</code> in place of <code>c</code>, and <code>ies</code> in place of <code>y</code>. (If you have more than one remembered group, you can use <code>\2</code> and <code>\3</code> and so on.)
<p>Regular expression substitutions are extremely powerful, and the <code>\1</code> syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder
to read, and it doesn't directly map to the way you first described the pluralizing rules. You originally laid out rules
like &#8220;if the word ends in S, X, or Z, then add ES&#8221;. And if you look at this function, you have two lines of code that say &#8220;if the word ends in S, X, or Z, then add ES&#8221;. It doesn't get much more direct than that.
<h2 id="plural.stage2">17.3. <code>plural.py</code>, stage 2</h2>
<p>Now you're going to add a level of abstraction. You started by defining a list of rules: if this, then do that, otherwise
go to the next rule. Let's temporarily complicate part of the program so you can simplify another part.
<div class=example><h3>Example 17.6. <code>plural2.py</code></h3><pre><code>
import re
def match_sxz(noun):
return re.search('[sxz]$', noun)
def apply_sxz(noun):
return re.sub('$', 'es', noun)
def match_h(noun):
return re.search('[^aeioudgkprt]h$', noun)
def apply_h(noun):
return re.sub('$', 'es', noun)
def match_y(noun):
return re.search('[^aeiou]y$', noun)
def apply_y(noun):
return re.sub('y$', 'ies', noun)
def match_default(noun):
return 1
def apply_default(noun):
return noun + 's'
rules = ((match_sxz, apply_sxz),
(match_h, apply_h),
(match_y, apply_y),
(match_default, apply_default)
) <span>&#x2460;</span>
def plural(noun):
for matchesRule, applyRule in rules: <span>&#x2461;</span>
if matchesRule(noun):<span>&#x2462;</span>
return applyRule(noun) <span>&#x2463;</span>
</pre><div class=calloutlist>
<ol>
<li>This version looks more complicated (it's certainly longer), but it does exactly the same thing: try to match four different
rules, in order, and apply the appropriate regular expression when a match is found. The difference is that each individual
match and apply rule is defined in its own function, and the functions are then listed in this <var>rules</var> variable, which is a tuple of tuples.
<li>Using a <code>for</code> loop, you can pull out the match and apply rules two at a time (one match, one apply) from the <var>rules</var> tuple. On the first iteration of the <code>for</code> loop, <var>matchesRule</var> will get <code>match_sxz</code>, and <var>applyRule</var> will get <code>apply_sxz</code>. On the second iteration (assuming you get that far), <var>matchesRule</var> will be assigned <code>match_h</code>, and <var>applyRule</var> will be assigned <code>apply_h</code>.
<li>Remember that <a href="#odbchelper.objects" title="2.4. Everything Is an Object">everything in Python is an object</a>, including functions. <var>rules</var> contains actual functions; not names of functions, but actual functions. When they get assigned in the <code>for</code> loop, then <var>matchesRule</var> and <var>applyRule</var> are actual functions that you can call. So on the first iteration of the <code>for</code> loop, this is equivalent to calling <code>matches_sxz(noun)</code>.
<li>On the first iteration of the <code>for</code> loop, this is equivalent to calling <code>apply_sxz(noun)</code>, and so forth.
<p>If this additional level of abstraction is confusing, try unrolling the function to see the equivalence. This <code>for</code> loop is equivalent to the following:
<div class=example><h3>Example 17.7. Unrolling the <code>plural</code> function</h3><pre><code>
def plural(noun):
if match_sxz(noun):
return apply_sxz(noun)
if match_h(noun):
return apply_h(noun)
if match_y(noun):
return apply_y(noun)
if match_default(noun):
return apply_default(noun)
</pre><p>The benefit here is that that <code>plural</code> function is now simplified. It takes a list of rules, defined elsewhere, and iterates through them in a generic fashion.
Get a match rule; does it match? Then call the apply rule. The rules could be defined anywhere, in any way. The <code>plural</code> function doesn't care.
<p>Now, was adding this level of abstraction worth it? Well, not yet. Let's consider what it would take to add a new rule to
the function. Well, in the previous example, it would require adding an <code>if</code> statement to the <code>plural</code> function. In this example, it would require adding two functions, <code>match_foo</code> and <code>apply_foo</code>, and then updating the <var>rules</var> list to specify where in the order the new match and apply functions should be called relative to the other rules.
<p>This is really just a stepping stone to the next section. Let's move on.
<h2 id="plural.stage3">17.4. <code>plural.py</code>, stage 3</h2>
<p>Defining separate named functions for each match and apply rule isn't really necessary. You never call them directly; you
define them in the <var>rules</var> list and call them through there. Let's streamline the rules definition by anonymizing those functions.
<div class=example><h3>Example 17.8. <code>plural3.py</code></h3><pre><code>
import re
rules = \
(
(
lambda word: re.search('[sxz]$', word),
lambda word: re.sub('$', 'es', word)
),
(
lambda word: re.search('[^aeioudgkprt]h$', word),
lambda word: re.sub('$', 'es', word)
),
(
lambda word: re.search('[^aeiou]y$', word),
lambda word: re.sub('y$', 'ies', word)
),
(
lambda word: re.search('$', word),
lambda word: re.sub('$', 's', word)
)
) <span>&#x2460;</span>
def plural(noun):
for matchesRule, applyRule in rules: <span>&#x2461;</span>
if matchesRule(noun):
return applyRule(noun)
</pre><div class=calloutlist>
<ol>
<li>This is the same set of rules as you defined in stage 2. The only difference is that instead of defining named functions
like <code>match_sxz</code> and <code>apply_sxz</code>, you have &#8220;inlined&#8221; those function definitions directly into the <var>rules</var> list itself, using <a href="#apihelper.lambda" title="4.7. Using lambda Functions">lambda functions</a>.
<li>Note that the <code>plural</code> function hasn't changed at all. It iterates through a set of rule functions, checks the first rule, and if it returns a
true value, calls the second rule and returns the value. Same as above, word for word. The only difference is that the rule
functions were defined inline, anonymously, using lambda functions. But the <code>plural</code> function doesn't care how they were defined; it just gets a list of rules and blindly works through them.
<p>Now to add a new rule, all you need to do is define the functions directly in the <var>rules</var> list itself: one match rule, and one apply rule. But defining the rule functions inline like this makes it very clear that
you have some unnecessary duplication here. You have four pairs of functions, and they all follow the same pattern. The
match function is a single call to <code>re.search</code>, and the apply function is a single call to <code>re.sub</code>. Let's factor out these similarities.
<h2 id="plural.stage4">17.5. <code>plural.py</code>, stage 4</h2>
<p>Let's factor out the duplication in the code so that defining new rules can be easier.
<div class=example><h3 id="plural.stage4.example.1">Example 17.9. <code>plural4.py</code></h3><pre><code>
import re
def buildMatchAndApplyFunctions((pattern, search, replace)):
matchFunction = lambda word: re.search(pattern, word) <span>&#x2460;</span>
applyFunction = lambda word: re.sub(search, replace, word) <span>&#x2461;</span>
return (matchFunction, applyFunction) <span>&#x2462;</span>
</pre><div class=calloutlist>
<ol>
<li><code>buildMatchAndApplyFunctions</code> is a function that builds other functions dynamically. It takes <var>pattern</var>, <var>search</var> and <var>replace</var> (actually it takes a tuple, but more on that in a minute), and you can build the match function using the <code>lambda</code> syntax to be a function that takes one parameter (<var>word</var>) and calls <code>re.search</code> with the <var>pattern</var> that was passed to the <code>buildMatchAndApplyFunctions</code> function, and the <var>word</var> that was passed to the match function you're building. Whoa.
<li>Building the apply function works the same way. The apply function is a function that takes one parameter, and calls <code>re.sub</code> with the <var>search</var> and <var>replace</var> parameters that were passed to the <code>buildMatchAndApplyFunctions</code> function, and the <var>word</var> that was passed to the apply function you're building. This technique of using the values of outside parameters within a
dynamic function is called <em>closures</em>. You're essentially defining constants within the apply function you're building: it takes one parameter (<var>word</var>), but it then acts on that plus two other values (<var>search</var> and <var>replace</var>) which were set when you defined the apply function.
<li>Finally, the <code>buildMatchAndApplyFunctions</code> function returns a tuple of two values: the two functions you just created. The constants you defined within those functions
(<var>pattern</var> within <var>matchFunction</var>, and <var>search</var> and <var>replace</var> within <var>applyFunction</var>) stay with those functions, even after you return from <code>buildMatchAndApplyFunctions</code>. That's insanely cool.
<p>If this is incredibly confusing (and it should be, this is weird stuff), it may become clearer when you see how to use it.
<div class=example><h3>Example 17.10. <code>plural4.py</code> continued</h3><pre><code>
patterns = \
(
('[sxz]$', '$', 'es'),
('[^aeioudgkprt]h$', '$', 'es'),
('(qu|[^aeiou])y$', 'y$', 'ies'),
('$', '$', 's')
) <span>&#x2460;</span>
rules = map(buildMatchAndApplyFunctions, patterns) <span>&#x2461;</span>
</pre><div class=calloutlist>
<ol>
<li>Our pluralization rules are now defined as a series of strings (not functions). The first string is the regular expression
that you would use in <code>re.search</code> to see if this rule matches; the second and third are the search and replace expressions you would use in <code>re.sub</code> to actually apply the rule to turn a noun into its plural.
<li>This line is magic. It takes the list of strings in <var>patterns</var> and turns them into a list of functions. How? By mapping the strings to the <code>buildMatchAndApplyFunctions</code> function, which just happens to take three strings as parameters and return a tuple of two functions. This means that <var>rules</var> ends up being exactly the same as the previous example: a list of tuples, where each tuple is a pair of functions, where
the first function is the match function that calls <code>re.search</code>, and the second function is the apply function that calls <code>re.sub</code>.
<p>I swear I am not making this up: <var>rules</var> ends up with exactly the same list of functions as the previous example. Unroll the <var>rules</var> definition, and you'll get this:
<div class=example><h3>Example 17.11. Unrolling the rules definition</h3><pre><code>
rules = \
(
(
lambda word: re.search('[sxz]$', word),
lambda word: re.sub('$', 'es', word)
),
(
lambda word: re.search('[^aeioudgkprt]h$', word),
lambda word: re.sub('$', 'es', word)
),
(
lambda word: re.search('[^aeiou]y$', word),
lambda word: re.sub('y$', 'ies', word)
),
(
lambda word: re.search('$', word),
lambda word: re.sub('$', 's', word)
)
)
</pre><div class=example><h3 id="plural.finishing.up">Example 17.12. <code>plural4.py</code>, finishing up</h3><pre><code>
def plural(noun):
for matchesRule, applyRule in rules: <span>&#x2460;</span>
if matchesRule(noun):
return applyRule(noun)
</pre><div class=calloutlist>
<ol>
<li>Since the <var>rules</var> list is the same as the previous example, it should come as no surprise that the <code>plural</code> function hasn't changed. Remember, it's completely generic; it takes a list of rule functions and calls them in order.
It doesn't care how the rules are defined. In <a href="#plural.stage2" title="17.3. plural.py, stage 2">stage 2</a>, they were defined as seperate named functions. In <a href="#plural.stage3" title="17.4. plural.py, stage 3">stage 3</a>, they were defined as anonymous <code>lambda</code> functions. Now in stage 4, they are built dynamically by mapping the <code>buildMatchAndApplyFunctions</code> function onto a list of raw strings. Doesn't matter; the <code>plural</code> function still works the same way.
<p>Just in case that wasn't mind-blowing enough, I must confess that there was a subtlety in the definition of <code>buildMatchAndApplyFunctions</code> that I skipped over. Let's go back and take another look.
<div class=example><h3>Example 17.13. Another look at <code>buildMatchAndApplyFunctions</code></h3><pre><code>
def buildMatchAndApplyFunctions((pattern, search, replace)): <span>&#x2460;</span>
</pre><div class=calloutlist>
<ol>
<li>Notice the double parentheses? This function doesn't actually take three parameters; it actually takes one parameter, a tuple
of three elements. But the tuple is expanded when the function is called, and the three elements of the tuple are each assigned
to different variables: <var>pattern</var>, <var>search</var>, and <var>replace</var>. Confused yet? Let's see it in action.
<div class=example><h3>Example 17.14. Expanding tuples when calling functions</h3><pre class=screen>
<samp class=p>>>> </samp><kbd>def foo((a, b, c)):</kbd>
<samp class=p>... </samp>print c
<samp class=p>... </samp>print b
<samp class=p>... </samp>print a
<samp class=p>>>> </samp><kbd>parameters = ('apple', 'bear', 'catnap')</kbd>
<samp class=p>>>> </samp><kbd>foo(parameters)</kbd> <span>&#x2460;</span>
catnap
bear
apple
</pre><div class=calloutlist>
<ol>
<li>The proper way to call the function <code>foo</code> is with a tuple of three elements. When the function is called, the elements are assigned to different local variables within
<code>foo</code>.
<p>Now let's go back and see why this auto-tuple-expansion trick was necessary. <var>patterns</var> was a list of tuples, and each tuple had three elements. When you called <code>map(buildMatchAndApplyFunctions, patterns)</code>, that means that <code>buildMatchAndApplyFunctions</code> is <em>not</em> getting called with three parameters. Using <code>map</code> to map a single list onto a function always calls the function with a single parameter: each element of the list. In the
case of <var>patterns</var>, each element of the list is a tuple, so <code>buildMatchAndApplyFunctions</code> always gets called with the tuple, and you use the auto-tuple-expansion trick in the definition of <code>buildMatchAndApplyFunctions</code> to assign the elements of that tuple to named variables that you can work with.
<h2 id="plural.stage5">17.6. <code>plural.py</code>, stage 5</h2>
<p>You've factored out all the duplicate code and added enough abstractions so that the pluralization rules are defined in a
list of strings. The next logical step is to take these strings and put them in a separate file, where they can be maintained
separately from the code that uses them.
<p>First, let's create a text file that contains the rules you want. No fancy data structures, just space- (or tab-)delimited
strings in three columns. You'll call it <code>rules.en</code>; &#8220;en&#8221; stands for English. These are the rules for pluralizing English nouns. You could add other rule files for other languages
later.
<div class=example><h3>Example 17.15. <code>rules.en</code></h3><pre><code>
[sxz]$$ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s
</pre><p>Now let's see how you can use this rules file.
<div class=example><h3>Example 17.16. <code>plural5.py</code></h3><pre><code>
import re
import string
def buildRule((pattern, search, replace)):
return lambda word: re.search(pattern, word) and re.sub(search, replace, word) <span>&#x2460;</span>
def plural(noun, language='en'): <span>&#x2461;</span>
lines = file('rules.%s' % language).readlines() <span>&#x2462;</span>
patterns = map(string.split, lines) <span>&#x2463;</span>
rules = map(buildRule, patterns) <span>&#x2464;</span>
for rule in rules:
result = rule(noun) <span>&#x2465;</span>
if result: return result
</pre><div class=calloutlist>
<ol>
<li>You're still using the closures technique here (building a function dynamically that uses variables defined outside the function),
but now you've combined the separate match and apply functions into one. (The reason for this change will become clear in
the next section.) This will let you accomplish the same thing as having two functions, but you'll need to call it differently,
as you'll see in a minute.
<li>Our <code>plural</code> function now takes an optional second parameter, <var>language</var>, which defaults to <code>en</code>.
<li>You use the <var>language</var> parameter to construct a filename, then open the file and read the contents into a list. If <var>language</var> is <code>en</code>, then you'll open the <code>rules.en</code> file, read the entire thing, break it up by carriage returns, and return a list. Each line of the file will be one element
in the list.
<li>As you saw, each line in the file really has three values, but they're separated by whitespace (tabs or spaces, it makes no
difference). Mapping the <code>string.split</code> function onto this list will create a new list where each element is a tuple of three strings. So a line like <code>[sxz]$ $ es</code> will be broken up into the tuple <code>('[sxz]$', '$', 'es')</code>. This means that <var>patterns</var> will end up as a list of tuples, just like you hard-coded it in <a href="#plural.stage4" title="17.5. plural.py, stage 4">stage 4</a>.
<li>If <var>patterns</var> is a list of tuples, then <var>rules</var> will be a list of the functions created dynamically by each call to <code>buildRule</code>. Calling <code>buildRule(('[sxz]$', '$', 'es'))</code> returns a function that takes a single parameter, <var>word</var>. When this returned function is called, it will execute <code>re.search('[sxz]$', word) and re.sub('$', 'es', word)</code>.
<li>Because you're now building a combined match-and-apply function, you need to call it differently. Just call the function,
and if it returns something, then that's the plural; if it returns nothing (<code>None</code>), then the rule didn't match and you need to try another rule.
<p>So the improvement here is that you've completely separated the pluralization rules into an external file. Not only can the
file be maintained separately from the code, but you've set up a naming scheme where the same <code>plural</code> function can use different rule files, based on the <var>language</var> parameter.
<p>The downside here is that you're reading that file every time you call the <code>plural</code> function. I thought I could get through this entire book without using the phrase &#8220;left as an exercise for the reader&#8221;, but here you go: building a caching mechanism for the language-specific rule files that auto-refreshes itself if the rule
files change between calls <em>is left as an exercise for the reader</em>. Have fun.
<h2 id="plural.stage6">17.7. <code>plural.py</code>, stage 6</h2>
<p>Now you're ready to talk about generators.
<div class=example><h3>Example 17.17. <code>plural6.py</code></h3><pre><code>
import re
def rules(language):
for line in file('rules.%s' % language):
pattern, search, replace = line.split()
yield lambda word: re.search(pattern, word) and re.sub(search, replace, word)
def plural(noun, language='en'):
for applyRule in rules(language):
result = applyRule(noun)
if result: return result
</pre><p>This uses a technique called generators, which I'm not even going to try to explain until you look at a simpler example first.
<div class=example><h3 id="plural.introducing.generators">Example 17.18. Introducing generators</h3><pre class=screen>
<samp class=p>>>> </samp><kbd>def make_counter(x):</kbd>
<samp class=p>... </samp>print 'entering make_counter'
<samp class=p>... </samp>while 1:
<samp class=p>... </samp> yield x <span>&#x2460;</span>
<samp class=p>... </samp> print 'incrementing x'
<samp class=p>... </samp> x = x + 1
<samp class=p>... </samp>
<samp class=p>>>> </samp><kbd>counter = make_counter(2)</kbd> <span>&#x2461;</span>
<samp class=p>>>> </samp><kbd>counter</kbd> <span>&#x2462;</span>
&lt;generator object at 0x001C9C10>
<samp class=p>>>> </samp><kbd>counter.next()</kbd> <span>&#x2463;</span>
<samp>entering make_counter
2</samp>
<samp class=p>>>> </samp><kbd>counter.next()</kbd> <span>&#x2464;</span>
<samp>incrementing x
3</samp>
<samp class=p>>>> </samp><kbd>counter.next()</kbd> <span>&#x2465;</span>
<samp>incrementing x
4</span>
</pre><div class=calloutlist>
<ol>
<li>The presence of the <code>yield</code> keyword in <code>make_counter</code> means that this is not a normal function. It is a special kind of function which generates values one at a time. You can
think of it as a resumable function. Calling it will return a generator that can be used to generate successive values of
<var>x</var>.
<li>To create an instance of the <code>make_counter</code> generator, just call it like any other function. Note that this does not actually execute the function code. You can tell
this because the first line of <code>make_counter</code> is a <code>print</code> statement, but nothing has been printed yet.
<li>The <code>make_counter</code> function returns a generator object.
<li>The first time you call the <code>next()</code> method on the generator object, it executes the code in <code>make_counter</code> up to the first <code>yield</code> statement, and then returns the value that was yielded. In this case, that will be <code>2</code>, because you originally created the generator by calling <code>make_counter(2)</code>.
<li>Repeatedly calling <code>next()</code> on the generator object <em>resumes where you left off</em> and continues until you hit the next <code>yield</code> statement. The next line of code waiting to be executed is the <code>print</code> statement that prints <code>incrementing x</code>, and then after that the <code>x = x + 1</code> statement that actually increments it. Then you loop through the <code>while</code> loop again, and the first thing you do is <code>yield x</code>, which returns the current value of <var>x</var> (now 3).
<li>The second time you call <code>counter.next()</code>, you do all the same things again, but this time <var>x</var> is now <code>4</code>. And so forth. Since <code>make_counter</code> sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing <var>x</var> and spitting out values. But let's look at more productive uses of generators instead.
<div class=example><h3 id="plural.fib.example">Example 17.19. Using generators instead of recursion</h3><pre><code>
def fibonacci(max):
a, b = 0, 1 <span>&#x2460;</span>
while a &lt; max:
yield a <span>&#x2461;</span>
a, b = b, a+b <span>&#x2462;</span>
</pre><div class=calloutlist>
<ol>
<li>The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with
<code>0</code> and <code>1</code>, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: <var>a</var> starts at <code>0</code>, and <var>b</var> starts at <code>1</code>.
<li><var>a</var> is the current number in the sequence, so yield it.
<li><var>b</var> is the next number in the sequence, so assign that to <var>a</var>, but also calculate the next value (<code>a+b</code>) and assign that to <var>b</var> for later use. Note that this happens in parallel; if <var>a</var> is <code>3</code> and <var>b</var> is <code>5</code>, then <code>a, b = b, a+b</code> will set <var>a</var> to <code>5</code> (the previous value of <var>b</var>) and <var>b</var> to <code>8</code> (the sum of the previous values of <var>a</var> and <var>b</var>).
<p>So you have a function that spits out successive Fibonacci numbers. Sure, you could do that with recursion, but this way
is easier to read. Also, it works well with <code>for</code> loops.
<div class=example><h3>Example 17.20. Generators in <code>for</code> loops</h3><pre class=screen>
<samp class=p>>>> </samp><kbd>for n in fibonacci(1000):</kbd> <span>&#x2460;</span>
<samp class=p>... </samp>print n, <span>&#x2461;</span>
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
</pre><div class=calloutlist>
<ol>
<li>You can use a generator like <code>fibonacci</code> in a <code>for</code> loop directly. The <code>for</code> loop will create the generator object and successively call the <code>next()</code> method to get values to assign to the <code>for</code> loop index variable (<var>n</var>).
<li>Each time through the <code>for</code> loop, <var>n</var> gets a new value from the <code>yield</code> statement in <code>fibonacci</code>, and all you do is print it out. Once <code>fibonacci</code> runs out of numbers (<var>a</var> gets bigger than <var>max</var>, which in this case is <code>1000</code>), then the <code>for</code> loop exits gracefully.
<p>OK, let's go back to the <code>plural</code> function and see how you're using this.
<div class=example><h3>Example 17.21. Generators that generate dynamic functions</h3><pre><code>
def rules(language):
for line in file('rules.%s' % language): <span>&#x2460;</span>
pattern, search, replace = line.split() <span>&#x2461;</span>
yield lambda word: re.search(pattern, word) and re.sub(search, replace, word) <span>&#x2462;</span>
def plural(noun, language='en'):
for applyRule in rules(language): <span>&#x2463;</span>
result = applyRule(noun)
if result: return result
</pre><div class=calloutlist>
<ol>
<li><code>for line in file(...)</code> is a common idiom for reading lines from a file, one line at a time. It works because <em><code>file</code> actually returns a generator</em> whose <code>next()</code> method returns the next line of the file. That is so insanely cool, I wet myself just thinking about it.
<li>No magic here. Remember that the lines of the rules file have three values separated by whitespace, so <code>line.split()</code> returns a tuple of 3 values, and you assign those values to 3 local variables.
<li><em>And then you yield.</em> What do you yield? A function, built dynamically with <code>lambda</code>, that is actually a closure (it uses the local variables <var>pattern</var>, <var>search</var>, and <var>replace</var> as constants). In other words, <code>rules</code> is a generator that spits out rule functions.
<li>Since <code>rules</code> is a generator, you can use it directly in a <code>for</code> loop. The first time through the <code>for</code> loop, you will call the <code>rules</code> function, which will open the rules file, read the first line out of it, dynamically build a function that matches and applies
the first rule defined in the rules file, and yields the dynamically built function. The second time through the <code>for</code> loop, you will pick up where you left off in <code>rules</code> (which was in the middle of the <code>for line in file(...)</code> loop), read the second line of the rules file, dynamically build another function that matches and applies the second rule
defined in the rules file, and yields it. And so forth.
<p>What have you gained over <a href="#plural.stage5" title="17.6. plural.py, stage 5">stage 5</a>? In stage 5, you read the entire rules file and built a list of all the possible rules before you even tried the first one.
Now with generators, you can do everything lazily: you open the first and read the first rule and create a function to try
it, but if that works you don't ever read the rest of the file or create any other functions.
<div class=itemizedlist>
<h3>Further reading</h3>
<ul>
<li><a href="http://www.python.org/peps/pep-0255.html">PEP 255</a> defines generators.
<li><a href="http://www.activestate.com/ASPN/Python/Cookbook/" title="growing archive of annotated code samples">Python Cookbook</a> has <a href="http://www.google.com/search?q=generators+cookbook+site:aspn.activestate.com">many more examples of generators</a>.
</ul>
<h2 id="plural.summary">17.8. Summary</h2>
<p>You talked about several different advanced techniques in this chapter. Not all of them are appropriate for every situation.
<p>You should now be comfortable with all of these techniques:
<div class=itemizedlist>
<ul>
<li>Performing <a href="#plural.stage1" title="17.2. plural.py, stage 1">string substitution with regular expressions</a>.
<li>Treating <a href="#plural.stage2" title="17.3. plural.py, stage 2">functions as objects</a>, storing them in lists, assigning them to variables, and calling them through those variables.
<li>Building <a href="#plural.stage3" title="17.4. plural.py, stage 3">dynamic functions with <code>lambda</code></a>.
<li>Building <a href="#plural.stage4" title="17.5. plural.py, stage 4">closures</a>, dynamic functions that contain surrounding variables as constants.
<li>Building <a href="#plural.stage6" title="17.7. plural.py, stage 6">generators</a>, resumable functions that perform incremental logic and return different values each time you call them.
</ul>
<p>Adding abstractions, building functions dynamically, building closures, and using generators can all make your code simpler,
more readable, and more flexible. But they can also end up making it more difficult to debug later. It's up to you to find
the right balance between simplicity and power.
<div class=chapter>
<h2 id="soundex">Chapter 18. Performance Tuning</h2>
<p>Performance tuning is a many-splendored thing. Just because Python is an interpreted language doesn't mean you shouldn't worry about code optimization. But don't worry about it <em>too</em> much.
<h2 id="soundex.divein">18.1. Diving in</h2>
+3 -7
View File
@@ -2,8 +2,6 @@
"Dive Into Python 3" stylesheet
vv-- begin MIT open source license --vv
Copyright (c) 2009, Mark Pilgrim, All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
@@ -27,8 +25,6 @@ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
^^-- end MIT open source license --^^
Classname Legend
@@ -56,14 +52,14 @@ Acknowledgements & Inspirations
/* typography */
body,.w a{font:medium 'Gill Sans','Gill Sans MT',Corbel,Helvetica,Jara,'Nimbus Sans L',sans-serif;line-height:1.75;word-spacing:0.1em}
pre,kbd,code,samp{font-family:Consolas,'Andale Mono',Monaco,'Liberation Mono','Bitstream Vera Sans Mono','DejaVu Sans Mono',monospace;font-size:medium;line-height:1.75;word-spacing:0}
pre,kbd,samp,code,var{font-family:Consolas,'Andale Mono',Monaco,'Liberation Mono','Bitstream Vera Sans Mono','DejaVu Sans Mono',monospace;font-size:medium;line-height:1.75;word-spacing:0}
span{font:medium 'Arial Unicode MS',FreeSerif,OpenSymbol,'DejaVu Sans',sans-serif}
pre span{font-family:'Arial Unicode MS','DejaVu Sans',FreeSerif,OpenSymbol,sans-serif}
.baa{font:oblique large Constantia,Baskerville,Palatino,'Palatino Linotype','URW Palladio L',serif}
abbr{font-variant:small-caps;text-transform:lowercase;letter-spacing:1px}
abbr{font-variant:small-caps;text-transform:lowercase;letter-spacing:0.1em}
.q{text-align:right;font-style:oblique}
.q span{font-size:large}
.note{margin-left:4.94em}
.note{margin:3.5em 4.94em}
.note span{display:block;float:left;font-size:xx-large;line-height:0.875;margin:0 0.22em 0 -1.22em}
.c,pre,.w,.w a,.d{line-height:2.154}
.f:first-letter{float:left;color:#ddd;padding:0.11em 4px 0 0;font:normal 4em/0.68 serif}
+2 -2
View File
@@ -13,8 +13,8 @@ $(document).ready(function() {
});
$("pre.code, pre.screen").each(function(i) {
this.id = "autopre" + i;
$(this).wrapInner('<div class="b"></div>');
$(this).prepend('<div class="w">[<a class="toggle" href="javascript:toggleCodeBlock(\'' + this.id + '\')">' + HS['visible'] + '</a>] [<a href="javascript:plainTextOnClick(\'' + this.id + '\')">open in new window</a>]</div>');
$(this).wrapInner('<div class=b></div>');
$(this).prepend('<div class=w>[<a class=toggle href="javascript:toggleCodeBlock(\'' + this.id + '\')">' + HS['visible'] + '</a>] [<a href="javascript:plainTextOnClick(\'' + this.id + '\')">open in new window</a>]</div>');
$(this).prev("p.d").each(function(i) {
$(this).next("pre").find("div.w").append(" " + $(this).html());
+30
View File
@@ -0,0 +1,30 @@
"""Fibonacci generator"""
def fib(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a + b
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+39
View File
@@ -0,0 +1,39 @@
"""Fibonacci iterator"""
class fib:
def __init__(self, max):
self.max = max
def __iter__(self):
self.a, self.b = 0, 1
return self
def __next__(self):
fib = self.a
if fib > self.max:
raise StopIteration
self.a, self.b = self.b, self.a + self.b
return fib
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+48
View File
@@ -0,0 +1,48 @@
"""Pluralize English nouns (stage 1)
Command line usage:
$ python3 plural.py noun
nouns
"""
import re
def plural(noun):
if re.search('[sxz]$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'
if __name__ == '__main__':
import sys
if sys.argv[1:]:
print(plural(sys.argv[1]))
else:
print(__doc__)
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+73
View File
@@ -0,0 +1,73 @@
"""Pluralize English nouns (stage 2)
Command line usage:
$ python plural2.py noun
nouns
"""
import re
def match_sxz(noun):
return re.search('[sxz]$', noun)
def apply_sxz(noun):
return re.sub('$', 'es', noun)
def match_h(noun):
return re.search('[^aeioudgkprt]h$', noun)
def apply_h(noun):
return re.sub('$', 'es', noun)
def match_y(noun):
return re.search('[^aeiou]y$', noun)
def apply_y(noun):
return re.sub('y$', 'ies', noun)
def match_default(noun):
return True
def apply_default(noun):
return noun + 's'
rules = ((match_sxz, apply_sxz),
(match_h, apply_h),
(match_y, apply_y),
(match_default, apply_default)
)
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
if __name__ == '__main__':
import sys
if sys.argv[1:]:
print(plural(sys.argv[1]))
else:
print(__doc__)
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+60
View File
@@ -0,0 +1,60 @@
"""Pluralize English nouns (stage 3)
Command line usage:
$ python plural3.py noun
nouns
"""
import re
def build_match_and_apply_functions(pattern, search, replace):
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
patterns = \
[
['[sxz]$', '$', 'es'],
['[^aeioudgkprt]h$', '$', 'es'],
['(qu|[^aeiou])y$', 'y$', 'ies'],
['$', '$', 's']
]
rules = [build_match_and_apply_functions(pattern, search, replace)
for (pattern, search, replace) in patterns]
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
if __name__ == '__main__':
import sys
if sys.argv[1:]:
print(plural(sys.argv[1]))
else:
print(__doc__)
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+4
View File
@@ -0,0 +1,4 @@
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s
+60
View File
@@ -0,0 +1,60 @@
"""Pluralize English nouns (stage 4)
Command line usage:
$ python plural4.py noun
nouns
"""
import re
def build_match_and_apply_functions(pattern, search, replace):
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
rules = []
pattern_file = open('plural4-rules.txt')
try:
for line in pattern_file:
pattern, search, replace = line.split(None, 3)
rules.append(build_match_and_apply_functions(
pattern, search, replace))
finally:
pattern_file.close()
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
if __name__ == '__main__':
import sys
if sys.argv[1:]:
print(plural(sys.argv[1]))
else:
print(__doc__)
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+4
View File
@@ -0,0 +1,4 @@
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s
+55
View File
@@ -0,0 +1,55 @@
"""Pluralize English nouns (stage 5)
Command line usage:
$ python plural5.py noun
nouns
"""
import re
def build_match_and_apply_functions(pattern, search, replace):
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
def rules():
for line in open('plural5-rules.txt'):
pattern, search, replace = line.split(None, 3)
yield build_match_and_apply_functions(pattern, search, replace)
def plural(noun):
for matches_rule, apply_rule in rules():
if matches_rule(noun):
return apply_rule(noun)
if __name__ == '__main__':
import sys
if sys.argv[1:]:
print(plural(sys.argv[1]))
else:
print(__doc__)
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+19
View File
@@ -0,0 +1,19 @@
^(sheep|deer|fish|moose|aircraft|series|haiku)$ ($) \1
[ml]ouse$ ouse$ ice
child$ $ ren
booth$ $ s
foot$ oot$ eet
ooth$ ooth$ eeth
l[eo]af$ af$ aves
sis$ sis$ ses
^(hu|ro)man$ $ s
man$ man$ men
^lowlife$ $ s
ife$ ife$ ives
eau$ $ x
^[dp]elf$ $ s
lf$ lf$ lves
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
(qu|[^aeiou])y$ y$ ies
$ $ s
+80
View File
@@ -0,0 +1,80 @@
"""Pluralize English nouns (stage 6)
Command line usage:
$ python plural6.py noun
nouns
"""
import re
def build_match_and_apply_functions(pattern, search, replace):
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
class LazyRules:
def __init__(self):
self.pattern_file = open('plural6-rules.txt')
self.cache = []
def __iter__(self):
self.cache_index = 0
return self
def __next__(self):
self.cache_index += 1
if len(self.cache) >= self.cache_index:
return self.cache[self.cache_index - 1]
if self.pattern_file.closed:
raise StopIteration
line = self.pattern_file.readline()
if not line:
self.pattern_file.close()
raise StopIteration
pattern, search, replace = line.split(None, 3)
funcs = build_match_and_apply_functions(
pattern, search, replace)
self.cache.append(funcs)
return funcs
rules = LazyRules()
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
if __name__ == '__main__':
import sys
if sys.argv[1:]:
print(plural(sys.argv[1]))
else:
print(__doc__)
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+81
View File
@@ -0,0 +1,81 @@
"""Unit test for plural1.py"""
import plural1
import unittest
class KnownValues(unittest.TestCase):
def test_sxz(self):
"words ending in S, X, and Z"
nouns = {
'bass': 'basses',
'bus': 'buses',
'walrus': 'walruses',
'box': 'boxes',
'fax': 'faxes',
'suffix': 'suffixes',
'mailbox': 'mailboxes',
'buzz': 'buzzes',
'waltz': 'waltzes'
}
for singular, plural in nouns.items():
self.assertEqual(plural1.plural(singular), plural)
def test_h(self):
"words ending in H"
nouns = {
'coach': 'coaches',
'glitch': 'glitches',
'rash': 'rashes',
'watch': 'watches',
'cheetah': 'cheetahs',
'cough': 'coughs'
}
for singular, plural in nouns.items():
self.assertEqual(plural1.plural(singular), plural)
def test_y(self):
"words ending in Y"
nouns = {
'utility': 'utilities',
'vacancy': 'vacancies',
'boy': 'boys',
'day': 'days'
}
for singular, plural in nouns.items():
self.assertEqual(plural1.plural(singular), plural)
def test_default(self):
"unexceptional words"
nouns = {
'papaya': 'papayas',
'whip': 'whips',
'palimpsest': 'palimpsests'
}
for singular, plural in nouns.items():
self.assertEqual(plural1.plural(singular), plural)
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+73
View File
@@ -0,0 +1,73 @@
"""Unit test for plural2.py
This program is part of "Dive Into Python", a free Python book for
experienced programmers. Visit http://diveintopython.org/ for the
latest version.
"""
__author__ = "Mark Pilgrim (mark@diveintopython.org)"
__version__ = "$Revision: 1.2 $"
__date__ = "$Date: 2004/03/17 14:34:40 $"
__copyright__ = "Copyright (c) 2004 Mark Pilgrim"
__license__ = "Python"
from plural2 import plural
import unittest, new
class KnownValues(unittest.TestCase):
nouns = {'bass': 'basses',
'bus': 'buses',
'walrus': 'walruses',
'box': 'boxes',
'fax': 'faxes',
'suffix': 'suffixes',
'mailbox': 'mailboxes',
'buzz': 'buzzes',
'waltz': 'waltzes',
'coach': 'coaches',
'glitch': 'glitches',
'rash': 'rashes',
'watch': 'watches',
'cheetah': 'cheetahs',
'cough': 'coughs',
'utility': 'utilities',
'vacancy': 'vacancies',
'boy': 'boys',
'day': 'days',
'computer': 'computers',
'rock': 'rocks',
'paper': 'papers',
}
for noun, pluralnoun in KnownValues.nouns.items():
func = lambda self, noun=noun, pluralnoun=pluralnoun: \
KnownValues.failUnlessEqual(self, plural(noun), pluralnoun)
func.__doc__ = "%s --> %s" % (noun, pluralnoun)
instanceMethod = new.instancemethod(func, None, KnownValues)
setattr(KnownValues, "test_%s" % noun, instanceMethod)
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+81
View File
@@ -0,0 +1,81 @@
"""Unit test for plural1.py"""
import plural3
import unittest
class KnownValues(unittest.TestCase):
def test_sxz(self):
"words ending in S, X, and Z"
nouns = {
'bass': 'basses',
'bus': 'buses',
'walrus': 'walruses',
'box': 'boxes',
'fax': 'faxes',
'suffix': 'suffixes',
'mailbox': 'mailboxes',
'buzz': 'buzzes',
'waltz': 'waltzes'
}
for singular, plural in nouns.items():
self.assertEqual(plural3.plural(singular), plural)
def test_h(self):
"words ending in H"
nouns = {
'coach': 'coaches',
'glitch': 'glitches',
'rash': 'rashes',
'watch': 'watches',
'cheetah': 'cheetahs',
'cough': 'coughs'
}
for singular, plural in nouns.items():
self.assertEqual(plural3.plural(singular), plural)
def test_y(self):
"words ending in Y"
nouns = {
'utility': 'utilities',
'vacancy': 'vacancies',
'boy': 'boys',
'day': 'days'
}
for singular, plural in nouns.items():
self.assertEqual(plural3.plural(singular), plural)
def test_default(self):
"unexceptional words"
nouns = {
'papaya': 'papayas',
'whip': 'whips',
'palimpsest': 'palimpsests'
}
for singular, plural in nouns.items():
self.assertEqual(plural3.plural(singular), plural)
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+81
View File
@@ -0,0 +1,81 @@
"""Unit test for plural1.py"""
import plural4
import unittest
class KnownValues(unittest.TestCase):
def test_sxz(self):
"words ending in S, X, and Z"
nouns = {
'bass': 'basses',
'bus': 'buses',
'walrus': 'walruses',
'box': 'boxes',
'fax': 'faxes',
'suffix': 'suffixes',
'mailbox': 'mailboxes',
'buzz': 'buzzes',
'waltz': 'waltzes'
}
for singular, plural in nouns.items():
self.assertEqual(plural4.plural(singular), plural)
def test_h(self):
"words ending in H"
nouns = {
'coach': 'coaches',
'glitch': 'glitches',
'rash': 'rashes',
'watch': 'watches',
'cheetah': 'cheetahs',
'cough': 'coughs'
}
for singular, plural in nouns.items():
self.assertEqual(plural4.plural(singular), plural)
def test_y(self):
"words ending in Y"
nouns = {
'utility': 'utilities',
'vacancy': 'vacancies',
'boy': 'boys',
'day': 'days'
}
for singular, plural in nouns.items():
self.assertEqual(plural4.plural(singular), plural)
def test_default(self):
"unexceptional words"
nouns = {
'papaya': 'papayas',
'whip': 'whips',
'palimpsest': 'palimpsests'
}
for singular, plural in nouns.items():
self.assertEqual(plural4.plural(singular), plural)
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+84
View File
@@ -0,0 +1,84 @@
"""Unit test for plural5.py"""
import plural5
import unittest
class KnownValues(unittest.TestCase):
def test_sxz(self):
"words ending in S, X, and Z"
nouns = {
'bass': 'basses',
'bus': 'buses',
'walrus': 'walruses',
'box': 'boxes',
'fax': 'faxes',
'suffix': 'suffixes',
'mailbox': 'mailboxes',
'buzz': 'buzzes',
'waltz': 'waltzes'
}
for singular, plural in nouns.items():
self.assertEqual(plural5.plural(singular), plural)
def test_h(self):
"words ending in H"
nouns = {
'coach': 'coaches',
'glitch': 'glitches',
'rash': 'rashes',
'watch': 'watches',
'cheetah': 'cheetahs',
'cough': 'coughs'
}
for singular, plural in nouns.items():
self.assertEqual(plural5.plural(singular), plural)
def test_y(self):
"words ending in Y"
nouns = {
'utility': 'utilities',
'vacancy': 'vacancies',
'boy': 'boys',
'day': 'days'
}
for singular, plural in nouns.items():
self.assertEqual(plural5.plural(singular), plural)
def test_default(self):
"unexceptional words"
nouns = {
'papaya': 'papayas',
'whip': 'whips',
'palimpsest': 'palimpsests'
}
for singular, plural in nouns.items():
self.assertEqual(plural5.plural(singular), plural)
if __name__ == "__main__":
unittest.main()
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+189
View File
@@ -0,0 +1,189 @@
"""Unit test for plural6.py"""
import plural6
import unittest
class KnownValues(unittest.TestCase):
def test_sxz(self):
"words ending in S, X, and Z"
nouns = {
'bass': 'basses',
'bus': 'buses',
'walrus': 'walruses',
'box': 'boxes',
'fax': 'faxes',
'suffix': 'suffixes',
'mailbox': 'mailboxes',
'buzz': 'buzzes',
'waltz': 'waltzes'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_h(self):
"words ending in H"
nouns = {
'coach': 'coaches',
'glitch': 'glitches',
'rash': 'rashes',
'watch': 'watches',
'cheetah': 'cheetahs',
'cough': 'coughs'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_y(self):
"words ending in Y"
nouns = {
'utility': 'utilities',
'vacancy': 'vacancies',
'boy': 'boys',
'day': 'days'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_ouce(self):
"words ending in OUSE"
nouns = {
'mouse': 'mice',
'louse': 'lice'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_child(self):
"special case: child"
nouns = {
'child': 'children'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_oot(self):
"special case: foot"
nouns = {
'foot': 'feet'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_ooth(self):
"words ending in OOTH"
nouns = {
'booth': 'booths',
'tooth': 'teeth'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_f_ves(self):
"words ending in F that become VES"
nouns = {
'leaf': 'leaves',
'loaf': 'loaves'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_sis(self):
"words ending in SIS"
nouns = {
'thesis': 'theses'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_man(self):
"words ending in MAN"
nouns = {
'man': 'men',
'mailman': 'mailmen',
'human': 'humans',
'roman': 'romans'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_ife(self):
"words ending in IFE"
nouns = {
'knife': 'knives',
'wife': 'wives',
'lowlife': 'lowlifes'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_eau(self):
"words ending in EAU"
nouns = {
'tableau': 'tableaux'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_elf(self):
"words ending in ELF"
nouns = {
'elf': 'elves',
'shelf': 'shelves',
'delf': 'delfs',
'pelf': 'pelfs'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_same(self):
"words that are their own plural"
nouns = {
'sheep': 'sheep',
'deer': 'deer',
'fish': 'fish',
'moose': 'moose',
'aircraft': 'aircraft',
'series': 'series',
'haiku': 'haiku'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
def test_default(self):
"unexceptional words"
nouns = {
'papaya': 'papayas',
'whip': 'whips',
'palimpsest': 'palimpsests'
}
for singular, plural in nouns.items():
self.assertEqual(plural6.plural(singular), plural)
if __name__ == "__main__":
unittest.main()
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+52
View File
@@ -0,0 +1,52 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+57
View File
@@ -0,0 +1,57 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError):
pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if n > 3999:
raise OutOfRangeError("number out of range (must be less than 3999)")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+56
View File
@@ -0,0 +1,56 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 4000):
raise OutOfRangeError("number out of range (must be 0..3999)")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+59
View File
@@ -0,0 +1,59 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 4000):
raise OutOfRangeError("number out of range (must be 0..3999)")
if not isinstance(n, int):
raise NotIntegerError("non-integers can not be converted")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -44,3 +44,26 @@ def from_roman(s):
result += integer
index += len(numeral)
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -62,3 +62,26 @@ def from_roman(s):
result += integer
index += len(numeral)
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -64,3 +64,26 @@ def from_roman(s):
result += integer
index += len(numeral)
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -66,3 +66,26 @@ def from_roman(s):
result += integer
index += len(numeral)
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -74,3 +74,26 @@ class KnownValues(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -79,3 +79,26 @@ class ToRomanBadInput(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -87,3 +87,26 @@ class ToRomanBadInput(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -91,3 +91,26 @@ class ToRomanBadInput(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -105,3 +105,26 @@ class SanityCheck(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -122,3 +122,26 @@ class SanityCheck(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -126,3 +126,26 @@ class SanityCheck(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+23
View File
@@ -130,3 +130,26 @@ class SanityCheck(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+7 -7
View File
@@ -23,27 +23,27 @@ h1:before{content:""}
<ol start=0>
<li class=todo>Installing Python
<li><a href=your-first-python-program.html>Your first Python program</a>
<li><a href=native-datatypes.html>Native datatypes</a>
<li><a href=your-first-python-program.html>Your First Python Program</a>
<li><a href=native-datatypes.html>Native Datatypes</a>
<li><a href=strings.html>Strings</a>
<li><a href=regular-expressions.html>Regular expressions</a>
<li><a href=regular-expressions.html>Regular Expressions</a>
<li class=todo>The power of introspection
<li class=todo>Objects and object-orientation
<li><a href=unit-testing.html>Unit testing</a>
<li><a href=unit-testing.html>Unit Testing</a>
<li class=todo>Test-first programming
<li class=todo>Refactoring your code
<li class=todo>Files
<li><a href=iterators-and-generators.html>Iterators <i class=baa>&amp;</i> Generators</a>
<li class=todo>HTML processing
<li class=todo>XML processing
<li class=todo>Web services
<li class=todo>Dynamic functions
<li class=todo>Metaclasses
<li class=todo>Performance tuning
<li class=todo>Packaging Python libraries
<li class=todo>Creating graphics with the Python Imaging Library
<li class=todo>Where to go from here
<li><a href=case-study-porting-chardet-to-python-3.html>Case study: porting <code>chardet</code> to Python 3</a>
<li id=a><a href=porting-code-to-python-3-with-2to3.html>Porting code to Python 3 with <code>2to3</code></a>
<li><a href=case-study-porting-chardet-to-python-3.html>Case Study: Porting <code>chardet</code> to Python 3</a>
<li id=a><a href=porting-code-to-python-3-with-2to3.html>Porting Code to Python 3 with <code>2to3</code></a>
</ol>
<p>There is a <a href=http://hg.diveintopython3.org/>changelog</a>, a <a type=application/atom+xml href=http://hg.diveintopython3.org/atom-log>feed</a>, and <a href="http://www.reddit.com/search?q=%22Dive+Into+Python+3%22&amp;sort=new">discussion on Reddit</a>. During development, you can download the book by cloning the Mercurial repository:
+569
View File
@@ -0,0 +1,569 @@
<!DOCTYPE html>
<head>
<meta charset=utf-8>
<title>Iterators &amp; generators - Dive into Python 3</title>
<link rel=stylesheet type=text/css href=dip3.css>
<style>
body{counter-reset:h1 11}
</style>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#iterators-and-generators>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Iterators <i class=baa>&amp;</i> Generators</h1>
<blockquote class=q>
<p><span>&#x275D;</span> East is East, and West is West, and never the twain shall meet. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Rudyard_Kipling>Rudyard_Kipling</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
<p class=f>Let&#8217;s talk about plural nouns. Also, functions that return other functions, advanced regular expressions, iterators, and generators. But first, let&#8217;s talk about how to make plural nouns. (If you haven&#8217;t read <a href=regular-expressions.html>the chapter on regular expressions</a>, now would be a good time. This chapter assumes you understand the basics of regular expressions, and quickly descends into more advanced uses.)
<p>English is a schizophrenic language that borrows from a lot of other languages, and the rules for making singular nouns into plural nouns are varied and complex. There are rules, and then there are exceptions to those rules, and then there are exceptions to the exceptions.
<p>If you grew up in an English-speaking country or learned English in a formal school setting, you&#8217;re probably familiar with the basic rules:
<ul>
<li>If a word ends in S, X, or Z, add ES. <i>Bass</i> becomes <i>basses</i>, <i>fax</i> becomes <i>faxes</i>, and <i>waltz</i> becomes <i>waltzes</i>.
<li>If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What&#8217;s a noisy H? One that gets combined with other letters to make a sound that you can hear. So <i>coach</i> becomes <i>coaches</i> and <i>rash</i> becomes <i>rashes</i>, because you can hear the CH and SH sounds when you say them. But <i>cheetah</i> becomes <i>cheetahs</i>, because the H is silent.
<li>If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. So <i>vacancy</i> becomes <i>vacancies</i>, but <i>day</i> becomes <i>days</i>.
<li>If all else fails, just add S and hope for the best.
</ul>
<p>(I know, there are a lot of exceptions. <i>Man</i> becomes <i>men</i> and <i>woman</i> becomes <i>women</i>, but <i>human</i> becomes <i>humans</i>. <i>Mouse</i> becomes <i>mice</i> and <i>louse</i> becomes <i>lice</i>, but <i>house</i> becomes <i>houses</i>. <i>Knife</i> becomes <i>knives</i> and <i>wife</i> becomes <i>wives</i>, but <i>lowlife</i> becomes <i>lowlifes</i>. And don&#8217;t even get me started on words that are their own plural, like <i>sheep</i>, <i>deer</i>, and <i>haiku</i>.)
<p>Other languages, of course, are completely different.
<p>Let&#8217;s design a Python library that automatically pluralizes English nouns. We&#8217;ll start just these four rules, but keep in mind that you&#8217;ll inevitably need to add more.
<h2 id=i-know>I know, let&#8217;s use regular expressions!</h2>
<p>So you&#8217;re looking at words, which, at least in English, means you&#8217;re looking at strings of characters. You have rules that say you need to find different combinations of characters, then do different things to them. This sounds like a job for regular expressions!
<p class=d>[<a href=examples/plural1.py>download <code>plural1.py</code></a>]
<pre><code>import re
def plural(noun):
<a> if re.search('[sxz]$', noun): <span>&#x2460;</span></a>
<a> return re.sub('$', 'es', noun) <span>&#x2461;</span></a>
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'</code></pre>
<ol>
<li>This is a regular expression, but it uses a syntax you didn&#8217;t see in <a href=regular-expressions.html><i>Regular Expressions</i></a>. The square brackets mean &#8220;match exactly one of these characters.&#8221; So <code>[sxz]</code> means &#8220;<code>s</code>, or <code>x</code>, or <code>z</code>&#8221;, but only one of them. The <code>$</code> should be familiar; it matches the end of string. Combined, this regular expression is tests whether <var>noun</var> ends with <code>s</code>, <code>x</code>, or <code>z</code>.
<li>This <code>re.sub</code> function performs regular expression-based string substitutions.
</ol>
<p>Let&#8217;s look at regular expression substitutions in more detail.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>re.search('[abc]', 'Mark')</kbd> <span>&#x2460;</span></a>
&lt;_sre.SRE_Match object at 0x001C1FA8>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'Mark')</kbd> <span>&#x2461;</span></a>
<samp>'Mork'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'rock')</kbd> <span>&#x2462;</span></a>
<samp>'rook'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('[abc]', 'o', 'caps')</kbd> <span>&#x2463;</span></a>
<samp>'oops'</samp></pre>
<ol>
<li>Does the string <code>Mark</code> contain <code>a</code>, <code>b</code>, or <code>c</code>? Yes, it contains <code>a</code>.
<li>OK, now find <code>a</code>, <code>b</code>, or <code>c</code>, and replace it with <code>o</code>. <code>Mark</code> becomes <code>Mork</code>.
<li>The same function turns <code>rock</code> into <code>rook</code>.
<li>You might think this would turn <code>caps</code> into <code>oaps</code>, but it doesn&#8217;t. <code>re.sub</code> replaces <em>all</em> of the matches, not just the first one. So this regular expression turns <code>caps</code> into <code>oops</code>, because both the <code>c</code> and the <code>a</code> get turned into <code>o</code>.
</ol>
<p>And now, back to the <code>plural()</code> function&hellip;
<pre><code>def plural(noun):
if re.search('[sxz]$', noun):
<a> return re.sub('$', 'es', noun) <span>&#x2460;</span></a>
<a> elif re.search('[^aeioudgkprt]h$', noun): <span>&#x2461;</span></a>
<a> return re.sub('$', 'es', noun) <span>&#x2462;</span></a>
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'</code></pre>
<ol>
<li>Here, you&#8217;re replacing the end of the string (matched by <code>$</code>) with the string <code>es</code>. In other words, adding <code>es</code> to the string. You could accomplish the same thing with string concatenation, for example <code>noun + 'es'</code>, but I chose to use regular expressions for each rule, for reasons that will become clear later in the chapter.
<li>Look closely, this is another new variation. The <code>^</code> as the first character inside the square brackets means something special: negation. <code>[^abc]</code> means &#8220;any single character <em>except</em> <code>a</code>, <code>b</code>, or <code>c</code>&#8221;. So <code>[^aeioudgkprt]</code> means any character except <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, <code>u</code>, <code>d</code>, <code>g</code>, <code>k</code>, <code>p</code>, <code>r</code>, or <code>t</code>. Then that character needs to be followed by <code>h</code>, followed by end of string. You&#8217;re looking for words that end in H where the H can be heard.
<li>Same pattern here: match words that end in Y, where the character before the Y is <em>not</em> <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>. You&#8217;re looking for words that end in Y that sounds like I.
</ol>
<p>Let&#8217;s look at negation regular expressions in more detail.
<pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'vacancy')</kbd> <span>&#x2460;</span></a>
&lt;_sre.SRE_Match object at 0x001C1FA8>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'boy')</kbd> <span>&#x2461;</span></a>
<samp class=p>>>> </samp>
<samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'day')</kbd>
<samp class=p>>>> </samp>
<a><samp class=p>>>> </samp><kbd>re.search('[^aeiou]y$', 'pita')</kbd> <span>&#x2462;</span></a>
<samp class=p>>>> </samp></pre>
<ol>
<li><code>vacancy</code> matches this regular expression, because it ends in <code>cy</code>, and <code>c</code> is not <code>a</code>, <code>e</code>, <code>i</code>, <code>o</code>, or <code>u</code>.
<li><code>boy</code> does not match, because it ends in <code>oy</code>, and you specifically said that the character before the <code>y</code> could not be <code>o</code>. <code>day</code> does not match, because it ends in <code>ay</code>.
<li><code>pita</code> does not match, because it does not end in <code>y</code>.
</ol>
<pre class=screen>
<a><samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'vacancy')</kbd> <span>&#x2460;</span></a>
<samp>'vacancies'</samp>
<samp class=p>>>> </samp><kbd>re.sub('y$', 'ies', 'agency')</kbd>
<samp>'agencies'</samp>
<a><samp class=p>>>> </samp><kbd>re.sub('([^aeiou])y$', r'\1ies', 'vacancy')</kbd> <span>&#x2461;</span></a>
<samp>'vacancies'</samp></pre>
<ol>
<li>This regular expression turns <code>vacancy</code> into <code>vacancies</code> and <code>agency</code> into <code>agencies</code>, which is what you wanted. Note that it would also turn <code>boy</code> into <code>boies</code>, but that will never happen in the function because you did that <code>re.search</code> first to find out whether you should do this <code>re.sub</code>.
<li>Just in passing, I want to point out that it is possible to combine these two regular expressions (one to find out if the rule applies, and another to actually apply it) into a single regular expression. Here&#8217;s what that would look like. Most of it should look familiar: you&#8217;re using a remembered group, which you learned in <a href=regular-expressions.html#phonenumbers>Case study: Parsing Phone Numbers</a>. The group is used to remember the character before the letter <code>y</code>. Then in the substitution string, you use a new syntax, <code>\1</code>, which means &#8220;hey, that first group you remembered? put it right here.&#8221; In this case, you remember the <code>c</code> before the <code>y</code>; when you do the substitution, you substitute <code>c</code> in place of <code>c</code>, and <code>ies</code> in place of <code>y</code>. (If you have more than one remembered group, you can use <code>\2</code> and <code>\3</code> and so on.)
</ol>
<p>Regular expression substitutions are extremely powerful, and the <code>\1</code> syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder to read, and it doesn&#8217;t directly map to the way you first described the pluralizing rules. You originally laid out rules like &#8220;if the word ends in S, X, or Z, then add ES&#8221;. If you look at this function, you have two lines of code that say &#8220;if the word ends in S, X, or Z, then add ES&#8221;. It doesn&#8217;t get much more direct than that.
<h2 id=a-list-of-functions>A list of functions</h2>
<p>Now you&#8217;re going to add a level of abstraction. You started by defining a list of rules: if this, do that, otherwise go to the next rule. Let&#8217;s temporarily complicate part of the program so you can simplify another part.
<p class=d>[<a href=examples/plural2.py>download <code>plural2.py</code></a>]
<pre><code>import re
def match_sxz(noun):
return re.search('[sxz]$', noun)
def apply_sxz(noun):
return re.sub('$', 'es', noun)
def match_h(noun):
return re.search('[^aeioudgkprt]h$', noun)
def apply_h(noun):
return re.sub('$', 'es', noun)
<a>def match_y(noun): <span>&#x2460;</span></a>
return re.search('[^aeiou]y$', noun)
<a>def apply_y(noun): <span>&#x2461;</span></a>
return re.sub('y$', 'ies', noun)
def match_default(noun):
return True
def apply_default(noun):
return noun + 's'
<a>rules = ((match_sxz, apply_sxz), <span>&#x2462;</span></a>
(match_h, apply_h),
(match_y, apply_y),
(match_default, apply_default)
)
def plural(noun):
<a> for matches_rule, apply_rule in rules: <span>&#x2463;</span></a>
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<ol>
<li>Now, each match rule is its own function which returns the results of calling the <code>re.sub()</code> function.
<li>Each apply rule is also its own function which calls the <code>re.search()</code> function to apply the appropriate pluralization rule.
<li>Instead of having one function (<code>plural()</code>) with multiple rules, you have the <code>rules</code> data structure, which is a sequence of pairs of functions.
<li>Since the rules have been broken out into a separate data structure, the new <code>plural()</code> function can be reduced to a few lines of code. Using a <code>for</code> loop, you can pull out the match and apply rules two at a time (one match, one apply) from the <var>rules</var> structure. On the first iteration of the <code>for</code> loop, <var>matches_rule</var> will get <code>match_sxz</code>, and <var>apply_rule</var> will get <code>apply_sxz</code>. On the second iteration (assuming you get that far), <var>matches_rule</var> will be assigned <code>match_h</code>, and <var>apply_rule</var> will be assigned <code>apply_h</code>. The function is guaranteed to return something eventually, because the final match rule (<code>match_default</code>) simply returns <code>True</code>, meaning the corresponding apply rule (<code>apply_default</code>) will always be applied.
</ol>
<p>The reason this technique works is that <a href=your-first-python-program.html#everythingisanobject>everything in Python is an object</a>, including functions. The <var>rules</var> data structure contains functions &mdash; not names of functions, but actual function objects. When they get assigned in the <code>for</code> loop, then <var>matches_rule</var> and <var>apply_rule</var> are actual functions that you can call. On the first iteration of the <code>for</code> loop, this is equivalent to calling <code>matches_sxz(noun)</code>, and if it returns a match, calling <code>apply_sxz(noun)</code>.
<p>If this additional level of abstraction is confusing, try unrolling the function to see the equivalence. The entire <code>for</code> loop is equivalent to the following:
<pre><code>
def plural(noun):
if match_sxz(noun):
return apply_sxz(noun)
if match_h(noun):
return apply_h(noun)
if match_y(noun):
return apply_y(noun)
if match_default(noun):
return apply_default(noun)</code></pre>
<p>The benefit here is that that <code>plural</code> function is now simplified. It takes a list of rules, defined elsewhere, and iterates through them in a generic fashion.
<ol>
<li>Get a match rule
<li>Does it match? Then call the apply rule and return the result.
<li>No match? Go to step 1.
</ol>
<p>The rules could be defined anywhere, in any way. The <code>plural()</code> function doesn&#8217;t care.
<p>Now, was adding this level of abstraction worth it? Well, not yet. Let&#8217;s consider what it would take to add a new rule to the function. In the first example, it would require adding an <code>if</code> statement to the <code>plural</code> function. In this second example, it would require adding two functions, <code>match_foo()</code> and <code>apply_foo()</code>, and then updating the <var>rules</var> list to specify where in the order the new match and apply functions should be called relative to the other rules.
<p>But this is really just a stepping stone to the next section. Let&#8217;s move on&hellip;
<h2 id=a-list-of-patterns>A list of patterns</h2>
<p>Defining separate named functions for each match and apply rule isn&#8217;t really necessary. You never call them directly; you add them to the <var>rules</var> list and call them through there. Furthermore, each function follows one of two patterns. All the match functions call <code>re.search()</code>, and all the apply functions call <code>re.sub()</code>. Let&#8217;s factor out the patterns so that defining new rules can be easier.
<p class=d>[<a href=examples/plural3.py>download <code>plural3.py</code></a>]
<pre><code>import re
def build_match_and_apply_functions(pattern, search, replace):
<a> def matches_rule(word): <span>&#x2460;</span></a>
return re.search(pattern, word)
<a> def apply_rule(word): <span>&#x2461;</span></a>
return re.sub(search, replace, word)
<a> return (matches_rule, apply_rule) <span>&#x2462;</span></a></code></pre>
<ol>
<li><code>build_match_and_apply_functions</code> is a function that builds other functions dynamically. It takes <var>pattern</var>, <var>search</var> and <var>replace</var>, then defines a <code>matches_rule()</code> function which calls <code>re.search()</code> with the <var>pattern</var> that was passed to the <code>build_match_and_apply_functions()</code> function, and the <var>word</var> that was passed to the <code>matches_rule()</code> function you&#8217;re building. Whoa.
<li>Building the apply function works the same way. The apply function is a function that takes one parameter, and calls <code>re.sub()</code> with the <var>search</var> and <var>replace</var> parameters that were passed to the <code>build_match_and_apply_functions</code> function, and the <var>word</var> that was passed to the <code>apply_rule()</code> function you&#8217;re building. This technique of using the values of outside parameters within a dynamic function is called <em>closures</em>. You&#8217;re essentially defining constants within the apply function you&#8217;re building: it takes one parameter (<var>word</var>), but it then acts on that plus two other values (<var>search</var> and <var>replace</var>) which were set when you defined the apply function.
<li>Finally, the <code>build_match_and_apply_functions</code> function returns a tuple of two values: the two functions you just created. The constants you defined within those functions (<var>pattern</var> within <var>matchFunction</var>, and <var>search</var> and <var>replace</var> within <var>applyFunction</var>) stay with those functions, even after you return from <code>build_match_and_apply_functions</code>. That&#8217;s insanely cool.
</ol>
<p>If this is incredibly confusing (and it should be, this is weird stuff), it may become clearer when you see how to use it.
<pre><code>
<a>patterns = \ <span>&#x2460;</span></a>
[
['[sxz]$', '$', 'es'],
['[^aeioudgkprt]h$', '$', 'es'],
['(qu|[^aeiou])y$', 'y$', 'ies'],
['$', '$', 's']
]
<a>rules = [build_match_and_apply_functions(pattern, search, replace) <span>&#x2461;</span></a>
for (pattern, search, replace) in patterns]</code></pre>
<ol>
<li>Our pluralization rules are now defined as a list of lists of strings (not functions). The first string in each group is the regular expression pattern that you would use in <code>re.search()</code> to see if this rule matches. The second and third strings in each group are the search and replace expressions you would use in <code>re.sub()</code> to actually apply the rule to turn a noun into its plural.
<li>This line is magic. It takes the list of strings in <var>patterns</var> and turns them into a list of functions. How? By mapping the strings to the <code>build_match_and_apply_functions</code> function, which just happens to take three strings as parameters and return a tuple of two functions. This means that <var>rules</var> ends up being exactly the same as the previous example: a list of tuples, where each tuple is a pair of functions, where the first function is the match function that calls <code>re.search()</code>, and the second function is the apply function that calls <code>re.sub()</code>.
</ol>
<p>Rounding out this version of the script is the main entry point, the <code>plural()</code> function.
<pre><code>def plural(noun):
<a> for matches_rule, apply_rule in rules: <span>&#x2460;</span></a>
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<ol>
<li>Since the <var>rules</var> list is the same as the previous example (really, it is), it should come as no surprise that the <code>plural()</code> function hasn&#8217;t changed at all. It&#8217;s completely generic; it takes a list of rule functions and calls them in order. It doesn&#8217;t care how the rules are defined. In the previous example, they were defined as seperate named functions. Now they are built dynamically by mapping the output of the <code>build_match_and_apply_functions()</code> function onto a list of raw strings. It doesn&#8217;t matter; the <code>plural</code> function still works the same way.
</ol>
<h2 id=a-file-of-patterns>A file of patterns</h2>
<p>You&#8217;ve factored out all the duplicate code and added enough abstractions so that the pluralization rules are defined in a list of strings. The next logical step is to take these strings and put them in a separate file, where they can be maintained separately from the code that uses them.
<p>First, let&#8217;s create a text file that contains the rules you want. No fancy data structures, just whitespace-delimited strings in three columns. Let&#8217;s call it <code>plural4-rules.txt</code>.
<p class=d>[<a href=examples/plural4-rules.txt>download <code>plural4-rules.txt</code></a>]
<pre><code>[sxz]$ $ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s</code></pre>
<p>Now let&#8217;s see how you can use this rules file.
<p class=d>[<a href=examples/plural4.py>download <code>plural4.py</code></a>]
<pre><code>import re
<a>def build_match_and_apply_functions(pattern, search, replace): <span>&#x2460;</span></a>
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
rules = []
<a>pattern_file = open('plural4-rules.txt') <span>&#x2461;</span></a>
try:
<a> for line in pattern_file: <span>&#x2462;</span></a>
<a> pattern, search, replace = line.split(None, 3) <span>&#x2463;</span></a>
<a> rules.append(build_match_and_apply_functions( <span>&#x2464;</span></a>
pattern, search, replace))
finally:
<a> pattern_file.close() <span>&#x2465;</span></a></code></pre>
<ol>
<li>The <code>build_match_and_apply_functions()</code> function has not changed. You&#8217;re still using closures to build two functions dynamically that use variables defined in the outer function.
<li>Open the file that contains the pattern strings.
<li>Read through the file one line at a time, using the <code>for line in &lt;fileobject&gt;</code> idiom.
<li>Each line in the file really has three values, but they&#8217;re separated by whitespace (tabs or spaces, it makes no difference). To split it out, use the <code>split()</code> string method. The first argument to the <code>split()</code> method is <code>None</code>, which means &#8220;split on any whitespace (tabs or spaces, it makes no difference).&#8221; The second argument is <code>3</code>, which means &#8220;split on whitespace 3 times, then discard the rest of the line.&#8221; A line like <code>[sxz]$ $ es</code> will be broken up into the tuple <code>('[sxz]$', '$', 'es')</code>, which means that <var>pattern</var> will get <code>'[sxz]$'</code>, <var>search</var> will get <code>'$'</code>, and <var>replace</var> will get <code>'es'</code>. That&#8217;s a lot of power in one little line of code.
<li>Use a <code>try..finally</code> block to ensure the file object is closed.
</ol>
<p>The improvement here is that you&#8217;ve completely separated the pluralization rules into an external file, so it can be maintained separately from the code that uses it. Code is code, data is data, and life is good.
<h2 id=generators>Generators</h2>
<p>Now you&#8217;re ready to learn about generators.
<p class=d>[<a href=examples/plural5.py>download <code>plural5.py</code></a>]
<pre><code>def rules():
for line in open('plural5-rules.txt'):
pattern, search, replace = line.split(None, 3)
yield build_match_and_apply_functions(pattern, search, replace)
def plural(noun):
for matches_rule, apply_rule in rules():
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<p>How the heck does <em>that</em> work? Let&#8217;s look at an interactive example first.
<pre class=screen>
<samp class=p>>>> </samp><kbd>def make_counter(x):</kbd>
<samp class=p>... </samp><kbd>print 'entering make_counter'</kbd>
<samp class=p>... </samp><kbd>while True:</kbd>
<a><samp class=p>... </samp><kbd> yield x</kbd> <span>&#x2460;</span></a>
<samp class=p>... </samp><kbd> print 'incrementing x'</kbd>
<samp class=p>... </samp><kbd> x = x + 1</kbd>
<samp class=p>... </samp>
<a><samp class=p>>>> </samp><kbd>counter = make_counter(2)</kbd> <span>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd>counter</kbd> <span>&#x2462;</span></a>
&lt;generator object at 0x001C9C10>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span>&#x2463;</span></a>
<samp>entering make_counter
2</samp>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span>&#x2464;</span></a>
<samp>incrementing x
3</samp>
<a><samp class=p>>>> </samp><kbd>next(counter)</kbd> <span>&#x2465;</span></a>
<samp>incrementing x
4</samp></pre>
<ol>
<li>The presence of the <code>yield</code> keyword in <code>make_counter</code> means that this is not a normal function. It is a special kind of function which generates values one at a time. You can think of it as a resumable function. Calling it will return a <i>generator</i> that can be used to generate successive values of <var>x</var>.
<li>To create an instance of the <code>make_counter</code> generator, just call it like any other function. Note that this does not actually execute the function code. You can tell this because the first line of the <code>make_counter()</code> function calls <code>print()</code>, but nothing has been printed yet.
<li>The <code>make_counter()</code> function returns a generator object.
<li>The <code>next()</code> function takes a generator object and returns its next value. The first time you call <code>next()</code> with the <var>counter</var> generator, it executes the code in <code>make_counter()</code> up to the first <code>yield</code> statement, then returns the value that was yielded. In this case, that will be <code>2</code>, because you originally created the generator by calling <code>make_counter(2)</code>.
<li>Repeatedly calling <code>next()</code> with the same generator object resumes exactly where it left off and continues until it hits the next <code>yield</code> statement. All variables, local state, <i class=baa>&amp;</i>c. are saved on <code>yield</code> and restored on <code>next()</code>. The next line of code waiting to be executed calls <code>print()</code>, which prints <samp>incrementing x</samp>. After that, the statement <code>x = x + 1</code>. Then it loops through the <code>while</code> loop again, and the first thing it hits is the statement <code>yield x</code>, which saves the state of everything and returns the current value of <var>x</var> (now <code>3</code>).
<li>The second time you call <code>next(counter)</code>, you do all the same things again, but this time <var>x</var> is now <code>4</code>.
</ol>
<p>Since <code>make_counter</code> sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing <var>x</var> and spitting out values. But let&#8217;s look at more productive uses of generators instead.
<h3 id=a-fibonacci-generator>A Fibonacci generator</h3>
<p class=d>[<a href=examples/fibonacci.py>download <code>fibonacci.py</code></a>]
<pre><code>def fib(max):
<a> a, b = 0, 1 <span>&#x2460;</span></a>
while a &lt; max:
<a> yield a <span>&#x2461;</span></a>
<a> a, b = b, a + b <span>&#x2462;</span></a></code></pre>
<ol>
<li>The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with <code>0</code> and <code>1</code>, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: <var>a</var> starts at <code>0</code>, and <var>b</var> starts at <code>1</code>.
<li><var>a</var> is the current number in the sequence, so yield it.
<li><var>b</var> is the next number in the sequence, so assign that to <var>a</var>, but also calculate the next value (<code>a + b</code>) and assign that to <var>b</var> for later use. Note that this happens in parallel; if <var>a</var> is <code>3</code> and <var>b</var> is <code>5</code>, then <code>a, b = b, a + b</code> will set <var>a</var> to <code>5</code> (the previous value of <var>b</var>) and <var>b</var> to <code>8</code> (the sum of the previous values of <var>a</var> and <var>b</var>).
</ol>
<p>So you have a function that spits out successive Fibonacci numbers. Sure, you could do that with recursion, but this way is easier to read. Also, it works well with <code>for</code> loops.
<pre class=screen>
<samp class=p>>>> </samp><kbd>from fibonacci import fib</kbd>
<a><samp class=p>>>> </samp><kbd>for n in fib(1000):</kbd> <span>&#x2460;</span></a>
<a><samp class=p>... </samp><kbd> print(n, end=' ')</kbd> <span>&#x2461;</span></a>
<samp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp></pre>
<ol>
<li>You can use a generator like <code>fib()</code> in a <code>for</code> loop directly. The <code>for</code> loop will automatically call the <code>next()</code> function to get values from the <code>fib()</code> generator and assign them to the <code>for</code> loop index variable (<var>n</var>).
<li>Each time through the <code>for</code> loop, <var>n</var> gets a new value from the <code>yield</code> statement in <code>fib()</code>, and all you have to do is print it out. Once <code>fib()</code> runs out of numbers (<var>a</var> becomes bigger than <var>max</var>, which in this case is <code>1000</code>), then the <code>for</code> loop exits gracefully.
</ol>
<h3 id=a-plural-rule-generator>A plural rule generator</h3>
<p>Let&#8217;s go back to <code>plural5.py</code> and see how this version of the <code>plural()</code> function works.
<pre><code>def rules():
<a> for line in open('plural5-rules.txt'): <span>&#x2460;</span></a>
<a> pattern, search, replace = line.split(None, 3) <span>&#x2461;</span></a>
<a> yield build_match_and_apply_functions(pattern, search, replace) <span>&#x2462;</span></a>
def plural(noun):
<a> for matches_rule, apply_rule in rules(): <span>&#x2463;</span></a>
if matches_rule(noun):
return apply_rule(noun)</code></pre>
<ol>
<li>As you&#8217;ve seen, <code>for line in open(...)</code> is a common idiom for reading from a file one line at a time. But here&#8217;s what you might not know: the reason this idiom works is because <em><code>open()</code> actually returns a generator, and calling <code>next()</code> on this generator returns the next line of the file.</em>
<li>No magic here. Remember that the lines of the rules file have three values separated by whitespace, so you use <code>line.split(None, 3)</code> to get the three &#8220;columns&#8221; and assign them to three local variables.
<li><em>And then you yield.</em> What do you yield? Two functions, built dynamically with your old friend, <code>build_match_and_apply_functions()</code>, which is identical to the previous examples. In other words, <code>rules()</code> is a generator that spits out match and apply functions <em>on demand</em>.
<li>Since <code>rules()</code> is a generator, you can use it directly in a <code>for</code> loop. The first time through the <code>for</code> loop, you will call the <code>rules()</code> function, which will open the pattern file, read the first line, dynamically build a match function and an apply function from the patterns on that line, and yield the dynamically built functions. The second time through the <code>for</code> loop, you will pick up exactly where you left off in <code>rules()</code> (which was in the middle of the <code>for line in file(...)</code> loop). The first thing it will do is read the next line of the file (which is still open), dynamically build another match and apply function based on the patterns on that line in the file, and yield the two functions.
</ol>
<p>What have you gained over stage 4? Startup time. In stage 4, when you imported the <code>plural4</code> module, it read the entire patterns file and built a list of all the possible rules, before you could even think about calling the <code>plural()</code> function. With generators, you can do everything lazily: you read the first rule and create functions and try them, and if that works you don&#8217;t ever read the rest of the file or create any other functions.
<p>What have you lost? Performance! Every time you call the <code>plural()</code> function, the <code>rules()</code> generator starts over from the beginning &mdash; which means re-opening the patterns file and reading from the beginning, one line at a time.
<p>What if you could have the best of both worlds: minimal startup cost (don&#8217;t execute any code on <code>import</code>), <em>and</em> maximum performance (don&#8217;t build the same functions over and over again). Oh, and you still want to keep the rules in a separate file (because code is code and data is data), just as long as you never have to read the same line twice.
<h2 id=iterators>Iterators</h2>
<p>In truth, generators are special case of <i>iterators</i>. A function that <code>yield</code>s values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that.
<h3 id=a-fibonacci-iterator>A Fibonacci iterator</h3>
<p>Remember <a href=a-fibonacci-generator>the Fibonacci generator</a>? Here it is as a built-from-scratch iterator:
<p class=d>[<a href=examples/fibonacci2.py>download <code>fibonacci2.py</code></a>]
<pre><code><a>class fib: <span>&#x2460;</span></a>
<a> def __init__(self, max): <span>&#x2461;</span></a>
self.max = max
<a> def __iter__(self): <span>&#x2462;</span></a>
self.a, self.b = 0, 1
return self
<a> def __next__(self): <span>&#x2463;</span></a>
fib = self.a
if fib > self.max:
<a> raise StopIteration <span>&#x2464;</span></a>
self.a, self.b = self.b, self.a + self.b
<a> return fib <span>&#x2465;</span></a></code></pre>
<ol>
<li>To build an iterator from scratch, <code>fib</code> needs to be a class, not a function.
<li>&#8220;Calling&#8221; <code>fib(max)</code> is really creating an instance of this class and calling its <code>__init__()</code> method with <var>max</var>. The <code>__init__()</code> method saves the maximum value as an instance variable so other methods can refer to it later.
<li>The <code>__iter__()</code> method is called whenever someone calls <code>iter(fib)</code>. (As you&#8217;ll see in a minute, a <code>for</code> loop will call this automatically, but you can also call it yourself manually.) After performing beginning-of-iteration initialization (in this case, resetting <code>self.a</code> and <code>self.b</code>, our two counters), the <code>__iter__()</code> method can return any object that implements a <code>__next__()</code> method. In this case (and in most cases), <code>__iter__()</code> simply returns <code>self</code>, since this class implements its own <code>__next__()</code> method.
<li>The <code>__next__()</code> method is called whenever someone calls <code>next()</code> on an iterator of an instance of a class. That will make more sense in a minute.
<li>When the <code>__next__()</code> method raises a <code>StopIteration</code> exception, this signals to the caller that the iteration is over; no more values are available. If the caller is a <code>for</code> loop, it will notice this <code>StopIteration</code> exception and gracefully exit the loop. (In other words, it will swallow the exception.) This little bit of magic is actually the key to using iterators in <code>for</code> loops.
<li>To spit out the next value, an iterator&#8217;s <code>__next__()</code> method simply <code>return</code>s the value. Do not use <code>yield</code> here; that&#8217;s a bit of syntactic sugar that only applies when you&#8217;re using generators. Here you&#8217;re creating your own iterator from scratch; use <code>return</code> instead.
</ol>
<p>Thoroughly confused yet? Excellent. Let&#8217;s see how to call this iterator:</p>
<pre class=screen>
<samp class=p>>>> </samp><kbd>from fibonacci2 import fib</kbd></a>
<samp class=p>>>> </samp><kbd>for n in fib(1000):</kbd>
<samp class=p>... </samp><kbd> print(n, end=' ')</kbd>
<samp>0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</samp></pre>
<p>Why, it&#8217;s exactly the same! Byte for byte identical to how you called Fibonacci-as-a-generator! But how?
<p>I told you there was a bit of magic involved in <code>for</code> loops. Here&#8217;s what happens:
<ul>
<li>The <code>for</code> loop calls <code>fib(1000)</code>, as shown. This returns an instance of the <code>fib</code> class. Call this <var>fib_inst</var>.
<li>Secretly, and quite cleverly, the <code>for</code> loop calls <code>iter(fib_inst)</code>, which returns an iterator object. Call this <var>fib_iter</var>. In this case, <var>fib_iter</var> == <var>fib_inst</var>, because the <code>__iter__()</code> method returns <code>self</code>, but the <code>for</code> loop doesn&#8217;t know (or care) about that.
<li>To &#8220;loop through&#8221; the iterator, the <code>for</code> loop calls <code>next(fib_iter)</code>, which calls the <code>__next__()</code> method on the <code>fib_iter</code> object, which does the next-Fibonacci-number calculations and returns a value. The <code>for</code> loop takes this value and assigns it to <var>n</var>, then executes the body of the <code>for</code> loop for that value of <var>n</var>.
<li>How does the <code>for</code> loop know when to stop? I&#8217;m glad you asked! When <code>next(fib_iter)</code> raises a <code>StopIteration</code> exception, the <code>for</code> loop will swallow the exception and gracefully exit. (Any other exception will pass through and be raised as usual.) And where have you seen a <code>StopIteration</code> exception? In the <code>__next__()</code> method, of course!
</ul>
<h3 id=a-plural-rule-iterator>A plural rule iterator</h3>
<p>Now it&#8217;s time for the finale&hellip;
<p class=d>[<a href=examples/plural6.py>download <code>plural6.py</code></a>]
<pre><code>class LazyRules:
def __init__(self):
self.pattern_file = open('plural6-rules.txt')
self.cache = []
def __iter__(self):
self.cache_index = 0
return self
def __next__(self):
self.cache_index += 1
if len(self.cache) >= self.cache_index:
return self.cache[self.cache_index - 1]
if self.pattern_file.closed:
raise StopIteration
line = self.pattern_file.readline()
if not line:
self.pattern_file.close()
raise StopIteration
pattern, search, replace = line.split(None, 3)
funcs = build_match_and_apply_functions(
pattern, search, replace)
self.cache.append(funcs)
return funcs
rules = LazyRules()</code></pre>
<p>So this is a class that implements <code>__iter__()</code> and <code>__next__()</code>, so it can be used as an iterator. Then, you instantiate the class and assign it to <var>rules</var>. This happens just once, on import.
<p>Let&#8217;s take the class one bite at a time.
<pre><code>class LazyRules:
<a> def __init__(self): <span>&#x2460;</span></a>
<a> self.pattern_file = open('plural6-rules.txt') <span>&#x2462;</span></a>
<a> self.cache = [] <span>&#x2461;</span></a></code></pre>
<ol>
<li>The <code>__init__()</code> method is only going to be called once, when you instantiate the class and assign it to <var>rules</var>.
<li>Since this is only going to get called once, it&#8217;s the perfect place to open the pattern file. You&#8217;ll read it later; no point doing more than you absolutely have to until absolutely necessary!
<li>Also, this is a good place to initialize the cache, which you&#8217;ll use later as you read the patterns from the pattern file.
</ol>
<pre><code><a> def __iter__(self): <span>&#x2460;</span></a>
<a> self.cache_index = 0 <span>&#x2461;</span></a>
<a> return self <span>&#x2462;</span></a>
</code></pre>
<ol>
<li>The <code>__iter__()</code> method will be called every time someone &mdash; say, a <code>for</code> loop &mdash; calls <code>iter(rules)</code>.
<li>This is the place to reset the counter that we&#8217;re going to use to retrieve items from the cache (that we haven&#8217;t built yet &mdash; patience, grasshopper).
<li>Finally, the <code>__iter__()</code> method returns <code>self</code>, which signals that this class will take care of returning its own values throughout an iteration.
</ol>
<pre><code><a> def __next__(self): <span>&#x2460;</span></a>
.
.
.
pattern, search, replace = line.split(None, 3)
<a> funcs = build_match_and_apply_functions( <span>&#x2461;</span></a>
pattern, search, replace)
<a> self.cache.append(funcs) <span>&#x2462;</span></a>
return funcs</code></pre>
<ol>
<li>The <code>__next__()</code> method gets called whenever someone &mdash; say, a <code>for</code> loop &mdash; calls <code>next(rules)</code>. This method will only make sense if we start at the end and work backwards. So let&#8217;s do that.
<li>The last part of this function should look familiar, at least. The <code>build_match_and_apply_functions()</code> function hasn&#8217;t changed; it&#8217;s the same as it ever was. <em>Each line of the pattern file will be read exactly once, as late as possible.</em>
<li>The only difference is that, before returning the match and apply functions (which are stored in the tuple <var>funcs</var>), we&#8217;ve going to save them in <code>self.cache</code>. <em>Each match and apply function will be built exactly once, as late as possible, then cached.</em>
</ol>
<p>Moving backwards&hellip;
<pre><code><a> def __next__(self):
.
.
.
<a> line = self.pattern_file.readline() <span>&#x2460;</span></a>
<a> if not line: <span>&#x2461;</span></a>
self.pattern_file.close()
<a> raise StopIteration <span>&#x2462;</span></a>
.
.
.</code></pre>
<ol>
<li>A bit of advanced file trickery here. The <code>readline()</code> method (note: singular, not the plural <code>readlines()</code>) reads exactly one line from an open file. Specifically, the next line. (<em>File objects are iterators too! It&#8217;s iterators all the way down&hellip;</em>)
<li>If there was a line for <code>readline()</code> to read, <var>line</var> will not be an empty string. Even if the file contained a blank line, <var>line</var> would end up as the one-character string <code>'\n'</code> (a carriage return). If <var>line</var> is really an empty string, that means there are no more lines to read from the file.
<li>When we reach the end of the file, we should close the file and raise the magic <code>StopIteration</code> exception. Remember, we got to this point because we needed a match and apply function for the next rule. The next rule comes from the next line of the file&hellip; but there is no next line! Therefore, we have no value to return. The iteration is over. (<span>&#x266B;</span> The party&#8217;s over&hellip; <span>&#x266B;</span>)
</ol>
<p>Moving backwards all the way to the start of the <code>__next__()</code> method&hellip;
<pre><code><a> def __next__(self):
self.cache_index += 1
if len(self.cache) >= self.cache_index:
<a> return self.cache[self.cache_index - 1] <span>&#x2460;</span></a>
if self.pattern_file.closed:
<a> raise StopIteration <span>&#x2461;</span></a>
.
.
.</code></pre>
<ol>
<li><code>self.cache</code> will be a list of the functions we need to match and apply individual rules. (At least <em>that</em> should sound familiar!) <code>self.cache_index</code> keeps track of which cached item we should return next. If we haven&#8217;t exhausted the cache yet (<i>i.e.</i> if the length of <code>self.cache</code> is greater than <code>self.cache_index</code>), then we have a cache hit! Hooray! We can return the match and apply functions from the cache instead of building them from scratch.
<li>On the other hand, if we don&#8217;t get a hit from the cache, <em>and</em> the file object has been closed (which could happen, further down the method, as you saw in the previous code snippet), then there&#8217;s nothing more we can do. If the file is closed, it means we&#8217;ve exhausted it &mdash; we&#8217;ve already read through every line from the pattern file, and we&#8217;ve already built and cached the match and apply functions for each pattern. The file is exhausted; the cache is exhausted; I&#8217;m exhausted. Wait, what? Hang in there, we&#8217;re almost done.
</ol>
<p>Putting it all together, here&#8217;s what happens when:
<ul>
<li>When the module is imported, it creates a single instance of the <code>LazyRules</code> class, called <var>rules</var>, which opens the pattern file but does not read from it.
<li>When asked for the first match and apply function, it checks its cache but finds the cache is empty. So it reads a single line from the pattern file, builds the match and apply functions from those patterns, and caches them.
<li>Let&#8217;s say, for the sake of argument, that the very first rule matched. If so, no further match and apply functions are built, and no further lines are read from the pattern file.
<li>Furthermore, for the sake of argument, suppose that the caller calls the <code>plural()</code> function <em>again</em> to pluralize a different word. The <code>for</code> loop in the <code>plural()</code> function will call <code>iter(rules)</code>, which will reset the cache index but will not reset the open file object.
<li>The first time through, the <code>for</code> loop will ask for a value from <var>rules</var>, which will invoke its <code>__next__()</code> method. This time, however, the cache is primed with a single pair of match and apply functions, corresponding to the patterns in the first line of the pattern file. Since they were built and cached in the course of pluralizing the previous word, they&#8217;re retrieved from the cache. The cache index increments, and the open file is never touched.
<li>Let&#8217;s say, for the sake of argument, that the first rule does <em>not</em> match this time around. So the <code>for</code> loop comes around again and asks for another value from <var>rules</var>. This invokes the <code>__next__()</code> method a second time. This time, the cache is exhausted &mdash; it only contained one item, and we&#8217;re asking for a second &mdash; so the <code>__next__()</code> method continues. It reads another line from the open file, builds match and apply functions out of the patterns, and caches them.
<li>This read-build-and-cache process will continue as long as the rules being read from the pattern file don&#8217;t match the word we&#8217;re trying to pluralize. If we do find a matching rule before the end of the file, we simply use it and stop, with the file still open. The file pointer will stay wherever we stopped reading, waiting for the next <code>readline()</code> command. In the meantime, the cache now has more items in it, and if we start all over again trying to pluralize a new word, each of those items in the cache will be tried before reading the next line from the pattern file.
</ul>
<p>Thus, we have achieved our combined goal:
<ol>
<li><strong>Minimal startup cost.</strong> The only thing that happens on <code>import</code> is instantiating a single class and opening a file (but not reading from it).
<li><strong>Maximum performance.</strong> The previous example would read through the file and build functions dynamically every time you wanted to pluralize a word. This version will cache functions as soon as they&#8217;re built, and in the worst case, it will only read through the pattern file once, no matter how many words you pluralize.
<li><strong>Separation of code and data.</strong> All the patterns are stored in a separate file. Code is code, and data is data, and never the twain shall meet.
</ol>
<h2 id=furtherreading>Further reading</h2>
<ul>
<li><a href=http://www.python.org/dev/peps/pep-0234/>PEP 234: Iterators</a>
<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
</ul>
<p class=c>&copy; 2001&ndash;9 <a href=about.html><span>&#x2133;</span>ark Pilgrim</a>
<script src=jquery.js></script>
<script src=dip3.js></script>
+2 -2
View File
@@ -9,9 +9,9 @@ body{counter-reset:h1 2}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#native-datatypes>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Native datatypes</h1>
<h1>Native Datatypes</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Wonder is the foundation of all philosophy, research its progress, ignorance its end. <span>&#x275E;</span><br>&mdash; <cite>Michel de Montaigne</cite>
<p><span>&#x275D;</span> Wonder is the foundation of all philosophy, inquiry its progress, ignorance its end. <span>&#x275E;</span><br>&mdash; Michel de Montaigne
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
+2 -1
View File
@@ -19,7 +19,7 @@ td pre{padding:0;border:0}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#porting-code-to-python-3-with-2to3>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Porting code to Python 3 with <code>2to3</code></h1>
<h1>Porting Code to Python 3 with <code>2to3</code></h1>
<blockquote class=q>
<p><span>&#x275D;</span> Life is pleasant. Death is peaceful. It&#8217;s the transition that&#8217;s troublesome. <span>&#x275E;</span><br>&mdash; Isaac Asimov (attributed)
</blockquote>
@@ -495,6 +495,7 @@ for an_iterator in a_sequence_of_iterators:
reduce(a, b, c)</code></pre></td></tr>
</table>
<blockquote>
<!-- FIXME reduce() removal from Guido: http://www.artima.com/weblogs/viewpost.jsp?thread=98196 -->
<p><span>&#x261E;</span>The version of <code>2to3</code> that shipped with Python 3.0 would not fix the <code>reduce()</code> function automatically. The fix first appeared in the <code>2to3</code> script that shipped with Python 3.1.
</blockquote>
<h2 id=apply><code>apply()</code> global function</h2>
+4 -3
View File
@@ -3,7 +3,8 @@
# make build directory and copy original files there for preflighting
rm -rf build
mkdir build
cp *.py robots.txt *.js *.css build/
cp robots.txt *.js *.css build/
cp -R examples build/
# minimize HTML (note: this script is quite fragile and relies on knowledge of how I write HTML)
for f in *.html; do
@@ -34,8 +35,8 @@ sed -i -e "s|=http:|=|g" build/*.html
sed -i -e "s|href=index.html|href=/|g" build/*.html
# set file permissions (hg resets these, don't know why)
chmod 644 build/*.html build/*.css build/*.js build/*.py build/*.txt
chmod 644 build/*.html build/*.css build/*.js build/examples/*.py build/examples/*.txt build/*.txt
# ship it!
rsync -essh -avzP build/$revision.js build/html5.js diveintomark.org:~/web/wearehugh.com/dip3/
rsync -essh -avzP build/*.html build/*.py build/*.txt diveintomark.org:~/web/diveintopython3.org/
rsync -essh -avzP build/*.html build/examples build/*.txt diveintomark.org:~/web/diveintopython3.org/
+2 -2
View File
@@ -9,9 +9,9 @@ body{counter-reset:h1 4}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#regular-expressions>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Regular expressions</h1>
<h1>Regular Expressions</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Some people, when confronted with a problem, think &#8220;I know, I&#8217;ll use regular expressions.&#8221; Now they have two problems. <span>&#x275E;</span><br>&mdash; <cite>Jamie Zawinski</cite>
<p><span>&#x275D;</span> Some people, when confronted with a problem, think &#8220;I know, I&#8217;ll use regular expressions.&#8221; Now they have two problems. <span>&#x275E;</span><br>&mdash; <a href=http://www.jwz.org/hacks/marginal.html>Jamie Zawinski</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
-29
View File
@@ -1,29 +0,0 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
-34
View File
@@ -1,34 +0,0 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError):
pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if n > 3999:
raise OutOfRangeError("number out of range (must be less than 3999)")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
-33
View File
@@ -1,33 +0,0 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 4000):
raise OutOfRangeError("number out of range (must be 0..3999)")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
-36
View File
@@ -1,36 +0,0 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 4000):
raise OutOfRangeError("number out of range (must be 0..3999)")
if not isinstance(n, int):
raise NotIntegerError("non-integers can not be converted")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
+58 -47
View File
@@ -13,13 +13,13 @@ body{counter-reset:h1 3}
<h1>Strings</h1>
<blockquote class=q>
<p><span>&#x275D;</span> I&#8217;m telling you this &#8217;cause you&#8217;re one of my friends.<br>
My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; <cite>Dr. Seuss, On Beyond Zebra!</cite>
My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr. Seuss, On Beyond Zebra!
</blockquote>
<p id=toc>&nbsp;
<h2 id=boring-stuff>Some boring stuff you need to understand before you can dive in</h2>
<p class=f>Did you know that the people of <a href="http://en.wikipedia.org/wiki/Bougainville_Province">Bougainville</a> have the smallest alphabet in the world? Their <a href="http://en.wikipedia.org/wiki/Rotokas_alphabet">Rotokas alphabet</a> is composed of only 12 letters: A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters &mdash; 52 if you count uppercase and lowercase separately &mdash; plus a handful of <i class=baa>!@#$%&</i> punctuation marks.
<p>When people talk about &#8220;text,&#8221; they&#8217;re thinking of &#8220;characters and symbols on the computer screen.&#8221; But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <i>character encoding</i>. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
<p>When people talk about &#8220;text,&#8221; they&#8217;re thinking of &#8220;characters and symbols on the computer screen.&#8221; But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <i>character encoding</i>. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages.
<p>In reality, it&#8217;s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key. Whenever someone gives you a sequence of bytes &mdash; a file, a web page, whatever &mdash; and claims it&#8217;s &#8220;text,&#8221; you need to know what character encoding they used so you can decode the bytes into characters. If they give you the wrong key or no key at all, you&#8217;re left with the unenviable task of cracking the code yourself. Chances are you&#8217;ll get it wrong, and the result will be gibberish.
@@ -101,7 +101,7 @@ La Pe&ntilde;a</pre>
<p>Let's take another look at <a href=your-first-python-program.html#divingin><code>humansize.py</code></a>:
<p class=d>[<a href=humansize.py>download <code>humansize.py</code></a>]
<p class=d>[<a href=examples/humansize.py>download <code>humansize.py</code></a>]
<pre><code>
<a>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], <span>&#x2460;</span></a>
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
@@ -149,6 +149,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<li>There's a lot going on here. First, that's a method call on a string literal. <em>Strings are objects</em>, and objects have methods. Second, the whole expression evaluates to a string. Third, <code>{0}</code> and <code>{1}</code> are <i>replacement fields</i>, which are replaced by the arguments passed to the <code>format()</code> method.
</ol>
<h3 id=compound-field-names>Compound field names</h3>
<p>The previous example shows the simplest case, where the replacement fields are simply integers. Integer replacement fields are treated as positional indices into the argument list of the <code>format()</code> method. That means that <code>{0}</code> is replaced by the first argument (<var>username</var> in this case), <code>{1}</code> is replaced by the second argument (<var>password</var>), <i class=baa>&amp;</i>c. You can have as many positional indices as you have arguments, and you can have as many arguments as you want. But replacement fields are much more powerful than that.
<pre class=screen>
@@ -160,11 +162,11 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<samp>'1000KB = 1MB'</samp>
</pre>
<ol>
<li>Rather than calling any function in the <code>humansize</code> module, you'll just grab one of the data structures it defines: the list of "SI" (powers-of-1000) suffixes.
<li>Rather than calling any function in the <code>humansize</code> module, you're just grabbing one of the data structures it defines: the list of "SI" (powers-of-1000) suffixes.
<li>This looks complicated, but it's not. <code>{0}</code> would refer to the first argument passed to the <code>format()</code> method, <var>si_suffixes</var>. But <var>si_suffixes</var> is a list. So <code>{0[0]}</code> refers to the first item of the list which is the first argument passed to the <code>format()</code> method: <code>'KB'</code>. Meanwhile, <code>{1[0]}</code> refers to the second item of the same list: <code>'MB'</code>. Everything outside the curly braces &mdash; including <code>1000</code>, the equals sign, and the spaces &mdash; is untouched. The final result is the string <code>'1000KB = 1MB'</code>.
</ol>
<p>What this example shows is that <em>format specifers can access items and properties of data structures using (almost) Python syntax</em>. The following things "just work":
<p>What this example shows is that <em>format specifers can access items and properties of data structures using (almost) Python syntax</em>. This is called <i>compound field names</i>. The following compound field names "just work":
<ul>
<li>Passing a list, and accessing an item of the list by index (as in the previous example)
@@ -193,6 +195,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<lI><code>sys.modules["humansize"].SUFFIXES[1000][0]</code> is the first item of the list of <abbr>SI</abbr> suffixes: <code>'KB'</code>. Therefore, the complete replacement field <code>{0.modules[humansize].SUFFIXES[1000][0]}</code> is replaced by the two-character string <code>KB</code>.
</ul>
<h3 id=format-specifiers>Format specifiers</h3>
<p>But wait! There's more! Let's take another look at that strange line of code from <code>humansize.py</code>:
<pre><code>if size < multiple:
@@ -210,58 +214,54 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<samp class=p>>>> </samp><kbd>"{0:.1f} {1}".format(698.25, 'GB')</kbd>
<samp>'698.3 GB'</samp></pre>
<p>For all the gory details on presentation types, check the <a href="http://docs.python.org/dev/3.0/library/string.html#format-specification-mini-language">Format Specification Mini-Language</a> in the official Python documentation.
<p>For all the gory details on format specifiers, consult the <a href="http://docs.python.org/dev/3.0/library/string.html#format-specification-mini-language">Format Specification Mini-Language</a> in the official Python documentation.
<div class=s>
<p>Note that <code>(k, v)</code> is a tuple. I told you they were good for something.
<h2 id=common-string-methods>Other common string methods</h2>
<p>You might be thinking that this is a lot of work just to do simple string concatentation, and you would be right, except that
string formatting isn't just concatenation. It's not even just formatting. It's also type coercion.
<p>Besides formatting, strings can do a number of other useful tricks.
<pre class=screen>
<samp class=p>>>> </samp><kbd>uid = "sa"</kbd>
<samp class=p>>>> </samp><kbd>pwd = "secret"</kbd>
<samp class=p>>>> </samp><kbd>print pwd + " is not a good password for " + uid</kbd> <span>&#x2460;</span>
secret is not a good password for sa
<samp class=p>>>> </samp><kbd>print "%s is not a good password for %s" % (pwd, uid)</kbd> <span>&#x2461;</span>
secret is not a good password for sa
<samp class=p>>>> </samp><kbd>userCount = 6</kbd>
<samp class=p>>>> </samp><kbd>print "Users connected: %d" % (userCount, )</kbd> <span>&#x2462;</span> <span>&#x2463;</span>
Users connected: 6
<samp class=p>>>> </samp><kbd>print "Users connected: " + userCount</kbd> <span>&#x2464;</span>
<samp class=traceback>Traceback (innermost last):
File "&lt;interactive input>", line 1, in ?
TypeError: cannot concatenate 'str' and 'int' objects</samp></pre>
<a><samp class=p>>>> </samp><kbd>s = """Finished files are the re-</kbd> <span>&#x2460;</span></a>
<samp class=p>... </samp><kbd>sult of years of scientif-</kbd>
<samp class=p>... </samp><kbd>ic study combined with the</kbd>
<samp class=p>... </samp><kbd>experience of years."""</kbd>
<a><samp class=p>>>> </samp><kbd>s.splitlines()</kbd> <span>&#x2461;</span></a>
<samp>['Finished files are the re-',
'sult of years of scientif-',
'ic study combined with the',
'experience of years.']</samp>
<a><samp class=p>>>> </samp><kbd>print(s.lower())</kbd> <span>&#x2462;</span></a>
<samp>finished files are the re-
sult of years of scientif-
ic study combined with the
experience of years.</samp>
<a><samp class=p>>>> </samp><kbd>s.lower().count("f")</kbd> <span>&#x2463;</span></a>
<samp>6</samp></pre>
<ol>
<li><code>+</code> is the string concatenation operator.
<li>In this trivial case, string formatting accomplishes the same result as concatentation.
<li><code>(userCount, )</code> is a tuple with one element. Yes, the syntax is a little strange, but there's a good reason for it: it's unambiguously a tuple. In fact, you can always include a comma after the last element when defining a list, tuple, or dictionary, but the comma is required when defining a tuple with one element. If the comma weren't required, Python wouldn't know whether <code>(userCount)</code> was a tuple with one element or just the value of <var>userCount</var>.
<li>String formatting works with integers by specifying <code>%d</code> instead of <code>%s</code>.
<li>Trying to concatenate a string with a non-string raises an exception. Unlike string formatting, string concatenation works only when everything is already a string.
<li>You can input multi-line strings in the Python interactive shell. Once you start a multi-line string with triple quotation marks, just hit <kbd>ENTER</kbd> and the interactive shell will prompt you to continue the string. Typing the closing triple quotation marks ends the string, and the next <kbd>ENTER</kbd> will execute the command (in this case, assigning the string to <var>s</var>).
<li>The <code>splitlines()</code> method takes one multi-line string and returns a list of strings, one for each line of the original. Note that the carriage returns at the end of each line are not included.
<li>The <code>lower()</code> method converts the entire string to lowercase. (Similarly, the <code>upper()</code> method converts a string to uppercase.)
<li>the <code>count()</code> method counts the number of occurrences of a substring. Yes, there really are six &#8220;f&#8221;s in that sentence!
</ol>
<p>As with <code>printf</code> in <abbr>C</abbr>, string formatting in Python is like a Swiss Army knife. There are options galore, and modifier strings to specially format many different types of values.
<!--
<p>What else can strings do? Here's a common idiom I use for getting bits of data out of semi-structured strings.
<pre class=screen>
<samp class=p>>>> </samp><kbd>print "Today's stock price: %f" % 50.4625</kbd> <span>&#x2460;</span>
<samp>50.462500</samp>
<samp class=p>>>> </samp><kbd>print "Today's stock price: %.2f" % 50.4625</kbd> <span>&#x2461;</span>
<samp>50.46</samp>
<samp class=p>>>> </samp><kbd>print "Change since yesterday: %+.2f" % 1.5</kbd> <span>&#x2462;</span>
<samp>+1.50</samp></pre>
<ol>
<li>The <code>%f</code> string formatting option treats the value as a decimal, and prints it to six decimal places.
<li>The ".2" modifier of the <code>%f</code> option truncates the value to two decimal places.
<li>You can even combine modifiers. Adding the <code>+</code> modifier displays a plus or minus sign before the value. Note that the ".2" modifier is still in place, and is padding the value to exactly two decimal places.
</ol>
</div>
<h2 id=common-string-methods>Common string methods</h2>
<samp class=p>>>> </samp><kbd>import subprocess</kbd>
<samp class=p>>>> </samp><kbd>df = subprocess.getoutput('df -x tmpfs')</kbd>
<samp class=p>>>> </samp><kbd>print(df)</kbd>
<samp>Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 461215812 73256908 364529712 17% /
/dev/sdb1 721075720 620495832 63951288 91% /backup</samp>
<samp class=p>>>> </samp><kbd>
-->
<!--
['capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
-->
<!--
<p>[FIXME is it worth keeping this section on joining lists / splitting strings? All the examples are from an old code sample that isn't used at all anymore.]
<div class=s>
@@ -275,7 +275,13 @@ TypeError: cannot concatenate 'str' and 'int' objects</samp></pre>
is an object. You might have thought I meant that string <em>variables</em> are objects. But no, look closely at this example and you'll see that the string <code>";"</code> itself is an object, and you are calling its <code>join</code> method.
<p>The <code>join</code> method joins the elements of the list into a single string, with each element separated by a semi-colon. The delimiter doesn't need to be a semi-colon; it doesn't even need to be a single character. It can be any string.
<!--<code>join</code> works only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements will raise an exception.-->
<code>join</code> works only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements will raise an exception.
<pre class=screen>
<samp class=p>>>> </samp><kbd>params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}</kbd>
@@ -302,10 +308,15 @@ is an object. You might have thought I meant that string <em>variables</em> are
<li><code>split</code> takes an optional second argument, which is the number of times to split. (&#8220;Oooooh, optional arguments...&#8221; You'll learn how to do this in your own functions in the next chapter.)
</ol>
<!--<code><var>anystring</var>.<code>split</code>(<var>delimiter</var>, 1)</code> is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element).-->
</div>
<h2 id=string-operations>Common string operations</h2>
<code><var>anystring</var>.<code>split</code>(<var>delimiter</var>, 1)</code> is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element).
</div>
-->
<h2 id=string-module>The <code>string</code> module</h2>
+26 -18
View File
@@ -36,7 +36,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
<li><a href=your-first-python-program.html#everythingisanobject>Everything is an object</a>
<ol>
<li><a href=your-first-python-program.html#importsearchpath>The <code>import</code> search path</a>
<li><a href=your-first-python-program.html#whatsanobject>What's an object?</a>
<li><a href=your-first-python-program.html#whatsanobject>What&#8217;s an object?</a>
</ol>
<li><a href=your-first-python-program.html#indentingcode>Indenting code</a>
<li><a href=your-first-python-program.html#runningscripts>Running scripts</a>
@@ -178,6 +178,25 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
<li>Handling errors (exceptions)
<li>Writing to files
</ol>
<li id=iterators-and-generators><a href=iterators-and-generators.html>Iterators <i class=baa>&amp;</i> generators</a>
<ol>
<li><a href=iterators-and-generators.html#divingin>Diving in</a>
<li><a href=iterators-and-generators.html#i-know>I know, let&#8217;s use regular expressions!</a>
<li><a href=iterators-and-generators.html#a-list-of-functions>A list of functions</a>
<li><a href=iterators-and-generators.html#a-list-of-patterns>A list of patterns</a>
<li><a href=iterators-and-generators.html#a-file-of-patterns>A file of patterns</a>
<li><a href=iterators-and-generators.html#generators>Generators</a>
<ol>
<li><a href=iterators-and-generators.html#a-fibonacci-generator>A Fibonacci generator</a>
<li><a href=iterators-and-generators.html#a-plural-rule-generator>A plural rule generator</a>
</ol>
<li><a href=iterators-and-generators.html#iterators>Iterators</a>
<ol>
<li><a href=iterators-and-generators.html#a-fibonacci-iterator>A Fibonacci iterator</a>
<li><a href=iterators-and-generators.html#a-plural-rule-iterator>A plural rule iterator</a>
</ol>
<li><a href=iterators-and-generators.html#furtherreading>Further reading</a>
</ol>
<li>HTML processing
<ol>
<li>Diving in
@@ -222,17 +241,6 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
<li>Putting it all together
<li>Summary
</ol>
<li>Dynamic functions
<ol>
<li>Diving in
<li>plural.py, stage 1
<li>plural.py, stage 2
<li>plural.py, stage 3
<li>plural.py, stage 4
<li>plural.py, stage 5
<li>plural.py, stage 6
<li>Summary
</ol>
<li>Metaclasses
<ol>
<li>...once I figure out WTF metaclasses are...
@@ -282,10 +290,10 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
<li><a href=case-study-porting-chardet-to-python-3.html#divingin>Introducing <code>chardet</code>: a mini-<abbr>FAQ</abbr></a>
<ol>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.what>What is character encoding auto-detection?</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.impossible>Isn't that impossible?</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.impossible>Isn&#8217;t that impossible?</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.who>Who wrote this detection algorithm?</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.yippie>Yippie! Screw the standards, I'll just auto-detect everything!</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.why>Why bother with auto-detection if it's slow, inaccurate, and non-standard?</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.yippie>Yippie! Screw the standards, I&#8217;ll just auto-detect everything!</a>
<li><a href=case-study-porting-chardet-to-python-3.html#faq.why>Why bother with auto-detection if it&#8217;s slow, inaccurate, and non-standard?</a>
</ol>
<li><a href=case-study-porting-chardet-to-python-3.html#divingin2>Diving in</a>
<ol>
@@ -296,13 +304,13 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
<li><a href=case-study-porting-chardet-to-python-3.html#how.windows1252><code>windows-1252</code></a>
</ol>
<li><a href=case-study-porting-chardet-to-python-3.html#running2to3>Running <code>2to3</code></a>
<li><a href=case-study-porting-chardet-to-python-3.html#manual>Fixing what <code>2to3</code> can't</a>
<li><a href=case-study-porting-chardet-to-python-3.html#manual>Fixing what <code>2to3</code> can&#8217;t</a>
<ol>
<li><a href=case-study-porting-chardet-to-python-3.html#falseisinvalidsyntax><code>False</code> is invalid syntax</a>
<li><a href=case-study-porting-chardet-to-python-3.html#nomodulenamedconstants>No module named <code>constants</code></a>
<li><a href=case-study-porting-chardet-to-python-3.html#namefileisnotdefined>Name '<var>file</var>' is not defined</a>
<li><a href=case-study-porting-chardet-to-python-3.html#cantuseastringpattern>Can't use a string pattern on a bytes-like object</a>
<li><a href=case-study-porting-chardet-to-python-3.html#cantconvertbytesobject>Can't convert '<code>bytes</code>' object to <code>str</code> implicitly</a>
<li><a href=case-study-porting-chardet-to-python-3.html#cantuseastringpattern>Can&#8217;t use a string pattern on a bytes-like object</a>
<li><a href=case-study-porting-chardet-to-python-3.html#cantconvertbytesobject>Can&#8217;t convert '<code>bytes</code>' object to <code>str</code> implicitly</a>
</ol>
</ol>
</ol>
+6 -6
View File
@@ -9,9 +9,9 @@ body{counter-reset:h1 7}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#unit-testing>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Unit testing</h1>
<h1>Unit Testing</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Certitude is not the test of certainty. We have been cocksure of many things that were not so. <span>&#x275E;</span><br>&mdash; <cite>Oliver Wendell Holmes, Jr.</cite>
<p><span>&#x275D;</span> Certitude is not the test of certainty. We have been cocksure of many things that were not so. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Oliver_Wendell_Holmes,_Jr.>Oliver Wendell Holmes, Jr.</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>(Not) diving in</h2>
@@ -49,7 +49,7 @@ body{counter-reset:h1 7}
<li>The <code>to_roman()</code> function should return the Roman numeral representation for all integers <code>1</code> to <code>3999</code>.
</ol>
<p>It is not immediately obvious how this code does&hellip; well, <em>anything</em>. It defines a class which has no <code>__init__()</code> method. The class <em>does</em> have another method, but it is never called. The entire script has a <code>__main__</code> block, but it doesn't reference the class or its method. But it does do something, I promise.
<p class=d>[<a href=romantest1.py>download <code>romantest1.py</code></a>]
<p class=d>[<a href=examples/romantest1.py>download <code>romantest1.py</code></a>]
<pre><code>import roman1
import unittest
@@ -159,7 +159,7 @@ Traceback (most recent call last):
<li>Overall, the unit test failed because at least one test case did not pass. When a test case doesn't pass, <code>unittest</code> distinguishes between failures and errors. A failure is a call to an <code>assertXYZ</code> method, like <code>assertEqual</code> or <code>assertRaises</code>, that fails because the asserted condition is not true or the expected exception was not raised. An error is any other sort of exception raised in the code you're testing or the unit test case itself.
</ol>
<p><em>Now</em>, finally, you can write the <code>to_roman()</code> function.
<p class=d>[<a href=roman1.py>download <code>roman1.py</code></a>]
<p class=d>[<a href=examples/roman1.py>download <code>roman1.py</code></a>]
<pre><code>roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
@@ -233,7 +233,7 @@ OK</samp></pre>
<p>The <code>to_roman()</code> function should raise an <code>OutOfRangeError</code> when given an integer greater than <code>3999</code>.
</blockquote>
<p>What would that test look like?
<p class=d>[<a href=romantest2.py>download <code>romantest2.py</code></a>]
<p class=d>[<a href=examples/romantest2.py>download <code>romantest2.py</code></a>]
<pre><code>
<a>class ToRomanBadInput(unittest.TestCase): <span>&#x2460;</span></a>
<a> def test_too_large(self): <span>&#x2461;</span></a>
@@ -298,7 +298,7 @@ FAILED (failures=1)</samp></pre>
<li>Of course, the <code>to_roman()</code> function isn't raising the <code>OutOfRangeError</code> exception you just defined, because you haven't told it to do that yet. That's excellent news! It means this is a valid test case &mdash; it fails before you write the code to make it pass.
</ol>
<p>Now you can write the code to make this test pass.
<p class=d>[<a href=roman2.py>download <code>roman2.py</code></a>]
<p class=d>[<a href=examples/roman2.py>download <code>roman2.py</code></a>]
<pre><code>def to_roman(n):
"""convert integer to Roman numeral"""
if n > 3999:
+3 -3
View File
@@ -10,14 +10,14 @@ th{font-family:inherit !important}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=31>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#your-first-python-program>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Your first Python program</h1>
<h1>Your First Python Program</h1>
<blockquote class=q>
<p><span>&#x275D;</span> Don&#8217;t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. <span>&#x275E;</span><br>&mdash; <cite>Ven. Henepola Gunararatana</cite>
<p><span>&#x275D;</span> Don&#8217;t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. <span>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Buddhism>Ven. Henepola Gunaratana</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving in</h2>
<p class=f>Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
<p class=d>[<a href=humansize.py>download <code>humansize.py</code></a>]
<p class=d>[<a href=examples/humansize.py>download <code>humansize.py</code></a>]
<pre><code>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}