finished refactoring chapter

This commit is contained in:
Mark Pilgrim
2009-04-12 01:09:17 -04:00
parent f520af8afe
commit 94a3353f45
15 changed files with 996 additions and 727 deletions
+1 -2
View File
@@ -6,10 +6,9 @@
<link rel=stylesheet type=text/css href=dip3.css>
<style>
body{counter-reset:h1 20}
ins,del,mark{line-height:2.154;text-decoration:none;font-style:normal;display:inline-block;width:100%}
ins,del{line-height:2.154;text-decoration:none;font-style:normal;display:inline-block;width:100%}
ins{background:#9f9}
del{background:#f87}
mark{background:#ff8;font-weight:bold}
</style>
<link rel=stylesheet type=text/css media='only screen and (max-device-width: 480px)' href=mobile.css>
</head>
-704
View File
@@ -6387,710 +6387,6 @@ OK </span><span>&#x2463;</span></pre><div class=calloutlist>
<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">When all of your tests pass, stop coding.
<div class=chapter>
<h2 id="roman2">Chapter 15. Refactoring</h2>
<h2 id="roman.bugs">15.1. Handling bugs</h2>
<p>Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by &#8220;bug&#8221;? A bug is a test case you haven't written yet.
<div class=example><h3>Example 15.1. The bug</h3><pre class=screen><samp class=p>>>> </samp><kbd>import roman5</kbd>
<samp class=p>>>> </samp><kbd>roman5.from_roman("")</kbd> <span>&#x2460;</span>
0</pre><div class=calloutlist>
<ol>
<li>Remember in the <a href="#roman.stage5" title="14.5. roman.py, stage 5">previous section</a> when you kept seeing that an empty string would match the regular expression you were using to check for valid Roman numerals?
Well, it turns out that this is still true for the final version of the regular expression. And that's a bug; you want an
empty string to raise an <code>InvalidRomanNumeralError</code> exception just like any other sequence of characters that don't represent a valid Roman numeral.
<p>After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.
<div class=example><h3>Example 15.2. Testing for the bug (<code>romantest61.py</code>)</h3><pre><code>
class FromRomanBadInput(unittest.TestCase):
# previous test cases omitted for clarity (they haven't changed)
def testBlank(self):
"""from_roman should fail with blank string"""
self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, "") <span>&#x2460;</span>
</pre><div class=calloutlist>
<ol>
<li>Pretty simple stuff here. Call <code>from_roman()</code> with an empty string and make sure it raises an <code>InvalidRomanNumeralError</code> exception. The hard part was finding the bug; now that you know about it, testing for it is the easy part.
<p>Since your code has a bug, and you now have a test case that tests this bug, the test case will fail:
<div class=example><h3>Example 15.3. Output of <code>romantest61.py</code> against <code>roman61.py</code></h3><pre class=screen><samp>from_roman should only accept uppercase input ... ok
to_roman should always return uppercase ... ok
from_roman should fail with blank string ... FAIL
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
======================================================================
FAIL: from_roman should fail with blank string
----------------------------------------------------------------------
</span><samp class=traceback>Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage6\romantest61.py", line 137, in testBlank
self.assertRaises(roman61.InvalidRomanNumeralError, roman61.from_roman, "")
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError</span><samp>
----------------------------------------------------------------------
Ran 13 tests in 2.864s
FAILED (failures=1)</span></pre><p><em>Now</em> you can fix the bug.
<div class=example><h3>Example 15.4. Fixing the bug (<code>roman62.py</code>)</h3>
<p>This file is available in <code>py/roman/stage6/</code> in the examples directory.
<pre><code>
def from_roman(s):
"""convert Roman numeral to integer"""
if not s: <span>&#x2460;</span>
raise InvalidRomanNumeralError, 'Input can not be blank'
if not re.search(romanNumeralPattern, s):
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
result = 0
index = 0
for numeral, integer in romanNumeralMap:
while s[index:index+len(numeral)] == numeral:
result += integer
index += len(numeral)
return result
</pre><div class=calloutlist>
<ol>
<li>Only two lines of code are required: an explicit check for an empty string, and a <code>raise</code> statement.
<div class=example><h3>Example 15.5. Output of <code>romantest62.py</code> against <code>roman62.py</code></h3><pre class=screen><samp>from_roman should only accept uppercase input ... ok
to_roman should always return uppercase ... ok
from_roman should fail with blank string ... ok </span><span>&#x2460;</span><samp>
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 13 tests in 2.834s
OK</span> <span>&#x2461;</span></pre><div class=calloutlist>
<ol>
<li>The blank string test case now passes, so the bug is fixed.
<li>All the other test cases still pass, which means that this bug fix didn't break anything else. Stop coding.
<p>Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs
will require complex test cases. In a testing-centric environment, it may <em>seem</em> like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case),
then fix the bug itself. Then if the test case doesn't pass right away, you need to figure out whether the fix was wrong,
or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code
tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can
easily re-run <em>all</em> the test cases along with your new one, you are much less likely to break old code when fixing new code. Today's unit test
is tomorrow's regression test.
<h2 id="roman.change">15.2. Handling changing requirements</h2>
<p>Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible
nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they
see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even
if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change.
<p>Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember <a href="#roman.divein" title="13.2. Diving in">the rule</a> that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception
to that rule by having 4 <code>M</code> characters in a row to represent <code>4000</code>. If you make this change, you'll be able to expand the range of convertible numbers from <code>1..3999</code> to <code>1..4999</code>. But first, you need to make some changes to the test cases.
<div class=example><h3>Example 15.6. Modifying test cases for new requirements (<code>romantest71.py</code>)</h3>
<p>This file is available in <code>py/roman/stage7/</code> in the examples directory.
<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
import roman71
import unittest
class KnownValues(unittest.TestCase):
knownValues = ( (1, 'I'),
(2, 'II'),
(3, 'III'),
(4, 'IV'),
(5, 'V'),
(6, 'VI'),
(7, 'VII'),
(8, 'VIII'),
(9, 'IX'),
(10, 'X'),
(50, 'L'),
(100, 'C'),
(500, 'D'),
(1000, 'M'),
(31, 'XXXI'),
(148, 'CXLVIII'),
(294, 'CCXCIV'),
(312, 'CCCXII'),
(421, 'CDXXI'),
(528, 'DXXVIII'),
(621, 'DCXXI'),
(782, 'DCCLXXXII'),
(870, 'DCCCLXX'),
(941, 'CMXLI'),
(1043, 'MXLIII'),
(1110, 'MCX'),
(1226, 'MCCXXVI'),
(1301, 'MCCCI'),
(1485, 'MCDLXXXV'),
(1509, 'MDIX'),
(1607, 'MDCVII'),
(1754, 'MDCCLIV'),
(1832, 'MDCCCXXXII'),
(1993, 'MCMXCIII'),
(2074, 'MMLXXIV'),
(2152, 'MMCLII'),
(2212, 'MMCCXII'),
(2343, 'MMCCCXLIII'),
(2499, 'MMCDXCIX'),
(2574, 'MMDLXXIV'),
(2646, 'MMDCXLVI'),
(2723, 'MMDCCXXIII'),
(2892, 'MMDCCCXCII'),
(2975, 'MMCMLXXV'),
(3051, 'MMMLI'),
(3185, 'MMMCLXXXV'),
(3250, 'MMMCCL'),
(3313, 'MMMCCCXIII'),
(3408, 'MMMCDVIII'),
(3501, 'MMMDI'),
(3610, 'MMMDCX'),
(3743, 'MMMDCCXLIII'),
(3844, 'MMMDCCCXLIV'),
(3888, 'MMMDCCCLXXXVIII'),
(3940, 'MMMCMXL'),
(3999, 'MMMCMXCIX'),
(4000, 'MMMM'), <span>&#x2460;</span>
(4500, 'MMMMD'),
(4888, 'MMMMDCCCLXXXVIII'),
(4999, 'MMMMCMXCIX'))
def testToRomanKnownValues(self):
"""to_roman should give known result with known input"""
for integer, numeral in self.knownValues:
result = roman71.to_roman(integer)
self.assertEqual(numeral, result)
def testFromRomanKnownValues(self):
"""from_roman should give known result with known input"""
for integer, numeral in self.knownValues:
result = roman71.from_roman(numeral)
self.assertEqual(integer, result)
class ToRomanBadInput(unittest.TestCase):
def testTooLarge(self):
"""to_roman should fail with large input"""
self.assertRaises(roman71.OutOfRangeError, roman71.to_roman, 5000) <span>&#x2461;</span>
def testZero(self):
"""to_roman should fail with 0 input"""
self.assertRaises(roman71.OutOfRangeError, roman71.to_roman, 0)
def testNegative(self):
"""to_roman should fail with negative input"""
self.assertRaises(roman71.OutOfRangeError, roman71.to_roman, -1)
def testNonInteger(self):
"""to_roman should fail with non-integer input"""
self.assertRaises(roman71.NotIntegerError, roman71.to_roman, 0.5)
class FromRomanBadInput(unittest.TestCase):
def testTooManyRepeatedNumerals(self):
"""from_roman should fail with too many repeated numerals"""
for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): <span>&#x2462;</span>
self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, s)
def testRepeatedPairs(self):
"""from_roman should fail with repeated pairs of numerals"""
for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):
self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, s)
def testMalformedAntecedent(self):
"""from_roman should fail with malformed antecedents"""
for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, s)
def testBlank(self):
"""from_roman should fail with blank string"""
self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, "")
class SanityCheck(unittest.TestCase):
def testSanity(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 5000):<span>&#x2463;</span>
numeral = roman71.to_roman(integer)
result = roman71.from_roman(numeral)
self.assertEqual(integer, result)
class CaseCheck(unittest.TestCase):
def testToRomanCase(self):
"""to_roman should always return uppercase"""
for integer in range(1, 5000):
numeral = roman71.to_roman(integer)
self.assertEqual(numeral, numeral.upper())
def testFromRomanCase(self):
"""from_roman should only accept uppercase input"""
for integer in range(1, 5000):
numeral = roman71.to_roman(integer)
roman71.from_roman(numeral.upper())
self.assertRaises(roman71.InvalidRomanNumeralError,
roman71.from_roman, numeral.lower())
if __name__ == "__main__":
unittest.main()
</pre><div class=calloutlist>
<ol>
<li>The existing known values don't change (they're all still reasonable values to test), but you need to add a few more in the
<code>4000</code> range. Here I've included <code>4000</code> (the shortest), <code>4500</code> (the second shortest), <code>4888</code> (the longest), and <code>4999</code> (the largest).
<li>The definition of &#8220;large input&#8221; has changed. This test used to call <code>to_roman()</code> with <code>4000</code> and expect an error; now that <code>4000-4999</code> are good values, you need to bump this up to <code>5000</code>.
<li>The definition of &#8220;too many repeated numerals&#8221; has also changed. This test used to call <code>from_roman()</code> with <code>'MMMM'</code> and expect an error; now that <code>MMMM</code> is considered a valid Roman numeral, you need to bump this up to <code>'MMMMM'</code>.
<li>The sanity check and case checks loop through every number in the range, from <code>1</code> to <code>3999</code>. Since the range has now expanded, these <code>for</code> loops need to be updated as well to go up to <code>4999</code>.
<p>Now your test cases are up to date with the new requirements, but your code is not, so you expect several of the test cases
to fail.
<div class=example><h3>Example 15.7. Output of <code>romantest71.py</code> against <code>roman71.py</code></h3><pre class=screen><samp>
from_roman should only accept uppercase input ... ERROR </span><span>&#x2460;</span><samp>
to_roman should always return uppercase ... ERROR
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ERROR </span><span>&#x2461;</span><samp>
to_roman should give known result with known input ... ERROR </span><span>&#x2462;</span><samp>
from_roman(to_roman(n))==n for all n ... ERROR</span><span>&#x2463;</span><samp>
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
</span></pre><div class=calloutlist>
<ol>
<li>Our case checks now fail because they loop from <code>1</code> to <code>4999</code>, but <code>to_roman()</code> only accepts numbers from <code>1</code> to <code>3999</code>, so it will fail as soon the test case hits <code>4000</code>.
<li>The <code>from_roman()</code> known values test will fail as soon as it hits <code>'MMMM'</code>, because <code>from_roman()</code> still thinks this is an invalid Roman numeral.
<li>The <code>to_roman()</code> known values test will fail as soon as it hits <code>4000</code>, because <code>to_roman()</code> still thinks this is out of range.
<li>The sanity check will also fail as soon as it hits <code>4000</code>, because <code>to_roman()</code> still thinks this is out of range.
<pre class=screen><samp>
======================================================================
ERROR: from_roman should only accept uppercase input
----------------------------------------------------------------------
</span><samp class=traceback>Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 161, in testFromRomanCase
numeral = roman71.to_roman(integer)
File "roman71.py", line 28, in to_roman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)</span><samp>
======================================================================
ERROR: to_roman should always return uppercase
----------------------------------------------------------------------
</span><samp class=traceback>Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 155, in testToRomanCase
numeral = roman71.to_roman(integer)
File "roman71.py", line 28, in to_roman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)</span><samp>
======================================================================
ERROR: from_roman should give known result with known input
----------------------------------------------------------------------
</span><samp class=traceback>Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 102, in testFromRomanKnownValues
result = roman71.from_roman(numeral)
File "roman71.py", line 47, in from_roman
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
InvalidRomanNumeralError: Invalid Roman numeral: MMMM</span><samp>
======================================================================
ERROR: to_roman should give known result with known input
----------------------------------------------------------------------
</span><samp class=traceback>Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 96, in testToRomanKnownValues
result = roman71.to_roman(integer)
File "roman71.py", line 28, in to_roman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)</span><samp>
======================================================================
ERROR: from_roman(to_roman(n))==n for all n
----------------------------------------------------------------------
</span><samp class=traceback>Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 147, in testSanity
numeral = roman71.to_roman(integer)
File "roman71.py", line 28, in to_roman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)</span><samp>
----------------------------------------------------------------------
Ran 13 tests in 2.213s
FAILED (errors=5)</span></pre><p>Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line
with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being
tested is never &#8220;ahead&#8221; of the test cases. While it's behind, you still have some work to do, and as soon as it catches up to the test cases, you
stop coding.)
<div class=example><h3>Example 15.8. Coding the new requirements (<code>roman72.py</code>)</h3>
<p>This file is available in <code>py/roman/stage7/</code> in the examples directory.
<pre><code>
"""Convert to and from Roman numerals"""
import re
#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass
#Define digit mapping
romanNumeralMap = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 &lt; n &lt; 5000): <span>&#x2460;</span>
raise OutOfRangeError, "number out of range (must be 1..4999)"
if int(n) &lt;> n:
raise NotIntegerError, "non-integers can not be converted"
result = ""
for numeral, integer in romanNumeralMap:
while n >= integer:
result += numeral
n -= integer
return result
#Define pattern to detect valid Roman numerals
romanNumeralPattern = '^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' <span>&#x2461;</span>
def from_roman(s):
"""convert Roman numeral to integer"""
if not s:
raise InvalidRomanNumeralError, 'Input can not be blank'
if not re.search(romanNumeralPattern, s):
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
result = 0
index = 0
for numeral, integer in romanNumeralMap:
while s[index:index+len(numeral)] == numeral:
result += integer
index += len(numeral)
return result
</pre><div class=calloutlist>
<ol>
<li><code>to_roman()</code> only needs one small change, in the range check. Where you used to check <code>0 &lt; n &lt; 4000</code>, you now check <code>0 &lt; n &lt; 5000</code>. And you change the error message that you <code>raise</code> to reflect the new acceptable range (<code>1..4999</code> instead of <code>1..3999</code>). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds <code>'M'</code> for each thousand that it finds; given <code>4000</code>, it will spit out <code>'MMMM'</code>. The only reason it didn't do this before is that you explicitly stopped it with the range check.)
<li>You don't need to make any changes to <code>from_roman()</code> at all. The only change is to <var>romanNumeralPattern</var>; if you look closely, you'll notice that you added another optional <code>M</code> in the first section of the regular expression. This will allow up to 4 <code>M</code> characters instead of 3, meaning you will allow the Roman numeral equivalents of <code>4999</code> instead of <code>3999</code>. The actual <code>from_roman()</code> function is completely general; it just looks for repeated Roman numeral characters and adds them up, without caring how
many times they repeat. The only reason it didn't handle <code>'MMMM'</code> before is that you explicitly stopped it with the regular expression pattern matching.
<p>You may be skeptical that these two small changes are all that you need. Hey, don't take my word for it; see for yourself:
<div class=example><h3 id="roman.roman72.output">Example 15.9. Output of <code>romantest72.py</code> against <code>roman72.py</code></h3><pre class=screen><samp>from_roman should only accept uppercase input ... ok
to_roman should always return uppercase ... ok
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 13 tests in 3.685s
OK</span> <span>&#x2460;</span></pre><div class=calloutlist>
<ol>
<li>All the test cases pass. Stop coding.
<p>Comprehensive unit testing means never having to rely on a programmer who says &#8220;Trust me.&#8221;
<h2 id="roman.refactoring">15.3. Refactoring</h2>
<p>The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even
the feeling you get when someone else blames you for breaking their code and you can actually <em>prove</em> that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
<p>Refactoring is the process of taking working code and making it work better. Usually, &#8220;better&#8221; means &#8220;faster&#8221;, although it can also mean &#8220;using less memory&#8221;, or &#8220;using less disk space&#8221;, or simply &#8220;more elegantly&#8221;. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any
program.
<p>Here, &#8220;better&#8221; means &#8220;faster&#8221;. Specifically, the <code>from_roman()</code> function is slower than it needs to be, because of that big nasty regular expression that you use to validate Roman numerals.
It's probably not worth trying to do away with the regular expression altogether (it would be difficult, and it might not
end up any faster), but you can speed up the function by precompiling the regular expression.
<div class=example><h3>Example 15.10. Compiling regular expressions</h3><pre class=screen>
<samp class=p>>>> </samp><kbd>import re</kbd>
<samp class=p>>>> </samp><kbd>pattern = '^M?M?M?$'</kbd>
<samp class=p>>>> </samp><kbd>re.search(pattern, 'M')</kbd> <span>&#x2460;</span>
&lt;SRE_Match object at 01090490>
<samp class=p>>>> </samp><kbd>compiledPattern = re.compile(pattern)</kbd> <span>&#x2461;</span>
<samp class=p>>>> </samp><kbd>compiledPattern</kbd>
&lt;SRE_Pattern object at 00F06E28>
<samp class=p>>>> </samp><kbd>dir(compiledPattern)</kbd><span>&#x2462;</span>
['findall', 'match', 'scanner', 'search', 'split', 'sub', 'subn']
<samp class=p>>>> </samp><kbd>compiledPattern.search('M')</kbd> <span>&#x2463;</span>
&lt;SRE_Match object at 01104928></pre><div class=calloutlist>
<ol>
<li>This is the syntax you've seen before: <code>re.search</code> takes a regular expression as a string (<var>pattern</var>) and a string to match against it (<code>'M'</code>). If the pattern matches, the function returns a match object which can be queried to find out exactly what matched and
how.
<li>This is the new syntax: <code>re.compile</code> takes a regular expression as a string and returns a pattern object. Note there is no string to match here. Compiling a
regular expression has nothing to do with matching it against any specific strings (like <code>'M'</code>); it only involves the regular expression itself.
<li>The compiled pattern object returned from <code>re.compile</code> has several useful-looking functions, including several (like <code>search</code> and <code>sub</code>) that are available directly in the <code>re</code> module.
<li>Calling the compiled pattern object's <code>search</code> function with the string <code>'M'</code> accomplishes the same thing as calling <code>re.search</code> with both the regular expression and the string <code>'M'</code>. Only much, much faster. (In fact, the <code>re.search</code> function simply compiles the regular expression and calls the resulting pattern object's <code>search</code> method for you.)
<table class=note border="0" summary="">
<td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"><td colspan="2" align="left" valign="top" width="99%">Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call
the methods on the pattern object directly.
<div class=example><h3>Example 15.11. Compiled regular expressions in <code>roman81.py</code></h3>
<p>This file is available in <code>py/roman/stage8/</code> in the examples directory.
<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
# to_roman and rest of module omitted for clarity
romanNumeralPattern = \
re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$') <span>&#x2460;</span>
def from_roman(s):
"""convert Roman numeral to integer"""
if not s:
raise InvalidRomanNumeralError, 'Input can not be blank'
if not romanNumeralPattern.search(s):<span>&#x2461;</span>
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
result = 0
index = 0
for numeral, integer in romanNumeralMap:
while s[index:index+len(numeral)] == numeral:
result += integer
index += len(numeral)
return result
</pre><div class=calloutlist>
<ol>
<li>This looks very similar, but in fact a lot has changed. <var>romanNumeralPattern</var> is no longer a string; it is a pattern object which was returned from <code>re.compile</code>.
<li>That means that you can call methods on <var>romanNumeralPattern</var> directly. This will be much, much faster than calling <code>re.search</code> every time. The regular expression is compiled once and stored in <var>romanNumeralPattern</var> when the module is first imported; then, every time you call <code>from_roman()</code>, you can immediately match the input string against the regular expression, without any intermediate steps occurring under
the covers.
<p>So how much faster is it to compile regular expressions? See for yourself:
<div class=example><h3 id="roman.stage8.1.output">Example 15.12. Output of <code>romantest81.py</code> against <code>roman81.py</code></h3><pre class=screen>............. <span>&#x2460;</span><samp>
----------------------------------------------------------------------
Ran 13 tests in 3.385s </span><span>&#x2461;</span><samp>
OK</span> <span>&#x2462;</span></pre><div class=calloutlist>
<ol>
<li>Just a note in passing here: this time, I ran the unit test <em>without</em> the <code>-v</code> option, so instead of the full <code>docstring</code> for each test, you only get a dot for each test that passes. (If a test failed, you'd get an <code>F</code>, and if it had an error, you'd get an <code>E</code>. You'd still get complete tracebacks for each failure and error, so you could track down any problems.)
<li>You ran <code>13</code> tests in <code>3.385</code> seconds, compared to <a href="#roman.roman72.output" title="Example 15.9. Output of romantest72.py against roman72.py"><code>3.685</code> seconds</a> without precompiling the regular expressions. That's an <code>8%</code> improvement overall, and remember that most of the time spent during the unit test is spent doing other things. (Separately,
I time-tested the regular expressions by themselves, apart from the rest of the unit tests, and found that compiling this
regular expression speeds up the <code>search</code> by an average of <code>54%</code>.) Not bad for such a simple fix.
<li>Oh, and in case you were wondering, precompiling the regular expression didn't break anything, and you just proved it.
<p>There is one other performance optimization that I want to try. Given the complexity of regular expression syntax, it should
come as no surprise that there is frequently more than one way to write the same expression. After some discussion about
this module on <a href="http://groups.google.com/groups?group=comp.lang.python">comp.lang.python</a>, someone suggested that I try using the <code>{<var>m</var>,<var>n</var>}</code> syntax for the optional repeated characters.
<div class=example><h3>Example 15.13. <code>roman82.py</code></h3>
<p>This file is available in <code>py/roman/stage8/</code> in the examples directory.
<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
# rest of program omitted for clarity
#old version
#romanNumeralPattern = \
# re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$')
#new version
romanNumeralPattern = \
re.compile('^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$') <span>&#x2460;</span>
</pre><div class=calloutlist>
<ol>
<li>You have replaced <code>M?M?M?M?</code> with <code>M{0,4}</code>. Both mean the same thing: &#8220;match 0 to 4 <code>M</code> characters&#8221;. Similarly, <code>C?C?C?</code> became <code>C{0,3}</code> (&#8220;match 0 to 3 <code>C</code> characters&#8221;) and so forth for <code>X</code> and <code>I</code>.
<p>This form of the regular expression is a little shorter (though not any more readable). The big question is, is it any faster?
<div class=example><h3>Example 15.14. Output of <code>romantest82.py</code> against <code>roman82.py</code></h3><pre class=screen><samp>.............
----------------------------------------------------------------------
Ran 13 tests in 3.315s </span><span>&#x2460;</span><samp>
OK</span> <span>&#x2461;</span></pre><div class=calloutlist>
<ol>
<li>Overall, the unit tests run 2% faster with this form of regular expression. That doesn't sound exciting, but remember that
the <code>search</code> function is a small part of the overall unit test; most of the time is spent doing other things. (Separately, I time-tested
just the regular expressions, and found that the <code>search</code> function is <code>11%</code> faster with this syntax.) By precompiling the regular expression and rewriting part of it to use this new syntax, you've
improved the regular expression performance by over <code>60%</code>, and improved the overall performance of the entire unit test by over <code>10%</code>.
<li>More important than any performance boost is the fact that the module still works perfectly. This is the freedom I was talking
about earlier: the freedom to tweak, change, or rewrite any piece of it and verify that you haven't messed anything up in
the process. This is not a license to endlessly tweak your code just for the sake of tweaking it; you had a very specific
objective (&#8220;make <code>from_roman()</code> faster&#8221;), and you were able to accomplish that objective without any lingering doubts about whether you introduced new bugs in the
process.
<p>One other tweak I would like to make, and then I promise I'll stop refactoring and put this module to bed. As you've seen
repeatedly, regular expressions can get pretty hairy and unreadable pretty quickly. I wouldn't like to come back to this
module in six months and try to maintain it. Sure, the test cases pass, so I know that it works, but if I can't figure out
<em>how</em> it works, it's still going to be difficult to add new features, fix new bugs, or otherwise maintain it. As you saw in <a href="#re.verbose" title="7.5. Verbose Regular Expressions">Section 7.5, &#8220;Verbose Regular Expressions&#8221;</a>, Python provides a way to document your logic line-by-line.
<div class=example><h3>Example 15.15. <code>roman83.py</code></h3>
<p>This file is available in <code>py/roman/stage8/</code> in the examples directory.
<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
# rest of program omitted for clarity
#old version
#romanNumeralPattern = \
# re.compile('^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$')
#new version
romanNumeralPattern = re.compile('''
^ # beginning of string
M{0,4} # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
''', re.VERBOSE) <span>&#x2460;</span>
</pre><div class=calloutlist>
<ol>
<li>The <code>re.compile</code> function can take an optional second argument, which is a set of one or more flags that control various options about the
compiled regular expression. Here you're specifying the <code>re.VERBOSE</code> flag, which tells Python that there are in-line comments within the regular expression itself. The comments and all the whitespace around them are
<em>not</em> considered part of the regular expression; the <code>re.compile</code> function simply strips them all out when it compiles the expression. This new, &#8220;verbose&#8221; version is identical to the old version, but it is infinitely more readable.
<div class=example><h3>Example 15.16. Output of <code>romantest83.py</code> against <code>roman83.py</code></h3><pre class=screen><samp>.............
----------------------------------------------------------------------
Ran 13 tests in 3.315s </span><span>&#x2460;</span><samp>
OK</span> <span>&#x2461;</span></pre><div class=calloutlist>
<ol>
<li>This new, &#8220;verbose&#8221; version runs at exactly the same speed as the old version. In fact, the compiled pattern objects are the same, since the
<code>re.compile</code> function strips out all the stuff you added.
<li>This new, &#8220;verbose&#8221; version passes all the same tests as the old version. Nothing has changed, except that the programmer who comes back to
this module in six months stands a fighting chance of understanding how the function works.
<h2 id="roman.postscript">15.4. Postscript</h2>
<p>A clever reader read the <a href="#roman.refactoring" title="15.3. Refactoring">previous section</a> and took it to the next level. The biggest headache (and performance drain) in the program as it is currently written is
the regular expression, which is required because you have no other way of breaking down a Roman numeral. But there's only
5000 of them; why don't you just build a lookup table once, then simply read that? This idea gets even better when you realize
that you don't need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals,
you can build the reverse lookup table to convert Roman numerals to integers.
<p>And best of all, he already had a complete set of unit tests. He changed over half the code in the module, but the unit tests
stayed the same, so he could prove that his code worked just as well as the original.
<div class=example><h3>Example 15.17. <code>roman9.py</code></h3>
<p>This file is available in <code>py/roman/stage9/</code> in the examples directory.
<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass
#Roman numerals must be less than 5000
MAX_ROMAN_NUMERAL = 4999
#Define digit mapping
romanNumeralMap = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
#Create tables for fast conversion of roman numerals.
#See fillLookupTables() below.
to_romanTable = [ None ] # Skip an index since Roman numerals have no zero
from_romanTable = {}
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 &lt; n &lt;= MAX_ROMAN_NUMERAL):
raise OutOfRangeError, "number out of range (must be 1..%s)" % MAX_ROMAN_NUMERAL
if int(n) &lt;> n:
raise NotIntegerError, "non-integers can not be converted"
return to_romanTable[n]
def from_roman(s):
"""convert Roman numeral to integer"""
if not s:
raise InvalidRomanNumeralError, "Input can not be blank"
if not from_romanTable.has_key(s):
raise InvalidRomanNumeralError, "Invalid Roman numeral: %s" % s
return from_romanTable[s]
def to_romanDynamic(n):
"""convert integer to Roman numeral using dynamic programming"""
result = ""
for numeral, integer in romanNumeralMap:
if n >= integer:
result = numeral
n -= integer
break
if n > 0:
result += to_romanTable[n]
return result
def fillLookupTables():
"""compute all the possible roman numerals"""
#Save the values in two global tables to convert to and from integers.
for integer in range(1, MAX_ROMAN_NUMERAL + 1):
romanNumber = to_romanDynamic(integer)
to_romanTable.append(romanNumber)
from_romanTable[romanNumber] = integer
fillLookupTables()
</pre><p>So how fast is it?
<div class=example><h3>Example 15.18. Output of <code>romantest9.py</code> against <code>roman9.py</code></h3><pre class=screen>
<samp>
.............
----------------------------------------------------------------------
Ran 13 tests in 0.791s
OK
</span>
</pre><p>Remember, the best performance you ever got in the original version was 13 tests in 3.315 seconds. Of course, it's not entirely
a fair comparison, because this version will take longer to import (when it fills the lookup tables). But since import is
only done once, this is negligible in the long run.
<p>The moral of the story?
<div class=itemizedlist>
<ul>
<li>Simplicity is a virtue.
<li>Especially when regular expressions are involved.
<li>And unit tests can give you the confidence to do large-scale refactoring... even if you didn't write the original code.
</ul>
<h2 id="roman.summary">15.5. Summary</h2>
<p>Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility
in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver,
or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers
are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional
testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it
work, you'll wonder how you ever got along without it.
<p>This chapter covered a lot of ground, and much of it wasn't even Python-specific. There are unit testing frameworks for many languages, all of which require you to understand the same basic concepts:
<div class=highlights>
<div class=itemizedlist>
<ul>
<li>Designing test cases that are specific, automated, and independent
<li>Writing test cases <em>before</em> the code they are testing
<li>Writing tests that <a href="#roman.success" title="13.4. Testing for success">test good input</a> and check for proper results
<li>Writing tests that <a href="#roman.failure" title="13.5. Testing for failure">test bad input</a> and check for proper failures
<li>Writing and updating test cases to <a href="#roman.bugs" title="15.1. Handling bugs">illustrate bugs</a> or <a href="#roman.change" title="15.2. Handling changing requirements">reflect new requirements</a>
<li><a href="#roman.refactoring" title="15.3. Refactoring">Refactoring</a> mercilessly to improve performance, scalability, readability, maintainability, or whatever other -ility you're lacking
</ul>
<p>Additionally, you should be comfortable doing all of the following Python-specific things:
<div class=highlights>
<div class=itemizedlist>
<ul>
<li><a href="#roman.testtoromanknownvalues.example" title="Example 13.2. testToRomanKnownValues">Subclassing <code>unittest.TestCase</code></a> and writing methods for individual test cases
<li>Using <a href="#roman.testtoromanknownvalues.example" title="Example 13.2. testToRomanKnownValues"><code>assertEqual</code></a> to check that a function returns a known value
<li>Using <a href="#roman.tobadinput.example" title="Example 13.3. Testing bad input to to_roman"><code>assertRaises</code></a> to check that a function raises a known exception
<li>Calling <a href="#roman.stage1.output" title="Example 14.2. Output of romantest1.py against roman1.py"><code>unittest.main()</code></a> in your <code>if __name__</code> clause to run all your test cases at once
<li>Running unit tests in <a href="#roman.stage1.output" title="Example 14.2. Output of romantest1.py against roman1.py">verbose</a> or <a href="#roman.stage8.1.output" title="Example 15.12. Output of romantest81.py against roman81.py">regular</a> mode
</ul>
<div class=itemizedlist>
<h3>Further reading</h3>
<ul>
<li><a href="http://www.xprogramming.com/">XProgramming.com</a> has links to <a href="http://www.xprogramming.com/software.htm">download unit testing frameworks</a> for many different languages.
</ul>
<div class=chapter>
<h2 id="regression">Chapter 16. Functional Programming</h2>
<h2 id="regression.divein">16.1. Diving in</h2>
<p>In <a href="#roman" title="Chapter 13. Unit Testing">Chapter 13, <i>Unit Testing</i></a>, you learned about the philosophy of unit testing. In <a href="#roman1.5" title="Chapter 14. Test-First Programming">Chapter 14, <i>Test-First Programming</i></a>, you stepped through the implementation of basic unit tests in Python. In <a href="#roman2" title="Chapter 15. Refactoring">Chapter 15, <i>Refactoring</i></a>, you saw how unit testing makes large-scale refactoring easier. This chapter will build on those sample programs, but here
+3 -2
View File
@@ -65,7 +65,7 @@ abbr{font-variant:small-caps;text-transform:lowercase;letter-spacing:0.1em}
p,ul,ol{margin:1.75em 0;font-size:medium}
/* basics */
html{background:#fff;color:#333}
html{background:#fff;color:#222}
body{margin:1.75em 28px}
form div{float:right}
.c{text-align:center;margin:2.154em 0}
@@ -84,7 +84,8 @@ pre{white-space:pre-wrap;padding-left:2.154em;border-left:1px solid #ddd}
.b,ol,p,blockquote,h1,h2,h3{clear:left}
pre a,.w a{padding:0.4375em 0}
.w a{text-decoration:underline}
kbd{font-weight:bold}
kbd,mark{font-weight:bold}
mark{display:inline-block;width:100%;background:#ff8}
.p{color:#667}
/* overrides */
+87
View File
@@ -0,0 +1,87 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
class InvalidRomanNumeralError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
to_roman_table = [ None ]
from_roman_table = {}
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 5000):
raise OutOfRangeError("number out of range (must be 1..4999)")
if int(n) != n:
raise NotIntegerError("non-integers can not be converted")
return to_roman_table[n]
def from_roman(s):
"""convert Roman numeral to integer"""
if not isinstance(s, str):
raise InvalidRomanNumeralError("Input must be a string")
if not s:
raise InvalidRomanNumeralError("Input can not be blank")
if s not in from_roman_table:
raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
return from_roman_table[s]
def build_lookup_tables():
def to_roman(n):
result = ""
for numeral, integer in roman_numeral_map:
if n >= integer:
result = numeral
n -= integer
break
if n > 0:
result += to_roman_table[n]
return result
for integer in range(1, 5000):
roman_numeral = to_roman(integer)
to_roman_table.append(roman_numeral)
from_roman_table[roman_numeral] = integer
build_lookup_tables()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+2 -2
View File
@@ -52,8 +52,8 @@ def to_roman(n):
def from_roman(s):
"""convert Roman numeral to integer"""
if not s:
raise InvalidRomanNumeralError("Input can not be blank")
if not isinstance(s, str):
raise InvalidRomanNumeralError("Input must be a string")
if not roman_numeral_pattern.search(s):
raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
+91
View File
@@ -0,0 +1,91 @@
"""Convert to and from Roman numerals
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
import re
class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
class InvalidRomanNumeralError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
roman_numeral_pattern = re.compile("""
^ # beginning of string
M{0,4} # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
""", re.VERBOSE)
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 5000):
raise OutOfRangeError("number out of range (must be 0..4999)")
if not isinstance(n, int):
raise NotIntegerError("non-integers can not be converted")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
def from_roman(s):
"""convert Roman numeral to integer"""
if not isinstance(s, str):
raise InvalidRomanNumeralError("Input must be a string")
if not s:
raise InvalidRomanNumeralError("Input can not be blank")
if not roman_numeral_pattern.search(s):
raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
result = 0
index = 0
for numeral, integer in roman_numeral_map:
while s[index : index + len(numeral)] == numeral:
result += integer
index += len(numeral)
return result
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+159
View File
@@ -0,0 +1,159 @@
"""Unit test for roman1.py
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
import roman10
import unittest
class KnownValues(unittest.TestCase):
known_values = ( (1, 'I'),
(2, 'II'),
(3, 'III'),
(4, 'IV'),
(5, 'V'),
(6, 'VI'),
(7, 'VII'),
(8, 'VIII'),
(9, 'IX'),
(10, 'X'),
(50, 'L'),
(100, 'C'),
(500, 'D'),
(1000, 'M'),
(31, 'XXXI'),
(148, 'CXLVIII'),
(294, 'CCXCIV'),
(312, 'CCCXII'),
(421, 'CDXXI'),
(528, 'DXXVIII'),
(621, 'DCXXI'),
(782, 'DCCLXXXII'),
(870, 'DCCCLXX'),
(941, 'CMXLI'),
(1043, 'MXLIII'),
(1110, 'MCX'),
(1226, 'MCCXXVI'),
(1301, 'MCCCI'),
(1485, 'MCDLXXXV'),
(1509, 'MDIX'),
(1607, 'MDCVII'),
(1754, 'MDCCLIV'),
(1832, 'MDCCCXXXII'),
(1993, 'MCMXCIII'),
(2074, 'MMLXXIV'),
(2152, 'MMCLII'),
(2212, 'MMCCXII'),
(2343, 'MMCCCXLIII'),
(2499, 'MMCDXCIX'),
(2574, 'MMDLXXIV'),
(2646, 'MMDCXLVI'),
(2723, 'MMDCCXXIII'),
(2892, 'MMDCCCXCII'),
(2975, 'MMCMLXXV'),
(3051, 'MMMLI'),
(3185, 'MMMCLXXXV'),
(3250, 'MMMCCL'),
(3313, 'MMMCCCXIII'),
(3408, 'MMMCDVIII'),
(3501, 'MMMDI'),
(3610, 'MMMDCX'),
(3743, 'MMMDCCXLIII'),
(3844, 'MMMDCCCXLIV'),
(3888, 'MMMDCCCLXXXVIII'),
(3940, 'MMMCMXL'),
(3999, 'MMMCMXCIX'),
(4000, 'MMMM'),
(4500, 'MMMMD'),
(4888, 'MMMMDCCCLXXXVIII'),
(4999, 'MMMMCMXCIX'))
def test_to_roman_known_values(self):
"""to_roman should give known result with known input"""
for integer, numeral in self.known_values:
result = roman10.to_roman(integer)
self.assertEqual(numeral, result)
def test_from_roman_known_values(self):
"""from_roman should give known result with known input"""
for integer, numeral in self.known_values:
result = roman10.from_roman(numeral)
self.assertEqual(integer, result)
class ToRomanBadInput(unittest.TestCase):
def test_too_large(self):
"""to_roman should fail with large input"""
self.assertRaises(roman10.OutOfRangeError, roman10.to_roman, 5000)
def test_zero(self):
"""to_roman should fail with 0 input"""
self.assertRaises(roman10.OutOfRangeError, roman10.to_roman, 0)
def test_negative(self):
"""to_roman should fail with negative input"""
self.assertRaises(roman10.OutOfRangeError, roman10.to_roman, -1)
def test_non_integer(self):
"""to_roman should fail with non-integer input"""
self.assertRaises(roman10.NotIntegerError, roman10.to_roman, 0.5)
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
"""from_roman should fail with too many repeated numerals"""
for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, s)
def test_repeated_pairs(self):
"""from_roman should fail with repeated pairs of numerals"""
for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):
self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, s)
def test_malformed_antecedents(self):
"""from_roman should fail with malformed antecedents"""
for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, s)
def test_blank(self):
"""from_roman should fail with blank string"""
self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, "")
def test_non_string(self):
"""from_roman should fail with non-string input"""
self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, 1)
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 5000):
numeral = roman10.to_roman(integer)
result = roman10.from_roman(numeral)
self.assertEqual(integer, result)
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+2 -2
View File
@@ -95,8 +95,8 @@ class ToRomanBadInput(unittest.TestCase):
"""to_roman should fail with non-integer input"""
self.assertRaises(roman5.NotIntegerError, roman5.to_roman, 0.5)
class SanityCheck(unittest.TestCase):
def testSanity(self):
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 4000):
numeral = roman5.to_roman(integer)
+3 -3
View File
@@ -98,7 +98,7 @@ class ToRomanBadInput(unittest.TestCase):
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
"""from_roman should fail with too many repeated numerals"""
for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)
def test_repeated_pairs(self):
@@ -112,8 +112,8 @@ class FromRomanBadInput(unittest.TestCase):
'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)
class SanityCheck(unittest.TestCase):
def testSanity(self):
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 4000):
numeral = roman6.to_roman(integer)
+6 -6
View File
@@ -98,7 +98,7 @@ class ToRomanBadInput(unittest.TestCase):
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
"""from_roman should fail with too many repeated numerals"""
for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, s)
def test_repeated_pairs(self):
@@ -112,12 +112,12 @@ class FromRomanBadInput(unittest.TestCase):
'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, s)
def test_blank(self):
"""from_roman should fail with blank string"""
self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, "")
def test_non_string(self):
"""from_roman should fail with non-string input"""
self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, 1)
class SanityCheck(unittest.TestCase):
def testSanity(self):
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 4000):
numeral = roman7.to_roman(integer)
+3 -3
View File
@@ -98,7 +98,7 @@ class ToRomanBadInput(unittest.TestCase):
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
"""from_roman should fail with too many repeated numerals"""
for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s)
def test_repeated_pairs(self):
@@ -120,8 +120,8 @@ class FromRomanBadInput(unittest.TestCase):
"""from_roman should fail with non-string input"""
self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, 1)
class SanityCheck(unittest.TestCase):
def testSanity(self):
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 4000):
numeral = roman8.to_roman(integer)
+159
View File
@@ -0,0 +1,159 @@
"""Unit test for roman1.py
This program is part of "Dive Into Python 3", a free Python book for
experienced programmers. Visit http://diveintopython3.org/ for the
latest version.
"""
import roman9
import unittest
class KnownValues(unittest.TestCase):
known_values = ( (1, 'I'),
(2, 'II'),
(3, 'III'),
(4, 'IV'),
(5, 'V'),
(6, 'VI'),
(7, 'VII'),
(8, 'VIII'),
(9, 'IX'),
(10, 'X'),
(50, 'L'),
(100, 'C'),
(500, 'D'),
(1000, 'M'),
(31, 'XXXI'),
(148, 'CXLVIII'),
(294, 'CCXCIV'),
(312, 'CCCXII'),
(421, 'CDXXI'),
(528, 'DXXVIII'),
(621, 'DCXXI'),
(782, 'DCCLXXXII'),
(870, 'DCCCLXX'),
(941, 'CMXLI'),
(1043, 'MXLIII'),
(1110, 'MCX'),
(1226, 'MCCXXVI'),
(1301, 'MCCCI'),
(1485, 'MCDLXXXV'),
(1509, 'MDIX'),
(1607, 'MDCVII'),
(1754, 'MDCCLIV'),
(1832, 'MDCCCXXXII'),
(1993, 'MCMXCIII'),
(2074, 'MMLXXIV'),
(2152, 'MMCLII'),
(2212, 'MMCCXII'),
(2343, 'MMCCCXLIII'),
(2499, 'MMCDXCIX'),
(2574, 'MMDLXXIV'),
(2646, 'MMDCXLVI'),
(2723, 'MMDCCXXIII'),
(2892, 'MMDCCCXCII'),
(2975, 'MMCMLXXV'),
(3051, 'MMMLI'),
(3185, 'MMMCLXXXV'),
(3250, 'MMMCCL'),
(3313, 'MMMCCCXIII'),
(3408, 'MMMCDVIII'),
(3501, 'MMMDI'),
(3610, 'MMMDCX'),
(3743, 'MMMDCCXLIII'),
(3844, 'MMMDCCCXLIV'),
(3888, 'MMMDCCCLXXXVIII'),
(3940, 'MMMCMXL'),
(3999, 'MMMCMXCIX'),
(4000, 'MMMM'),
(4500, 'MMMMD'),
(4888, 'MMMMDCCCLXXXVIII'),
(4999, 'MMMMCMXCIX'))
def test_to_roman_known_values(self):
"""to_roman should give known result with known input"""
for integer, numeral in self.known_values:
result = roman9.to_roman(integer)
self.assertEqual(numeral, result)
def test_from_roman_known_values(self):
"""from_roman should give known result with known input"""
for integer, numeral in self.known_values:
result = roman9.from_roman(numeral)
self.assertEqual(integer, result)
class ToRomanBadInput(unittest.TestCase):
def test_too_large(self):
"""to_roman should fail with large input"""
self.assertRaises(roman9.OutOfRangeError, roman9.to_roman, 5000)
def test_zero(self):
"""to_roman should fail with 0 input"""
self.assertRaises(roman9.OutOfRangeError, roman9.to_roman, 0)
def test_negative(self):
"""to_roman should fail with negative input"""
self.assertRaises(roman9.OutOfRangeError, roman9.to_roman, -1)
def test_non_integer(self):
"""to_roman should fail with non-integer input"""
self.assertRaises(roman9.NotIntegerError, roman9.to_roman, 0.5)
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
"""from_roman should fail with too many repeated numerals"""
for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, s)
def test_repeated_pairs(self):
"""from_roman should fail with repeated pairs of numerals"""
for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):
self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, s)
def test_malformed_antecedents(self):
"""from_roman should fail with malformed antecedents"""
for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, s)
def test_blank(self):
"""from_roman should fail with blank string"""
self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, "")
def test_non_string(self):
"""from_roman should fail with non-string input"""
self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, 1)
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
for integer in range(1, 5000):
numeral = roman9.to_roman(integer)
result = roman9.from_roman(numeral)
self.assertEqual(integer, result)
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
+2 -2
View File
@@ -14,7 +14,7 @@ h1:before{content:""}
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8><input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here:&nbsp;&nbsp;<span title="Ce n'est pas un point">&bull;</span>
<p>You are here:&nbsp;&nbsp;<span title="Ce n'est pas un point" style="cursor:default">&bull;</span>
<h1>Dive Into Python 3</h1>
@@ -33,7 +33,7 @@ h1:before{content:""}
<li class=todo>Objects and object-orientation
<li><a href=unit-testing.html>Unit Testing</a>
<li class=todo>Test-first programming
<li class=todo>Refactoring your code
<li><a href=refactoring.html>Refactoring</a>
<li class=todo>Files
<li class=todo>HTML processing
<li class=todo>XML processing
+468
View File
@@ -0,0 +1,468 @@
<!DOCTYPE html>
<head>
<meta charset=utf-8>
<title>Refactoring - Dive into Python 3</title>
<link rel=stylesheet type=text/css href=dip3.css>
<style>
body{counter-reset:h1 10}
</style>
<link rel=stylesheet type=text/css media='only screen and (max-device-width: 480px)' href=mobile.css>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span>&#8227;</span> <a href=table-of-contents.html#refactoring>Dive Into Python 3</a> <span>&#8227;</span>
<h1>Refactoring</h1>
<blockquote class=q>
<p><span>&#x275D;</span> FIXME <span>&#x275E;</span><br>&mdash; FIXME
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
<p class=f>Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by &#8220;bug&#8221;? A bug is a test case you haven't written yet.
<pre class=screen><samp class=p>>>> </samp><kbd>import roman6</kbd>
<a><samp class=p>>>> </samp><kbd>roman6.from_roman("")</kbd> <span>&#x2460;</span></a>
<samp>0</samp></pre>
<ol>
<li>Remember in the [FIXME-xref] previous section when you kept seeing that an empty string would match the regular expression you were using to check for valid Roman numerals? Well, it turns out that this is still true for the final version of the regular expression. And that's a bug; you want an empty string to raise an <code>InvalidRomanNumeralError</code> exception just like any other sequence of characters that don't represent a valid Roman numeral.
</ol>
<p>After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug.
<pre><code>class FromRomanBadInput(unittest.TestCase):
.
.
.
def testBlank(self):
"""from_roman should fail with blank string"""
<a> self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, "") <span>&#x2460;</span></a></code></pre>
<ol>
<li>Pretty simple stuff here. Call <code>from_roman()</code> with an empty string and make sure it raises an <code>InvalidRomanNumeralError</code> exception. The hard part was finding the bug; now that you know about it, testing for it is the easy part.
</ol>
<p>Since your code has a bug, and you now have a test case that tests this bug, the test case will fail:
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest8.py -v</kbd>
<samp>from_roman should fail with blank string ... FAIL
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
======================================================================
FAIL: from_roman should fail with blank string
----------------------------------------------------------------------
Traceback (most recent call last):
File "romantest8.py", line 117, in test_blank
self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, "")
<mark>AssertionError: InvalidRomanNumeralError not raised by from_roman</mark>
----------------------------------------------------------------------
Ran 11 tests in 0.171s
FAILED (failures=1)</samp></pre>
<p><em>Now</em> you can fix the bug.
<pre><code>def from_roman(s):
"""convert Roman numeral to integer"""
<a> if not s: <span>&#x2460;</span></a>
raise InvalidRomanNumeralError, 'Input can not be blank'
if not re.search(romanNumeralPattern, s):
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
result = 0
index = 0
for numeral, integer in romanNumeralMap:
while s[index:index+len(numeral)] == numeral:
result += integer
index += len(numeral)
return result</code></pre>
<ol>
<li>Only two lines of code are required: an explicit check for an empty string, and a <code>raise</code> statement.
</ol>
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest8.py -v</kbd>
<a><samp>from_roman should fail with blank string ... ok</samp> <span>&#x2460;</span></a>
<samp>from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 11 tests in 0.156s
</samp>
<a><samp>OK</samp> <span>&#x2461;</span></a></pre>
<ol>
<li>The blank string test case now passes, so the bug is fixed.
<li>All the other test cases still pass, which means that this bug fix didn't break anything else. Stop coding.
</ol>
<p>Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may <em>seem</em> like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn't pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run <em>all</em> the test cases along with your new one, you are much less likely to break old code when fixing new code. Today's unit test is tomorrow's regression test.
<h2 id=changing-requirements>Handling Changing Requirements</h2>
<p>Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change.
<p>Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember [FIXME-xref] the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 <code>M</code> characters in a row to represent <code>4000</code>. If you make this change, you'll be able to expand the range of convertible numbers from <code>1..3999</code> to <code>1..4999</code>. But first, you need to make some changes to your test cases.
<p class=d>[<a href=examples/roman8.py>download <code>roman8.py</code></a>]
<pre><code>
class KnownValues(unittest.TestCase):
known_values = ( (1, 'I'),
.
.
.
(3999, 'MMMCMXCIX'),
<a> (4000, 'MMMM'), <span>&#x2460;</span></a>
(4500, 'MMMMD'),
(4888, 'MMMMDCCCLXXXVIII'),
(4999, 'MMMMCMXCIX') )
class ToRomanBadInput(unittest.TestCase):
def test_too_large(self):
"""to_roman should fail with large input"""
<a> self.assertRaises(roman8.OutOfRangeError, roman8.to_roman, 5000) <span>&#x2461;</span></a>
.
.
.
class FromRomanBadInput(unittest.TestCase):
def test_too_many_repeated_numerals(self):
"""from_roman should fail with too many repeated numerals"""
<a> for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): <span>&#x2462;</span></a>
self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s)
.
.
.
class RoundtripCheck(unittest.TestCase):
def test_roundtrip(self):
"""from_roman(to_roman(n))==n for all n"""
<a> for integer in range(1, 5000): <span>&#x2463;</span></a>
numeral = roman8.to_roman(integer)
result = roman8.from_roman(numeral)
self.assertEqual(integer, result)</code></pre>
<ol>
<li>The existing known values don't change (they're all still reasonable values to test), but you need to add a few more in the <code>4000</code> range. Here I've included <code>4000</code> (the shortest), <code>4500</code> (the second shortest), <code>4888</code> (the longest), and <code>4999</code> (the largest).
<li>The definition of &#8220;large input&#8221; has changed. This test used to call <code>to_roman()</code> with <code>4000</code> and expect an error; now that <code>4000-4999</code> are good values, you need to bump this up to <code>5000</code>.
<li>The definition of &#8220;too many repeated numerals&#8221; has also changed. This test used to call <code>from_roman()</code> with <code>'MMMM'</code> and expect an error; now that <code>MMMM</code> is considered a valid Roman numeral, you need to bump this up to <code>'MMMMM'</code>.
<li>The sanity check loops through every number in the range, from <code>1</code> to <code>3999</code>. Since the range has now expanded, this <code>for</code> loop need to be updated as well to go up to <code>4999</code>.
</ol>
<p>Now your test cases are up to date with the new requirements, but your code is not, so you expect several of the test cases to fail.
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest9.py -v</kbd>
<samp>from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
<a>from_roman should give known result with known input ... ERROR <span>&#x2460;</span></a>
<a>to_roman should give known result with known input ... ERROR <span>&#x2461;</span></a>
<a>from_roman(to_roman(n))==n for all n ... ERROR <span>&#x2462;</span></a>
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
======================================================================
ERROR: from_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
File "romantest9.py", line 82, in test_from_roman_known_values
result = roman9.from_roman(numeral)
File "C:\home\diveintopython3\examples\roman9.py", line 60, in from_roman
raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
<mark>roman9.InvalidRomanNumeralError: Invalid Roman numeral: MMMM</mark>
======================================================================
ERROR: to_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
File "romantest9.py", line 76, in test_to_roman_known_values
result = roman9.to_roman(integer)
File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman
raise OutOfRangeError("number out of range (must be 0..3999)")
<mark>roman9.OutOfRangeError: number out of range (must be 0..3999)</mark>
======================================================================
ERROR: from_roman(to_roman(n))==n for all n
----------------------------------------------------------------------
Traceback (most recent call last):
File "romantest9.py", line 131, in testSanity
numeral = roman9.to_roman(integer)
File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman
raise OutOfRangeError("number out of range (must be 0..3999)")
<mark>roman9.OutOfRangeError: number out of range (must be 0..3999)</mark>
----------------------------------------------------------------------
Ran 12 tests in 0.171s
FAILED (errors=3)</samp></pre>
<ol>
<li>The <code>from_roman()</code> known values test will fail as soon as it hits <code>'MMMM'</code>, because <code>from_roman()</code> still thinks this is an invalid Roman numeral.
<li>The <code>to_roman()</code> known values test will fail as soon as it hits <code>4000</code>, because <code>to_roman()</code> still thinks this is out of range.
<li>The roundtrip check will also fail as soon as it hits <code>4000</code>, because <code>to_roman()</code> still thinks this is out of range.
</ol>
<p>Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being tested is never &#8220;ahead&#8221; of the test cases. While it's behind, you still have some work to do, and as soon as it catches up to the test cases, you stop coding.)
<p class=d>[<a href=examples/roman9.py>download <code>roman9.py</code></a>]
<pre><code>
roman_numeral_pattern = re.compile("""
^ # beginning of string
<a> M{0,4} # thousands - 0 to 4 M's <span>&#x2460;</span></a>
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
""", re.VERBOSE)
def to_roman(n):
"""convert integer to Roman numeral"""
<a> if not (0 < n < 5000): <span>&#x2461;</span></a>
raise OutOfRangeError("number out of range (must be 0..4999)")
if not isinstance(n, int):
raise NotIntegerError("non-integers can not be converted")
result = ""
for numeral, integer in roman_numeral_map:
while n >= integer:
result += numeral
n -= integer
return result
def from_roman(s):
.
.
.</code></pre>
<ol>
<li>You don't need to make any changes to the <code>from_roman()</code> function at all. The only change is to <var>roman_numeral_pattern</var>. If you look closely, you'll notice that I changed the maximum number of optional <code>M</code> characters from <code>3</code> to <code>4</code> in the first section of the regular expression. This will allow the Roman numeral equivalents of <code>4999</code> instead of <code>3999</code>. The actual <code>from_roman()</code> function is completely generic; it just looks for repeated Roman numeral characters and adds them up, without caring how many times they repeat. The only reason it didn't handle <code>'MMMM'</code> before is that you explicitly stopped it with the regular expression pattern matching.
<li>The <code>to_roman()</code> function only needs one small change, in the range check. Where you used to check <code>0 &lt; n &lt; 4000</code>, you now check <code>0 &lt; n &lt; 5000</code>. And you change the error message that you <code>raise</code> to reflect the new acceptable range (<code>1..4999</code> instead of <code>1..3999</code>). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds <code>'M'</code> for each thousand that it finds; given <code>4000</code>, it will spit out <code>'MMMM'</code>. The only reason it didn't do this before is that you explicitly stopped it with the range check.)
</ol>
<p>You may be skeptical that these two small changes are all that you need. Hey, don't take my word for it; see for yourself.
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest9.py -v</kbd>
<samp>from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 12 tests in 0.203s
<a>OK <span>&#x2460;</span></a></samp></pre>
<ol>
<li>All the test cases pass. Stop coding.
</ol>
<p>Comprehensive unit testing means never having to rely on a programmer who says &#8220;Trust me.&#8221;
<h2 id=refactoring>Refactoring</h2>
<p>The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually <em>prove</em> that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
<p>Refactoring is the process of taking working code and making it work better. Usually, &#8220;better&#8221; means &#8220;faster&#8221;, although it can also mean &#8220;using less memory&#8221;, or &#8220;using less disk space&#8221;, or simply &#8220;more elegantly&#8221;. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any program.
<p>Here, &#8220;better&#8221; means both &#8220;faster&#8221; and &#8220;easier to maintain.&#8221; Specifically, the <code>from_roman()</code> function is slower and more complex than I'd like, because of that big nasty regular expression that you use to validate Roman numerals. Now, you might think, "Sure, the regular expression is big and hairy, but how else am I supposed to validate that an arbitrary string is a valid a Roman numeral?"
<p>Answer: there's only 5000 of them; why don't you just build a lookup table? This idea gets even better when you realize that <em>you don't need to use regular expressions at all</em>. As you build the lookup table for converting integers to Roman numerals, you can build the reverse lookup table to convert Roman numerals to integers. By the time you need to check whether an arbitrary string is a valid Roman numeral, you will have collected all the valid Roman numerals. &#8220;Validating&#8221; is reduced to a single dictionary lookup.
<p>And best of all, you already have a complete set of unit tests. You can change over half the code in the module, but the unit tests will stay the same. That means you can prove &mdash; to yourself and to others &mdash; that the new code works just as well as the original.
<p class=d>[<a href=examples/roman10.py>download <code>roman10.py</code></a>]
<pre><code>class OutOfRangeError(ValueError): pass
class NotIntegerError(ValueError): pass
class InvalidRomanNumeralError(ValueError): pass
roman_numeral_map = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
to_roman_table = [ None ]
from_roman_table = {}
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 5000):
raise OutOfRangeError("number out of range (must be 1..4999)")
if int(n) != n:
raise NotIntegerError("non-integers can not be converted")
return to_roman_table[n]
def from_roman(s):
"""convert Roman numeral to integer"""
if not isinstance(s, str):
raise InvalidRomanNumeralError("Input must be a string")
if not s:
raise InvalidRomanNumeralError("Input can not be blank")
if s not in from_roman_table:
raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
return from_roman_table[s]
def build_lookup_tables():
def to_roman(n):
result = ""
for numeral, integer in roman_numeral_map:
if n >= integer:
result = numeral
n -= integer
break
if n > 0:
result += to_roman_table[n]
return result
for integer in range(1, 5000):
roman_numeral = to_roman(integer)
to_roman_table.append(roman_numeral)
from_roman_table[roman_numeral] = integer
build_lookup_tables()</code></pre>
<p>Let's break that down into digestable pieces. Arguably, the most important line is the last one:
<pre><code>build_lookup_tables()</code></pre>
<p>You will note that is a function call, but there's no <code>if</code> statement around it. This is not an <code>if __name__ == '__main__'</code> block; it gets called <em>when the module is imported</em>. (It is important to understand that modules are only imported once, then cached. If you import an already-imported module, it does nothing. So this code will only get called the first time you import this module.)
<p>So what does the <code>build_lookup_tables()</code> function do? I'm glad you asked.
<pre><code><a>to_roman_table = [ None ]
from_roman_table = {}
.
.
.
def build_lookup_tables():
<a> def to_roman(n): <span>&#x2460;</span></a>
result = ""
for numeral, integer in roman_numeral_map:
if n >= integer:
result = numeral
n -= integer
break
if n > 0:
result += to_roman_table[n]
return result
for integer in range(1, 5000):
<a> roman_numeral = to_roman(integer) <span>&#x2461;</span></a>
<a> to_roman_table.append(roman_numeral) <span>&#x2462;</span></a>
from_roman_table[roman_numeral] = integer</code></pre>
<ol>
<li>This is a clever bit of programming&hellip; perhaps too clever. The <code>to_roman()</code> function is defined above; it looks up values in the lookup table and returns them. But the <code>build_lookup_tables()</code> function redefines the <code>to_roman()</code> function to actually do work (like the previous examples did, before you added a lookup table). Within the <code>build_lookup_tables()</code> function, calling <code>to_roman()</code> will call this redefined version. Once the <code>build_lookup_tables()</code> function exits, the redefined version disappears &mdash; it is only defined in the local scope of the <code>build_lookup_tables()</code> function.
<li>This line of code will call the redefined <code>to_roman()</code> function, which actually calculates the Roman numeral.
<li>Once you have the result (from the redefined <code>to_roman()</code> function), you add the integer and its Roman numeral equivalent to both lookup tables.
</ol>
<p>Once the lookup tables are built, the rest of the code is both easy and fast.
<pre><code>def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 5000):
raise OutOfRangeError("number out of range (must be 1..4999)")
if int(n) != n:
raise NotIntegerError("non-integers can not be converted")
<a> return to_roman_table[n] <span>&#x2460;</span></a>
def from_roman(s):
"""convert Roman numeral to integer"""
if not isinstance(s, str):
raise InvalidRomanNumeralError("Input must be a string")
if not s:
raise InvalidRomanNumeralError("Input can not be blank")
if s not in from_roman_table:
raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
<a> return from_roman_table[s] <span>&#x2461;</span></a></code></pre>
<ol>
<li>After doing the same bounds checking as before, the <code>to_roman()</code> function simply finds the appropriate value in the lookup table and returns it.
<li>Similarly, the <code>from_roman()</code> function is reduced to some bounds checking and one line of code. No more regular expressions. No more looping. O(1) conversion to and from Roman numerals.
</ol>
<p>But does it work? Why yes, yes it does. And I can prove it.
<pre class=screen>
<samp class=p>you@localhost:~$ </samp><kbd>python3 romantest10.py -v</kbd>
<samp>from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with non-string input ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with negative input ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
<a>Ran 12 tests in 0.031s <span>&#x2460;</span></a>
OK</samp></pre>
<ol>
<li>Not that you asked, but it's fast, too! Like, almost 10&times; as fast. Of course, it's not entirely a fair comparison, because this version takes longer to import (when it builds the lookup tables). But since the import is only done once, the startup cost is amortized over all the calls to the <code>to_roman()</code> and <code>from_roman()</code> functions. Since the tests make several thousand function calls (the roundtrip test alone makes 10,000), this savings adds up in a hurry!
</ol>
<p>The moral of the story?
<ul>
<li>Simplicity is a virtue.
<li>Especially when regular expressions are involved.
<li>Unit tests can give you the confidence to do large-scale refactoring.
</ul>
<h2 id=summary>Summary</h2>
<p>Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it work, you'll wonder how you ever got along without it.
<p>These few chapters have covered a lot of ground, and much of it wasn't even Python-specific. There are unit testing frameworks for many languages, all of which require you to understand the same basic concepts:
<ul>
<li>Designing test cases that are specific, automated, and independent
<li>Writing test cases <em>before</em> the code they are testing
<li>Writing tests that test good input and check for proper results
<li>Writing tests that test bad input and check for proper failure responses
<li>Writing and updating test cases to reflect new requirements
<li>Refactoring mercilessly to improve performance, scalability, readability, maintainability, or whatever other -ility you're lacking
</ul>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=jquery.js></script>
<script src=dip3.js></script>
+10 -1
View File
@@ -5,7 +5,6 @@
<link rel=stylesheet type=text/css href=dip3.css>
<style>
body{counter-reset:h1 8}
mark{background:#ff8;font-weight:bold;line-height:2.154;text-decoration:none;font-style:normal;display:inline-block;width:100%}
</style>
<link rel=stylesheet type=text/css media='only screen and (max-device-width: 480px)' href=mobile.css>
</head>
@@ -544,6 +543,16 @@ For instance, the <code>testFromRomanCase</code> method (&#8220;<code>from_roman
<li><code>from_roman</code> should only accept uppercase Roman numerals (<i class=foreignphrase><abbr>i.e.</abbr></i> it should fail when given lowercase input).
</ol>
-->
<!--
<ol>
<li>The <code>re.compile</code> function can take an optional second argument, which is a set of one or more flags that control various options about the
compiled regular expression. Here you're specifying the <code>re.VERBOSE</code> flag, which tells Python that there are in-line comments within the regular expression itself. The comments and all the whitespace around them are
<em>not</em> considered part of the regular expression; the <code>re.compile</code> function simply strips them all out when it compiles the expression. This new, &#8220;verbose&#8221; version is identical to the old version, but it is infinitely more readable.
</ol>
-->
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=jquery.js></script>
<script src=dip3.js></script>