wrote advanced-unit-testing chapter, decided to merge it into unit-testing. renumbered chapters and fixed up TOC and navigation

2026-06-05 23:10:17 +00:00 · 2009-07-25 15:31:55 -04:00
parent 71821cfadc
commit e5b43fb442
19 changed files with 187 additions and 1933 deletions
@@ -537,7 +537,7 @@ OK</samp></pre>

 <p>But first, the tests. We&#8217;ll need a &#8220;known values&#8221; test to spot-check for accuracy. Our test suite already contains <a href=#romantest1>a mapping of known values</a>; let&#8217;s reuse that.

-<pre class=nd><code>    def test_from_roman_known_values(self):
+<pre class=nd><code class=pp>    def test_from_roman_known_values(self):
        '''from_roman should give known result with known input'''
        for integer, numeral in self.known_values:
            result = roman5.from_roman(numeral)
@@ -545,11 +545,11 @@ OK</samp></pre>

 <p>There&#8217;s a pleasing symmetry here. The <code>to_roman()</code> and <code>from_roman()</code> functions are inverses of each other. The first converts integers to specially-formatted strings, the second converts specially-formated strings to integers. In theory, we should be able to &#8220;round-trip&#8221; a number by passing to the <code>to_roman()</code> function to get a string, then passing that string to the <code>from_roman()</code> function to get an integer, and end up with the same number. In mathematical terms,

-<pre class=nd><code>x = f(g(x)) for all values of x</code></pre>
+<pre class=nd><code class=pp>x = f(g(x)) for all values of x</code></pre>

 <p>In this case, &#8220;all values&#8221; means any number between <code>1..3999</code>, since that is the valid range of inputs to the <code>to_roman()</code> function. We can express this symmetry in a test case that runs through all the values <code>1..3999</code>, calls <code>to_roman()</code>, calls <code>from_roman()</code>, and checks that the output is the same as the original input.

-<pre class=nd><code>class RoundtripCheck(unittest.TestCase):
+<pre class=nd><code class=pp>class RoundtripCheck(unittest.TestCase):
    def test_roundtrip(self):
        '''from_roman(to_roman(n))==n for all n'''
        for integer in range(1, 4000):
@@ -587,7 +587,7 @@ FAILED (errors=2)</samp></pre>

 <p>A quick stub function will solve that problem.

-<pre class=nd><code># roman5.py
+<pre class=nd><code class=pp># roman5.py
 def from_roman(s):
    '''convert Roman numeral to integer'''</code></pre>

@@ -621,7 +621,7 @@ FAILED (failures=2)</samp></pre>

 <p>Now it&#8217;s time to write the <code>from_roman()</code> function.

-<pre><code>def from_roman(s):
+<pre><code class=pp>def from_roman(s):
    """convert Roman numeral to integer"""
    result = 0
    index = 0
@@ -636,7 +636,7 @@ FAILED (failures=2)</samp></pre>

 <p>If you're not clear how <code>from_roman()</code> works, add a <code>print</code> statement to the end of the <code>while</code> loop:

-<pre><code>def from_roman(s):
+<pre><code class=pp>def from_roman(s):
    """convert Roman numeral to integer"""
    result = 0
    index = 0
@@ -646,7 +646,7 @@ FAILED (failures=2)</samp></pre>
            index += len(numeral)
 <mark>            print('found', numeral, 'of length', len(numeral), ', adding', integer)</mark></code></pre>

-<pre class=screen>
+<pre class='nd screen'>
 <samp class=p>>>> </samp><kbd class=pp>import roman5</kbd>
 <samp class=p>>>> </samp><kbd class=pp>roman5.from_roman('MCMLXXII')</kbd>
 <samp class=pp>found M , of length 1, adding 1000
@@ -670,6 +670,126 @@ OK</samp></pre>

 <p>Two pieces of exciting news here. The first is that the <code>from_roman()</code> function works for good input, at least for all the <a href=#romantest1>known values</a>. The second is that the &#8220;round trip&#8221; test also passed. Combined with the known values tests, you can be reasonably sure that both the <code>to_roman()</code> and <code>from_roman()</code> functions work properly for all possible good values. (This is not guaranteed; it is theoretically possible that <code>to_roman()</code> has a bug that produces the wrong Roman numeral for some particular set of inputs, <em>and</em> that <code>from_roman()</code> has a reciprocal bug that produces the same wrong integer values for exactly that set of Roman numerals that <code>to_roman()</code> generated incorrectly. Depending on your application and your requirements, this possibility may bother you; if so, write more comprehensive test cases until it doesn't bother you.)

+<p class=a>&#x2042;
+
+<h2 id=romantest6>More Bad Input</h2>
+
+<p>Now that the <code>from_roman()</code> function works properly with good input, it's time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it's a valid Roman numeral. This is inherently more difficult than <a href=#romantest3>validating numeric input</a> in the <code>to_roman()</code> function, but you have a powerful tool at your disposal: regular expressions. (If you&#8217;re not familiar with regular expressions, now would be a good time to read <a href=regular-expressions.html>the regular expressions chapter</a>.)
+
+<p>As you saw in <a href=regular-expressions.html#romannumerals>Case Study: Roman Numerals</a>, there are several simple rules for constructing a Roman numeral, using the letters <code>M</code>, <code>D</code>, <code>C</code>, <code>L</code>, <code>X</code>, <code>V</code>, and <code>I</code>. Let's review the rules:
+
+<ol>
+<li>Characters are additive. <code>I</code> is <code>1</code>, <code>II</code> is <code>2</code>, and <code>III</code> is <code>3</code>. <code>VI</code> is <code>6</code> (literally, &#8220;<code>5</code> and <code>1</code>&#8221;), <code>VII</code> is <code>7</code>, and <code>VIII</code> is <code>8</code>.
+<li>The tens characters (<code>I</code>, <code>X</code>, <code>C</code>, and <code>M</code>) can be repeated up to three times. At <code>4</code>, you need to subtract from the next highest fives character. You can't represent <code>4</code> as <code>IIII</code>; instead, it is represented as <code>IV</code> (&#8220;<code>1</code> less than <code>5</code>&#8221;). <code>40</code> is written as <code>XL</code> (&#8220;<code>10</code> less than <code>50</code>&#8221;), <code>41</code> as <code>XLI</code>, <code>42</code> as <code>XLII</code>, <code>43</code> as <code>XLIII</code>, and then <code>44</code> as <code>XLIV</code> (&#8220;<code>10</code> less than <code>50</code>, then <code>1</code> less than <code>5</code>&#8221;).
+<li>Similarly, at <code>9</code>, you need to subtract from the next highest tens character: <code>8</code> is <code>VIII</code>, but <code>9</code> is <code>IX</code> (&#8220;<code>1</code> less than <code>10</code>&#8221;), not <code>VIIII</code> (since the <code>I</code> character can not be repeated four times). <code>90</code> is <code>XC</code>, <code>900</code> is <code>CM</code>.
+<li>The fives characters can not be repeated. <code>10</code> is always represented as <code>X</code>, never as <code>VV</code>. <code>100</code> is always <code>C</code>, never <code>LL</code>.
+<li>Roman numerals are always written highest to lowest, and read left to right, so order of characters matters very much. <code>DC</code> is <code>600</code>; <code>CD</code> is a completely different number (<code>400</code>, &#8220;<code>100</code> less than <code>500</code>&#8221;). <code>CI</code> is <code>101</code>; <code>IC</code> is not even a valid Roman numeral (because you can't subtract <code>1</code> directly from <code>100</code>; you would need to write it as <code>XCIX</code>, &#8220;<code>10</code> less than <code>100</code>, then <code>1</code> less than <code>10</code>&#8221;).
+</ol>
+
+<p>Thus, one useful test would be to ensure that the <code>from_roman()</code> function should fail when you pass it a string with too many repeated numerals. How many is &#8220;too many&#8221; depends on the numeral.
+
+<pre class=nd><code class=pp>class FromRomanBadInput(unittest.TestCase):
+    def test_too_many_repeated_numerals(self):
+        '''from_roman should fail with too many repeated numerals'''
+        for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):
+            self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)</code></pre>
+
+<p>Another useful test would be to check that certain patterns aren&#8217;t repeated. For example, <code>IX</code> is <code>9</code>, but <code>IXIX</code> is never valid.
+
+<pre class=nd><code class=pp>    def test_repeated_pairs(self):
+        '''from_roman should fail with repeated pairs of numerals'''
+        for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):
+            self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)</code></pre>
+
+<p>A third test could check that numerals appear in the correct order, from highest to lowest value. For example, <code>CL</code> is <code>150</code>, but <code>LC</code> is never valid, because the numeral for <code>50</code> can never come before the numeral for <code>100</code>.
+
+<pre class=nd><code class=pp>    def test_malformed_antecedents(self):
+        '''from_roman should fail with malformed antecedents'''
+        for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
+                  'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
+            self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)</code></pre>
+
+<p>Each of these tests relies the <code>from_roman()</code> function raising a new exception, <code>InvalidRomanNumeralError</code>, which we haven&#8217;t defined yet.
+
+<pre class=nd><code class=pp># roman6.py
+class InvalidRomanNumeralError(ValueError): pass</code></pre>
+
+<p>All three of these tests should fail, since the <code>from_roman()</code> function doesn&#8217;t currently have any validity checking. (If they don&#8217;t fail now, then what the heck are they testing?)
+
+<pre class='nd screen'>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 romantest6.py</kbd>
+<samp>FFF.......
+======================================================================
+FAIL: test_malformed_antecedents (__main__.FromRomanBadInput)
+from_roman should fail with malformed antecedents
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "romantest6.py", line 113, in test_malformed_antecedents
+    self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)
+AssertionError: InvalidRomanNumeralError not raised by from_roman
+
+======================================================================
+FAIL: test_repeated_pairs (__main__.FromRomanBadInput)
+from_roman should fail with repeated pairs of numerals
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "romantest6.py", line 107, in test_repeated_pairs
+    self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)
+AssertionError: InvalidRomanNumeralError not raised by from_roman
+
+======================================================================
+FAIL: test_too_many_repeated_numerals (__main__.FromRomanBadInput)
+from_roman should fail with too many repeated numerals
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "romantest6.py", line 102, in test_too_many_repeated_numerals
+    self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s)
+AssertionError: InvalidRomanNumeralError not raised by from_roman
+
+----------------------------------------------------------------------
+Ran 10 tests in 0.058s
+
+FAILED (failures=3)</samp></pre>
+
+<p>Good deal. Now, all we need to do is add the <a href=regular-expressions.html#romannumerals>regular expression to test for valid Roman numerals</a> into the <code>from_roman()</code> function.
+
+<pre class=nd><code class=pp>roman_numeral_pattern = re.compile('''
+    ^                   # beginning of string
+    M{0,3}              # thousands - 0 to 3 Ms
+    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 Cs),
+                        #            or 500-800 (D, followed by 0 to 3 Cs)
+    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 Xs),
+                        #        or 50-80 (L, followed by 0 to 3 Xs)
+    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 Is),
+                        #        or 5-8 (V, followed by 0 to 3 Is)
+    $                   # end of string
+    ''', re.VERBOSE)
+
+def from_roman(s):
+    '''convert Roman numeral to integer'''
+<mark>    if not roman_numeral_pattern.search(s):
+        raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s))</mark>
+
+    result = 0
+    index = 0
+    for numeral, integer in roman_numeral_map:
+        while s[index : index + len(numeral)] == numeral:
+            result += integer
+            index += len(numeral)
+    return result</code></pre>
+
+<p>And re-run the tests&hellip;
+
+<pre class='nd screen'>
+<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>python3 romantest7.py</kbd>
+<samp>..........
+----------------------------------------------------------------------
+Ran 10 tests in 0.066s
+
+OK</samp></pre>
+
+<p>And the anticlimax award of the year goes to&hellip; the word &#8220;<code>OK</code>&#8221;, which is printed by the <code>unittest</code> module when all the tests pass.
+
 <p class=v><a href=advanced-iterators.html rel=prev title='back to &#8220;Advanced Iterators&#8221;'><span class=u>&#x261C;</span></a> <a href=unit-testing.html rel=next title='onward to &#8220;Unit Testing&#8221;'><span class=u>&#x261E;</span></a>
 <p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
 <script src=j/jquery.js></script>