From 94a3353f451cab04b55d79511dfc842375fd813e Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Sun, 12 Apr 2009 01:09:17 -0400 Subject: [PATCH] finished refactoring chapter --- case-study-porting-chardet-to-python-3.html | 3 +- dip2 | 704 -------------------- dip3.css | 5 +- examples/roman10.py | 87 +++ examples/roman7.py | 4 +- examples/roman9.py | 91 +++ examples/romantest10.py | 159 +++++ examples/romantest5.py | 4 +- examples/romantest6.py | 6 +- examples/romantest7.py | 12 +- examples/romantest8.py | 6 +- examples/romantest9.py | 159 +++++ index.html | 4 +- refactoring.html | 468 +++++++++++++ unit-testing.html | 11 +- 15 files changed, 996 insertions(+), 727 deletions(-) create mode 100644 examples/roman10.py create mode 100644 examples/roman9.py create mode 100644 examples/romantest10.py create mode 100644 examples/romantest9.py create mode 100644 refactoring.html diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index 6e5b231..55652c3 100644 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -6,10 +6,9 @@ diff --git a/dip2 b/dip2 index 64419e7..3b2f2c5 100644 --- a/dip2 +++ b/dip2 @@ -6387,710 +6387,6 @@ OK
NoteWhen all of your tests pass, stop coding.
-

Chapter 15. Refactoring

-

15.1. Handling bugs

-

Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven't written yet. -

Example 15.1. The bug

>>> import roman5
->>> roman5.from_roman("") 
-0
-
    -
  1. Remember in the previous section when you kept seeing that an empty string would match the regular expression you were using to check for valid Roman numerals? - Well, it turns out that this is still true for the final version of the regular expression. And that's a bug; you want an - empty string to raise an InvalidRomanNumeralError exception just like any other sequence of characters that don't represent a valid Roman numeral. -

    After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug. -

    Example 15.2. Testing for the bug (romantest61.py)

    
    -class FromRomanBadInput(unittest.TestCase):  
    -
    -    # previous test cases omitted for clarity (they haven't changed)
    -
    -    def testBlank(self):
    -        """from_roman should fail with blank string"""
    -        self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, "") 
    -
    -
      -
    1. Pretty simple stuff here. Call from_roman() with an empty string and make sure it raises an InvalidRomanNumeralError exception. The hard part was finding the bug; now that you know about it, testing for it is the easy part. -

      Since your code has a bug, and you now have a test case that tests this bug, the test case will fail: -

      Example 15.3. Output of romantest61.py against roman61.py

      from_roman should only accept uppercase input ... ok
      -to_roman should always return uppercase ... ok
      -from_roman should fail with blank string ... FAIL
      -from_roman should fail with malformed antecedents ... ok
      -from_roman should fail with repeated pairs of numerals ... ok
      -from_roman should fail with too many repeated numerals ... ok
      -from_roman should give known result with known input ... ok
      -to_roman should give known result with known input ... ok
      -from_roman(to_roman(n))==n for all n ... ok
      -to_roman should fail with non-integer input ... ok
      -to_roman should fail with negative input ... ok
      -to_roman should fail with large input ... ok
      -to_roman should fail with 0 input ... ok
      -
      -======================================================================
      -FAIL: from_roman should fail with blank string
      -----------------------------------------------------------------------
      -Traceback (most recent call last):
      -  File "C:\docbook\dip\py\roman\stage6\romantest61.py", line 137, in testBlank
      -    self.assertRaises(roman61.InvalidRomanNumeralError, roman61.from_roman, "")
      -  File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
      -    raise self.failureException, excName
      -AssertionError: InvalidRomanNumeralError
      -----------------------------------------------------------------------
      -Ran 13 tests in 2.864s
      -
      -FAILED (failures=1)

      Now you can fix the bug. -

      Example 15.4. Fixing the bug (roman62.py)

      -

      This file is available in py/roman/stage6/ in the examples directory. -

      
      -def from_roman(s):
      -    """convert Roman numeral to integer"""
      -    if not s: 
      -        raise InvalidRomanNumeralError, 'Input can not be blank'
      -    if not re.search(romanNumeralPattern, s):
      -        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
      -
      -    result = 0
      -    index = 0
      -    for numeral, integer in romanNumeralMap:
      -        while s[index:index+len(numeral)] == numeral:
      -            result += integer
      -            index += len(numeral)
      -    return result
      -
      -
        -
      1. Only two lines of code are required: an explicit check for an empty string, and a raise statement. -

        Example 15.5. Output of romantest62.py against roman62.py

        from_roman should only accept uppercase input ... ok
        -to_roman should always return uppercase ... ok
        -from_roman should fail with blank string ... ok 
        -from_roman should fail with malformed antecedents ... ok
        -from_roman should fail with repeated pairs of numerals ... ok
        -from_roman should fail with too many repeated numerals ... ok
        -from_roman should give known result with known input ... ok
        -to_roman should give known result with known input ... ok
        -from_roman(to_roman(n))==n for all n ... ok
        -to_roman should fail with non-integer input ... ok
        -to_roman should fail with negative input ... ok
        -to_roman should fail with large input ... ok
        -to_roman should fail with 0 input ... ok
        -
        -----------------------------------------------------------------------
        -Ran 13 tests in 2.834s
        -
        -OK 
        -
          -
        1. The blank string test case now passes, so the bug is fixed. -
        2. All the other test cases still pass, which means that this bug fix didn't break anything else. Stop coding. -

          Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs -will require complex test cases. In a testing-centric environment, it may seem like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), -then fix the bug itself. Then if the test case doesn't pass right away, you need to figure out whether the fix was wrong, -or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code -tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can -easily re-run all the test cases along with your new one, you are much less likely to break old code when fixing new code. Today's unit test -is tomorrow's regression test. -

          15.2. Handling changing requirements

          -

          Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible - nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they - see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even - if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change. -

          Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception -to that rule by having 4 M characters in a row to represent 4000. If you make this change, you'll be able to expand the range of convertible numbers from 1..3999 to 1..4999. But first, you need to make some changes to the test cases. -

          Example 15.6. Modifying test cases for new requirements (romantest71.py)

          -

          This file is available in py/roman/stage7/ in the examples directory. -

          If you have not already done so, you can download this and other examples used in this book. -

          
          -import roman71
          -import unittest
          -
          -class KnownValues(unittest.TestCase):
          -    knownValues = ( (1, 'I'),
          -  (2, 'II'),
          -  (3, 'III'),
          -  (4, 'IV'),
          -  (5, 'V'),
          -  (6, 'VI'),
          -  (7, 'VII'),
          -  (8, 'VIII'),
          -  (9, 'IX'),
          -  (10, 'X'),
          -  (50, 'L'),
          -  (100, 'C'),
          -  (500, 'D'),
          -  (1000, 'M'),
          -  (31, 'XXXI'),
          -  (148, 'CXLVIII'),
          -  (294, 'CCXCIV'),
          -  (312, 'CCCXII'),
          -  (421, 'CDXXI'),
          -  (528, 'DXXVIII'),
          -  (621, 'DCXXI'),
          -  (782, 'DCCLXXXII'),
          -  (870, 'DCCCLXX'),
          -  (941, 'CMXLI'),
          -  (1043, 'MXLIII'),
          -  (1110, 'MCX'),
          -  (1226, 'MCCXXVI'),
          -  (1301, 'MCCCI'),
          -  (1485, 'MCDLXXXV'),
          -  (1509, 'MDIX'),
          -  (1607, 'MDCVII'),
          -  (1754, 'MDCCLIV'),
          -  (1832, 'MDCCCXXXII'),
          -  (1993, 'MCMXCIII'),
          -  (2074, 'MMLXXIV'),
          -  (2152, 'MMCLII'),
          -  (2212, 'MMCCXII'),
          -  (2343, 'MMCCCXLIII'),
          -  (2499, 'MMCDXCIX'),
          -  (2574, 'MMDLXXIV'),
          -  (2646, 'MMDCXLVI'),
          -  (2723, 'MMDCCXXIII'),
          -  (2892, 'MMDCCCXCII'),
          -  (2975, 'MMCMLXXV'),
          -  (3051, 'MMMLI'),
          -  (3185, 'MMMCLXXXV'),
          -  (3250, 'MMMCCL'),
          -  (3313, 'MMMCCCXIII'),
          -  (3408, 'MMMCDVIII'),
          -  (3501, 'MMMDI'),
          -  (3610, 'MMMDCX'),
          -  (3743, 'MMMDCCXLIII'),
          -  (3844, 'MMMDCCCXLIV'),
          -  (3888, 'MMMDCCCLXXXVIII'),
          -  (3940, 'MMMCMXL'),
          -  (3999, 'MMMCMXCIX'),
          -  (4000, 'MMMM'),   
          -  (4500, 'MMMMD'),
          -  (4888, 'MMMMDCCCLXXXVIII'),
          -  (4999, 'MMMMCMXCIX'))
          -
          -    def testToRomanKnownValues(self):
          -        """to_roman should give known result with known input"""
          -        for integer, numeral in self.knownValues:
          -            result = roman71.to_roman(integer)
          -            self.assertEqual(numeral, result)
          -
          -    def testFromRomanKnownValues(self):
          -        """from_roman should give known result with known input"""
          -        for integer, numeral in self.knownValues:
          -            result = roman71.from_roman(numeral)
          -            self.assertEqual(integer, result)
          -
          -class ToRomanBadInput(unittest.TestCase):
          -    def testTooLarge(self):
          -        """to_roman should fail with large input"""
          -        self.assertRaises(roman71.OutOfRangeError, roman71.to_roman, 5000) 
          -
          -    def testZero(self):
          -        """to_roman should fail with 0 input"""
          -        self.assertRaises(roman71.OutOfRangeError, roman71.to_roman, 0)
          -
          -    def testNegative(self):
          -        """to_roman should fail with negative input"""
          -        self.assertRaises(roman71.OutOfRangeError, roman71.to_roman, -1)
          -
          -    def testNonInteger(self):
          -        """to_roman should fail with non-integer input"""
          -        self.assertRaises(roman71.NotIntegerError, roman71.to_roman, 0.5)
          -
          -class FromRomanBadInput(unittest.TestCase):
          -    def testTooManyRepeatedNumerals(self):
          -        """from_roman should fail with too many repeated numerals"""
          -        for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):     
          -            self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, s)
          -
          -    def testRepeatedPairs(self):
          -        """from_roman should fail with repeated pairs of numerals"""
          -        for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'):
          -            self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, s)
          -
          -    def testMalformedAntecedent(self):
          -        """from_roman should fail with malformed antecedents"""
          -        for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV',
          -'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'):
          -            self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, s)
          -
          -    def testBlank(self):
          -        """from_roman should fail with blank string"""
          -        self.assertRaises(roman71.InvalidRomanNumeralError, roman71.from_roman, "")
          -
          -class SanityCheck(unittest.TestCase):
          -    def testSanity(self):
          -        """from_roman(to_roman(n))==n for all n"""
          -        for integer in range(1, 5000):
          -            numeral = roman71.to_roman(integer)
          -            result = roman71.from_roman(numeral)
          -            self.assertEqual(integer, result)
          -
          -class CaseCheck(unittest.TestCase):
          -    def testToRomanCase(self):
          -        """to_roman should always return uppercase"""
          -        for integer in range(1, 5000):
          -            numeral = roman71.to_roman(integer)
          -            self.assertEqual(numeral, numeral.upper())
          -
          -    def testFromRomanCase(self):
          -        """from_roman should only accept uppercase input"""
          -        for integer in range(1, 5000):
          -            numeral = roman71.to_roman(integer)
          -            roman71.from_roman(numeral.upper())
          -            self.assertRaises(roman71.InvalidRomanNumeralError,
          -            roman71.from_roman, numeral.lower())
          -
          -if __name__ == "__main__":
          -    unittest.main()
          -
          -
            -
          1. The existing known values don't change (they're all still reasonable values to test), but you need to add a few more in the -4000 range. Here I've included 4000 (the shortest), 4500 (the second shortest), 4888 (the longest), and 4999 (the largest). -
          2. The definition of “large input” has changed. This test used to call to_roman() with 4000 and expect an error; now that 4000-4999 are good values, you need to bump this up to 5000. -
          3. The definition of “too many repeated numerals” has also changed. This test used to call from_roman() with 'MMMM' and expect an error; now that MMMM is considered a valid Roman numeral, you need to bump this up to 'MMMMM'. -
          4. The sanity check and case checks loop through every number in the range, from 1 to 3999. Since the range has now expanded, these for loops need to be updated as well to go up to 4999. -

            Now your test cases are up to date with the new requirements, but your code is not, so you expect several of the test cases -to fail. -

            Example 15.7. Output of romantest71.py against roman71.py

            
            -from_roman should only accept uppercase input ... ERROR        
            -to_roman should always return uppercase ... ERROR
            -from_roman should fail with blank string ... ok
            -from_roman should fail with malformed antecedents ... ok
            -from_roman should fail with repeated pairs of numerals ... ok
            -from_roman should fail with too many repeated numerals ... ok
            -from_roman should give known result with known input ... ERROR 
            -to_roman should give known result with known input ... ERROR   
            -from_roman(to_roman(n))==n for all n ... ERROR
            -to_roman should fail with non-integer input ... ok
            -to_roman should fail with negative input ... ok
            -to_roman should fail with large input ... ok
            -to_roman should fail with 0 input ... ok
            -
            -
              -
            1. Our case checks now fail because they loop from 1 to 4999, but to_roman() only accepts numbers from 1 to 3999, so it will fail as soon the test case hits 4000. -
            2. The from_roman() known values test will fail as soon as it hits 'MMMM', because from_roman() still thinks this is an invalid Roman numeral. -
            3. The to_roman() known values test will fail as soon as it hits 4000, because to_roman() still thinks this is out of range. -
            4. The sanity check will also fail as soon as it hits 4000, because to_roman() still thinks this is out of range. -
              
              -======================================================================
              -ERROR: from_roman should only accept uppercase input
              -----------------------------------------------------------------------
              -Traceback (most recent call last):
              -  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 161, in testFromRomanCase
              -    numeral = roman71.to_roman(integer)
              -  File "roman71.py", line 28, in to_roman
              -    raise OutOfRangeError, "number out of range (must be 1..3999)"
              -OutOfRangeError: number out of range (must be 1..3999)
              -======================================================================
              -ERROR: to_roman should always return uppercase
              -----------------------------------------------------------------------
              -Traceback (most recent call last):
              -  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 155, in testToRomanCase
              -    numeral = roman71.to_roman(integer)
              -  File "roman71.py", line 28, in to_roman
              -    raise OutOfRangeError, "number out of range (must be 1..3999)"
              -OutOfRangeError: number out of range (must be 1..3999)
              -======================================================================
              -ERROR: from_roman should give known result with known input
              -----------------------------------------------------------------------
              -Traceback (most recent call last):
              -  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 102, in testFromRomanKnownValues
              -    result = roman71.from_roman(numeral)
              -  File "roman71.py", line 47, in from_roman
              -    raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
              -InvalidRomanNumeralError: Invalid Roman numeral: MMMM
              -======================================================================
              -ERROR: to_roman should give known result with known input
              -----------------------------------------------------------------------
              -Traceback (most recent call last):
              -  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 96, in testToRomanKnownValues
              -    result = roman71.to_roman(integer)
              -  File "roman71.py", line 28, in to_roman
              -    raise OutOfRangeError, "number out of range (must be 1..3999)"
              -OutOfRangeError: number out of range (must be 1..3999)
              -======================================================================
              -ERROR: from_roman(to_roman(n))==n for all n
              -----------------------------------------------------------------------
              -Traceback (most recent call last):
              -  File "C:\docbook\dip\py\roman\stage7\romantest71.py", line 147, in testSanity
              -    numeral = roman71.to_roman(integer)
              -  File "roman71.py", line 28, in to_roman
              -    raise OutOfRangeError, "number out of range (must be 1..3999)"
              -OutOfRangeError: number out of range (must be 1..3999)
              -----------------------------------------------------------------------
              -Ran 13 tests in 2.213s
              -
              -FAILED (errors=5)

              Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line -with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being -tested is never “ahead” of the test cases. While it's behind, you still have some work to do, and as soon as it catches up to the test cases, you -stop coding.) -

              Example 15.8. Coding the new requirements (roman72.py)

              -

              This file is available in py/roman/stage7/ in the examples directory. -

              
              -"""Convert to and from Roman numerals"""
              -import re
              -
              -#Define exceptions
              -class RomanError(Exception): pass
              -class OutOfRangeError(RomanError): pass
              -class NotIntegerError(RomanError): pass
              -class InvalidRomanNumeralError(RomanError): pass
              -
              -#Define digit mapping
              -romanNumeralMap = (('M',  1000),
              - ('CM', 900),
              - ('D',  500),
              - ('CD', 400),
              - ('C',  100),
              - ('XC', 90),
              - ('L',  50),
              - ('XL', 40),
              - ('X',  10),
              - ('IX', 9),
              - ('V',  5),
              - ('IV', 4),
              - ('I',  1))
              -
              -def to_roman(n):
              -    """convert integer to Roman numeral"""
              -    if not (0 < n < 5000):   
              -        raise OutOfRangeError, "number out of range (must be 1..4999)"
              -    if int(n) <> n:
              -        raise NotIntegerError, "non-integers can not be converted"
              -
              -    result = ""
              -    for numeral, integer in romanNumeralMap:
              -        while n >= integer:
              -            result += numeral
              -            n -= integer
              -    return result
              -
              -#Define pattern to detect valid Roman numerals
              -romanNumeralPattern = '^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' 
              -
              -def from_roman(s):
              -    """convert Roman numeral to integer"""
              -    if not s:
              -        raise InvalidRomanNumeralError, 'Input can not be blank'
              -    if not re.search(romanNumeralPattern, s):
              -        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
              -
              -    result = 0
              -    index = 0
              -    for numeral, integer in romanNumeralMap:
              -        while s[index:index+len(numeral)] == numeral:
              -            result += integer
              -            index += len(numeral)
              -    return result
              -
              -
                -
              1. to_roman() only needs one small change, in the range check. Where you used to check 0 < n < 4000, you now check 0 < n < 5000. And you change the error message that you raise to reflect the new acceptable range (1..4999 instead of 1..3999). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds 'M' for each thousand that it finds; given 4000, it will spit out 'MMMM'. The only reason it didn't do this before is that you explicitly stopped it with the range check.) -
              2. You don't need to make any changes to from_roman() at all. The only change is to romanNumeralPattern; if you look closely, you'll notice that you added another optional M in the first section of the regular expression. This will allow up to 4 M characters instead of 3, meaning you will allow the Roman numeral equivalents of 4999 instead of 3999. The actual from_roman() function is completely general; it just looks for repeated Roman numeral characters and adds them up, without caring how - many times they repeat. The only reason it didn't handle 'MMMM' before is that you explicitly stopped it with the regular expression pattern matching. -

                You may be skeptical that these two small changes are all that you need. Hey, don't take my word for it; see for yourself: -

                Example 15.9. Output of romantest72.py against roman72.py

                from_roman should only accept uppercase input ... ok
                -to_roman should always return uppercase ... ok
                -from_roman should fail with blank string ... ok
                -from_roman should fail with malformed antecedents ... ok
                -from_roman should fail with repeated pairs of numerals ... ok
                -from_roman should fail with too many repeated numerals ... ok
                -from_roman should give known result with known input ... ok
                -to_roman should give known result with known input ... ok
                -from_roman(to_roman(n))==n for all n ... ok
                -to_roman should fail with non-integer input ... ok
                -to_roman should fail with negative input ... ok
                -to_roman should fail with large input ... ok
                -to_roman should fail with 0 input ... ok
                -
                -----------------------------------------------------------------------
                -Ran 13 tests in 3.685s
                -
                -OK 
                -
                  -
                1. All the test cases pass. Stop coding. -

                  Comprehensive unit testing means never having to rely on a programmer who says “Trust me.” -

                  15.3. Refactoring

                  -

                  The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even - the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly. -

                  Refactoring is the process of taking working code and making it work better. Usually, “better” means “faster”, although it can also mean “using less memory”, or “using less disk space”, or simply “more elegantly”. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any -program. -

                  Here, “better” means “faster”. Specifically, the from_roman() function is slower than it needs to be, because of that big nasty regular expression that you use to validate Roman numerals. -It's probably not worth trying to do away with the regular expression altogether (it would be difficult, and it might not -end up any faster), but you can speed up the function by precompiling the regular expression. -

                  Example 15.10. Compiling regular expressions

                  ->>> import re
                  ->>> pattern = '^M?M?M?$'
                  ->>> re.search(pattern, 'M')               
                  -<SRE_Match object at 01090490>
                  ->>> compiledPattern = re.compile(pattern) 
                  ->>> compiledPattern
                  -<SRE_Pattern object at 00F06E28>
                  ->>> dir(compiledPattern)
                  -['findall', 'match', 'scanner', 'search', 'split', 'sub', 'subn']
                  ->>> compiledPattern.search('M')           
                  -<SRE_Match object at 01104928>
                  -
                    -
                  1. This is the syntax you've seen before: re.search takes a regular expression as a string (pattern) and a string to match against it ('M'). If the pattern matches, the function returns a match object which can be queried to find out exactly what matched and - how. -
                  2. This is the new syntax: re.compile takes a regular expression as a string and returns a pattern object. Note there is no string to match here. Compiling a - regular expression has nothing to do with matching it against any specific strings (like 'M'); it only involves the regular expression itself. -
                  3. The compiled pattern object returned from re.compile has several useful-looking functions, including several (like search and sub) that are available directly in the re module. -
                  4. Calling the compiled pattern object's search function with the string 'M' accomplishes the same thing as calling re.search with both the regular expression and the string 'M'. Only much, much faster. (In fact, the re.search function simply compiles the regular expression and calls the resulting pattern object's search method for you.) - - -
                    NoteWhenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call - the methods on the pattern object directly. -

                    Example 15.11. Compiled regular expressions in roman81.py

                    -

                    This file is available in py/roman/stage8/ in the examples directory. -

                    If you have not already done so, you can download this and other examples used in this book. -

                    
                    -# to_roman and rest of module omitted for clarity
                    -
                    -romanNumeralPattern = \
                    -    re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$') 
                    -
                    -def from_roman(s):
                    -    """convert Roman numeral to integer"""
                    -    if not s:
                    -        raise InvalidRomanNumeralError, 'Input can not be blank'
                    -    if not romanNumeralPattern.search(s):
                    -        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
                    -
                    -    result = 0
                    -    index = 0
                    -    for numeral, integer in romanNumeralMap:
                    -        while s[index:index+len(numeral)] == numeral:
                    -            result += integer
                    -            index += len(numeral)
                    -    return result
                    -
                    -
                      -
                    1. This looks very similar, but in fact a lot has changed. romanNumeralPattern is no longer a string; it is a pattern object which was returned from re.compile. -
                    2. That means that you can call methods on romanNumeralPattern directly. This will be much, much faster than calling re.search every time. The regular expression is compiled once and stored in romanNumeralPattern when the module is first imported; then, every time you call from_roman(), you can immediately match the input string against the regular expression, without any intermediate steps occurring under - the covers. -

                      So how much faster is it to compile regular expressions? See for yourself: -

                      Example 15.12. Output of romantest81.py against roman81.py

                      .............         
                      -----------------------------------------------------------------------
                      -Ran 13 tests in 3.385s 
                      -
                      -OK   
                      -
                        -
                      1. Just a note in passing here: this time, I ran the unit test without the -v option, so instead of the full docstring for each test, you only get a dot for each test that passes. (If a test failed, you'd get an F, and if it had an error, you'd get an E. You'd still get complete tracebacks for each failure and error, so you could track down any problems.) -
                      2. You ran 13 tests in 3.385 seconds, compared to 3.685 seconds without precompiling the regular expressions. That's an 8% improvement overall, and remember that most of the time spent during the unit test is spent doing other things. (Separately, - I time-tested the regular expressions by themselves, apart from the rest of the unit tests, and found that compiling this - regular expression speeds up the search by an average of 54%.) Not bad for such a simple fix. -
                      3. Oh, and in case you were wondering, precompiling the regular expression didn't break anything, and you just proved it. -

                        There is one other performance optimization that I want to try. Given the complexity of regular expression syntax, it should -come as no surprise that there is frequently more than one way to write the same expression. After some discussion about -this module on comp.lang.python, someone suggested that I try using the {m,n} syntax for the optional repeated characters. -

                        Example 15.13. roman82.py

                        -

                        This file is available in py/roman/stage8/ in the examples directory. -

                        If you have not already done so, you can download this and other examples used in this book. -

                        
                        -# rest of program omitted for clarity
                        -
                        -#old version
                        -#romanNumeralPattern = \
                        -#   re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$')
                        -
                        -#new version
                        -romanNumeralPattern = \
                        -    re.compile('^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$') 
                        -
                        -
                          -
                        1. You have replaced M?M?M?M? with M{0,4}. Both mean the same thing: “match 0 to 4 M characters”. Similarly, C?C?C? became C{0,3} (“match 0 to 3 C characters”) and so forth for X and I. -

                          This form of the regular expression is a little shorter (though not any more readable). The big question is, is it any faster? -

                          Example 15.14. Output of romantest82.py against roman82.py

                          .............
                          -----------------------------------------------------------------------
                          -Ran 13 tests in 3.315s 
                          -
                          -OK   
                          -
                            -
                          1. Overall, the unit tests run 2% faster with this form of regular expression. That doesn't sound exciting, but remember that - the search function is a small part of the overall unit test; most of the time is spent doing other things. (Separately, I time-tested - just the regular expressions, and found that the search function is 11% faster with this syntax.) By precompiling the regular expression and rewriting part of it to use this new syntax, you've - improved the regular expression performance by over 60%, and improved the overall performance of the entire unit test by over 10%. -
                          2. More important than any performance boost is the fact that the module still works perfectly. This is the freedom I was talking - about earlier: the freedom to tweak, change, or rewrite any piece of it and verify that you haven't messed anything up in - the process. This is not a license to endlessly tweak your code just for the sake of tweaking it; you had a very specific - objective (“make from_roman() faster”), and you were able to accomplish that objective without any lingering doubts about whether you introduced new bugs in the - process. -

                            One other tweak I would like to make, and then I promise I'll stop refactoring and put this module to bed. As you've seen -repeatedly, regular expressions can get pretty hairy and unreadable pretty quickly. I wouldn't like to come back to this -module in six months and try to maintain it. Sure, the test cases pass, so I know that it works, but if I can't figure out -how it works, it's still going to be difficult to add new features, fix new bugs, or otherwise maintain it. As you saw in Section 7.5, “Verbose Regular Expressions”, Python provides a way to document your logic line-by-line. -

                            Example 15.15. roman83.py

                            -

                            This file is available in py/roman/stage8/ in the examples directory. -

                            If you have not already done so, you can download this and other examples used in this book. -

                            
                            -# rest of program omitted for clarity
                            -
                            -#old version
                            -#romanNumeralPattern = \
                            -#   re.compile('^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$')
                            -
                            -#new version
                            -romanNumeralPattern = re.compile('''
                            -    ^ # beginning of string
                            -    M{0,4}              # thousands - 0 to 4 M's
                            -    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                            -      #            or 500-800 (D, followed by 0 to 3 C's)
                            -    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                            -      #        or 50-80 (L, followed by 0 to 3 X's)
                            -    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                            -      #        or 5-8 (V, followed by 0 to 3 I's)
                            -    $ # end of string
                            -    ''', re.VERBOSE) 
                            -
                            -
                              -
                            1. The re.compile function can take an optional second argument, which is a set of one or more flags that control various options about the - compiled regular expression. Here you're specifying the re.VERBOSE flag, which tells Python that there are in-line comments within the regular expression itself. The comments and all the whitespace around them are -not considered part of the regular expression; the re.compile function simply strips them all out when it compiles the expression. This new, “verbose” version is identical to the old version, but it is infinitely more readable. -

                              Example 15.16. Output of romantest83.py against roman83.py

                              .............
                              -----------------------------------------------------------------------
                              -Ran 13 tests in 3.315s 
                              -
                              -OK   
                              -
                                -
                              1. This new, “verbose” version runs at exactly the same speed as the old version. In fact, the compiled pattern objects are the same, since the -re.compile function strips out all the stuff you added. -
                              2. This new, “verbose” version passes all the same tests as the old version. Nothing has changed, except that the programmer who comes back to - this module in six months stands a fighting chance of understanding how the function works. -

                                15.4. Postscript

                                -

                                A clever reader read the previous section and took it to the next level. The biggest headache (and performance drain) in the program as it is currently written is - the regular expression, which is required because you have no other way of breaking down a Roman numeral. But there's only - 5000 of them; why don't you just build a lookup table once, then simply read that? This idea gets even better when you realize - that you don't need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals, - you can build the reverse lookup table to convert Roman numerals to integers. -

                                And best of all, he already had a complete set of unit tests. He changed over half the code in the module, but the unit tests -stayed the same, so he could prove that his code worked just as well as the original. -

                                Example 15.17. roman9.py

                                -

                                This file is available in py/roman/stage9/ in the examples directory. -

                                If you have not already done so, you can download this and other examples used in this book. -

                                
                                -#Define exceptions
                                -class RomanError(Exception): pass
                                -class OutOfRangeError(RomanError): pass
                                -class NotIntegerError(RomanError): pass
                                -class InvalidRomanNumeralError(RomanError): pass
                                -
                                -#Roman numerals must be less than 5000
                                -MAX_ROMAN_NUMERAL = 4999
                                -
                                -#Define digit mapping
                                -romanNumeralMap = (('M',  1000),
                                - ('CM', 900),
                                - ('D',  500),
                                - ('CD', 400),
                                - ('C',  100),
                                - ('XC', 90),
                                - ('L',  50),
                                - ('XL', 40),
                                - ('X',  10),
                                - ('IX', 9),
                                - ('V',  5),
                                - ('IV', 4),
                                - ('I',  1))
                                -
                                -#Create tables for fast conversion of roman numerals.
                                -#See fillLookupTables() below.
                                -to_romanTable = [ None ]  # Skip an index since Roman numerals have no zero
                                -from_romanTable = {}
                                -
                                -def to_roman(n):
                                -    """convert integer to Roman numeral"""
                                -    if not (0 < n <= MAX_ROMAN_NUMERAL):
                                -        raise OutOfRangeError, "number out of range (must be 1..%s)" % MAX_ROMAN_NUMERAL
                                -    if int(n) <> n:
                                -        raise NotIntegerError, "non-integers can not be converted"
                                -    return to_romanTable[n]
                                -
                                -def from_roman(s):
                                -    """convert Roman numeral to integer"""
                                -    if not s:
                                -        raise InvalidRomanNumeralError, "Input can not be blank"
                                -    if not from_romanTable.has_key(s):
                                -        raise InvalidRomanNumeralError, "Invalid Roman numeral: %s" % s
                                -    return from_romanTable[s]
                                -
                                -def to_romanDynamic(n):
                                -    """convert integer to Roman numeral using dynamic programming"""
                                -    result = ""
                                -    for numeral, integer in romanNumeralMap:
                                -        if n >= integer:
                                -            result = numeral
                                -            n -= integer
                                -            break
                                -    if n > 0:
                                -        result += to_romanTable[n]
                                -    return result
                                -
                                -def fillLookupTables():
                                -    """compute all the possible roman numerals"""
                                -    #Save the values in two global tables to convert to and from integers.
                                -    for integer in range(1, MAX_ROMAN_NUMERAL + 1):
                                -        romanNumber = to_romanDynamic(integer)
                                -        to_romanTable.append(romanNumber)
                                -        from_romanTable[romanNumber] = integer
                                -
                                -fillLookupTables()
                                -

                                So how fast is it? -

                                Example 15.18. Output of romantest9.py against roman9.py

                                -
                                -.............
                                -----------------------------------------------------------------------
                                -Ran 13 tests in 0.791s
                                -
                                -OK
                                -
                                -

                                Remember, the best performance you ever got in the original version was 13 tests in 3.315 seconds. Of course, it's not entirely -a fair comparison, because this version will take longer to import (when it fills the lookup tables). But since import is -only done once, this is negligible in the long run. -

                                The moral of the story? -

                                -
                                  -
                                • Simplicity is a virtue. -
                                • Especially when regular expressions are involved. -
                                • And unit tests can give you the confidence to do large-scale refactoring... even if you didn't write the original code. -
                                -

                                15.5. Summary

                                -

                                Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility - in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, - or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers - are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional - testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it - work, you'll wonder how you ever got along without it. -

                                This chapter covered a lot of ground, and much of it wasn't even Python-specific. There are unit testing frameworks for many languages, all of which require you to understand the same basic concepts: -

                                -
                                -
                                  -
                                • Designing test cases that are specific, automated, and independent -
                                • Writing test cases before the code they are testing - -
                                • Writing tests that test good input and check for proper results - -
                                • Writing tests that test bad input and check for proper failures - -
                                • Writing and updating test cases to illustrate bugs or reflect new requirements -
                                • Refactoring mercilessly to improve performance, scalability, readability, maintainability, or whatever other -ility you're lacking - -
                                -

                                Additionally, you should be comfortable doing all of the following Python-specific things: -

                                -
                                - -
                                -

                                Further reading

                                - -

                                Chapter 16. Functional Programming

                                16.1. Diving in

                                In Chapter 13, Unit Testing, you learned about the philosophy of unit testing. In Chapter 14, Test-First Programming, you stepped through the implementation of basic unit tests in Python. In Chapter 15, Refactoring, you saw how unit testing makes large-scale refactoring easier. This chapter will build on those sample programs, but here diff --git a/dip3.css b/dip3.css index f35772b..7cae60f 100644 --- a/dip3.css +++ b/dip3.css @@ -65,7 +65,7 @@ abbr{font-variant:small-caps;text-transform:lowercase;letter-spacing:0.1em} p,ul,ol{margin:1.75em 0;font-size:medium} /* basics */ -html{background:#fff;color:#333} +html{background:#fff;color:#222} body{margin:1.75em 28px} form div{float:right} .c{text-align:center;margin:2.154em 0} @@ -84,7 +84,8 @@ pre{white-space:pre-wrap;padding-left:2.154em;border-left:1px solid #ddd} .b,ol,p,blockquote,h1,h2,h3{clear:left} pre a,.w a{padding:0.4375em 0} .w a{text-decoration:underline} -kbd{font-weight:bold} +kbd,mark{font-weight:bold} +mark{display:inline-block;width:100%;background:#ff8} .p{color:#667} /* overrides */ diff --git a/examples/roman10.py b/examples/roman10.py new file mode 100644 index 0000000..0efdc45 --- /dev/null +++ b/examples/roman10.py @@ -0,0 +1,87 @@ +"""Convert to and from Roman numerals + +This program is part of "Dive Into Python 3", a free Python book for +experienced programmers. Visit http://diveintopython3.org/ for the +latest version. +""" + +class OutOfRangeError(ValueError): pass +class NotIntegerError(ValueError): pass +class InvalidRomanNumeralError(ValueError): pass + +roman_numeral_map = (('M', 1000), + ('CM', 900), + ('D', 500), + ('CD', 400), + ('C', 100), + ('XC', 90), + ('L', 50), + ('XL', 40), + ('X', 10), + ('IX', 9), + ('V', 5), + ('IV', 4), + ('I', 1)) + +to_roman_table = [ None ] +from_roman_table = {} + +def to_roman(n): + """convert integer to Roman numeral""" + if not (0 < n < 5000): + raise OutOfRangeError("number out of range (must be 1..4999)") + if int(n) != n: + raise NotIntegerError("non-integers can not be converted") + return to_roman_table[n] + +def from_roman(s): + """convert Roman numeral to integer""" + if not isinstance(s, str): + raise InvalidRomanNumeralError("Input must be a string") + if not s: + raise InvalidRomanNumeralError("Input can not be blank") + if s not in from_roman_table: + raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s)) + return from_roman_table[s] + +def build_lookup_tables(): + def to_roman(n): + result = "" + for numeral, integer in roman_numeral_map: + if n >= integer: + result = numeral + n -= integer + break + if n > 0: + result += to_roman_table[n] + return result + + for integer in range(1, 5000): + roman_numeral = to_roman(integer) + to_roman_table.append(roman_numeral) + from_roman_table[roman_numeral] = integer + +build_lookup_tables() + +# Copyright (c) 2009, Mark Pilgrim, All rights reserved. +# +# Redistribution and use in source and binary forms, with or without modification, +# are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. diff --git a/examples/roman7.py b/examples/roman7.py index 6b61cae..8730a5b 100644 --- a/examples/roman7.py +++ b/examples/roman7.py @@ -52,8 +52,8 @@ def to_roman(n): def from_roman(s): """convert Roman numeral to integer""" - if not s: - raise InvalidRomanNumeralError("Input can not be blank") + if not isinstance(s, str): + raise InvalidRomanNumeralError("Input must be a string") if not roman_numeral_pattern.search(s): raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s)) diff --git a/examples/roman9.py b/examples/roman9.py new file mode 100644 index 0000000..85e09c4 --- /dev/null +++ b/examples/roman9.py @@ -0,0 +1,91 @@ +"""Convert to and from Roman numerals + +This program is part of "Dive Into Python 3", a free Python book for +experienced programmers. Visit http://diveintopython3.org/ for the +latest version. +""" +import re + +class OutOfRangeError(ValueError): pass +class NotIntegerError(ValueError): pass +class InvalidRomanNumeralError(ValueError): pass + +roman_numeral_map = (('M', 1000), + ('CM', 900), + ('D', 500), + ('CD', 400), + ('C', 100), + ('XC', 90), + ('L', 50), + ('XL', 40), + ('X', 10), + ('IX', 9), + ('V', 5), + ('IV', 4), + ('I', 1)) + +roman_numeral_pattern = re.compile(""" + ^ # beginning of string + M{0,4} # thousands - 0 to 4 M's + (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's), + # or 500-800 (D, followed by 0 to 3 C's) + (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's), + # or 50-80 (L, followed by 0 to 3 X's) + (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's), + # or 5-8 (V, followed by 0 to 3 I's) + $ # end of string + """, re.VERBOSE) + +def to_roman(n): + """convert integer to Roman numeral""" + if not (0 < n < 5000): + raise OutOfRangeError("number out of range (must be 0..4999)") + if not isinstance(n, int): + raise NotIntegerError("non-integers can not be converted") + + result = "" + for numeral, integer in roman_numeral_map: + while n >= integer: + result += numeral + n -= integer + return result + +def from_roman(s): + """convert Roman numeral to integer""" + if not isinstance(s, str): + raise InvalidRomanNumeralError("Input must be a string") + if not s: + raise InvalidRomanNumeralError("Input can not be blank") + if not roman_numeral_pattern.search(s): + raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s)) + + result = 0 + index = 0 + for numeral, integer in roman_numeral_map: + while s[index : index + len(numeral)] == numeral: + result += integer + index += len(numeral) + return result + +# Copyright (c) 2009, Mark Pilgrim, All rights reserved. +# +# Redistribution and use in source and binary forms, with or without modification, +# are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. diff --git a/examples/romantest10.py b/examples/romantest10.py new file mode 100644 index 0000000..dc48dd1 --- /dev/null +++ b/examples/romantest10.py @@ -0,0 +1,159 @@ +"""Unit test for roman1.py + +This program is part of "Dive Into Python 3", a free Python book for +experienced programmers. Visit http://diveintopython3.org/ for the +latest version. +""" + +import roman10 +import unittest + +class KnownValues(unittest.TestCase): + known_values = ( (1, 'I'), + (2, 'II'), + (3, 'III'), + (4, 'IV'), + (5, 'V'), + (6, 'VI'), + (7, 'VII'), + (8, 'VIII'), + (9, 'IX'), + (10, 'X'), + (50, 'L'), + (100, 'C'), + (500, 'D'), + (1000, 'M'), + (31, 'XXXI'), + (148, 'CXLVIII'), + (294, 'CCXCIV'), + (312, 'CCCXII'), + (421, 'CDXXI'), + (528, 'DXXVIII'), + (621, 'DCXXI'), + (782, 'DCCLXXXII'), + (870, 'DCCCLXX'), + (941, 'CMXLI'), + (1043, 'MXLIII'), + (1110, 'MCX'), + (1226, 'MCCXXVI'), + (1301, 'MCCCI'), + (1485, 'MCDLXXXV'), + (1509, 'MDIX'), + (1607, 'MDCVII'), + (1754, 'MDCCLIV'), + (1832, 'MDCCCXXXII'), + (1993, 'MCMXCIII'), + (2074, 'MMLXXIV'), + (2152, 'MMCLII'), + (2212, 'MMCCXII'), + (2343, 'MMCCCXLIII'), + (2499, 'MMCDXCIX'), + (2574, 'MMDLXXIV'), + (2646, 'MMDCXLVI'), + (2723, 'MMDCCXXIII'), + (2892, 'MMDCCCXCII'), + (2975, 'MMCMLXXV'), + (3051, 'MMMLI'), + (3185, 'MMMCLXXXV'), + (3250, 'MMMCCL'), + (3313, 'MMMCCCXIII'), + (3408, 'MMMCDVIII'), + (3501, 'MMMDI'), + (3610, 'MMMDCX'), + (3743, 'MMMDCCXLIII'), + (3844, 'MMMDCCCXLIV'), + (3888, 'MMMDCCCLXXXVIII'), + (3940, 'MMMCMXL'), + (3999, 'MMMCMXCIX'), + (4000, 'MMMM'), + (4500, 'MMMMD'), + (4888, 'MMMMDCCCLXXXVIII'), + (4999, 'MMMMCMXCIX')) + + def test_to_roman_known_values(self): + """to_roman should give known result with known input""" + for integer, numeral in self.known_values: + result = roman10.to_roman(integer) + self.assertEqual(numeral, result) + + def test_from_roman_known_values(self): + """from_roman should give known result with known input""" + for integer, numeral in self.known_values: + result = roman10.from_roman(numeral) + self.assertEqual(integer, result) + +class ToRomanBadInput(unittest.TestCase): + def test_too_large(self): + """to_roman should fail with large input""" + self.assertRaises(roman10.OutOfRangeError, roman10.to_roman, 5000) + + def test_zero(self): + """to_roman should fail with 0 input""" + self.assertRaises(roman10.OutOfRangeError, roman10.to_roman, 0) + + def test_negative(self): + """to_roman should fail with negative input""" + self.assertRaises(roman10.OutOfRangeError, roman10.to_roman, -1) + + def test_non_integer(self): + """to_roman should fail with non-integer input""" + self.assertRaises(roman10.NotIntegerError, roman10.to_roman, 0.5) + +class FromRomanBadInput(unittest.TestCase): + def test_too_many_repeated_numerals(self): + """from_roman should fail with too many repeated numerals""" + for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): + self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, s) + + def test_repeated_pairs(self): + """from_roman should fail with repeated pairs of numerals""" + for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'): + self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, s) + + def test_malformed_antecedents(self): + """from_roman should fail with malformed antecedents""" + for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV', + 'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'): + self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, s) + + def test_blank(self): + """from_roman should fail with blank string""" + self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, "") + + def test_non_string(self): + """from_roman should fail with non-string input""" + self.assertRaises(roman10.InvalidRomanNumeralError, roman10.from_roman, 1) + +class RoundtripCheck(unittest.TestCase): + def test_roundtrip(self): + """from_roman(to_roman(n))==n for all n""" + for integer in range(1, 5000): + numeral = roman10.to_roman(integer) + result = roman10.from_roman(numeral) + self.assertEqual(integer, result) + +if __name__ == "__main__": + unittest.main() + +# Copyright (c) 2009, Mark Pilgrim, All rights reserved. +# +# Redistribution and use in source and binary forms, with or without modification, +# are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. diff --git a/examples/romantest5.py b/examples/romantest5.py index 7428e21..2a6116c 100644 --- a/examples/romantest5.py +++ b/examples/romantest5.py @@ -95,8 +95,8 @@ class ToRomanBadInput(unittest.TestCase): """to_roman should fail with non-integer input""" self.assertRaises(roman5.NotIntegerError, roman5.to_roman, 0.5) -class SanityCheck(unittest.TestCase): - def testSanity(self): +class RoundtripCheck(unittest.TestCase): + def test_roundtrip(self): """from_roman(to_roman(n))==n for all n""" for integer in range(1, 4000): numeral = roman5.to_roman(integer) diff --git a/examples/romantest6.py b/examples/romantest6.py index ecf5d01..ead625c 100644 --- a/examples/romantest6.py +++ b/examples/romantest6.py @@ -98,7 +98,7 @@ class ToRomanBadInput(unittest.TestCase): class FromRomanBadInput(unittest.TestCase): def test_too_many_repeated_numerals(self): """from_roman should fail with too many repeated numerals""" - for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): + for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s) def test_repeated_pairs(self): @@ -112,8 +112,8 @@ class FromRomanBadInput(unittest.TestCase): 'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'): self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, s) -class SanityCheck(unittest.TestCase): - def testSanity(self): +class RoundtripCheck(unittest.TestCase): + def test_roundtrip(self): """from_roman(to_roman(n))==n for all n""" for integer in range(1, 4000): numeral = roman6.to_roman(integer) diff --git a/examples/romantest7.py b/examples/romantest7.py index f7cc4de..ecf74dc 100644 --- a/examples/romantest7.py +++ b/examples/romantest7.py @@ -98,7 +98,7 @@ class ToRomanBadInput(unittest.TestCase): class FromRomanBadInput(unittest.TestCase): def test_too_many_repeated_numerals(self): """from_roman should fail with too many repeated numerals""" - for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): + for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, s) def test_repeated_pairs(self): @@ -112,12 +112,12 @@ class FromRomanBadInput(unittest.TestCase): 'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'): self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, s) - def test_blank(self): - """from_roman should fail with blank string""" - self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, "") + def test_non_string(self): + """from_roman should fail with non-string input""" + self.assertRaises(roman7.InvalidRomanNumeralError, roman7.from_roman, 1) -class SanityCheck(unittest.TestCase): - def testSanity(self): +class RoundtripCheck(unittest.TestCase): + def test_roundtrip(self): """from_roman(to_roman(n))==n for all n""" for integer in range(1, 4000): numeral = roman7.to_roman(integer) diff --git a/examples/romantest8.py b/examples/romantest8.py index e5c577e..7cfb46b 100644 --- a/examples/romantest8.py +++ b/examples/romantest8.py @@ -98,7 +98,7 @@ class ToRomanBadInput(unittest.TestCase): class FromRomanBadInput(unittest.TestCase): def test_too_many_repeated_numerals(self): """from_roman should fail with too many repeated numerals""" - for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): + for s in ('MMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s) def test_repeated_pairs(self): @@ -120,8 +120,8 @@ class FromRomanBadInput(unittest.TestCase): """from_roman should fail with non-string input""" self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, 1) -class SanityCheck(unittest.TestCase): - def testSanity(self): +class RoundtripCheck(unittest.TestCase): + def test_roundtrip(self): """from_roman(to_roman(n))==n for all n""" for integer in range(1, 4000): numeral = roman8.to_roman(integer) diff --git a/examples/romantest9.py b/examples/romantest9.py new file mode 100644 index 0000000..2ce118b --- /dev/null +++ b/examples/romantest9.py @@ -0,0 +1,159 @@ +"""Unit test for roman1.py + +This program is part of "Dive Into Python 3", a free Python book for +experienced programmers. Visit http://diveintopython3.org/ for the +latest version. +""" + +import roman9 +import unittest + +class KnownValues(unittest.TestCase): + known_values = ( (1, 'I'), + (2, 'II'), + (3, 'III'), + (4, 'IV'), + (5, 'V'), + (6, 'VI'), + (7, 'VII'), + (8, 'VIII'), + (9, 'IX'), + (10, 'X'), + (50, 'L'), + (100, 'C'), + (500, 'D'), + (1000, 'M'), + (31, 'XXXI'), + (148, 'CXLVIII'), + (294, 'CCXCIV'), + (312, 'CCCXII'), + (421, 'CDXXI'), + (528, 'DXXVIII'), + (621, 'DCXXI'), + (782, 'DCCLXXXII'), + (870, 'DCCCLXX'), + (941, 'CMXLI'), + (1043, 'MXLIII'), + (1110, 'MCX'), + (1226, 'MCCXXVI'), + (1301, 'MCCCI'), + (1485, 'MCDLXXXV'), + (1509, 'MDIX'), + (1607, 'MDCVII'), + (1754, 'MDCCLIV'), + (1832, 'MDCCCXXXII'), + (1993, 'MCMXCIII'), + (2074, 'MMLXXIV'), + (2152, 'MMCLII'), + (2212, 'MMCCXII'), + (2343, 'MMCCCXLIII'), + (2499, 'MMCDXCIX'), + (2574, 'MMDLXXIV'), + (2646, 'MMDCXLVI'), + (2723, 'MMDCCXXIII'), + (2892, 'MMDCCCXCII'), + (2975, 'MMCMLXXV'), + (3051, 'MMMLI'), + (3185, 'MMMCLXXXV'), + (3250, 'MMMCCL'), + (3313, 'MMMCCCXIII'), + (3408, 'MMMCDVIII'), + (3501, 'MMMDI'), + (3610, 'MMMDCX'), + (3743, 'MMMDCCXLIII'), + (3844, 'MMMDCCCXLIV'), + (3888, 'MMMDCCCLXXXVIII'), + (3940, 'MMMCMXL'), + (3999, 'MMMCMXCIX'), + (4000, 'MMMM'), + (4500, 'MMMMD'), + (4888, 'MMMMDCCCLXXXVIII'), + (4999, 'MMMMCMXCIX')) + + def test_to_roman_known_values(self): + """to_roman should give known result with known input""" + for integer, numeral in self.known_values: + result = roman9.to_roman(integer) + self.assertEqual(numeral, result) + + def test_from_roman_known_values(self): + """from_roman should give known result with known input""" + for integer, numeral in self.known_values: + result = roman9.from_roman(numeral) + self.assertEqual(integer, result) + +class ToRomanBadInput(unittest.TestCase): + def test_too_large(self): + """to_roman should fail with large input""" + self.assertRaises(roman9.OutOfRangeError, roman9.to_roman, 5000) + + def test_zero(self): + """to_roman should fail with 0 input""" + self.assertRaises(roman9.OutOfRangeError, roman9.to_roman, 0) + + def test_negative(self): + """to_roman should fail with negative input""" + self.assertRaises(roman9.OutOfRangeError, roman9.to_roman, -1) + + def test_non_integer(self): + """to_roman should fail with non-integer input""" + self.assertRaises(roman9.NotIntegerError, roman9.to_roman, 0.5) + +class FromRomanBadInput(unittest.TestCase): + def test_too_many_repeated_numerals(self): + """from_roman should fail with too many repeated numerals""" + for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'): + self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, s) + + def test_repeated_pairs(self): + """from_roman should fail with repeated pairs of numerals""" + for s in ('CMCM', 'CDCD', 'XCXC', 'XLXL', 'IXIX', 'IVIV'): + self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, s) + + def test_malformed_antecedents(self): + """from_roman should fail with malformed antecedents""" + for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV', + 'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'): + self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, s) + + def test_blank(self): + """from_roman should fail with blank string""" + self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, "") + + def test_non_string(self): + """from_roman should fail with non-string input""" + self.assertRaises(roman9.InvalidRomanNumeralError, roman9.from_roman, 1) + +class RoundtripCheck(unittest.TestCase): + def test_roundtrip(self): + """from_roman(to_roman(n))==n for all n""" + for integer in range(1, 5000): + numeral = roman9.to_roman(integer) + result = roman9.from_roman(numeral) + self.assertEqual(integer, result) + +if __name__ == "__main__": + unittest.main() + +# Copyright (c) 2009, Mark Pilgrim, All rights reserved. +# +# Redistribution and use in source and binary forms, with or without modification, +# are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. diff --git a/index.html b/index.html index a8045ec..005ffba 100644 --- a/index.html +++ b/index.html @@ -14,7 +14,7 @@ h1:before{content:""}

                                 
                                -

                                You are here:   +

                                You are here:  

                                Dive Into Python 3

                                @@ -33,7 +33,7 @@ h1:before{content:""}
                              3. Objects and object-orientation
                              4. Unit Testing
                              5. Test-first programming -
                              6. Refactoring your code +
                              7. Refactoring
                              8. Files
                              9. HTML processing
                              10. XML processing diff --git a/refactoring.html b/refactoring.html new file mode 100644 index 0000000..1693443 --- /dev/null +++ b/refactoring.html @@ -0,0 +1,468 @@ + + + +Refactoring - Dive into Python 3 + + + + +
                                  
                                +

                                You are here: Home Dive Into Python 3 +

                                Refactoring

                                +
                                +

                                FIXME
                                — FIXME +

                                +

                                  +

                                Diving In

                                +

                                Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven't written yet. + +

                                >>> import roman6
                                +>>> roman6.from_roman("") 
                                +0
                                +
                                  +
                                1. Remember in the [FIXME-xref] previous section when you kept seeing that an empty string would match the regular expression you were using to check for valid Roman numerals? Well, it turns out that this is still true for the final version of the regular expression. And that's a bug; you want an empty string to raise an InvalidRomanNumeralError exception just like any other sequence of characters that don't represent a valid Roman numeral. +
                                + +

                                After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug. + +

                                class FromRomanBadInput(unittest.TestCase):  
                                +    .
                                +    .
                                +    .
                                +    def testBlank(self):
                                +        """from_roman should fail with blank string"""
                                +        self.assertRaises(roman6.InvalidRomanNumeralError, roman6.from_roman, "") 
                                +
                                  +
                                1. Pretty simple stuff here. Call from_roman() with an empty string and make sure it raises an InvalidRomanNumeralError exception. The hard part was finding the bug; now that you know about it, testing for it is the easy part. +
                                + +

                                Since your code has a bug, and you now have a test case that tests this bug, the test case will fail: +

                                +you@localhost:~$ python3 romantest8.py -v
                                +from_roman should fail with blank string ... FAIL
                                +from_roman should fail with malformed antecedents ... ok
                                +from_roman should fail with repeated pairs of numerals ... ok
                                +from_roman should fail with too many repeated numerals ... ok
                                +from_roman should give known result with known input ... ok
                                +to_roman should give known result with known input ... ok
                                +from_roman(to_roman(n))==n for all n ... ok
                                +to_roman should fail with negative input ... ok
                                +to_roman should fail with non-integer input ... ok
                                +to_roman should fail with large input ... ok
                                +to_roman should fail with 0 input ... ok
                                +
                                +======================================================================
                                +FAIL: from_roman should fail with blank string
                                +----------------------------------------------------------------------
                                +Traceback (most recent call last):
                                +  File "romantest8.py", line 117, in test_blank
                                +    self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, "")
                                +AssertionError: InvalidRomanNumeralError not raised by from_roman
                                +
                                +----------------------------------------------------------------------
                                +Ran 11 tests in 0.171s
                                +
                                +FAILED (failures=1)
                                + +

                                Now you can fix the bug. + +

                                def from_roman(s):
                                +    """convert Roman numeral to integer"""
                                +    if not s:  
                                +        raise InvalidRomanNumeralError, 'Input can not be blank'
                                +    if not re.search(romanNumeralPattern, s):
                                +        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
                                +
                                +    result = 0
                                +    index = 0
                                +    for numeral, integer in romanNumeralMap:
                                +        while s[index:index+len(numeral)] == numeral:
                                +            result += integer
                                +            index += len(numeral)
                                +    return result
                                +
                                  +
                                1. Only two lines of code are required: an explicit check for an empty string, and a raise statement. +
                                + +
                                +you@localhost:~$ python3 romantest8.py -v
                                +from_roman should fail with blank string ... ok  
                                +from_roman should fail with malformed antecedents ... ok
                                +from_roman should fail with repeated pairs of numerals ... ok
                                +from_roman should fail with too many repeated numerals ... ok
                                +from_roman should give known result with known input ... ok
                                +to_roman should give known result with known input ... ok
                                +from_roman(to_roman(n))==n for all n ... ok
                                +to_roman should fail with negative input ... ok
                                +to_roman should fail with non-integer input ... ok
                                +to_roman should fail with large input ... ok
                                +to_roman should fail with 0 input ... ok
                                +
                                +----------------------------------------------------------------------
                                +Ran 11 tests in 0.156s
                                +
                                +OK  
                                +
                                  +
                                1. The blank string test case now passes, so the bug is fixed. +
                                2. All the other test cases still pass, which means that this bug fix didn't break anything else. Stop coding. +
                                + +

                                Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may seem like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn't pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run all the test cases along with your new one, you are much less likely to break old code when fixing new code. Today's unit test is tomorrow's regression test. + +

                                Handling Changing Requirements

                                +

                                Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change. + +

                                Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember [FIXME-xref] the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 M characters in a row to represent 4000. If you make this change, you'll be able to expand the range of convertible numbers from 1..3999 to 1..4999. But first, you need to make some changes to your test cases. + +

                                [download roman8.py] +

                                
                                +class KnownValues(unittest.TestCase):
                                +    known_values = ( (1, 'I'),
                                +                      .
                                +                      .
                                +                      .
                                +                     (3999, 'MMMCMXCIX'),
                                +                     (4000, 'MMMM'),                                      
                                +                     (4500, 'MMMMD'),
                                +                     (4888, 'MMMMDCCCLXXXVIII'),
                                +                     (4999, 'MMMMCMXCIX') )
                                +
                                +class ToRomanBadInput(unittest.TestCase):
                                +    def test_too_large(self):
                                +        """to_roman should fail with large input"""
                                +        self.assertRaises(roman8.OutOfRangeError, roman8.to_roman, 5000)  
                                +
                                +    .
                                +    .
                                +    .
                                +
                                +class FromRomanBadInput(unittest.TestCase):
                                +    def test_too_many_repeated_numerals(self):
                                +        """from_roman should fail with too many repeated numerals"""
                                +        for s in ('MMMMM', 'DD', 'CCCC', 'LL', 'XXXX', 'VV', 'IIII'):     
                                +            self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s)
                                +
                                +    .
                                +    .
                                +    .
                                +
                                +class RoundtripCheck(unittest.TestCase):
                                +    def test_roundtrip(self):
                                +        """from_roman(to_roman(n))==n for all n"""
                                +        for integer in range(1, 5000):                                    
                                +            numeral = roman8.to_roman(integer)
                                +            result = roman8.from_roman(numeral)
                                +            self.assertEqual(integer, result)
                                +
                                  +
                                1. The existing known values don't change (they're all still reasonable values to test), but you need to add a few more in the 4000 range. Here I've included 4000 (the shortest), 4500 (the second shortest), 4888 (the longest), and 4999 (the largest). +
                                2. The definition of “large input” has changed. This test used to call to_roman() with 4000 and expect an error; now that 4000-4999 are good values, you need to bump this up to 5000. +
                                3. The definition of “too many repeated numerals” has also changed. This test used to call from_roman() with 'MMMM' and expect an error; now that MMMM is considered a valid Roman numeral, you need to bump this up to 'MMMMM'. +
                                4. The sanity check loops through every number in the range, from 1 to 3999. Since the range has now expanded, this for loop need to be updated as well to go up to 4999. +
                                + +

                                Now your test cases are up to date with the new requirements, but your code is not, so you expect several of the test cases to fail. + +

                                +you@localhost:~$ python3 romantest9.py -v
                                +from_roman should fail with blank string ... ok
                                +from_roman should fail with malformed antecedents ... ok
                                +from_roman should fail with non-string input ... ok
                                +from_roman should fail with repeated pairs of numerals ... ok
                                +from_roman should fail with too many repeated numerals ... ok
                                +from_roman should give known result with known input ... ERROR          
                                +to_roman should give known result with known input ... ERROR            
                                +from_roman(to_roman(n))==n for all n ... ERROR                          
                                +to_roman should fail with negative input ... ok
                                +to_roman should fail with non-integer input ... ok
                                +to_roman should fail with large input ... ok
                                +to_roman should fail with 0 input ... ok
                                +
                                +======================================================================
                                +ERROR: from_roman should give known result with known input
                                +----------------------------------------------------------------------
                                +Traceback (most recent call last):
                                +  File "romantest9.py", line 82, in test_from_roman_known_values
                                +    result = roman9.from_roman(numeral)
                                +  File "C:\home\diveintopython3\examples\roman9.py", line 60, in from_roman
                                +    raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
                                +roman9.InvalidRomanNumeralError: Invalid Roman numeral: MMMM
                                +
                                +======================================================================
                                +ERROR: to_roman should give known result with known input
                                +----------------------------------------------------------------------
                                +Traceback (most recent call last):
                                +  File "romantest9.py", line 76, in test_to_roman_known_values
                                +    result = roman9.to_roman(integer)
                                +  File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman
                                +    raise OutOfRangeError("number out of range (must be 0..3999)")
                                +roman9.OutOfRangeError: number out of range (must be 0..3999)
                                +
                                +======================================================================
                                +ERROR: from_roman(to_roman(n))==n for all n
                                +----------------------------------------------------------------------
                                +Traceback (most recent call last):
                                +  File "romantest9.py", line 131, in testSanity
                                +    numeral = roman9.to_roman(integer)
                                +  File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman
                                +    raise OutOfRangeError("number out of range (must be 0..3999)")
                                +roman9.OutOfRangeError: number out of range (must be 0..3999)
                                +
                                +----------------------------------------------------------------------
                                +Ran 12 tests in 0.171s
                                +
                                +FAILED (errors=3)
                                +
                                  +
                                1. The from_roman() known values test will fail as soon as it hits 'MMMM', because from_roman() still thinks this is an invalid Roman numeral. +
                                2. The to_roman() known values test will fail as soon as it hits 4000, because to_roman() still thinks this is out of range. +
                                3. The roundtrip check will also fail as soon as it hits 4000, because to_roman() still thinks this is out of range. +
                                + +

                                Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being tested is never “ahead” of the test cases. While it's behind, you still have some work to do, and as soon as it catches up to the test cases, you stop coding.) + +

                                [download roman9.py] +

                                
                                +roman_numeral_pattern = re.compile("""
                                +    ^                   # beginning of string
                                +    M{0,4}              # thousands - 0 to 4 M's  
                                +    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                                +                        #            or 500-800 (D, followed by 0 to 3 C's)
                                +    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                                +                        #        or 50-80 (L, followed by 0 to 3 X's)
                                +    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                                +                        #        or 5-8 (V, followed by 0 to 3 I's)
                                +    $                   # end of string
                                +    """, re.VERBOSE)
                                +
                                +def to_roman(n):
                                +    """convert integer to Roman numeral"""
                                +    if not (0 < n < 5000):                        
                                +        raise OutOfRangeError("number out of range (must be 0..4999)")
                                +    if not isinstance(n, int):
                                +        raise NotIntegerError("non-integers can not be converted")
                                +
                                +    result = ""
                                +    for numeral, integer in roman_numeral_map:
                                +        while n >= integer:
                                +            result += numeral
                                +            n -= integer
                                +    return result
                                +
                                +def from_roman(s):
                                +    .
                                +    .
                                +    .
                                +
                                  +
                                1. You don't need to make any changes to the from_roman() function at all. The only change is to roman_numeral_pattern. If you look closely, you'll notice that I changed the maximum number of optional M characters from 3 to 4 in the first section of the regular expression. This will allow the Roman numeral equivalents of 4999 instead of 3999. The actual from_roman() function is completely generic; it just looks for repeated Roman numeral characters and adds them up, without caring how many times they repeat. The only reason it didn't handle 'MMMM' before is that you explicitly stopped it with the regular expression pattern matching. +
                                2. The to_roman() function only needs one small change, in the range check. Where you used to check 0 < n < 4000, you now check 0 < n < 5000. And you change the error message that you raise to reflect the new acceptable range (1..4999 instead of 1..3999). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds 'M' for each thousand that it finds; given 4000, it will spit out 'MMMM'. The only reason it didn't do this before is that you explicitly stopped it with the range check.) +
                                + +

                                You may be skeptical that these two small changes are all that you need. Hey, don't take my word for it; see for yourself. + +

                                +you@localhost:~$ python3 romantest9.py -v
                                +from_roman should fail with blank string ... ok
                                +from_roman should fail with malformed antecedents ... ok
                                +from_roman should fail with non-string input ... ok
                                +from_roman should fail with repeated pairs of numerals ... ok
                                +from_roman should fail with too many repeated numerals ... ok
                                +from_roman should give known result with known input ... ok
                                +to_roman should give known result with known input ... ok
                                +from_roman(to_roman(n))==n for all n ... ok
                                +to_roman should fail with negative input ... ok
                                +to_roman should fail with non-integer input ... ok
                                +to_roman should fail with large input ... ok
                                +to_roman should fail with 0 input ... ok
                                +
                                +----------------------------------------------------------------------
                                +Ran 12 tests in 0.203s
                                +
                                +OK  
                                +
                                  +
                                1. All the test cases pass. Stop coding. +
                                + +

                                Comprehensive unit testing means never having to rely on a programmer who says “Trust me.” + +

                                Refactoring

                                + +

                                The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly. + +

                                Refactoring is the process of taking working code and making it work better. Usually, “better” means “faster”, although it can also mean “using less memory”, or “using less disk space”, or simply “more elegantly”. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any program. + +

                                Here, “better” means both “faster” and “easier to maintain.” Specifically, the from_roman() function is slower and more complex than I'd like, because of that big nasty regular expression that you use to validate Roman numerals. Now, you might think, "Sure, the regular expression is big and hairy, but how else am I supposed to validate that an arbitrary string is a valid a Roman numeral?" + +

                                Answer: there's only 5000 of them; why don't you just build a lookup table? This idea gets even better when you realize that you don't need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals, you can build the reverse lookup table to convert Roman numerals to integers. By the time you need to check whether an arbitrary string is a valid Roman numeral, you will have collected all the valid Roman numerals. “Validating” is reduced to a single dictionary lookup. + +

                                And best of all, you already have a complete set of unit tests. You can change over half the code in the module, but the unit tests will stay the same. That means you can prove — to yourself and to others — that the new code works just as well as the original. + +

                                [download roman10.py] +

                                class OutOfRangeError(ValueError): pass
                                +class NotIntegerError(ValueError): pass
                                +class InvalidRomanNumeralError(ValueError): pass
                                +
                                +roman_numeral_map = (('M',  1000),
                                +                     ('CM', 900),
                                +                     ('D',  500),
                                +                     ('CD', 400),
                                +                     ('C',  100),
                                +                     ('XC', 90),
                                +                     ('L',  50),
                                +                     ('XL', 40),
                                +                     ('X',  10),
                                +                     ('IX', 9),
                                +                     ('V',  5),
                                +                     ('IV', 4),
                                +                     ('I',  1))
                                +
                                +to_roman_table = [ None ]
                                +from_roman_table = {}
                                +
                                +def to_roman(n):
                                +    """convert integer to Roman numeral"""
                                +    if not (0 < n < 5000):
                                +        raise OutOfRangeError("number out of range (must be 1..4999)")
                                +    if int(n) != n:
                                +        raise NotIntegerError("non-integers can not be converted")
                                +    return to_roman_table[n]
                                +
                                +def from_roman(s):
                                +    """convert Roman numeral to integer"""
                                +    if not isinstance(s, str):
                                +        raise InvalidRomanNumeralError("Input must be a string")
                                +    if not s:
                                +        raise InvalidRomanNumeralError("Input can not be blank")
                                +    if s not in from_roman_table:
                                +        raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
                                +    return from_roman_table[s]
                                +
                                +def build_lookup_tables():
                                +    def to_roman(n):
                                +        result = ""
                                +        for numeral, integer in roman_numeral_map:
                                +            if n >= integer:
                                +                result = numeral
                                +                n -= integer
                                +                break
                                +        if n > 0:
                                +            result += to_roman_table[n]
                                +        return result
                                +
                                +    for integer in range(1, 5000):
                                +        roman_numeral = to_roman(integer)
                                +        to_roman_table.append(roman_numeral)
                                +        from_roman_table[roman_numeral] = integer
                                +
                                +build_lookup_tables()
                                + +

                                Let's break that down into digestable pieces. Arguably, the most important line is the last one: + +

                                build_lookup_tables()
                                + +

                                You will note that is a function call, but there's no if statement around it. This is not an if __name__ == '__main__' block; it gets called when the module is imported. (It is important to understand that modules are only imported once, then cached. If you import an already-imported module, it does nothing. So this code will only get called the first time you import this module.) + +

                                So what does the build_lookup_tables() function do? I'm glad you asked. + +

                                to_roman_table = [ None ]
                                +from_roman_table = {}
                                +.
                                +.
                                +.
                                +def build_lookup_tables():
                                +    def to_roman(n):                                
                                +        result = ""
                                +        for numeral, integer in roman_numeral_map:
                                +            if n >= integer:
                                +                result = numeral
                                +                n -= integer
                                +                break
                                +        if n > 0:
                                +            result += to_roman_table[n]
                                +        return result
                                +
                                +    for integer in range(1, 5000):
                                +        roman_numeral = to_roman(integer)          
                                +        to_roman_table.append(roman_numeral)       
                                +        from_roman_table[roman_numeral] = integer
                                +
                                  +
                                1. This is a clever bit of programming… perhaps too clever. The to_roman() function is defined above; it looks up values in the lookup table and returns them. But the build_lookup_tables() function redefines the to_roman() function to actually do work (like the previous examples did, before you added a lookup table). Within the build_lookup_tables() function, calling to_roman() will call this redefined version. Once the build_lookup_tables() function exits, the redefined version disappears — it is only defined in the local scope of the build_lookup_tables() function. +
                                2. This line of code will call the redefined to_roman() function, which actually calculates the Roman numeral. +
                                3. Once you have the result (from the redefined to_roman() function), you add the integer and its Roman numeral equivalent to both lookup tables. +
                                + +

                                Once the lookup tables are built, the rest of the code is both easy and fast. + +

                                def to_roman(n):
                                +    """convert integer to Roman numeral"""
                                +    if not (0 < n < 5000):
                                +        raise OutOfRangeError("number out of range (must be 1..4999)")
                                +    if int(n) != n:
                                +        raise NotIntegerError("non-integers can not be converted")
                                +    return to_roman_table[n]                                            
                                +
                                +def from_roman(s):
                                +    """convert Roman numeral to integer"""
                                +    if not isinstance(s, str):
                                +        raise InvalidRomanNumeralError("Input must be a string")
                                +    if not s:
                                +        raise InvalidRomanNumeralError("Input can not be blank")
                                +    if s not in from_roman_table:
                                +        raise InvalidRomanNumeralError("Invalid Roman numeral: {0}".format(s))
                                +    return from_roman_table[s]                                          
                                +
                                  +
                                1. After doing the same bounds checking as before, the to_roman() function simply finds the appropriate value in the lookup table and returns it. +
                                2. Similarly, the from_roman() function is reduced to some bounds checking and one line of code. No more regular expressions. No more looping. O(1) conversion to and from Roman numerals. +
                                + +

                                But does it work? Why yes, yes it does. And I can prove it. + +

                                +you@localhost:~$ python3 romantest10.py -v
                                +from_roman should fail with blank string ... ok
                                +from_roman should fail with malformed antecedents ... ok
                                +from_roman should fail with non-string input ... ok
                                +from_roman should fail with repeated pairs of numerals ... ok
                                +from_roman should fail with too many repeated numerals ... ok
                                +from_roman should give known result with known input ... ok
                                +to_roman should give known result with known input ... ok
                                +from_roman(to_roman(n))==n for all n ... ok
                                +to_roman should fail with negative input ... ok
                                +to_roman should fail with non-integer input ... ok
                                +to_roman should fail with large input ... ok
                                +to_roman should fail with 0 input ... ok
                                +
                                +----------------------------------------------------------------------
                                +Ran 12 tests in 0.031s                                                  
                                +
                                +OK
                                +
                                  +
                                1. Not that you asked, but it's fast, too! Like, almost 10× as fast. Of course, it's not entirely a fair comparison, because this version takes longer to import (when it builds the lookup tables). But since the import is only done once, the startup cost is amortized over all the calls to the to_roman() and from_roman() functions. Since the tests make several thousand function calls (the roundtrip test alone makes 10,000), this savings adds up in a hurry! +
                                + +

                                The moral of the story? + +

                                  +
                                • Simplicity is a virtue. +
                                • Especially when regular expressions are involved. +
                                • Unit tests can give you the confidence to do large-scale refactoring. +
                                + +

                                Summary

                                + +

                                Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it work, you'll wonder how you ever got along without it. + +

                                These few chapters have covered a lot of ground, and much of it wasn't even Python-specific. There are unit testing frameworks for many languages, all of which require you to understand the same basic concepts: + +

                                  +
                                • Designing test cases that are specific, automated, and independent +
                                • Writing test cases before the code they are testing +
                                • Writing tests that test good input and check for proper results +
                                • Writing tests that test bad input and check for proper failure responses +
                                • Writing and updating test cases to reflect new requirements +
                                • Refactoring mercilessly to improve performance, scalability, readability, maintainability, or whatever other -ility you're lacking +
                                + +

                                © 2001–9 Mark Pilgrim + + diff --git a/unit-testing.html b/unit-testing.html index fed8eb6..4bdad52 100644 --- a/unit-testing.html +++ b/unit-testing.html @@ -5,7 +5,6 @@ @@ -544,6 +543,16 @@ For instance, the testFromRomanCase method (“from_roman

                              11. from_roman should only accept uppercase Roman numerals (i.e. it should fail when given lowercase input).
                              --> + + + +

                              © 2001–9 Mark Pilgrim