Knowing this, you can design a test suite for your module within the module itself by putting it in this if statement. When you run the module directly, __name__ is __main__, so the test suite executes. When you import the module, __name__ is something else, so the test suite is ignored. This makes it easier to develop and debug new modules before integrating
them into a larger program.
 | On MacPython, there is an additional step to make the if __name__ trick work. Pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and
make sure Run as __main__ is checked.
Further Reading on Importing Modules
3.4. Declaring variables
Now that you know something about dictionaries, tuples, and lists (oh my!), let's get back to the sample program from Chapter 2, odbchelper.py.
Python has local and global variables like most other languages, but it has no explicit variable declarations. Variables spring
into existence by being assigned a value, and they are automatically destroyed when they go out of scope.
Example 3.17. Defining the myParams Variable
if __name__ == "__main__":
myParams = {"server":"mpilgrim", \
"database":"master", \
"uid":"sa", \
"pwd":"secret" \
}
Notice the indentation. An if statement is a code block and needs to be indented just like a function.
Also notice that the variable assignment is one command split over several lines, with a backslash (“\”) serving as a line-continuation marker.
 | When a command is split among several lines with the line-continuation marker (“\”), the continued lines can be indented in any manner; Python's normally stringent indentation rules do not apply. If your Python IDE auto-indents the continued line, you should probably accept its default unless you have a burning reason not to.
Strictly speaking, expressions in parentheses, straight brackets, or curly braces (like defining a dictionary) can be split into multiple lines with or without the line continuation character (“\”). I like to include the backslash even when it's not required because I think it makes the code easier to read, but that's
a matter of style.
[unbound variable exception example was here]
3.4.2. Assigning Multiple Values at Once
One of the cooler programming shortcuts in Python is using sequences to assign multiple values at once.
Example 3.19. Assigning multiple values at once>>> v = ('a', 'b', 'e')
>>> (x, y, z) = v ①
>>> x
'a'
>>> y
'b'
>>> z
'e'
- v is a tuple of three elements, and
(x, y, z) is a tuple of three variables. Assigning one to the other assigns each of the values of v to each of the variables, in order.
This has all sorts of uses. I often want to assign names to a range of values. In C, you would use enum and manually list each constant and its associated value, which seems especially tedious when the values are consecutive.
In Python, you can use the built-in range function with multi-variable assignment to quickly assign consecutive values.
Example 3.20. Assigning Consecutive Values>>> range(7) ①
[0, 1, 2, 3, 4, 5, 6]
>>> (MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7) ②
>>> MONDAY ③
0
>>> TUESDAY
1
>>> SUNDAY
6
- The built-in
range function returns a list of integers. In its simplest form, it takes an upper limit and returns a zero-based list counting
up to but not including the upper limit. (If you like, you can pass other parameters to specify a base other than 0 and a step other than 1. You can print range.__doc__ for details.)
- MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, and SUNDAY are the variables you're defining. (This example came from the
calendar module, a fun little module that prints calendars, like the UNIX program cal. The calendar module defines integer constants for days of the week.)
- Now each variable has its value: MONDAY is
0, TUESDAY is 1, and so forth.
You can also use multi-variable assignment to build functions that return multiple values, simply by returning a tuple of
all the values. The caller can treat it as a tuple, or assign the values to individual variables. Many standard Python libraries do this, including the os module, which you'll discuss in Chapter 6.
Further Reading on Variables
Example 6.12. Introducing sys.modules>>> import sys ①
>>> print '\n'.join(sys.modules.keys()) ②
win32api
os.path
os
exceptions
__main__
ntpath
nt
sys
__builtin__
site
signal
UserDict
stat
- The
sys module contains system-level information, such as the version of Python you're running (sys.version or sys.version_info), and system-level options such as the maximum allowed recursion depth (sys.getrecursionlimit() and sys.setrecursionlimit()).
sys.modules is a dictionary containing all the modules that have ever been imported since Python was started; the key is the module name, the value is the module object. Note that this is more than just the modules your program has imported. Python preloads some modules on startup, and if you're using a Python IDE, sys.modules contains all the modules imported by all the programs you've run within the IDE.
This example demonstrates how to use sys.modules.
Example 6.13. Using sys.modules>>> import fileinfo ①
>>> print '\n'.join(sys.modules.keys())
win32api
os.path
os
fileinfo
exceptions
__main__
ntpath
nt
sys
__builtin__
site
signal
UserDict
stat
>>> fileinfo
<module 'fileinfo' from 'fileinfo.pyc'>
>>> sys.modules["fileinfo"] ②
<module 'fileinfo' from 'fileinfo.pyc'>
- As new modules are imported, they are added to
sys.modules. This explains why importing the same module twice is very fast: Python has already loaded and cached the module in sys.modules, so importing the second time is simply a dictionary lookup.
- Given the name (as a string) of any previously-imported module, you can get a reference to the module itself through the
sys.modules dictionary.
The next example shows how to use the __module__ class attribute with the sys.modules dictionary to get a reference to the module in which a class is defined.
Example 6.14. The __module__ Class Attribute>>> from fileinfo import MP3FileInfo
>>> MP3FileInfo.__module__ ①
'fileinfo'
>>> sys.modules[MP3FileInfo.__module__] ②
<module 'fileinfo' from 'fileinfo.pyc'>
- Every Python class has a built-in class attribute
__module__, which is the name of the module in which the class is defined.
- Combining this with the
sys.modules dictionary, you can get a reference to the module in which a class is defined.
Now you're ready to see how sys.modules is used in fileinfo.py, the sample program introduced in Chapter 5. This example shows that portion of the code.
Example 6.15. sys.modules in fileinfo.py
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]): ①
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:] ②
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo ③
- This is a function with two arguments; filename is required, but module is optional and defaults to the module that contains the
FileInfo class. This looks inefficient, because you might expect Python to evaluate the sys.modules expression every time the function is called. In fact, Python evaluates default expressions only once, the first time the module is imported. As you'll see later, you never call this
function with a module argument, so module serves as a function-level constant.
- You'll plow through this line later, after you dive into the
os module. For now, take it on faith that subclass ends up as the name of a class, like MP3FileInfo.
- You already know about
getattr, which gets a reference to an object by name. hasattr is a complementary function that checks whether an object has a particular attribute; in this case, whether a module has
a particular class (although it works for any object and any attribute, just like getattr). In English, this line of code says, “If this module has the class named by subclass then return it, otherwise return the base class FileInfo.”
Further Reading on Modules
6.5. Working with Directories
The os.path module has several functions for manipulating files and directories. Here, we're looking at handling pathnames and listing
the contents of a directory.
Example 6.16. Constructing Pathnames
>>> import os
>>> os.path.join("c:\\music\\ap\\", "mahadeva.mp3") ① ②
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.join("c:\\music\\ap", "mahadeva.mp3") ③
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.expanduser("~") ④
'c:\\Documents and Settings\\mpilgrim\\My Documents'
>>> os.path.join(os.path.expanduser("~"), "Python") ⑤
'c:\\Documents and Settings\\mpilgrim\\My Documents\\Python'
os.path is a reference to a module -- which module depends on your platform. Just as getpass encapsulates differences between platforms by setting getpass to a platform-specific function, os encapsulates differences between platforms by setting path to a platform-specific module.
- The
join function of os.path constructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. (Note that dealing
with pathnames on Windows is annoying because the backslash character must be escaped.)
- In this slightly less trivial case,
join will add an extra backslash to the pathname before joining it to the filename. I was overjoyed when I discovered this, since
addSlashIfNecessary is one of the stupid little functions I always need to write when building up my toolbox in a new language. Do not write this stupid little function in Python; smart people have already taken care of it for you.
expanduser will expand a pathname that uses ~ to represent the current user's home directory. This works on any platform where users have a home directory, like Windows,
UNIX, and Mac OS X; it has no effect on Mac OS.
- Combining these techniques, you can easily construct pathnames for directories and files under the user's home directory.
Example 6.17. Splitting Pathnames>>> os.path.split("c:\\music\\ap\\mahadeva.mp3") ①
('c:\\music\\ap', 'mahadeva.mp3')
>>> (filepath, filename) = os.path.split("c:\\music\\ap\\mahadeva.mp3") ②
>>> filepath ③
'c:\\music\\ap'
>>> filename ④
'mahadeva.mp3'
>>> (shortname, extension) = os.path.splitext(filename) ⑤
>>> shortname
'mahadeva'
>>> extension
'.mp3'
- The
split function splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use
multi-variable assignment to return multiple values from a function? Well, split is such a function.
- You assign the return value of the
split function into a tuple of two variables. Each variable receives the value of the corresponding element of the returned tuple.
- The first variable, filepath, receives the value of the first element of the tuple returned from
split, the file path.
- The second variable, filename, receives the value of the second element of the tuple returned from
split, the filename.
os.path also contains a function splitext, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique
to assign each of them to separate variables.
Example 6.18. Listing Directories>>> os.listdir("c:\\music\\_singles\\") ①
['a_time_long_forgotten_con.mp3', 'hellraiser.mp3',
'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3',
'spinning.mp3']
>>> dirname = "c:\\"
>>> os.listdir(dirname) ②
['AUTOEXEC.BAT', 'boot.ini', 'CONFIG.SYS', 'cygwin',
'docbook', 'Documents and Settings', 'Incoming', 'Inetpub', 'IO.SYS',
'MSDOS.SYS', 'Music', 'NTDETECT.COM', 'ntldr', 'pagefile.sys',
'Program Files', 'Python20', 'RECYCLER',
'System Volume Information', 'TEMP', 'WINNT']
>>> [f for f in os.listdir(dirname)
... if os.path.isfile(os.path.join(dirname, f))] ③
['AUTOEXEC.BAT', 'boot.ini', 'CONFIG.SYS', 'IO.SYS', 'MSDOS.SYS',
'NTDETECT.COM', 'ntldr', 'pagefile.sys']
>>> [f for f in os.listdir(dirname)
... if os.path.isdir(os.path.join(dirname, f))] ④
['cygwin', 'docbook', 'Documents and Settings', 'Incoming',
'Inetpub', 'Music', 'Program Files', 'Python20', 'RECYCLER',
'System Volume Information', 'TEMP', 'WINNT']
- The
listdir function takes a pathname and returns a list of the contents of the directory.
listdir returns both files and folders, with no indication of which is which.
- You can use list filtering and the
isfile function of the os.path module to separate the files from the folders. isfile takes a pathname and returns 1 if the path represents a file, and 0 otherwise. Here you're using os.path.join to ensure a full pathname, but isfile also works with a partial path, relative to the current working directory. You can use os.getcwd() to get the current working directory.
os.path also has a isdir function which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories
within a directory.
Example 6.19. Listing Directories in fileinfo.py
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f)
for f in os.listdir(directory)] ① ②
fileList = [os.path.join(directory, f)
for f in fileList
if os.path.splitext(f)[1] in fileExtList] ③ ④ ⑤
os.listdir(directory) returns a list of all the files and folders in directory.
- Iterating through the list with f, you use
os.path.normcase(f) to normalize the case according to operating system defaults. normcase is a useful little function that compensates for case-insensitive operating systems that think that mahadeva.mp3 and mahadeva.MP3 are the same file. For instance, on Windows and Mac OS, normcase will convert the entire filename to lowercase; on UNIX-compatible systems, it will return the filename unchanged.
- Iterating through the normalized list with f again, you use
os.path.splitext(f) to split each filename into name and extension.
- For each file, you see if the extension is in the list of file extensions you care about (fileExtList, which was passed to the
listDirectory function).
- For each file you care about, you use
os.path.join(directory, f) to construct the full pathname of the file, and return a list of the full pathnames.
 | Whenever possible, you should use the functions in os and os.path for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like
os.path.split work on UNIX, Windows, Mac OS, and any other platform supported by Python.
There is one other way to get the contents of a directory. It's very powerful, and it uses the sort of wildcards that you
may already be familiar with from working on the command line.
Example 6.20. Listing Directories with glob
>>> os.listdir("c:\\music\\_singles\\") ①
['a_time_long_forgotten_con.mp3', 'hellraiser.mp3',
'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3',
'spinning.mp3']
>>> import glob
>>> glob.glob('c:\\music\\_singles\\*.mp3') ②
['c:\\music\\_singles\\a_time_long_forgotten_con.mp3',
'c:\\music\\_singles\\hellraiser.mp3',
'c:\\music\\_singles\\kairo.mp3',
'c:\\music\\_singles\\long_way_home1.mp3',
'c:\\music\\_singles\\sidewinder.mp3',
'c:\\music\\_singles\\spinning.mp3']
>>> glob.glob('c:\\music\\_singles\\s*.mp3') ③
['c:\\music\\_singles\\sidewinder.mp3',
'c:\\music\\_singles\\spinning.mp3']
>>> glob.glob('c:\\music\\*\\*.mp3')④
- As you saw earlier,
os.listdir simply takes a directory path and lists all files and directories in that directory.
- The
glob module, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard.
Here the wildcard is a directory path plus "*.mp3", which will match all .mp3 files. Note that each element of the returned list already includes the full path of the file.
- If you want to find all the files in a specific directory that start with "s" and end with ".mp3", you can do that too.
- Now consider this scenario: you have a
music directory, with several subdirectories within it, with .mp3 files within each subdirectory. You can get a list of all of those with a single call to glob, by using two wildcards at once. One wildcard is the "*.mp3" (to match .mp3 files), and one wildcard is within the directory path itself, to match any subdirectory within c:\music. That's a crazy amount of power packed into one deceptively simple-looking function!
Further Reading on the os Module
[HTML stuff was here]
8.5. locals and globals
Let's digress from HTML processing for a minute and talk about how Python handles variables. Python has two built-in functions, locals and globals, which provide dictionary-based access to local and global variables.
Remember locals? You first saw it here:
def unknown_starttag(self, tag, attrs):
strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs])
self.pieces.append("<%(tag)s%(strattrs)s>" % locals())
No, wait, you can't learn about locals yet. First, you need to learn about namespaces. This is dry stuff, but it's important, so pay attention.
Python uses what are called namespaces to keep track of variables. A namespace is just like a dictionary where the keys are names
of variables and the dictionary values are the values of those variables. In fact, you can access a namespace as a Python dictionary, as you'll see in a minute.
At any particular point in a Python program, there are several namespaces available. Each function has its own namespace, called the local namespace, which
keeps track of the function's variables, including function arguments and locally defined variables. Each module has its
own namespace, called the global namespace, which keeps track of the module's variables, including functions, classes, any
other imported modules, and module-level variables and constants. And there is the built-in namespace, accessible from any
module, which holds built-in functions and exceptions.
When a line of code asks for the value of a variable x, Python will search for that variable in all the available namespaces, in order:
- local namespace - specific to the current function or class method. If the function defines a local variable x, or has an argument x, Python will use this and stop searching.
- global namespace - specific to the current module. If the module has defined a variable, function, or class called x, Python will use that and stop searching.
- built-in namespace - global to all modules. As a last resort, Python will assume that x is the name of built-in function or variable.
If Python doesn't find x in any of these namespaces, it gives up and raises a NameError with the message There is no variable named 'x', which you saw back in Example 3.18, “Referencing an Unbound Variable”, but you didn't appreciate how much work Python was doing before giving you that error.
 | Python 2.2 introduced a subtle but important change that affects the namespace search order: nested scopes. In versions of Python prior to 2.2, when you reference a variable within a nested function or lambda function, Python will search for that variable in the current (nested or lambda) function's namespace, then in the module's namespace. Python 2.2 will search for the variable in the current (nested or lambda) function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1 can work either way; by default, it works like Python 2.0, but you can add the following line of code at the top of your module to make your module work like Python 2.2:
from __future__ import nested_scopes
Are you confused yet? Don't despair! This is really cool, I promise. Like many things in Python, namespaces are directly accessible at run-time. How? Well, the local namespace is accessible via the built-in locals function, and the global (module level) namespace is accessible via the built-in globals function.
Example 8.10. Introducing locals>>> def foo(arg): ①
... x = 1
... print locals()
...
>>> foo(7) ②
{'arg': 7, 'x': 1}
>>> foo('bar') ③
{'arg': 'bar', 'x': 1}
- The function
foo has two variables in its local namespace: arg, whose value is passed in to the function, and x, which is defined within the function.
locals returns a dictionary of name/value pairs. The keys of this dictionary are the names of the variables as strings; the values
of the dictionary are the actual values of the variables. So calling foo with 7 prints the dictionary containing the function's two local variables: arg (7) and x (1).
- Remember, Python has dynamic typing, so you could just as easily pass a string in for arg; the function (and the call to
locals) would still work just as well. locals works with all variables of all datatypes.
What locals does for the local (function) namespace, globals does for the global (module) namespace. globals is more exciting, though, because a module's namespace is more exciting.
[3] Not only does the module's namespace include module-level variables and constants, it includes all the functions and classes
defined in the module. Plus, it includes anything that was imported into the module.
Remember the difference between from module import and import module? With import module, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access
any of its functions or attributes: module.function. But with from module import, you're actually importing specific functions and attributes from another module into your own namespace, which is why you
access them directly without referencing the original module they came from. With the globals function, you can actually see this happen.
Example 8.11. Introducing globals
Look at the following block of code at the bottom of BaseHTMLProcessor.py:
if __name__ == "__main__":
for k, v in globals().items(): ①
print k, "=", v
- Just so you don't get intimidated, remember that you've seen all this before. The
globals function returns a dictionary, and you're iterating through the dictionary using the items method and multi-variable assignment. The only thing new here is the globals function.
Now running the script from the command line gives this output (note that your output may be slightly different, depending
on your platform and where you installed Python): c:\docbook\dip\py> python BaseHTMLProcessor.py
SGMLParser = sgmllib.SGMLParser ①
htmlentitydefs = <module 'htmlentitydefs' from 'C:\Python23\lib\htmlentitydefs.py'> ②
BaseHTMLProcessor = __main__.BaseHTMLProcessor ③
__name__ = __main__ ④
... rest of output omitted for brevity...
SGMLParser was imported from sgmllib, using from module import. That means that it was imported directly into the module's namespace, and here it is.
- Contrast this with
htmlentitydefs, which was imported using import. That means that the htmlentitydefs module itself is in the namespace, but the entitydefs variable defined within htmlentitydefs is not.
- This module only defines one class,
BaseHTMLProcessor, and here it is. Note that the value here is the class itself, not a specific instance of the class.
- Remember the
if __name__ trick? When running a module (as opposed to importing it from another module), the built-in __name__ attribute is a special value, __main__. Since you ran this module as a script from the command line, __name__ is __main__, which is why the little test code to print the globals got executed.
 | Using the locals and globals functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors
the functionality of the getattr function, which allows you to access arbitrary functions dynamically by providing the function name as a string.
There is one other important difference between the locals and globals functions, which you should learn now before it bites you. It will bite you anyway, but at least then you'll remember learning
it.
Example 8.12. locals is read-only, globals is not
def foo(arg):
x = 1
print locals() ①
locals()["x"] = 2 ②
print "x=",x ③
z = 7
print "z=",z
foo(3)
globals()["z"] = 8 ④
print "z=",z ⑤
- Since
foo is called with 3, this will print {'arg': 3, 'x': 1}. This should not be a surprise.
locals is a function that returns a dictionary, and here you are setting a value in that dictionary. You might think that this
would change the value of the local variable x to 2, but it doesn't. locals does not actually return the local namespace, it returns a copy. So changing it does nothing to the value of the variables
in the local namespace.
- This prints
x= 1, not x= 2.
- After being burned by
locals, you might think that this wouldn't change the value of z, but it does. Due to internal differences in how Python is implemented (which I'd rather not go into, since I don't fully understand them myself), globals returns the actual global namespace, not a copy: the exact opposite behavior of locals. So any changes to the dictionary returned by globals directly affect your global variables.
- This prints
z= 8, not z= 7.
[XML stuff was here]
9.2. Packages
Actually parsing an XML document is very simple: one line of code. However, before you get to that line of code, you need to take a short detour
to talk about packages.
Example 9.5. Loading an XML document (a sneak peek)
>>> from xml.dom import minidom ①
>>> xmldoc = minidom.parse('~/diveintopython3/common/py/kgp/binary.xml')
- This is a syntax you haven't seen before. It looks almost like the
from module import you know and love, but the "." gives it away as something above and beyond a simple import. In fact, xml is what is known as a package, dom is a nested package within xml, and minidom is a module within xml.dom.
That sounds complicated, but it's really not. Looking at the actual implementation may help. Packages are little more than
directories of modules; nested packages are subdirectories. The modules within a package (or a nested package) are still
just .py files, like always, except that they're in a subdirectory instead of the main lib/ directory of your Python installation.
Example 9.6. File layout of a packagePython21/ root Python installation (home of the executable)
|
+--lib/ library directory (home of the standard library modules)
|
+-- xml/ xml package (really just a directory with other stuff in it)
|
+--sax/ xml.sax package (again, just a directory)
|
+--dom/ xml.dom package (contains minidom.py)
|
+--parsers/ xml.parsers package (used internally)So when you say from xml.dom import minidom, Python figures out that that means “look in the xml directory for a dom directory, and look in that for the minidom module, and import it as minidom”. But Python is even smarter than that; not only can you import entire modules contained within a package, you can selectively import
specific classes or functions from a module contained within a package. You can also import the package itself as a module.
The syntax is all the same; Python figures out what you mean based on the file layout of the package, and automatically does the right thing.
Example 9.7. Packages are modules, too>>> from xml.dom import minidom ①
>>> minidom
<module 'xml.dom.minidom' from 'C:\Python21\lib\xml\dom\minidom.pyc'>
>>> minidom.Element
<class xml.dom.minidom.Element at 01095744>
>>> from xml.dom.minidom import Element ②
>>> Element
<class xml.dom.minidom.Element at 01095744>
>>> minidom.Element
<class xml.dom.minidom.Element at 01095744>
>>> from xml import dom ③
>>> dom
<module 'xml.dom' from 'C:\Python21\lib\xml\dom\__init__.pyc'>
>>> import xml ④
>>> xml
<module 'xml' from 'C:\Python21\lib\xml\__init__.pyc'>
- Here you're importing a module (
minidom) from a nested package (xml.dom). The result is that minidom is imported into your namespace, and in order to reference classes within the minidom module (like Element), you need to preface them with the module name.
- Here you are importing a class (
Element) from a module (minidom) from a nested package (xml.dom). The result is that Element is imported directly into your namespace. Note that this does not interfere with the previous import; the Element class can now be referenced in two ways (but it's all still the same class).
- Here you are importing the
dom package (a nested package of xml) as a module in and of itself. Any level of a package can be treated as a module, as you'll see in a moment. It can even
have its own attributes and methods, just the modules you've seen before.
- Here you are importing the root level
xml package as a module.
So how can a package (which is just a directory on disk) be imported and treated as a module (which is always a file on disk)?
The answer is the magical __init__.py file. You see, packages are not simply directories; they are directories with a specific file, __init__.py, inside. This file defines the attributes and methods of the package. For instance, xml.dom contains a Node class, which is defined in xml/dom/__init__.py. When you import a package as a module (like dom from xml), you're really importing its __init__.py file.
 | A package is a directory with the special __init__.py file in it. The __init__.py file defines the attributes and methods of the package. It doesn't need to define anything; it can just be an empty file,
but it has to exist. But if __init__.py doesn't exist, the directory is just a directory, not a package, and it can't be imported or contain modules or nested packages.
So why bother with packages? Well, they provide a way to logically group related modules. Instead of having an xml package with sax and dom packages inside, the authors could have chosen to put all the sax functionality in xmlsax.py and all the dom functionality in xmldom.py, or even put all of it in a single module. But that would have been unwieldy (as of this writing, the XML package has over 3000 lines of code) and difficult to manage (separate source files mean multiple people can work on different
areas simultaneously).
If you ever find yourself writing a large subsystem in Python (or, more likely, when you realize that your small subsystem has grown into a large one), invest some time designing a good
package architecture. It's one of the many things Python is good at, so take advantage of it.
9.3. Parsing XML
As I was saying, actually parsing an XML document is very simple: one line of code. Where you go from there is up to you.
10.6. Handling command-line arguments
Python fully supports creating programs that can be run on the command line, complete with command-line arguments and either short-
or long-style flags to specify various options. None of this is XML-specific, but this script makes good use of command-line processing, so it seemed like a good time to mention it.
It's difficult to talk about command-line processing without understanding how command-line arguments are exposed to your
Python program, so let's write a simple program to see them.
Example 10.20. Introducing sys.argv
If you have not already done so, you can download this and other examples used in this book.
#argecho.py
import sys
for arg in sys.argv: ①
print arg
- Each command-line argument passed to the program will be in sys.argv, which is just a list. Here you are printing each argument on a separate line.
Example 10.21. The contents of sys.argv
[you@localhost py]$ python argecho.py ①
argecho.py
[you@localhost py]$ python argecho.py abc def ②
argecho.py
abc
def
[you@localhost py]$ python argecho.py --help ③
argecho.py
--help
[you@localhost py]$ python argecho.py -m kant.xml ④
argecho.py
-m
kant.xml
- The first thing to know about sys.argv is that it contains the name of the script you're calling. You will actually use this knowledge to your advantage later,
in Chapter 16, Functional Programming. Don't worry about it for now.
- Command-line arguments are separated by spaces, and each shows up as a separate element in the sys.argv list.
- Command-line flags, like
--help, also show up as their own element in the sys.argv list.
- To make things even more interesting, some command-line flags themselves take arguments. For instance, here you have a flag
(
-m) which takes an argument (kant.xml). Both the flag itself and the flag's argument are simply sequential elements in the sys.argv list. No attempt is made to associate one with the other; all you get is a list.
So as you can see, you certainly have all the information passed on the command line, but then again, it doesn't look like
it's going to be all that easy to actually use it. For simple programs that only take a single argument and have no flags,
you can simply use sys.argv[1] to access the argument. There's no shame in this; I do it all the time. For more complex programs, you need the getopt module.
Example 10.22. Introducing getopt
def main(argv):
grammar = "kant.xml" ①
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="]) ②
except getopt.GetoptError: ③
usage() ④
sys.exit(2)
...
if __name__ == "__main__":
main(sys.argv[1:])
- First off, look at the bottom of the example and notice that you're calling the
main function with sys.argv[1:]. Remember, sys.argv[0] is the name of the script that you're running; you don't care about that for command-line processing, so you chop it off
and pass the rest of the list.
- This is where all the interesting processing happens. The
getopt function of the getopt module takes three parameters: the argument list (which you got from sys.argv[1:]), a string containing all the possible single-character command-line flags that this program accepts, and a list of longer
command-line flags that are equivalent to the single-character versions. This is quite confusing at first glance, and is
explained in more detail below.
- If anything goes wrong trying to parse these command-line flags,
getopt will raise an exception, which you catch. You told getopt all the flags you understand, so this probably means that the end user passed some command-line flag that you don't understand.
- As is standard practice in the UNIX world, when the script is passed flags it doesn't understand, you print out a summary of proper usage and exit gracefully.
Note that I haven't shown the
usage function here. You would still need to code that somewhere and have it print out the appropriate summary; it's not automatic.
So what are all those parameters you pass to the getopt function? Well, the first one is simply the raw list of command-line flags and arguments (not including the first element,
the script name, which you already chopped off before calling the main function). The second is the list of short command-line flags that the script accepts.
"hg:d"
-h
- print usage summary
-g ...
- use specified grammar file or URL
-d
- show debugging information while parsing
The first and third flags are simply standalone flags; you specify them or you don't, and they do things (print help) or change
state (turn on debugging). However, the second flag (-g) must be followed by an argument, which is the name of the grammar file to read from. In fact it can be a filename or a web address,
and you don't know which yet (you'll figure it out later), but you know it has to be something. So you tell getopt this by putting a colon after the g in that second parameter to the getopt function.
To further complicate things, the script accepts either short flags (like -h) or long flags (like --help), and you want them to do the same thing. This is what the third parameter to getopt is for, to specify a list of the long flags that correspond to the short flags you specified in the second parameter.
["help", "grammar="]
--help
- print usage summary
--grammar ...
- use specified grammar file or URL
Three things of note here:
- All long flags are preceded by two dashes on the command line, but you don't include those dashes when calling
getopt. They are understood.
- The
--grammar flag must always be followed by an additional argument, just like the -g flag. This is notated by an equals sign, "grammar=".
- The list of long flags is shorter than the list of short flags, because the
-d flag does not have a corresponding long version. This is fine; only -d will turn on debugging. But the order of short and long flags needs to be the same, so you'll need to specify all the short
flags that do have corresponding long flags first, then all the rest of the short flags.
Confused yet? Let's look at the actual code and see if it makes sense in context.
Example 10.23. Handling command-line arguments in kgp.py
def main(argv): ①
grammar = "kant.xml"
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
except getopt.GetoptError:
usage()
sys.exit(2)
for opt, arg in opts: ②
if opt in ("-h", "--help"): ③
usage()
sys.exit()
elif opt == '-d': ④
global _debug
_debug = 1
elif opt in ("-g", "--grammar"): ⑤
grammar = arg
source = "".join(args) ⑥
k = KantGenerator(grammar, source)
print k.output()
- The grammar variable will keep track of the grammar file you're using. You initialize it here in case it's not specified on the command
line (using either the
-g or the --grammar flag).
- The opts variable that you get back from
getopt contains a list of tuples: flag and argument. If the flag doesn't take an argument, then arg will simply be None. This makes it easier to loop through the flags.
getopt validates that the command-line flags are acceptable, but it doesn't do any sort of conversion between short and long flags.
If you specify the -h flag, opt will contain "-h"; if you specify the --help flag, opt will contain "--help". So you need to check for both.
- Remember, the
-d flag didn't have a corresponding long flag, so you only need to check for the short form. If you find it, you set a global
variable that you'll refer to later to print out debugging information. (I used this during the development of the script.
What, you thought all these examples worked on the first try?)
- If you find a grammar file, either with a
-g flag or a --grammar flag, you save the argument that followed it (stored in arg) into the grammar variable, overwriting the default that you initialized at the top of the main function.
- That's it. You've looped through and dealt with all the command-line flags. That means that anything left must be command-line
arguments. These come back from the
getopt function in the args variable. In this case, you're treating them as source material for the parser. If there are no command-line arguments
specified, args will be an empty list, and source will end up as the empty string.
10.7. Putting it all together
You've covered a lot of ground. Let's step back and see how all the pieces fit together.
To start with, this is a script that takes its arguments on the command line, using the getopt module.
def main(argv):
...
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
except getopt.GetoptError:
...
for opt, arg in opts:
...
You create a new instance of the KantGenerator class, and pass it the grammar file and source that may or may not have been specified on the command line.
k = KantGenerator(grammar, source)
The KantGenerator instance automatically loads the grammar, which is an XML file. You use your custom openAnything function to open the file (which could be stored in a local file or a remote web server), then use the built-in minidom parsing functions to parse the XML into a tree of Python objects.
def _load(self, source):
sock = toolbox.openAnything(source)
xmldoc = minidom.parse(sock).documentElement
sock.close()
Oh, and along the way, you take advantage of your knowledge of the structure of the XML document to set up a little cache of references, which are just elements in the XML document.
def loadGrammar(self, grammar):
for ref in self.grammar.getElementsByTagName("ref"):
self.refs[ref.attributes["id"].value] = ref
If you specified some source material on the command line, you use that; otherwise you rip through the grammar looking for
the "top-level" reference (that isn't referenced by anything else) and use that as a starting point.
def getDefaultSource(self):
xrefs = {}
for xref in self.grammar.getElementsByTagName("xref"):
xrefs[xref.attributes["id"].value] = 1
xrefs = xrefs.keys()
standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
return '<xref id="%s"/>' % random.choice(standaloneXrefs)
Now you rip through the source material. The source material is also XML, and you parse it one node at a time. To keep the code separated and more maintainable, you use separate handlers for each node type.
def parse_Element(self, node):
handlerMethod = getattr(self, "do_%s" % node.tagName)
handlerMethod(node)
You bounce through the grammar, parsing all the children of each p element,
def do_p(self, node):
...
if doit:
for child in node.childNodes: self.parse(child)
replacing choice elements with a random child,
def do_choice(self, node):
self.parse(self.randomChildElement(node))
and replacing xref elements with a random child of the corresponding ref element, which you previously cached.
def do_xref(self, node):
id = node.attributes["id"].value
self.parse(self.randomChildElement(self.refs[id]))
Eventually, you parse your way down to plain text,
def parse_Text(self, node):
text = node.data
...
self.pieces.append(text)
which you print out.
def main(argv):
...
k = KantGenerator(grammar, source)
print k.output()
10.8. Summary
Python comes with powerful libraries for parsing and manipulating XML documents. The minidom takes an XML file and parses it into Python objects, providing for random access to arbitrary elements. Furthermore, this chapter shows how Python can be used to create a "real" standalone command-line script, complete with command-line flags, command-line arguments,
error handling, even the ability to take input from the piped result of a previous program.
Before moving on to the next chapter, you should be comfortable doing all of these things:
[HTTP web services stuff was here]
[unit testing stuff was here]
Chapter 14. Test-First Programming
14.1. roman.py, stage 1
Now that the unit tests are complete, it's time to start writing the code that the test cases are attempting to test. You're
going to do this in stages, so you can see all the unit tests fail, then watch them pass one by one as you fill in the gaps
in roman.py.
Example 14.1. roman1.py
This file is available in py/roman/stage1/ in the examples directory.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals"""
#Define exceptions
class RomanError(Exception): pass ①
class OutOfRangeError(RomanError): pass ②
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass ③
def to_roman(n):
"""convert integer to Roman numeral"""
pass ④
def from_roman(s):
"""convert Roman numeral to integer"""
pass
- This is how you define your own custom exceptions in Python. Exceptions are classes, and you create your own by subclassing existing exceptions. It is strongly recommended (but not
required) that you subclass
Exception, which is the base class that all built-in exceptions inherit from. Here I am defining RomanError (inherited from Exception) to act as the base class for all my other custom exceptions to follow. This is a matter of style; I could just as easily
have inherited each individual exception from the Exception class directly.
- The
OutOfRangeError and NotIntegerError exceptions will eventually be used by to_roman() to flag various forms of invalid input, as specified in ToRomanBadInput.
- The
InvalidRomanNumeralError exception will eventually be used by from_roman() to flag invalid input, as specified in FromRomanBadInput.
- At this stage, you want to define the API of each of your functions, but you don't want to code them yet, so you stub them out using the Python reserved word
pass.
Now for the big moment (drum roll please): you're finally going to run the unit test against this stubby little module. At
this point, every test case should fail. In fact, if any test case passes in stage 1, you should go back to romantest.py and re-evaluate why you coded a test so useless that it passes with do-nothing functions.
- At this stage, you want to define the API of each of your functions, but you don't want to code them yet, so you stub them out using the Python reserved word
pass.
Run romantest1.py with the -v command-line option, which will give more verbose output so you can see exactly what's going on as each test case runs.
With any luck, your output should look like this:
Example 14.2. Output of romantest1.py against roman1.pyfrom_roman should only accept uppercase input ... ERROR
to_roman should always return uppercase ... ERROR
from_roman should fail with malformed antecedents ... FAIL
from_roman should fail with repeated pairs of numerals ... FAIL
from_roman should fail with too many repeated numerals ... FAIL
from_roman should give known result with known input ... FAIL
to_roman should give known result with known input ... FAIL
from_roman(to_roman(n))==n for all n ... FAIL
to_roman should fail with non-integer input ... FAIL
to_roman should fail with negative input ... FAIL
to_roman should fail with large input ... FAIL
to_roman should fail with 0 input ... FAIL
======================================================================
ERROR: from_roman should only accept uppercase input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 154, in testFromRomanCase
roman1.from_roman(numeral.upper())
AttributeError: 'None' object has no attribute 'upper'
======================================================================
ERROR: to_roman should always return uppercase
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 148, in testToRomanCase
self.assertEqual(numeral, numeral.upper())
AttributeError: 'None' object has no attribute 'upper'
======================================================================
FAIL: from_roman should fail with malformed antecedents
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 133, in testMalformedAntecedent
self.assertRaises(roman1.InvalidRomanNumeralError, roman1.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with repeated pairs of numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 127, in testRepeatedPairs
self.assertRaises(roman1.InvalidRomanNumeralError, roman1.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with too many repeated numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 122, in testTooManyRepeatedNumerals
self.assertRaises(roman1.InvalidRomanNumeralError, roman1.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 99, in testFromRomanKnownValues
self.assertEqual(integer, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: 1 != None
======================================================================
FAIL: to_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 93, in testToRomanKnownValues
self.assertEqual(numeral, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: I != None
======================================================================
FAIL: from_roman(to_roman(n))==n for all n
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 141, in testSanity
self.assertEqual(integer, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: 1 != None
======================================================================
FAIL: to_roman should fail with non-integer input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 116, in testNonInteger
self.assertRaises(roman1.NotIntegerError, roman1.to_roman, 0.5)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: NotIntegerError
======================================================================
FAIL: to_roman should fail with negative input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 112, in testNegative
self.assertRaises(roman1.OutOfRangeError, roman1.to_roman, -1)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: OutOfRangeError
======================================================================
FAIL: to_roman should fail with large input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 104, in testTooLarge
self.assertRaises(roman1.OutOfRangeError, roman1.to_roman, 4000)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: OutOfRangeError
======================================================================
FAIL: to_roman should fail with 0 input ①
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage1\romantest1.py", line 108, in testZero
self.assertRaises(roman1.OutOfRangeError, roman1.to_roman, 0)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: OutOfRangeError ②
----------------------------------------------------------------------
Ran 12 tests in 0.040s ③
FAILED (failures=10, errors=2) ④
14.2. roman.py, stage 2
Now that you have the framework of the roman module laid out, it's time to start writing code and passing test cases.
Example 14.3. roman2.py
This file is available in py/roman/stage2/ in the examples directory.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals"""
#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass
#Define digit mapping
romanNumeralMap = (('M', 1000), ①
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
result = ""
for numeral, integer in romanNumeralMap:
while n >= integer: ②
result += numeral
n -= integer
return result
def from_roman(s):
"""convert Roman numeral to integer"""
pass
- romanNumeralMap is a tuple of tuples which defines three things:
- The character representations of the most basic Roman numerals. Note that this is not just the single-character Roman numerals;
you're also defining two-character pairs like
CM (“one hundred less than one thousand”); this will make the to_roman() code simpler later.
- The order of the Roman numerals. They are listed in descending value order, from
M all the way down to I.
- The value of each Roman numeral. Each inner tuple is a pair of
(numeral, value).
- Here's where your rich data structure pays off, because you don't need any special logic to handle the subtraction rule.
To convert to Roman numerals, you simply iterate through romanNumeralMap looking for the largest integer value less than or equal to the input. Once found, you add the Roman numeral representation
to the end of the output, subtract the corresponding integer value from the input, lather, rinse, repeat.
Example 14.4. How to_roman() works
If you're not clear how to_roman() works, add a print statement to the end of the while loop:
while n >= integer:
result += numeral
n -= integer
print 'subtracting', integer, 'from input, adding', numeral, 'to output'
>>> import roman2
>>> roman2.to_roman(1424)
subtracting 1000 from input, adding M to output
subtracting 400 from input, adding CD to output
subtracting 10 from input, adding X to output
subtracting 10 from input, adding X to output
subtracting 4 from input, adding IV to output
'MCDXXIV'
So to_roman() appears to work, at least in this manual spot check. But will it pass the unit testing? Well no, not entirely.
Example 14.5. Output of romantest2.py against roman2.py
Remember to run romantest2.py with the -v command-line flag to enable verbose mode.
from_roman should only accept uppercase input ... FAIL
to_roman should always return uppercase ... ok①
from_roman should fail with malformed antecedents ... FAIL
from_roman should fail with repeated pairs of numerals ... FAIL
from_roman should fail with too many repeated numerals ... FAIL
from_roman should give known result with known input ... FAIL
to_roman should give known result with known input ... ok ②
from_roman(to_roman(n))==n for all n ... FAIL
to_roman should fail with non-integer input ... FAIL ③
to_roman should fail with negative input ... FAIL
to_roman should fail with large input ... FAIL
to_roman should fail with 0 input ... FAIL
to_roman() does, in fact, always return uppercase, because romanNumeralMap defines the Roman numeral representations as uppercase. So this test passes already.
- Here's the big news: this version of the
to_roman() function passes the known values test. Remember, it's not comprehensive, but it does put the function through its paces with a variety of good inputs, including
inputs that produce every single-character Roman numeral, the largest possible input (3999), and the input that produces the longest possible Roman numeral (3888). At this point, you can be reasonably confident that the function works for any good input value you could throw at it.
- However, the function does not “work” for bad values; it fails every single bad input test. That makes sense, because you didn't include any checks for bad input. Those test cases look for specific exceptions to
be raised (via
assertRaises), and you're never raising them. You'll do that in the next stage.
Here's the rest of the output of the unit test, listing the details of all the failures. You're down to 10.
======================================================================
FAIL: from_roman should only accept uppercase input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 156, in testFromRomanCase
roman2.from_roman, numeral.lower())
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with malformed antecedents
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 133, in testMalformedAntecedent
self.assertRaises(roman2.InvalidRomanNumeralError, roman2.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with repeated pairs of numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 127, in testRepeatedPairs
self.assertRaises(roman2.InvalidRomanNumeralError, roman2.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with too many repeated numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 122, in testTooManyRepeatedNumerals
self.assertRaises(roman2.InvalidRomanNumeralError, roman2.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 99, in testFromRomanKnownValues
self.assertEqual(integer, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: 1 != None
======================================================================
FAIL: from_roman(to_roman(n))==n for all n
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 141, in testSanity
self.assertEqual(integer, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: 1 != None
======================================================================
FAIL: to_roman should fail with non-integer input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 116, in testNonInteger
self.assertRaises(roman2.NotIntegerError, roman2.to_roman, 0.5)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: NotIntegerError
======================================================================
FAIL: to_roman should fail with negative input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 112, in testNegative
self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, -1)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: OutOfRangeError
======================================================================
FAIL: to_roman should fail with large input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 104, in testTooLarge
self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 4000)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: OutOfRangeError
======================================================================
FAIL: to_roman should fail with 0 input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage2\romantest2.py", line 108, in testZero
self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 0)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: OutOfRangeError
----------------------------------------------------------------------
Ran 12 tests in 0.320s
FAILED (failures=10)14.3. roman.py, stage 3
Now that to_roman() behaves correctly with good input (integers from 1 to 3999), it's time to make it behave correctly with bad input (everything else).
Example 14.6. roman3.py
This file is available in py/roman/stage3/ in the examples directory.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals"""
#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass
#Define digit mapping
romanNumeralMap = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 4000): ①
raise OutOfRangeError, "number out of range (must be 1..3999)" ②
if int(n) <> n: ③
raise NotIntegerError, "non-integers can not be converted"
result = "" ④
for numeral, integer in romanNumeralMap:
while n >= integer:
result += numeral
n -= integer
return result
def from_roman(s):
"""convert Roman numeral to integer"""
pass
- This is a nice Pythonic shortcut: multiple comparisons at once. This is equivalent to
if not ((0 < n) and (n < 4000)), but it's much easier to read. This is the range check, and it should catch inputs that are too large, negative, or zero.
- You raise exceptions yourself with the
raise statement. You can raise any of the built-in exceptions, or you can raise any of your custom exceptions that you've defined.
The second parameter, the error message, is optional; if given, it is displayed in the traceback that is printed if the exception
is never handled.
- This is the non-integer check. Non-integers can not be converted to Roman numerals.
- The rest of the function is unchanged.
Example 14.7. Watching to_roman() handle bad input
>>> import roman3
>>> roman3.to_roman(4000)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "roman3.py", line 27, in to_roman
raise OutOfRangeError, "number out of range (must be 1..3999)"
OutOfRangeError: number out of range (must be 1..3999)
>>> roman3.to_roman(1.5)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "roman3.py", line 29, in to_roman
raise NotIntegerError, "non-integers can not be converted"
NotIntegerError: non-integers can not be converted
Example 14.8. Output of romantest3.py against roman3.pyfrom_roman should only accept uppercase input ... FAIL
to_roman should always return uppercase ... ok
from_roman should fail with malformed antecedents ... FAIL
from_roman should fail with repeated pairs of numerals ... FAIL
from_roman should fail with too many repeated numerals ... FAIL
from_roman should give known result with known input ... FAIL
to_roman should give known result with known input ... ok ①
from_roman(to_roman(n))==n for all n ... FAIL
to_roman should fail with non-integer input ... ok ②
to_roman should fail with negative input ... ok ③
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
to_roman() still passes the known values test, which is comforting. All the tests that passed in stage 2 still pass, so the latest code hasn't broken anything.
- More exciting is the fact that all of the bad input tests now pass. This test,
testNonInteger, passes because of the int(n) <> n check. When a non-integer is passed to to_roman(), the int(n) <> n check notices it and raises the NotIntegerError exception, which is what testNonInteger is looking for.
- This test,
testNegative, passes because of the not (0 < n < 4000) check, which raises an OutOfRangeError exception, which is what testNegative is looking for.
======================================================================
FAIL: from_roman should only accept uppercase input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage3\romantest3.py", line 156, in testFromRomanCase
roman3.from_roman, numeral.lower())
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with malformed antecedents
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage3\romantest3.py", line 133, in testMalformedAntecedent
self.assertRaises(roman3.InvalidRomanNumeralError, roman3.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with repeated pairs of numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage3\romantest3.py", line 127, in testRepeatedPairs
self.assertRaises(roman3.InvalidRomanNumeralError, roman3.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with too many repeated numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage3\romantest3.py", line 122, in testTooManyRepeatedNumerals
self.assertRaises(roman3.InvalidRomanNumeralError, roman3.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should give known result with known input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage3\romantest3.py", line 99, in testFromRomanKnownValues
self.assertEqual(integer, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: 1 != None
======================================================================
FAIL: from_roman(to_roman(n))==n for all n
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage3\romantest3.py", line 141, in testSanity
self.assertEqual(integer, result)
File "c:\python21\lib\unittest.py", line 273, in failUnlessEqual
raise self.failureException, (msg or '%s != %s' % (first, second))
AssertionError: 1 != None
----------------------------------------------------------------------
Ran 12 tests in 0.401s
FAILED (failures=6) ①
- You're down to 6 failures, and all of them involve
from_roman(): the known values test, the three separate bad input tests, the case check, and the sanity check. That means that to_roman() has passed all the tests it can pass by itself. (It's involved in the sanity check, but that also requires that from_roman() be written, which it isn't yet.) Which means that you must stop coding to_roman() now. No tweaking, no twiddling, no extra checks “just in case”. Stop. Now. Back away from the keyboard.
 | The most important thing that comprehensive unit testing can tell you is when to stop coding. When all the unit tests for
a function pass, stop coding the function. When all the unit tests for an entire module pass, stop coding the module.
14.4. roman.py, stage 4
Now that to_roman() is done, it's time to start coding from_roman().
the to_roman() function.
Example 14.9. roman4.py
This file is available in py/roman/stage4/ in the examples directory.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals"""
#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass
#Define digit mapping
romanNumeralMap = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
# to_roman function omitted for clarity (it hasn't changed)
def from_roman(s):
"""convert Roman numeral to integer"""
result = 0
index = 0
for numeral, integer in romanNumeralMap:
while s[index:index+len(numeral)] == numeral: ①
result += integer
index += len(numeral)
return result
- The pattern here is the same as
to_roman(). You iterate through your Roman numeral data structure (a tuple of tuples), and instead of matching the highest integer
values as often as possible, you match the “highest” Roman numeral character strings as often as possible.
Example 14.10. How from_roman() works
If you're not clear how from_roman() works, add a print statement to the end of the while loop:
while s[index:index+len(numeral)] == numeral:
result += integer
index += len(numeral)
print 'found', numeral, 'of length', len(numeral), ', adding', integer
>>> import roman4
>>> roman4.from_roman('MCMLXXII')
found M , of length 1, adding 1000
found CM , of length 2, adding 900
found L , of length 1, adding 50
found X , of length 1, adding 10
found X , of length 1, adding 10
found I , of length 1, adding 1
found I , of length 1, adding 1
1972Example 14.11. Output of romantest4.py against roman4.pyfrom_roman should only accept uppercase input ... FAIL
to_roman should always return uppercase ... ok
from_roman should fail with malformed antecedents ... FAIL
from_roman should fail with repeated pairs of numerals ... FAIL
from_roman should fail with too many repeated numerals ... FAIL
from_roman should give known result with known input ... ok ①
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok②
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
- Two pieces of exciting news here. The first is that
from_roman() works for good input, at least for all the known values you test.
- The second is that the sanity check also passed. Combined with the known values tests, you can be reasonably sure that both
to_roman() and from_roman() work properly for all possible good values. (This is not guaranteed; it is theoretically possible that to_roman() has a bug that produces the wrong Roman numeral for some particular set of inputs, and that from_roman() has a reciprocal bug that produces the same wrong integer values for exactly that set of Roman numerals that to_roman() generated incorrectly. Depending on your application and your requirements, this possibility may bother you; if so, write
more comprehensive test cases until it doesn't bother you.)
======================================================================
FAIL: from_roman should only accept uppercase input
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage4\romantest4.py", line 156, in testFromRomanCase
roman4.from_roman, numeral.lower())
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with malformed antecedents
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage4\romantest4.py", line 133, in testMalformedAntecedent
self.assertRaises(roman4.InvalidRomanNumeralError, roman4.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with repeated pairs of numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage4\romantest4.py", line 127, in testRepeatedPairs
self.assertRaises(roman4.InvalidRomanNumeralError, roman4.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
======================================================================
FAIL: from_roman should fail with too many repeated numerals
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\docbook\dip\py\roman\stage4\romantest4.py", line 122, in testTooManyRepeatedNumerals
self.assertRaises(roman4.InvalidRomanNumeralError, roman4.from_roman, s)
File "c:\python21\lib\unittest.py", line 266, in failUnlessRaises
raise self.failureException, excName
AssertionError: InvalidRomanNumeralError
----------------------------------------------------------------------
Ran 12 tests in 1.222s
FAILED (failures=4)14.5. roman.py, stage 5
Example 14.12. roman5.py
This file is available in py/roman/stage5/ in the examples directory.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals"""
import re
#Define exceptions
class RomanError(Exception): pass
class OutOfRangeError(RomanError): pass
class NotIntegerError(RomanError): pass
class InvalidRomanNumeralError(RomanError): pass
#Define digit mapping
romanNumeralMap = (('M', 1000),
('CM', 900),
('D', 500),
('CD', 400),
('C', 100),
('XC', 90),
('L', 50),
('XL', 40),
('X', 10),
('IX', 9),
('V', 5),
('IV', 4),
('I', 1))
def to_roman(n):
"""convert integer to Roman numeral"""
if not (0 < n < 4000):
raise OutOfRangeError, "number out of range (must be 1..3999)"
if int(n) <> n:
raise NotIntegerError, "non-integers can not be converted"
result = ""
for numeral, integer in romanNumeralMap:
while n >= integer:
result += numeral
n -= integer
return result
#Define pattern to detect valid Roman numerals
romanNumeralPattern = '^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' ①
def from_roman(s):
"""convert Roman numeral to integer"""
if not re.search(romanNumeralPattern, s):②
raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
result = 0
index = 0
for numeral, integer in romanNumeralMap:
while s[index:index+len(numeral)] == numeral:
result += integer
index += len(numeral)
return result
- This is just a continuation of the pattern you discussed in Section 7.3, “Case Study: Roman Numerals”. The tens places is either
XC (90), XL (40), or an optional L followed by 0 to 3 optional X characters. The ones place is either IX (9), IV (4), or an optional V followed by 0 to 3 optional I characters.
- Having encoded all that logic into a regular expression, the code to check for invalid Roman numerals becomes trivial. If
re.search returns an object, then the regular expression matched and the input is valid; otherwise, the input is invalid.
At this point, you are allowed to be skeptical that that big ugly regular expression could possibly catch all the types of
invalid Roman numerals. But don't take my word for it, look at the results:
Example 14.13. Output of romantest5.py against roman5.py
from_roman should only accept uppercase input ... ok ①
to_roman should always return uppercase ... ok
from_roman should fail with malformed antecedents ... ok ②
from_roman should fail with repeated pairs of numerals ... ok ③
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
----------------------------------------------------------------------
Ran 12 tests in 2.864s
OK ④
- One thing I didn't mention about regular expressions is that, by default, they are case-sensitive. Since the regular expression
romanNumeralPattern was expressed in uppercase characters, the
re.search check will reject any input that isn't completely uppercase. So the uppercase input test passes.
- More importantly, the bad input tests pass. For instance, the malformed antecedents test checks cases like
MCMC. As you've seen, this does not match the regular expression, so from_roman() raises an InvalidRomanNumeralError exception, which is what the malformed antecedents test case is looking for, so the test passes.
- In fact, all the bad input tests pass. This regular expression catches everything you could think of when you made your test
cases.
 | When all of your tests pass, stop coding.
[functional programming stuff was here]
The following is a complete Python program that acts as a cheap and simple regression testing framework. It takes unit tests that you've written for individual
modules, collects them all into one big test suite, and runs them all at once. I actually use this script as part of the
build process for this book; I have unit tests for several of the example programs (not just the roman.py module featured in Chapter 13, Unit Testing), and the first thing my automated build script does is run this program to make sure all my examples still work. If this
regression test fails, the build immediately stops. I don't want to release non-working examples any more than you want to
download them and sit around scratching your head and yelling at your monitor and wondering why they don't work.
Example 16.1. regression.py
If you have not already done so, you can download this and other examples used in this book.
"""Regression testing framework
This module will search for scripts in the same directory named
XYZtest.py. Each such script should be a test suite that tests a
module through PyUnit. (As of Python 2.1, PyUnit is included in
the standard library as "unittest".) This script will aggregate all
found test suites into one big test suite and run them all at once.
"""
import sys, os, re, unittest
def regressionTest():
path = os.path.abspath(os.path.dirname(sys.argv[0]))
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = filter(test.search, files)
filenameToModuleName = lambda f: os.path.splitext(f)[0]
moduleNames = map(filenameToModuleName, files)
modules = map(__import__, moduleNames)
load = unittest.defaultTestLoader.loadTestsFromModule
return unittest.TestSuite(map(load, modules))
if __name__ == "__main__":
unittest.main(defaultTest="regressionTest")
Running this script in the same directory as the rest of the example scripts that come with this book will find all the unit
tests, named moduletest.py, run them as a single test, and pass or fail them all at once.
Example 16.2. Sample output of regression.py
[you@localhost py]$ python regression.py -v
help should fail with no object ... ok ①
help should return known result for apihelper ... ok
help should honor collapse argument ... ok
help should honor spacing argument ... ok
buildConnectionString should fail with list input ... ok ②
buildConnectionString should fail with string input ... ok
buildConnectionString should fail with tuple input ... ok
buildConnectionString handles empty dictionary ... ok
buildConnectionString returns known result with known input ... ok
from_roman should only accept uppercase input ... ok ③
to_roman should always return uppercase ... ok
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
kgp a ref test ... ok
kgp b ref test ... ok
kgp c ref test ... ok
kgp d ref test ... ok
kgp e ref test ... ok
kgp f ref test ... ok
kgp g ref test ... ok
----------------------------------------------------------------------
Ran 29 tests in 2.799s
OK
- The first 5 tests are from
apihelpertest.py, which tests the example script from Chapter 4, The Power Of Introspection.
- The next 5 tests are from
odbchelpertest.py, which tests the example script from Chapter 2, Your First Python Program.
- The rest are from
romantest.py, which you studied in depth in Chapter 13, Unit Testing.
16.2. Finding the path
When running Python scripts from the command line, it is sometimes useful to know where the currently running script is located on disk.
This is one of those obscure little tricks that is virtually impossible to figure out on your own, but simple to remember
once you see it. The key to it is sys.argv. As you saw in Chapter 9, XML Processing, this is a list that holds the list of command-line arguments. However, it also holds the name of the running script, exactly
as it was called from the command line, and this is enough information to determine its location.
Example 16.3. fullpath.py
If you have not already done so, you can download this and other examples used in this book.
import sys, os
print 'sys.argv[0] =', sys.argv[0] ①
pathname = os.path.dirname(sys.argv[0]) ②
print 'path =', pathname
print 'full path =', os.path.abspath(pathname) ③
- Regardless of how you run a script,
sys.argv[0] will always contain the name of the script, exactly as it appears on the command line. This may or may not include any path
information, as you'll see shortly.
os.path.dirname takes a filename as a string and returns the directory path portion. If the given filename does not include any path information,
os.path.dirname returns an empty string.
os.path.abspath is the key here. It takes a pathname, which can be partial or even blank, and returns a fully qualified pathname.
os.path.abspath deserves further explanation. It is very flexible; it can take any kind of pathname.
Example 16.4. Further explanation of os.path.abspath
>>> import os
>>> os.getcwd() ①
/home/you
>>> os.path.abspath('') ②
/home/you
>>> os.path.abspath('.ssh') ③
/home/you/.ssh
>>> os.path.abspath('/home/you/.ssh') ④
/home/you/.ssh
>>> os.path.abspath('.ssh/../foo/') ⑤
/home/you/foo
os.getcwd() returns the current working directory.
- Calling
os.path.abspath with an empty string returns the current working directory, same as os.getcwd().
- Calling
os.path.abspath with a partial pathname constructs a fully qualified pathname out of it, based on the current working directory.
- Calling
os.path.abspath with a full pathname simply returns it.
os.path.abspath also normalizes the pathname it returns. Note that this example worked even though I don't actually have a 'foo' directory. os.path.abspath never checks your actual disk; this is all just string manipulation.
 | The pathnames and filenames you pass to os.path.abspath do not need to exist.
 | os.path.abspath not only constructs full path names, it also normalizes them. That means that if you are in the /usr/ directory, os.path.abspath('bin/../local/bin') will return /usr/local/bin. It normalizes the path by making it as simple as possible. If you just want to normalize a pathname like this without
turning it into a full pathname, use os.path.normpath instead.
Example 16.5. Sample output from fullpath.py
[you@localhost py]$ python /home/you/diveintopython3/common/py/fullpath.py ①
sys.argv[0] = /home/you/diveintopython3/common/py/fullpath.py
path = /home/you/diveintopython3/common/py
full path = /home/you/diveintopython3/common/py
[you@localhost diveintopython3]$ python common/py/fullpath.py ②
sys.argv[0] = common/py/fullpath.py
path = common/py
full path = /home/you/diveintopython3/common/py
[you@localhost diveintopython3]$ cd common/py
[you@localhost py]$ python fullpath.py ③
sys.argv[0] = fullpath.py
path =
full path = /home/you/diveintopython3/common/py
- In the first case,
sys.argv[0] includes the full path of the script. You can then use the os.path.dirname function to strip off the script name and return the full directory name, and os.path.abspath simply returns what you give it.
- If the script is run by using a partial pathname,
sys.argv[0] will still contain exactly what appears on the command line. os.path.dirname will then give you a partial pathname (relative to the current directory), and os.path.abspath will construct a full pathname from the partial pathname.
- If the script is run from the current directory without giving any path,
os.path.dirname will simply return an empty string. Given an empty string, os.path.abspath returns the current directory, which is what you want, since the script was run from the current directory.
 | Like the other functions in the os and os.path modules, os.path.abspath is cross-platform. Your results will look slightly different than my examples if you're running on Windows (which uses backslash
as a path separator) or Mac OS (which uses colons), but they'll still work. That's the whole point of the os module.
Addendum. One reader was dissatisfied with this solution, and wanted to be able to run all the unit tests in the current directory,
not the directory where regression.py is located. He suggests this approach instead:
Example 16.6. Running scripts in the current directoryimport sys, os, re, unittest
def regressionTest():
path = os.getcwd() ①
sys.path.append(path) ②
files = os.listdir(path) ③
- Instead of setting path to the directory where the currently running script is located, you set it to the current working directory instead. This
will be whatever directory you were in before you ran the script, which is not necessarily the same as the directory the script
is in. (Read that sentence a few times until you get it.)
- Append this directory to the Python library search path, so that when you dynamically import the unit test modules later, Python can find them. You didn't need to do this when path was the directory of the currently running script, because Python always looks in that directory.
- The rest of the function is the same.
This technique will allow you to re-use this regression.py script on multiple projects. Just put the script in a common directory, then change to the project's directory before running
it. All of that project's unit tests will be found and tested, instead of the unit tests in the common directory where regression.py is located.
[more functional programming stuff was here]
16.6. Dynamically importing modules
OK, enough philosophizing. Let's talk about dynamically importing modules.
First, let's look at how you normally import modules. The import module syntax looks in the search path for the named module and imports it by name. You can even import multiple modules at once
this way, with a comma-separated list. You did this on the very first line of this chapter's script.
Example 16.13. Importing multiple modules at once
import sys, os, re, unittest ①
- This imports four modules at once:
sys (for system functions and access to the command line parameters), os (for operating system functions like directory listings), re (for regular expressions), and unittest (for unit testing).
Now let's do the same thing, but with dynamic imports.
Example 16.14. Importing modules dynamically
>>> sys = __import__('sys') ①
>>> os = __import__('os')
>>> re = __import__('re')
>>> unittest = __import__('unittest')
>>> sys ②
>>> <module 'sys' (built-in)>
>>> os
>>> <module 'os' from '/usr/local/lib/python2.2/os.pyc'>
- The built-in
__import__ function accomplishes the same goal as using the import statement, but it's an actual function, and it takes a string as an argument.
- The variable sys is now the
sys module, just as if you had said import sys. The variable os is now the os module, and so forth.
So __import__ imports a module, but takes a string argument to do it. In this case the module you imported was just a hard-coded string,
but it could just as easily be a variable, or the result of a function call. And the variable that you assign the module
to doesn't need to match the module name, either. You could import a series of modules and assign them to a list.
Example 16.15. Importing a list of modules dynamically
>>> moduleNames = ['sys', 'os', 're', 'unittest'] ①
>>> moduleNames
['sys', 'os', 're', 'unittest']
>>> modules = map(__import__, moduleNames) ②
>>> modules ③
[<module 'sys' (built-in)>,
<module 'os' from 'c:\Python22\lib\os.pyc'>,
<module 're' from 'c:\Python22\lib\re.pyc'>,
<module 'unittest' from 'c:\Python22\lib\unittest.pyc'>]
>>> modules[0].version ④
'2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]'
>>> import sys
>>> sys.version
'2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]'
- moduleNames is just a list of strings. Nothing fancy, except that the strings happen to be names of modules that you could import, if
you wanted to.
- Surprise, you wanted to import them, and you did, by mapping the
__import__ function onto the list. Remember, this takes each element of the list (moduleNames) and calls the function (__import__) over and over, once with each element of the list, builds a list of the return values, and returns the result.
- So now from a list of strings, you've created a list of actual modules. (Your paths may be different, depending on your operating
system, where you installed Python, the phase of the moon, etc.)
- To drive home the point that these are real modules, let's look at some module attributes. Remember, modules[0] is the
sys module, so modules[0].version is sys.version. All the other attributes and methods of these modules are also available. There's nothing magic about the import statement, and there's nothing magic about modules. Modules are objects. Everything is an object.
Now you should be able to put this all together and figure out what most of this chapter's code sample is doing.
16.7. Putting it all together
You've learned enough now to deconstruct the first seven lines of this chapter's code sample: reading a directory and importing
selected modules within it.
Example 16.16. The regressionTest function
def regressionTest():
path = os.path.abspath(os.path.dirname(sys.argv[0]))
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = filter(test.search, files)
filenameToModuleName = lambda f: os.path.splitext(f)[0]
moduleNames = map(filenameToModuleName, files)
modules = map(__import__, moduleNames)
load = unittest.defaultTestLoader.loadTestsFromModule
return unittest.TestSuite(map(load, modules))
Let's look at it line by line, interactively. Assume that the current directory is c:\diveintopython3\py, which contains the examples that come with this book, including this chapter's script. As you saw in Section 16.2, “Finding the path”, the script directory will end up in the path variable, so let's start hard-code that and go from there.
Example 16.17. Step 1: Get all the files
>>> import sys, os, re, unittest
>>> path = r'c:\diveintopython3\py'
>>> files = os.listdir(path)
>>> files ①
['BaseHTMLProcessor.py', 'LICENSE.txt', 'apihelper.py', 'apihelpertest.py',
'argecho.py', 'autosize.py', 'builddialectexamples.py', 'dialect.py',
'fileinfo.py', 'fullpath.py', 'kgptest.py', 'makerealworddoc.py',
'odbchelper.py', 'odbchelpertest.py', 'parsephone.py', 'piglatin.py',
'plural.py', 'pluraltest.py', 'pyfontify.py', 'regression.py', 'roman.py', 'romantest.py',
'uncurly.py', 'unicode2koi8r.py', 'urllister.py', 'kgp', 'plural', 'roman',
'colorize.py']
- files is a list of all the files and directories in the script's directory. (If you've been running some of the examples already,
you may also see some
.pyc files in there as well.)
Example 16.18. Step 2: Filter to find the files you care about
>>> test = re.compile("test\.py$", re.IGNORECASE) ①
>>> files = filter(test.search, files) ②
>>> files ③
['apihelpertest.py', 'kgptest.py', 'odbchelpertest.py', 'pluraltest.py', 'romantest.py']
- This regular expression will match any string that ends with
test.py. Note that you need to escape the period, since a period in a regular expression usually means “match any single character”, but you actually want to match a literal period instead.
- The compiled regular expression acts like a function, so you can use it to filter the large list of files and directories,
to find the ones that match the regular expression.
- And you're left with the list of unit testing scripts, because they were the only ones named
SOMETHINGtest.py.
Example 16.19. Step 3: Map filenames to module names
>>> filenameToModuleName = lambda f: os.path.splitext(f)[0] ①
>>> filenameToModuleName('romantest.py') ②
'romantest'
>>> filenameToModuleName('odchelpertest.py')
'odbchelpertest'
>>> moduleNames = map(filenameToModuleName, files) ③
>>> moduleNames ④
['apihelpertest', 'kgptest', 'odbchelpertest', 'pluraltest', 'romantest']
- As you saw in Section 4.7, “Using lambda Functions”,
lambda is a quick-and-dirty way of creating an inline, one-line function. This one takes a filename with an extension and returns
just the filename part, using the standard library function os.path.splitext that you saw in Example 6.17, “Splitting Pathnames”.
- filenameToModuleName is a function. There's nothing magic about
lambda functions as opposed to regular functions that you define with a def statement. You can call the filenameToModuleName function like any other, and it does just what you wanted it to do: strips the file extension off of its argument.
- Now you can apply this function to each file in the list of unit test files, using
map.
- And the result is just what you wanted: a list of modules, as strings.
Example 16.20. Step 4: Mapping module names to modules
>>> modules = map(__import__, moduleNames)①
>>> modules ②
[<module 'apihelpertest' from 'apihelpertest.py'>,
<module 'kgptest' from 'kgptest.py'>,
<module 'odbchelpertest' from 'odbchelpertest.py'>,
<module 'pluraltest' from 'pluraltest.py'>,
<module 'romantest' from 'romantest.py'>]
>>> modules[-1] ③
<module 'romantest' from 'romantest.py'>
- As you saw in Section 16.6, “Dynamically importing modules”, you can use a combination of
map and __import__ to map a list of module names (as strings) into actual modules (which you can call or access like any other module).
- modules is now a list of modules, fully accessible like any other module.
- The last module in the list is the
romantest module, just as if you had said import romantest.
Example 16.21. Step 5: Loading the modules into a test suite
>>> load = unittest.defaultTestLoader.loadTestsFromModule
>>> map(load, modules) ①
[<unittest.TestSuite tests=[
<unittest.TestSuite tests=[<apihelpertest.BadInput testMethod=testNoObject>]>,
<unittest.TestSuite tests=[<apihelpertest.KnownValues testMethod=testApiHelper>]>,
<unittest.TestSuite tests=[
<apihelpertest.ParamChecks testMethod=testCollapse>,
<apihelpertest.ParamChecks testMethod=testSpacing>]>,
...
]
]
>>> unittest.TestSuite(map(load, modules)) ②
- These are real module objects. Not only can you access them like any other module, instantiate classes and call functions,
you can also introspect into the module to figure out which classes and functions it has in the first place. That's what
the
loadTestsFromModule method does: it introspects into each module and returns a unittest.TestSuite object for each module. Each TestSuite object actually contains a list of TestSuite objects, one for each TestCase class in your module, and each of those TestSuite objects contains a list of tests, one for each test method in your module.
- Finally, you wrap the list of
TestSuite objects into one big test suite. The unittest module has no problem traversing this tree of nested test suites within test suites; eventually it gets down to an individual
test method and executes it, verifies that it passes or fails, and moves on to the next one.
This introspection process is what the unittest module usually does for us. Remember that magic-looking unittest.main() function that our individual test modules called to kick the whole thing off? unittest.main() actually creates an instance of unittest.TestProgram, which in turn creates an instance of a unittest.defaultTestLoader and loads it up with the module that called it. (How does it get a reference to the module that called it if you don't give
it one? By using the equally-magic __import__('__main__') command, which dynamically imports the currently-running module. I could write a book on all the tricks and techniques used
in the unittest module, but then I'd never finish this one.)
Example 16.22. Step 6: Telling unittest to use your test suite
if __name__ == "__main__":
unittest.main(defaultTest="regressionTest") ①
- Instead of letting the
unittest module do all its magic for us, you've done most of it yourself. You've created a function (regressionTest) that imports the modules yourself, calls unittest.defaultTestLoader yourself, and wraps it all up in a test suite. Now all you need to do is tell unittest that, instead of looking for tests and building a test suite in the usual way, it should just call the regressionTest function, which returns a ready-to-use TestSuite.
16.8. Summary
The regression.py program and its output should now make perfect sense.
You should now feel comfortable doing all of these things:
| | | | | | | | | | |