 | Python 2.2 introduced a subtle but important change that affects the namespace search order: nested scopes. In versions of Python prior to 2.2, when you reference a variable within a nested function or lambda function, Python will search for that variable in the current (nested or lambda) function's namespace, then in the module's namespace. Python 2.2 will search for the variable in the current (nested or lambda) function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1 can work either way; by default, it works like Python 2.0, but you can add the following line of code at the top of your module to make your module work like Python 2.2:
from __future__ import nested_scopes
Are you confused yet? Don't despair! This is really cool, I promise. Like many things in Python, namespaces are directly accessible at run-time. How? Well, the local namespace is accessible via the built-in locals function, and the global (module level) namespace is accessible via the built-in globals function.
Example 8.10. Introducing locals>>> def foo(arg): ①
... x = 1
... print locals()
...
>>> foo(7) ②
{'arg': 7, 'x': 1}
>>> foo('bar') ③
{'arg': 'bar', 'x': 1}
- The function
foo has two variables in its local namespace: arg, whose value is passed in to the function, and x, which is defined within the function.
locals returns a dictionary of name/value pairs. The keys of this dictionary are the names of the variables as strings; the values
of the dictionary are the actual values of the variables. So calling foo with 7 prints the dictionary containing the function's two local variables: arg (7) and x (1).
- Remember, Python has dynamic typing, so you could just as easily pass a string in for arg; the function (and the call to
locals) would still work just as well. locals works with all variables of all datatypes.
What locals does for the local (function) namespace, globals does for the global (module) namespace. globals is more exciting, though, because a module's namespace is more exciting.
[3] Not only does the module's namespace include module-level variables and constants, it includes all the functions and classes
defined in the module. Plus, it includes anything that was imported into the module.
Remember the difference between from module import and import module? With import module, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access
any of its functions or attributes: module.function. But with from module import, you're actually importing specific functions and attributes from another module into your own namespace, which is why you
access them directly without referencing the original module they came from. With the globals function, you can actually see this happen.
Example 8.11. Introducing globals
Look at the following block of code at the bottom of BaseHTMLProcessor.py:
if __name__ == "__main__":
for k, v in globals().items(): ①
print k, "=", v
- Just so you don't get intimidated, remember that you've seen all this before. The
globals function returns a dictionary, and you're iterating through the dictionary using the items method and multi-variable assignment. The only thing new here is the globals function.
Now running the script from the command line gives this output (note that your output may be slightly different, depending
on your platform and where you installed Python): c:\docbook\dip\py> python BaseHTMLProcessor.py
SGMLParser = sgmllib.SGMLParser ①
htmlentitydefs = <module 'htmlentitydefs' from 'C:\Python23\lib\htmlentitydefs.py'> ②
BaseHTMLProcessor = __main__.BaseHTMLProcessor ③
__name__ = __main__ ④
... rest of output omitted for brevity...
SGMLParser was imported from sgmllib, using from module import. That means that it was imported directly into the module's namespace, and here it is.
- Contrast this with
htmlentitydefs, which was imported using import. That means that the htmlentitydefs module itself is in the namespace, but the entitydefs variable defined within htmlentitydefs is not.
- This module only defines one class,
BaseHTMLProcessor, and here it is. Note that the value here is the class itself, not a specific instance of the class.
- Remember the
if __name__ trick? When running a module (as opposed to importing it from another module), the built-in __name__ attribute is a special value, __main__. Since you ran this module as a script from the command line, __name__ is __main__, which is why the little test code to print the globals got executed.
 | Using the locals and globals functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors
the functionality of the getattr function, which allows you to access arbitrary functions dynamically by providing the function name as a string.
There is one other important difference between the locals and globals functions, which you should learn now before it bites you. It will bite you anyway, but at least then you'll remember learning
it.
Example 8.12. locals is read-only, globals is not
def foo(arg):
x = 1
print locals() ①
locals()["x"] = 2 ②
print "x=",x ③
z = 7
print "z=",z
foo(3)
globals()["z"] = 8 ④
print "z=",z ⑤
- Since
foo is called with 3, this will print {'arg': 3, 'x': 1}. This should not be a surprise.
locals is a function that returns a dictionary, and here you are setting a value in that dictionary. You might think that this
would change the value of the local variable x to 2, but it doesn't. locals does not actually return the local namespace, it returns a copy. So changing it does nothing to the value of the variables
in the local namespace.
- This prints
x= 1, not x= 2.
- After being burned by
locals, you might think that this wouldn't change the value of z, but it does. Due to internal differences in how Python is implemented (which I'd rather not go into, since I don't fully understand them myself), globals returns the actual global namespace, not a copy: the exact opposite behavior of locals. So any changes to the dictionary returned by globals directly affect your global variables.
- This prints
z= 8, not z= 7.
[XML stuff was here]
9.2. Packages
10.6. Handling command-line arguments
Python fully supports creating programs that can be run on the command line, complete with command-line arguments and either short-
or long-style flags to specify various options. None of this is XML-specific, but this script makes good use of command-line processing, so it seemed like a good time to mention it.
It's difficult to talk about command-line processing without understanding how command-line arguments are exposed to your
Python program, so let's write a simple program to see them.
Example 10.20. Introducing sys.argv
If you have not already done so, you can download this and other examples used in this book.
#argecho.py
import sys
for arg in sys.argv: ①
print arg
- Each command-line argument passed to the program will be in sys.argv, which is just a list. Here you are printing each argument on a separate line.
Example 10.21. The contents of sys.argv
[you@localhost py]$ python argecho.py ①
argecho.py
[you@localhost py]$ python argecho.py abc def ②
argecho.py
abc
def
[you@localhost py]$ python argecho.py --help ③
argecho.py
--help
[you@localhost py]$ python argecho.py -m kant.xml ④
argecho.py
-m
kant.xml
- The first thing to know about sys.argv is that it contains the name of the script you're calling. You will actually use this knowledge to your advantage later,
in Chapter 16, Functional Programming. Don't worry about it for now.
- Command-line arguments are separated by spaces, and each shows up as a separate element in the sys.argv list.
- Command-line flags, like
--help, also show up as their own element in the sys.argv list.
- To make things even more interesting, some command-line flags themselves take arguments. For instance, here you have a flag
(
-m) which takes an argument (kant.xml). Both the flag itself and the flag's argument are simply sequential elements in the sys.argv list. No attempt is made to associate one with the other; all you get is a list.
So as you can see, you certainly have all the information passed on the command line, but then again, it doesn't look like
it's going to be all that easy to actually use it. For simple programs that only take a single argument and have no flags,
you can simply use sys.argv[1] to access the argument. There's no shame in this; I do it all the time. For more complex programs, you need the getopt module.
Example 10.22. Introducing getopt
def main(argv):
grammar = "kant.xml" ①
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="]) ②
except getopt.GetoptError: ③
usage() ④
sys.exit(2)
...
if __name__ == "__main__":
main(sys.argv[1:])
- First off, look at the bottom of the example and notice that you're calling the
main function with sys.argv[1:]. Remember, sys.argv[0] is the name of the script that you're running; you don't care about that for command-line processing, so you chop it off
and pass the rest of the list.
- This is where all the interesting processing happens. The
getopt function of the getopt module takes three parameters: the argument list (which you got from sys.argv[1:]), a string containing all the possible single-character command-line flags that this program accepts, and a list of longer
command-line flags that are equivalent to the single-character versions. This is quite confusing at first glance, and is
explained in more detail below.
- If anything goes wrong trying to parse these command-line flags,
getopt will raise an exception, which you catch. You told getopt all the flags you understand, so this probably means that the end user passed some command-line flag that you don't understand.
- As is standard practice in the UNIX world, when the script is passed flags it doesn't understand, you print out a summary of proper usage and exit gracefully.
Note that I haven't shown the
usage function here. You would still need to code that somewhere and have it print out the appropriate summary; it's not automatic.
So what are all those parameters you pass to the getopt function? Well, the first one is simply the raw list of command-line flags and arguments (not including the first element,
the script name, which you already chopped off before calling the main function). The second is the list of short command-line flags that the script accepts.
"hg:d"
-h
- print usage summary
-g ...
- use specified grammar file or URL
-d
- show debugging information while parsing
The first and third flags are simply standalone flags; you specify them or you don't, and they do things (print help) or change
state (turn on debugging). However, the second flag (-g) must be followed by an argument, which is the name of the grammar file to read from. In fact it can be a filename or a web address,
and you don't know which yet (you'll figure it out later), but you know it has to be something. So you tell getopt this by putting a colon after the g in that second parameter to the getopt function.
To further complicate things, the script accepts either short flags (like -h) or long flags (like --help), and you want them to do the same thing. This is what the third parameter to getopt is for, to specify a list of the long flags that correspond to the short flags you specified in the second parameter.
["help", "grammar="]
--help
- print usage summary
--grammar ...
- use specified grammar file or URL
Three things of note here:
- All long flags are preceded by two dashes on the command line, but you don't include those dashes when calling
getopt. They are understood.
- The
--grammar flag must always be followed by an additional argument, just like the -g flag. This is notated by an equals sign, "grammar=".
- The list of long flags is shorter than the list of short flags, because the
-d flag does not have a corresponding long version. This is fine; only -d will turn on debugging. But the order of short and long flags needs to be the same, so you'll need to specify all the short
flags that do have corresponding long flags first, then all the rest of the short flags.
Confused yet? Let's look at the actual code and see if it makes sense in context.
Example 10.23. Handling command-line arguments in kgp.py
def main(argv): ①
grammar = "kant.xml"
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
except getopt.GetoptError:
usage()
sys.exit(2)
for opt, arg in opts: ②
if opt in ("-h", "--help"): ③
usage()
sys.exit()
elif opt == '-d': ④
global _debug
_debug = 1
elif opt in ("-g", "--grammar"): ⑤
grammar = arg
source = "".join(args) ⑥
k = KantGenerator(grammar, source)
print k.output()
- The grammar variable will keep track of the grammar file you're using. You initialize it here in case it's not specified on the command
line (using either the
-g or the --grammar flag).
- The opts variable that you get back from
getopt contains a list of tuples: flag and argument. If the flag doesn't take an argument, then arg will simply be None. This makes it easier to loop through the flags.
getopt validates that the command-line flags are acceptable, but it doesn't do any sort of conversion between short and long flags.
If you specify the -h flag, opt will contain "-h"; if you specify the --help flag, opt will contain "--help". So you need to check for both.
- Remember, the
-d flag didn't have a corresponding long flag, so you only need to check for the short form. If you find it, you set a global
variable that you'll refer to later to print out debugging information. (I used this during the development of the script.
What, you thought all these examples worked on the first try?)
- If you find a grammar file, either with a
-g flag or a --grammar flag, you save the argument that followed it (stored in arg) into the grammar variable, overwriting the default that you initialized at the top of the main function.
- That's it. You've looped through and dealt with all the command-line flags. That means that anything left must be command-line
arguments. These come back from the
getopt function in the args variable. In this case, you're treating them as source material for the parser. If there are no command-line arguments
specified, args will be an empty list, and source will end up as the empty string.
10.7. Putting it all together
You've covered a lot of ground. Let's step back and see how all the pieces fit together.
To start with, this is a script that takes its arguments on the command line, using the getopt module.
def main(argv):
...
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
except getopt.GetoptError:
...
for opt, arg in opts:
...
You create a new instance of the KantGenerator class, and pass it the grammar file and source that may or may not have been specified on the command line.
k = KantGenerator(grammar, source)
The KantGenerator instance automatically loads the grammar, which is an XML file. You use your custom openAnything function to open the file (which could be stored in a local file or a remote web server), then use the built-in minidom parsing functions to parse the XML into a tree of Python objects.
def _load(self, source):
sock = toolbox.openAnything(source)
xmldoc = minidom.parse(sock).documentElement
sock.close()
Oh, and along the way, you take advantage of your knowledge of the structure of the XML document to set up a little cache of references, which are just elements in the XML document.
def loadGrammar(self, grammar):
for ref in self.grammar.getElementsByTagName("ref"):
self.refs[ref.attributes["id"].value] = ref
If you specified some source material on the command line, you use that; otherwise you rip through the grammar looking for
the "top-level" reference (that isn't referenced by anything else) and use that as a starting point.
def getDefaultSource(self):
xrefs = {}
for xref in self.grammar.getElementsByTagName("xref"):
xrefs[xref.attributes["id"].value] = 1
xrefs = xrefs.keys()
standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
return '<xref id="%s"/>' % random.choice(standaloneXrefs)
Now you rip through the source material. The source material is also XML, and you parse it one node at a time. To keep the code separated and more maintainable, you use separate handlers for each node type.
def parse_Element(self, node):
handlerMethod = getattr(self, "do_%s" % node.tagName)
handlerMethod(node)
You bounce through the grammar, parsing all the children of each p element,
def do_p(self, node):
...
if doit:
for child in node.childNodes: self.parse(child)
replacing choice elements with a random child,
def do_choice(self, node):
self.parse(self.randomChildElement(node))
and replacing xref elements with a random child of the corresponding ref element, which you previously cached.
def do_xref(self, node):
id = node.attributes["id"].value
self.parse(self.randomChildElement(self.refs[id]))
Eventually, you parse your way down to plain text,
def parse_Text(self, node):
text = node.data
...
self.pieces.append(text)
which you print out.
def main(argv):
...
k = KantGenerator(grammar, source)
print k.output()
10.8. Summary
Python comes with powerful libraries for parsing and manipulating XML documents. The minidom takes an XML file and parses it into Python objects, providing for random access to arbitrary elements. Furthermore, this chapter shows how Python can be used to create a "real" standalone command-line script, complete with command-line flags, command-line arguments,
error handling, even the ability to take input from the piped result of a previous program.
Before moving on to the next chapter, you should be comfortable doing all of these things:
The following is a complete Python program that acts as a cheap and simple regression testing framework. It takes unit tests that you've written for individual
modules, collects them all into one big test suite, and runs them all at once. I actually use this script as part of the
build process for this book; I have unit tests for several of the example programs (not just the roman.py module featured in Chapter 13, Unit Testing), and the first thing my automated build script does is run this program to make sure all my examples still work. If this
regression test fails, the build immediately stops. I don't want to release non-working examples any more than you want to
download them and sit around scratching your head and yelling at your monitor and wondering why they don't work.
Example 16.1. regression.py
If you have not already done so, you can download this and other examples used in this book.
"""Regression testing framework
This module will search for scripts in the same directory named
XYZtest.py. Each such script should be a test suite that tests a
module through PyUnit. (As of Python 2.1, PyUnit is included in
the standard library as "unittest".) This script will aggregate all
found test suites into one big test suite and run them all at once.
"""
import sys, os, re, unittest
def regressionTest():
path = os.path.abspath(os.path.dirname(sys.argv[0]))
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = filter(test.search, files)
filenameToModuleName = lambda f: os.path.splitext(f)[0]
moduleNames = map(filenameToModuleName, files)
modules = map(__import__, moduleNames)
load = unittest.defaultTestLoader.loadTestsFromModule
return unittest.TestSuite(map(load, modules))
if __name__ == "__main__":
unittest.main(defaultTest="regressionTest")
Running this script in the same directory as the rest of the example scripts that come with this book will find all the unit
tests, named moduletest.py, run them as a single test, and pass or fail them all at once.
Example 16.2. Sample output of regression.py
[you@localhost py]$ python regression.py -v
help should fail with no object ... ok ①
help should return known result for apihelper ... ok
help should honor collapse argument ... ok
help should honor spacing argument ... ok
buildConnectionString should fail with list input ... ok ②
buildConnectionString should fail with string input ... ok
buildConnectionString should fail with tuple input ... ok
buildConnectionString handles empty dictionary ... ok
buildConnectionString returns known result with known input ... ok
from_roman should only accept uppercase input ... ok ③
to_roman should always return uppercase ... ok
from_roman should fail with blank string ... ok
from_roman should fail with malformed antecedents ... ok
from_roman should fail with repeated pairs of numerals ... ok
from_roman should fail with too many repeated numerals ... ok
from_roman should give known result with known input ... ok
to_roman should give known result with known input ... ok
from_roman(to_roman(n))==n for all n ... ok
to_roman should fail with non-integer input ... ok
to_roman should fail with negative input ... ok
to_roman should fail with large input ... ok
to_roman should fail with 0 input ... ok
kgp a ref test ... ok
kgp b ref test ... ok
kgp c ref test ... ok
kgp d ref test ... ok
kgp e ref test ... ok
kgp f ref test ... ok
kgp g ref test ... ok
----------------------------------------------------------------------
Ran 29 tests in 2.799s
OK
- The first 5 tests are from
apihelpertest.py, which tests the example script from Chapter 4, The Power Of Introspection.
- The next 5 tests are from
odbchelpertest.py, which tests the example script from Chapter 2, Your First Python Program.
- The rest are from
romantest.py, which you studied in depth in Chapter 13, Unit Testing.
16.2. Finding the path
When running Python scripts from the command line, it is sometimes useful to know where the currently running script is located on disk.
This is one of those obscure little tricks that is virtually impossible to figure out on your own, but simple to remember
once you see it. The key to it is sys.argv. As you saw in Chapter 9, XML Processing, this is a list that holds the list of command-line arguments. However, it also holds the name of the running script, exactly
as it was called from the command line, and this is enough information to determine its location.
Example 16.3. fullpath.py
If you have not already done so, you can download this and other examples used in this book.
import sys, os
print 'sys.argv[0] =', sys.argv[0] ①
pathname = os.path.dirname(sys.argv[0]) ②
print 'path =', pathname
print 'full path =', os.path.abspath(pathname) ③
- Regardless of how you run a script,
sys.argv[0] will always contain the name of the script, exactly as it appears on the command line. This may or may not include any path
information, as you'll see shortly.
os.path.dirname takes a filename as a string and returns the directory path portion. If the given filename does not include any path information,
os.path.dirname returns an empty string.
os.path.abspath is the key here. It takes a pathname, which can be partial or even blank, and returns a fully qualified pathname.
os.path.abspath deserves further explanation. It is very flexible; it can take any kind of pathname.
Example 16.4. Further explanation of os.path.abspath
>>> import os
>>> os.getcwd() ①
/home/you
>>> os.path.abspath('') ②
/home/you
>>> os.path.abspath('.ssh') ③
/home/you/.ssh
>>> os.path.abspath('/home/you/.ssh') ④
/home/you/.ssh
>>> os.path.abspath('.ssh/../foo/') ⑤
/home/you/foo
os.getcwd() returns the current working directory.
- Calling
os.path.abspath with an empty string returns the current working directory, same as os.getcwd().
- Calling
os.path.abspath with a partial pathname constructs a fully qualified pathname out of it, based on the current working directory.
- Calling
os.path.abspath with a full pathname simply returns it.
os.path.abspath also normalizes the pathname it returns. Note that this example worked even though I don't actually have a 'foo' directory. os.path.abspath never checks your actual disk; this is all just string manipulation.
 | The pathnames and filenames you pass to os.path.abspath do not need to exist.
 | os.path.abspath not only constructs full path names, it also normalizes them. That means that if you are in the /usr/ directory, os.path.abspath('bin/../local/bin') will return /usr/local/bin. It normalizes the path by making it as simple as possible. If you just want to normalize a pathname like this without
turning it into a full pathname, use os.path.normpath instead.
Example 16.5. Sample output from fullpath.py
[you@localhost py]$ python /home/you/diveintopython3/common/py/fullpath.py ①
sys.argv[0] = /home/you/diveintopython3/common/py/fullpath.py
path = /home/you/diveintopython3/common/py
full path = /home/you/diveintopython3/common/py
[you@localhost diveintopython3]$ python common/py/fullpath.py ②
sys.argv[0] = common/py/fullpath.py
path = common/py
full path = /home/you/diveintopython3/common/py
[you@localhost diveintopython3]$ cd common/py
[you@localhost py]$ python fullpath.py ③
sys.argv[0] = fullpath.py
path =
full path = /home/you/diveintopython3/common/py
- In the first case,
sys.argv[0] includes the full path of the script. You can then use the os.path.dirname function to strip off the script name and return the full directory name, and os.path.abspath simply returns what you give it.
- If the script is run by using a partial pathname,
sys.argv[0] will still contain exactly what appears on the command line. os.path.dirname will then give you a partial pathname (relative to the current directory), and os.path.abspath will construct a full pathname from the partial pathname.
- If the script is run from the current directory without giving any path,
os.path.dirname will simply return an empty string. Given an empty string, os.path.abspath returns the current directory, which is what you want, since the script was run from the current directory.
 | Like the other functions in the os and os.path modules, os.path.abspath is cross-platform. Your results will look slightly different than my examples if you're running on Windows (which uses backslash
as a path separator) or Mac OS (which uses colons), but they'll still work. That's the whole point of the os module.
Addendum. One reader was dissatisfied with this solution, and wanted to be able to run all the unit tests in the current directory,
not the directory where regression.py is located. He suggests this approach instead:
Example 16.6. Running scripts in the current directoryimport sys, os, re, unittest
def regressionTest():
path = os.getcwd() ①
sys.path.append(path) ②
files = os.listdir(path) ③
- Instead of setting path to the directory where the currently running script is located, you set it to the current working directory instead. This
will be whatever directory you were in before you ran the script, which is not necessarily the same as the directory the script
is in. (Read that sentence a few times until you get it.)
- Append this directory to the Python library search path, so that when you dynamically import the unit test modules later, Python can find them. You didn't need to do this when path was the directory of the currently running script, because Python always looks in that directory.
- The rest of the function is the same.
This technique will allow you to re-use this regression.py script on multiple projects. Just put the script in a common directory, then change to the project's directory before running
it. All of that project's unit tests will be found and tested, instead of the unit tests in the common directory where regression.py is located.
[more functional programming stuff was here]
16.6. Dynamically importing modules
OK, enough philosophizing. Let's talk about dynamically importing modules.
First, let's look at how you normally import modules. The import module syntax looks in the search path for the named module and imports it by name. You can even import multiple modules at once
this way, with a comma-separated list. You did this on the very first line of this chapter's script.
Example 16.13. Importing multiple modules at once
import sys, os, re, unittest ①
- This imports four modules at once:
sys (for system functions and access to the command line parameters), os (for operating system functions like directory listings), re (for regular expressions), and unittest (for unit testing).
Now let's do the same thing, but with dynamic imports.
Example 16.14. Importing modules dynamically
>>> sys = __import__('sys') ①
>>> os = __import__('os')
>>> re = __import__('re')
>>> unittest = __import__('unittest')
>>> sys ②
>>> <module 'sys' (built-in)>
>>> os
>>> <module 'os' from '/usr/local/lib/python2.2/os.pyc'>
- The built-in
__import__ function accomplishes the same goal as using the import statement, but it's an actual function, and it takes a string as an argument.
- The variable sys is now the
sys module, just as if you had said import sys. The variable os is now the os module, and so forth.
So __import__ imports a module, but takes a string argument to do it. In this case the module you imported was just a hard-coded string,
but it could just as easily be a variable, or the result of a function call. And the variable that you assign the module
to doesn't need to match the module name, either. You could import a series of modules and assign them to a list.
Example 16.15. Importing a list of modules dynamically
>>> moduleNames = ['sys', 'os', 're', 'unittest'] ①
>>> moduleNames
['sys', 'os', 're', 'unittest']
>>> modules = map(__import__, moduleNames) ②
>>> modules ③
[<module 'sys' (built-in)>,
<module 'os' from 'c:\Python22\lib\os.pyc'>,
<module 're' from 'c:\Python22\lib\re.pyc'>,
<module 'unittest' from 'c:\Python22\lib\unittest.pyc'>]
>>> modules[0].version ④
'2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]'
>>> import sys
>>> sys.version
'2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]'
- moduleNames is just a list of strings. Nothing fancy, except that the strings happen to be names of modules that you could import, if
you wanted to.
- Surprise, you wanted to import them, and you did, by mapping the
__import__ function onto the list. Remember, this takes each element of the list (moduleNames) and calls the function (__import__) over and over, once with each element of the list, builds a list of the return values, and returns the result.
- So now from a list of strings, you've created a list of actual modules. (Your paths may be different, depending on your operating
system, where you installed Python, the phase of the moon, etc.)
- To drive home the point that these are real modules, let's look at some module attributes. Remember, modules[0] is the
sys module, so modules[0].version is sys.version. All the other attributes and methods of these modules are also available. There's nothing magic about the import statement, and there's nothing magic about modules. Modules are objects. Everything is an object.
Now you should be able to put this all together and figure out what most of this chapter's code sample is doing.
16.7. Putting it all together
You've learned enough now to deconstruct the first seven lines of this chapter's code sample: reading a directory and importing
selected modules within it.
Example 16.16. The regressionTest function
def regressionTest():
path = os.path.abspath(os.path.dirname(sys.argv[0]))
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = filter(test.search, files)
filenameToModuleName = lambda f: os.path.splitext(f)[0]
moduleNames = map(filenameToModuleName, files)
modules = map(__import__, moduleNames)
load = unittest.defaultTestLoader.loadTestsFromModule
return unittest.TestSuite(map(load, modules))
Let's look at it line by line, interactively. Assume that the current directory is c:\diveintopython3\py, which contains the examples that come with this book, including this chapter's script. As you saw in Section 16.2, “Finding the path”, the script directory will end up in the path variable, so let's start hard-code that and go from there.
Example 16.17. Step 1: Get all the files
>>> import sys, os, re, unittest
>>> path = r'c:\diveintopython3\py'
>>> files = os.listdir(path)
>>> files ①
['BaseHTMLProcessor.py', 'LICENSE.txt', 'apihelper.py', 'apihelpertest.py',
'argecho.py', 'autosize.py', 'builddialectexamples.py', 'dialect.py',
'fileinfo.py', 'fullpath.py', 'kgptest.py', 'makerealworddoc.py',
'odbchelper.py', 'odbchelpertest.py', 'parsephone.py', 'piglatin.py',
'plural.py', 'pluraltest.py', 'pyfontify.py', 'regression.py', 'roman.py', 'romantest.py',
'uncurly.py', 'unicode2koi8r.py', 'urllister.py', 'kgp', 'plural', 'roman',
'colorize.py']
- files is a list of all the files and directories in the script's directory. (If you've been running some of the examples already,
you may also see some
.pyc files in there as well.)
Example 16.18. Step 2: Filter to find the files you care about
>>> test = re.compile("test\.py$", re.IGNORECASE) ①
>>> files = filter(test.search, files) ②
>>> files ③
['apihelpertest.py', 'kgptest.py', 'odbchelpertest.py', 'pluraltest.py', 'romantest.py']
- This regular expression will match any string that ends with
test.py. Note that you need to escape the period, since a period in a regular expression usually means “match any single character”, but you actually want to match a literal period instead.
- The compiled regular expression acts like a function, so you can use it to filter the large list of files and directories,
to find the ones that match the regular expression.
- And you're left with the list of unit testing scripts, because they were the only ones named
SOMETHINGtest.py.
Example 16.19. Step 3: Map filenames to module names
>>> filenameToModuleName = lambda f: os.path.splitext(f)[0] ①
>>> filenameToModuleName('romantest.py') ②
'romantest'
>>> filenameToModuleName('odchelpertest.py')
'odbchelpertest'
>>> moduleNames = map(filenameToModuleName, files) ③
>>> moduleNames ④
['apihelpertest', 'kgptest', 'odbchelpertest', 'pluraltest', 'romantest']
- As you saw in Section 4.7, “Using lambda Functions”,
lambda is a quick-and-dirty way of creating an inline, one-line function. This one takes a filename with an extension and returns
just the filename part, using the standard library function os.path.splitext that you saw in Example 6.17, “Splitting Pathnames”.
- filenameToModuleName is a function. There's nothing magic about
lambda functions as opposed to regular functions that you define with a def statement. You can call the filenameToModuleName function like any other, and it does just what you wanted it to do: strips the file extension off of its argument.
- Now you can apply this function to each file in the list of unit test files, using
map.
- And the result is just what you wanted: a list of modules, as strings.
Example 16.20. Step 4: Mapping module names to modules
>>> modules = map(__import__, moduleNames)①
>>> modules ②
[<module 'apihelpertest' from 'apihelpertest.py'>,
<module 'kgptest' from 'kgptest.py'>,
<module 'odbchelpertest' from 'odbchelpertest.py'>,
<module 'pluraltest' from 'pluraltest.py'>,
<module 'romantest' from 'romantest.py'>]
>>> modules[-1] ③
<module 'romantest' from 'romantest.py'>
- As you saw in Section 16.6, “Dynamically importing modules”, you can use a combination of
map and __import__ to map a list of module names (as strings) into actual modules (which you can call or access like any other module).
- modules is now a list of modules, fully accessible like any other module.
- The last module in the list is the
romantest module, just as if you had said import romantest.
Example 16.21. Step 5: Loading the modules into a test suite
>>> load = unittest.defaultTestLoader.loadTestsFromModule
>>> map(load, modules) ①
[<unittest.TestSuite tests=[
<unittest.TestSuite tests=[<apihelpertest.BadInput testMethod=testNoObject>]>,
<unittest.TestSuite tests=[<apihelpertest.KnownValues testMethod=testApiHelper>]>,
<unittest.TestSuite tests=[
<apihelpertest.ParamChecks testMethod=testCollapse>,
<apihelpertest.ParamChecks testMethod=testSpacing>]>,
...
]
]
>>> unittest.TestSuite(map(load, modules)) ②
- These are real module objects. Not only can you access them like any other module, instantiate classes and call functions,
you can also introspect into the module to figure out which classes and functions it has in the first place. That's what
the
loadTestsFromModule method does: it introspects into each module and returns a unittest.TestSuite object for each module. Each TestSuite object actually contains a list of TestSuite objects, one for each TestCase class in your module, and each of those TestSuite objects contains a list of tests, one for each test method in your module.
- Finally, you wrap the list of
TestSuite objects into one big test suite. The unittest module has no problem traversing this tree of nested test suites within test suites; eventually it gets down to an individual
test method and executes it, verifies that it passes or fails, and moves on to the next one.
This introspection process is what the unittest module usually does for us. Remember that magic-looking unittest.main() function that our individual test modules called to kick the whole thing off? unittest.main() actually creates an instance of unittest.TestProgram, which in turn creates an instance of a unittest.defaultTestLoader and loads it up with the module that called it. (How does it get a reference to the module that called it if you don't give
it one? By using the equally-magic __import__('__main__') command, which dynamically imports the currently-running module. I could write a book on all the tricks and techniques used
in the unittest module, but then I'd never finish this one.)
Example 16.22. Step 6: Telling unittest to use your test suite
if __name__ == "__main__":
unittest.main(defaultTest="regressionTest") ①
- Instead of letting the
unittest module do all its magic for us, you've done most of it yourself. You've created a function (regressionTest) that imports the modules yourself, calls unittest.defaultTestLoader yourself, and wraps it all up in a test suite. Now all you need to do is tell unittest that, instead of looking for tests and building a test suite in the usual way, it should just call the regressionTest function, which returns a ready-to-use TestSuite.
16.8. Summary
The regression.py program and its output should now make perfect sense.
You should now feel comfortable doing all of these things:
| | | | |