Example 6.12. Introducing sys.modules

>>> import sys        
>>> print '\n'.join(sys.modules.keys()) 
win32api
os.path
os
exceptions
__main__
ntpath
nt
sys
__builtin__
site
signal
UserDict
stat
  1. The sys module contains system-level information, such as the version of Python you're running (sys.version or sys.version_info), and system-level options such as the maximum allowed recursion depth (sys.getrecursionlimit() and sys.setrecursionlimit()).
  2. sys.modules is a dictionary containing all the modules that have ever been imported since Python was started; the key is the module name, the value is the module object. Note that this is more than just the modules your program has imported. Python preloads some modules on startup, and if you're using a Python IDE, sys.modules contains all the modules imported by all the programs you've run within the IDE.

    This example demonstrates how to use sys.modules.

    Example 6.13. Using sys.modules

    >>> import fileinfo         
    >>> print '\n'.join(sys.modules.keys())
    win32api
    os.path
    os
    fileinfo
    exceptions
    __main__
    ntpath
    nt
    sys
    __builtin__
    site
    signal
    UserDict
    stat
    >>> fileinfo
    <module 'fileinfo' from 'fileinfo.pyc'>
    >>> sys.modules["fileinfo"] 
    <module 'fileinfo' from 'fileinfo.pyc'>
    1. As new modules are imported, they are added to sys.modules. This explains why importing the same module twice is very fast: Python has already loaded and cached the module in sys.modules, so importing the second time is simply a dictionary lookup.
    2. Given the name (as a string) of any previously-imported module, you can get a reference to the module itself through the sys.modules dictionary.

      The next example shows how to use the __module__ class attribute with the sys.modules dictionary to get a reference to the module in which a class is defined.

      Example 6.14. The __module__ Class Attribute

      >>> from fileinfo import MP3FileInfo
      >>> MP3FileInfo.__module__              
      'fileinfo'
      >>> sys.modules[MP3FileInfo.__module__] 
      <module 'fileinfo' from 'fileinfo.pyc'>
      1. Every Python class has a built-in class attribute __module__, which is the name of the module in which the class is defined.
      2. Combining this with the sys.modules dictionary, you can get a reference to the module in which a class is defined.

        Now you're ready to see how sys.modules is used in fileinfo.py, the sample program introduced in Chapter 5. This example shows that portion of the code.

        Example 6.15. sys.modules in fileinfo.py

        
            def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):       
                "get file info class from filename extension"           
                subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]        
                return hasattr(module, subclass) and getattr(module, subclass) or FileInfo 
        1. This is a function with two arguments; filename is required, but module is optional and defaults to the module that contains the FileInfo class. This looks inefficient, because you might expect Python to evaluate the sys.modules expression every time the function is called. In fact, Python evaluates default expressions only once, the first time the module is imported. As you'll see later, you never call this function with a module argument, so module serves as a function-level constant.
        2. You'll plow through this line later, after you dive into the os module. For now, take it on faith that subclass ends up as the name of a class, like MP3FileInfo.
        3. You already know about getattr, which gets a reference to an object by name. hasattr is a complementary function that checks whether an object has a particular attribute; in this case, whether a module has a particular class (although it works for any object and any attribute, just like getattr). In English, this line of code says, “If this module has the class named by subclass then return it, otherwise return the base class FileInfo.”

          Further Reading on Modules

          8.5. locals and globals

          Let's digress from HTML processing for a minute and talk about how Python handles variables. Python has two built-in functions, locals and globals, which provide dictionary-based access to local and global variables.

          Remember locals? You first saw it here:

          
              def unknown_starttag(self, tag, attrs):
                  strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs])
                  self.pieces.append("<%(tag)s%(strattrs)s>" % locals())
          

          No, wait, you can't learn about locals yet. First, you need to learn about namespaces. This is dry stuff, but it's important, so pay attention.

          Python uses what are called namespaces to keep track of variables. A namespace is just like a dictionary where the keys are names of variables and the dictionary values are the values of those variables. In fact, you can access a namespace as a Python dictionary, as you'll see in a minute.

          At any particular point in a Python program, there are several namespaces available. Each function has its own namespace, called the local namespace, which keeps track of the function's variables, including function arguments and locally defined variables. Each module has its own namespace, called the global namespace, which keeps track of the module's variables, including functions, classes, any other imported modules, and module-level variables and constants. And there is the built-in namespace, accessible from any module, which holds built-in functions and exceptions.

          When a line of code asks for the value of a variable x, Python will search for that variable in all the available namespaces, in order:

          1. local namespace - specific to the current function or class method. If the function defines a local variable x, or has an argument x, Python will use this and stop searching.
          2. global namespace - specific to the current module. If the module has defined a variable, function, or class called x, Python will use that and stop searching.
          3. built-in namespace - global to all modules. As a last resort, Python will assume that x is the name of built-in function or variable.

          If Python doesn't find x in any of these namespaces, it gives up and raises a NameError with the message There is no variable named 'x', which you saw back in Example 3.18, “Referencing an Unbound Variable”, but you didn't appreciate how much work Python was doing before giving you that error.
          ImportantPython 2.2 introduced a subtle but important change that affects the namespace search order: nested scopes. In versions of Python prior to 2.2, when you reference a variable within a nested function or lambda function, Python will search for that variable in the current (nested or lambda) function's namespace, then in the module's namespace. Python 2.2 will search for the variable in the current (nested or lambda) function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1 can work either way; by default, it works like Python 2.0, but you can add the following line of code at the top of your module to make your module work like Python 2.2:
          
          from __future__ import nested_scopes

          Are you confused yet? Don't despair! This is really cool, I promise. Like many things in Python, namespaces are directly accessible at run-time. How? Well, the local namespace is accessible via the built-in locals function, and the global (module level) namespace is accessible via the built-in globals function.

          Example 8.10. Introducing locals

          >>> def foo(arg): 
          ...    x = 1
          ...    print locals()
          ...    
          >>> foo(7)        
          {'arg': 7, 'x': 1}
          >>> foo('bar')    
          {'arg': 'bar', 'x': 1}
          1. The function foo has two variables in its local namespace: arg, whose value is passed in to the function, and x, which is defined within the function.
          2. locals returns a dictionary of name/value pairs. The keys of this dictionary are the names of the variables as strings; the values of the dictionary are the actual values of the variables. So calling foo with 7 prints the dictionary containing the function's two local variables: arg (7) and x (1).
          3. Remember, Python has dynamic typing, so you could just as easily pass a string in for arg; the function (and the call to locals) would still work just as well. locals works with all variables of all datatypes.

            What locals does for the local (function) namespace, globals does for the global (module) namespace. globals is more exciting, though, because a module's namespace is more exciting. [3] Not only does the module's namespace include module-level variables and constants, it includes all the functions and classes defined in the module. Plus, it includes anything that was imported into the module.

            Remember the difference between from module import and import module? With import module, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access any of its functions or attributes: module.function. But with from module import, you're actually importing specific functions and attributes from another module into your own namespace, which is why you access them directly without referencing the original module they came from. With the globals function, you can actually see this happen.

            Example 8.11. Introducing globals

            Look at the following block of code at the bottom of BaseHTMLProcessor.py:

            
            if __name__ == "__main__":
                for k, v in globals().items():             
                    print k, "=", v
            1. Just so you don't get intimidated, remember that you've seen all this before. The globals function returns a dictionary, and you're iterating through the dictionary using the items method and multi-variable assignment. The only thing new here is the globals function.

              Now running the script from the command line gives this output (note that your output may be slightly different, depending on your platform and where you installed Python):

              c:\docbook\dip\py> python BaseHTMLProcessor.py
              
              SGMLParser = sgmllib.SGMLParser                
              htmlentitydefs = <module 'htmlentitydefs' from 'C:\Python23\lib\htmlentitydefs.py'> 
              BaseHTMLProcessor = __main__.BaseHTMLProcessor 
              __name__ = __main__          
              ... rest of output omitted for brevity...
              1. SGMLParser was imported from sgmllib, using from module import. That means that it was imported directly into the module's namespace, and here it is.
              2. Contrast this with htmlentitydefs, which was imported using import. That means that the htmlentitydefs module itself is in the namespace, but the entitydefs variable defined within htmlentitydefs is not.
              3. This module only defines one class, BaseHTMLProcessor, and here it is. Note that the value here is the class itself, not a specific instance of the class.
              4. Remember the if __name__ trick? When running a module (as opposed to importing it from another module), the built-in __name__ attribute is a special value, __main__. Since you ran this module as a script from the command line, __name__ is __main__, which is why the little test code to print the globals got executed.
                NoteUsing the locals and globals functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors the functionality of the getattr function, which allows you to access arbitrary functions dynamically by providing the function name as a string.

                There is one other important difference between the locals and globals functions, which you should learn now before it bites you. It will bite you anyway, but at least then you'll remember learning it.

                Example 8.12. locals is read-only, globals is not

                
                def foo(arg):
                    x = 1
                    print locals()    
                    locals()["x"] = 2 
                    print "x=",x      
                
                z = 7
                print "z=",z
                foo(3)
                globals()["z"] = 8    
                print "z=",z          
                
                1. Since foo is called with 3, this will print {'arg': 3, 'x': 1}. This should not be a surprise.
                2. locals is a function that returns a dictionary, and here you are setting a value in that dictionary. You might think that this would change the value of the local variable x to 2, but it doesn't. locals does not actually return the local namespace, it returns a copy. So changing it does nothing to the value of the variables in the local namespace.
                3. This prints x= 1, not x= 2.
                4. After being burned by locals, you might think that this wouldn't change the value of z, but it does. Due to internal differences in how Python is implemented (which I'd rather not go into, since I don't fully understand them myself), globals returns the actual global namespace, not a copy: the exact opposite behavior of locals. So any changes to the dictionary returned by globals directly affect your global variables.
                5. This prints z= 8, not z= 7. [XML stuff was here]

                  9.2. Packages

                  10.6. Handling command-line arguments

                  Python fully supports creating programs that can be run on the command line, complete with command-line arguments and either short- or long-style flags to specify various options. None of this is XML-specific, but this script makes good use of command-line processing, so it seemed like a good time to mention it.

                  It's difficult to talk about command-line processing without understanding how command-line arguments are exposed to your Python program, so let's write a simple program to see them.

                  Example 10.20. Introducing sys.argv

                  If you have not already done so, you can download this and other examples used in this book.

                  
                  #argecho.py
                  import sys
                  
                  for arg in sys.argv: 
                      print arg
                  1. Each command-line argument passed to the program will be in sys.argv, which is just a list. Here you are printing each argument on a separate line.

                    Example 10.21. The contents of sys.argv

                    [you@localhost py]$ python argecho.py             
                    argecho.py
                    [you@localhost py]$ python argecho.py abc def     
                    argecho.py
                    abc
                    def
                    [you@localhost py]$ python argecho.py --help      
                    argecho.py
                    --help
                    [you@localhost py]$ python argecho.py -m kant.xml 
                    argecho.py
                    -m
                    kant.xml
                    1. The first thing to know about sys.argv is that it contains the name of the script you're calling. You will actually use this knowledge to your advantage later, in Chapter 16, Functional Programming. Don't worry about it for now.
                    2. Command-line arguments are separated by spaces, and each shows up as a separate element in the sys.argv list.
                    3. Command-line flags, like --help, also show up as their own element in the sys.argv list.
                    4. To make things even more interesting, some command-line flags themselves take arguments. For instance, here you have a flag (-m) which takes an argument (kant.xml). Both the flag itself and the flag's argument are simply sequential elements in the sys.argv list. No attempt is made to associate one with the other; all you get is a list.

                      So as you can see, you certainly have all the information passed on the command line, but then again, it doesn't look like it's going to be all that easy to actually use it. For simple programs that only take a single argument and have no flags, you can simply use sys.argv[1] to access the argument. There's no shame in this; I do it all the time. For more complex programs, you need the getopt module.

                      Example 10.22. Introducing getopt

                      
                      def main(argv):       
                          grammar = "kant.xml"                 
                          try:              
                              opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="]) 
                          except getopt.GetoptError:           
                              usage()        
                              sys.exit(2)   
                      
                      ...
                      
                      if __name__ == "__main__":
                          main(sys.argv[1:])
                      1. First off, look at the bottom of the example and notice that you're calling the main function with sys.argv[1:]. Remember, sys.argv[0] is the name of the script that you're running; you don't care about that for command-line processing, so you chop it off and pass the rest of the list.
                      2. This is where all the interesting processing happens. The getopt function of the getopt module takes three parameters: the argument list (which you got from sys.argv[1:]), a string containing all the possible single-character command-line flags that this program accepts, and a list of longer command-line flags that are equivalent to the single-character versions. This is quite confusing at first glance, and is explained in more detail below.
                      3. If anything goes wrong trying to parse these command-line flags, getopt will raise an exception, which you catch. You told getopt all the flags you understand, so this probably means that the end user passed some command-line flag that you don't understand.
                      4. As is standard practice in the UNIX world, when the script is passed flags it doesn't understand, you print out a summary of proper usage and exit gracefully. Note that I haven't shown the usage function here. You would still need to code that somewhere and have it print out the appropriate summary; it's not automatic.

                        So what are all those parameters you pass to the getopt function? Well, the first one is simply the raw list of command-line flags and arguments (not including the first element, the script name, which you already chopped off before calling the main function). The second is the list of short command-line flags that the script accepts.

                        "hg:d"

                        -h
                        print usage summary
                        -g ...
                        use specified grammar file or URL
                        -d
                        show debugging information while parsing

                        The first and third flags are simply standalone flags; you specify them or you don't, and they do things (print help) or change state (turn on debugging). However, the second flag (-g) must be followed by an argument, which is the name of the grammar file to read from. In fact it can be a filename or a web address, and you don't know which yet (you'll figure it out later), but you know it has to be something. So you tell getopt this by putting a colon after the g in that second parameter to the getopt function.

                        To further complicate things, the script accepts either short flags (like -h) or long flags (like --help), and you want them to do the same thing. This is what the third parameter to getopt is for, to specify a list of the long flags that correspond to the short flags you specified in the second parameter.

                        ["help", "grammar="]

                        --help
                        print usage summary
                        --grammar ...
                        use specified grammar file or URL

                        Three things of note here:

                        1. All long flags are preceded by two dashes on the command line, but you don't include those dashes when calling getopt. They are understood.
                        2. The --grammar flag must always be followed by an additional argument, just like the -g flag. This is notated by an equals sign, "grammar=".
                        3. The list of long flags is shorter than the list of short flags, because the -d flag does not have a corresponding long version. This is fine; only -d will turn on debugging. But the order of short and long flags needs to be the same, so you'll need to specify all the short flags that do have corresponding long flags first, then all the rest of the short flags.

                        Confused yet? Let's look at the actual code and see if it makes sense in context.

                        Example 10.23. Handling command-line arguments in kgp.py

                        
                        def main(argv):        
                            grammar = "kant.xml"                
                            try:              
                                opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
                            except getopt.GetoptError:          
                                usage()       
                                sys.exit(2)   
                            for opt, arg in opts:                
                                if opt in ("-h", "--help"):      
                                    usage()   
                                    sys.exit()
                                elif opt == '-d':                
                                    global _debug               
                                    _debug = 1
                                elif opt in ("-g", "--grammar"): 
                                    grammar = arg               
                        
                            source = "".join(args)               
                        
                            k = KantGenerator(grammar, source)
                            print k.output()
                        1. The grammar variable will keep track of the grammar file you're using. You initialize it here in case it's not specified on the command line (using either the -g or the --grammar flag).
                        2. The opts variable that you get back from getopt contains a list of tuples: flag and argument. If the flag doesn't take an argument, then arg will simply be None. This makes it easier to loop through the flags.
                        3. getopt validates that the command-line flags are acceptable, but it doesn't do any sort of conversion between short and long flags. If you specify the -h flag, opt will contain "-h"; if you specify the --help flag, opt will contain "--help". So you need to check for both.
                        4. Remember, the -d flag didn't have a corresponding long flag, so you only need to check for the short form. If you find it, you set a global variable that you'll refer to later to print out debugging information. (I used this during the development of the script. What, you thought all these examples worked on the first try?)
                        5. If you find a grammar file, either with a -g flag or a --grammar flag, you save the argument that followed it (stored in arg) into the grammar variable, overwriting the default that you initialized at the top of the main function.
                        6. That's it. You've looped through and dealt with all the command-line flags. That means that anything left must be command-line arguments. These come back from the getopt function in the args variable. In this case, you're treating them as source material for the parser. If there are no command-line arguments specified, args will be an empty list, and source will end up as the empty string.

                          10.7. Putting it all together

                          You've covered a lot of ground. Let's step back and see how all the pieces fit together.

                          To start with, this is a script that takes its arguments on the command line, using the getopt module.

                          
                          def main(argv):       
                          ...
                              try:              
                                  opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="])
                              except getopt.GetoptError:          
                          ...
                              for opt, arg in opts:               
                          ...

                          You create a new instance of the KantGenerator class, and pass it the grammar file and source that may or may not have been specified on the command line.

                          
                              k = KantGenerator(grammar, source)

                          The KantGenerator instance automatically loads the grammar, which is an XML file. You use your custom openAnything function to open the file (which could be stored in a local file or a remote web server), then use the built-in minidom parsing functions to parse the XML into a tree of Python objects.

                          
                              def _load(self, source):
                                  sock = toolbox.openAnything(source)
                                  xmldoc = minidom.parse(sock).documentElement
                                  sock.close()

                          Oh, and along the way, you take advantage of your knowledge of the structure of the XML document to set up a little cache of references, which are just elements in the XML document.

                          
                              def loadGrammar(self, grammar):       
                                  for ref in self.grammar.getElementsByTagName("ref"):
                                      self.refs[ref.attributes["id"].value] = ref     

                          If you specified some source material on the command line, you use that; otherwise you rip through the grammar looking for the "top-level" reference (that isn't referenced by anything else) and use that as a starting point.

                          
                              def getDefaultSource(self):
                                  xrefs = {}
                                  for xref in self.grammar.getElementsByTagName("xref"):
                                      xrefs[xref.attributes["id"].value] = 1
                                  xrefs = xrefs.keys()
                                  standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
                                  return '<xref id="%s"/>' % random.choice(standaloneXrefs)

                          Now you rip through the source material. The source material is also XML, and you parse it one node at a time. To keep the code separated and more maintainable, you use separate handlers for each node type.

                          
                              def parse_Element(self, node): 
                                  handlerMethod = getattr(self, "do_%s" % node.tagName)
                                  handlerMethod(node)

                          You bounce through the grammar, parsing all the children of each p element,

                          
                              def do_p(self, node):
                          ...
                                  if doit:
                                      for child in node.childNodes: self.parse(child)

                          replacing choice elements with a random child,

                          
                              def do_choice(self, node):
                                  self.parse(self.randomChildElement(node))

                          and replacing xref elements with a random child of the corresponding ref element, which you previously cached.

                          
                              def do_xref(self, node):
                                  id = node.attributes["id"].value
                                  self.parse(self.randomChildElement(self.refs[id]))

                          Eventually, you parse your way down to plain text,

                          
                              def parse_Text(self, node):    
                                  text = node.data
                          ...
                                      self.pieces.append(text)

                          which you print out.

                          
                          def main(argv):       
                          ...
                              k = KantGenerator(grammar, source)
                              print k.output()

                          10.8. Summary

                          Python comes with powerful libraries for parsing and manipulating XML documents. The minidom takes an XML file and parses it into Python objects, providing for random access to arbitrary elements. Furthermore, this chapter shows how Python can be used to create a "real" standalone command-line script, complete with command-line flags, command-line arguments, error handling, even the ability to take input from the piped result of a previous program.

                          Before moving on to the next chapter, you should be comfortable doing all of these things:

                          The following is a complete Python program that acts as a cheap and simple regression testing framework. It takes unit tests that you've written for individual modules, collects them all into one big test suite, and runs them all at once. I actually use this script as part of the build process for this book; I have unit tests for several of the example programs (not just the roman.py module featured in Chapter 13, Unit Testing), and the first thing my automated build script does is run this program to make sure all my examples still work. If this regression test fails, the build immediately stops. I don't want to release non-working examples any more than you want to download them and sit around scratching your head and yelling at your monitor and wondering why they don't work.

                          Example 16.1. regression.py

                          If you have not already done so, you can download this and other examples used in this book.

                          
                          """Regression testing framework
                          
                          This module will search for scripts in the same directory named
                          XYZtest.py. Each such script should be a test suite that tests a
                          module through PyUnit. (As of Python 2.1, PyUnit is included in
                          the standard library as "unittest".)  This script will aggregate all
                          found test suites into one big test suite and run them all at once.
                          """
                          
                          import sys, os, re, unittest
                          
                          def regressionTest():
                              path = os.path.abspath(os.path.dirname(sys.argv[0]))   
                              files = os.listdir(path)             
                              test = re.compile("test\.py$", re.IGNORECASE)          
                              files = filter(test.search, files)   
                              filenameToModuleName = lambda f: os.path.splitext(f)[0]
                              moduleNames = map(filenameToModuleName, files)         
                              modules = map(__import__, moduleNames)                 
                              load = unittest.defaultTestLoader.loadTestsFromModule  
                              return unittest.TestSuite(map(load, modules))          
                          
                          if __name__ == "__main__": 
                              unittest.main(defaultTest="regressionTest")
                          

                          Running this script in the same directory as the rest of the example scripts that come with this book will find all the unit tests, named moduletest.py, run them as a single test, and pass or fail them all at once.

                          Example 16.2. Sample output of regression.py

                          [you@localhost py]$ python regression.py -v
                          help should fail with no object ... ok           
                          help should return known result for apihelper ... ok
                          help should honor collapse argument ... ok
                          help should honor spacing argument ... ok
                          buildConnectionString should fail with list input ... ok           
                          buildConnectionString should fail with string input ... ok
                          buildConnectionString should fail with tuple input ... ok
                          buildConnectionString handles empty dictionary ... ok
                          buildConnectionString returns known result with known input ... ok
                          from_roman should only accept uppercase input ... ok                
                          to_roman should always return uppercase ... ok
                          from_roman should fail with blank string ... ok
                          from_roman should fail with malformed antecedents ... ok
                          from_roman should fail with repeated pairs of numerals ... ok
                          from_roman should fail with too many repeated numerals ... ok
                          from_roman should give known result with known input ... ok
                          to_roman should give known result with known input ... ok
                          from_roman(to_roman(n))==n for all n ... ok
                          to_roman should fail with non-integer input ... ok
                          to_roman should fail with negative input ... ok
                          to_roman should fail with large input ... ok
                          to_roman should fail with 0 input ... ok
                          kgp a ref test ... ok
                          kgp b ref test ... ok
                          kgp c ref test ... ok
                          kgp d ref test ... ok
                          kgp e ref test ... ok
                          kgp f ref test ... ok
                          kgp g ref test ... ok
                          
                          ----------------------------------------------------------------------
                          Ran 29 tests in 2.799s
                          
                          OK
                          1. The first 5 tests are from apihelpertest.py, which tests the example script from Chapter 4, The Power Of Introspection.
                          2. The next 5 tests are from odbchelpertest.py, which tests the example script from Chapter 2, Your First Python Program.
                          3. The rest are from romantest.py, which you studied in depth in Chapter 13, Unit Testing.

                            16.2. Finding the path

                            When running Python scripts from the command line, it is sometimes useful to know where the currently running script is located on disk.

                            This is one of those obscure little tricks that is virtually impossible to figure out on your own, but simple to remember once you see it. The key to it is sys.argv. As you saw in Chapter 9, XML Processing, this is a list that holds the list of command-line arguments. However, it also holds the name of the running script, exactly as it was called from the command line, and this is enough information to determine its location.

                            Example 16.3. fullpath.py

                            If you have not already done so, you can download this and other examples used in this book.

                            
                            import sys, os
                            
                            print 'sys.argv[0] =', sys.argv[0]             
                            pathname = os.path.dirname(sys.argv[0])        
                            print 'path =', pathname
                            print 'full path =', os.path.abspath(pathname) 
                            1. Regardless of how you run a script, sys.argv[0] will always contain the name of the script, exactly as it appears on the command line. This may or may not include any path information, as you'll see shortly.
                            2. os.path.dirname takes a filename as a string and returns the directory path portion. If the given filename does not include any path information, os.path.dirname returns an empty string.
                            3. os.path.abspath is the key here. It takes a pathname, which can be partial or even blank, and returns a fully qualified pathname.

                              os.path.abspath deserves further explanation. It is very flexible; it can take any kind of pathname.

                              Example 16.4. Further explanation of os.path.abspath

                              >>> import os
                              >>> os.getcwd()      
                              /home/you
                              >>> os.path.abspath('')                
                              /home/you
                              >>> os.path.abspath('.ssh')            
                              /home/you/.ssh
                              >>> os.path.abspath('/home/you/.ssh') 
                              /home/you/.ssh
                              >>> os.path.abspath('.ssh/../foo/')    
                              /home/you/foo
                              1. os.getcwd() returns the current working directory.
                              2. Calling os.path.abspath with an empty string returns the current working directory, same as os.getcwd().
                              3. Calling os.path.abspath with a partial pathname constructs a fully qualified pathname out of it, based on the current working directory.
                              4. Calling os.path.abspath with a full pathname simply returns it.
                              5. os.path.abspath also normalizes the pathname it returns. Note that this example worked even though I don't actually have a 'foo' directory. os.path.abspath never checks your actual disk; this is all just string manipulation.
                                NoteThe pathnames and filenames you pass to os.path.abspath do not need to exist.
                                Noteos.path.abspath not only constructs full path names, it also normalizes them. That means that if you are in the /usr/ directory, os.path.abspath('bin/../local/bin') will return /usr/local/bin. It normalizes the path by making it as simple as possible. If you just want to normalize a pathname like this without turning it into a full pathname, use os.path.normpath instead.

                                Example 16.5. Sample output from fullpath.py

                                [you@localhost py]$ python /home/you/diveintopython3/common/py/fullpath.py 
                                sys.argv[0] = /home/you/diveintopython3/common/py/fullpath.py
                                path = /home/you/diveintopython3/common/py
                                full path = /home/you/diveintopython3/common/py
                                [you@localhost diveintopython3]$ python common/py/fullpath.py               
                                sys.argv[0] = common/py/fullpath.py
                                path = common/py
                                full path = /home/you/diveintopython3/common/py
                                [you@localhost diveintopython3]$ cd common/py
                                [you@localhost py]$ python fullpath.py 
                                sys.argv[0] = fullpath.py
                                path = 
                                full path = /home/you/diveintopython3/common/py
                                1. In the first case, sys.argv[0] includes the full path of the script. You can then use the os.path.dirname function to strip off the script name and return the full directory name, and os.path.abspath simply returns what you give it.
                                2. If the script is run by using a partial pathname, sys.argv[0] will still contain exactly what appears on the command line. os.path.dirname will then give you a partial pathname (relative to the current directory), and os.path.abspath will construct a full pathname from the partial pathname.
                                3. If the script is run from the current directory without giving any path, os.path.dirname will simply return an empty string. Given an empty string, os.path.abspath returns the current directory, which is what you want, since the script was run from the current directory.
                                  NoteLike the other functions in the os and os.path modules, os.path.abspath is cross-platform. Your results will look slightly different than my examples if you're running on Windows (which uses backslash as a path separator) or Mac OS (which uses colons), but they'll still work. That's the whole point of the os module.

                                  Addendum. One reader was dissatisfied with this solution, and wanted to be able to run all the unit tests in the current directory, not the directory where regression.py is located. He suggests this approach instead:

                                  Example 16.6. Running scripts in the current directory

                                  import sys, os, re, unittest
                                  
                                  def regressionTest():
                                      path = os.getcwd()       
                                      sys.path.append(path)    
                                      files = os.listdir(path) 
                                  
                                  1. Instead of setting path to the directory where the currently running script is located, you set it to the current working directory instead. This will be whatever directory you were in before you ran the script, which is not necessarily the same as the directory the script is in. (Read that sentence a few times until you get it.)
                                  2. Append this directory to the Python library search path, so that when you dynamically import the unit test modules later, Python can find them. You didn't need to do this when path was the directory of the currently running script, because Python always looks in that directory.
                                  3. The rest of the function is the same.

                                    This technique will allow you to re-use this regression.py script on multiple projects. Just put the script in a common directory, then change to the project's directory before running it. All of that project's unit tests will be found and tested, instead of the unit tests in the common directory where regression.py is located. [more functional programming stuff was here]

                                    16.6. Dynamically importing modules

                                    OK, enough philosophizing. Let's talk about dynamically importing modules.

                                    First, let's look at how you normally import modules. The import module syntax looks in the search path for the named module and imports it by name. You can even import multiple modules at once this way, with a comma-separated list. You did this on the very first line of this chapter's script.

                                    Example 16.13. Importing multiple modules at once

                                    
                                    import sys, os, re, unittest 
                                    
                                    1. This imports four modules at once: sys (for system functions and access to the command line parameters), os (for operating system functions like directory listings), re (for regular expressions), and unittest (for unit testing).

                                      Now let's do the same thing, but with dynamic imports.

                                      Example 16.14. Importing modules dynamically

                                      >>> sys = __import__('sys')           
                                      >>> os = __import__('os')
                                      >>> re = __import__('re')
                                      >>> unittest = __import__('unittest')
                                      >>> sys             
                                      >>> <module 'sys' (built-in)>
                                      >>> os
                                      >>> <module 'os' from '/usr/local/lib/python2.2/os.pyc'>
                                      
                                      1. The built-in __import__ function accomplishes the same goal as using the import statement, but it's an actual function, and it takes a string as an argument.
                                      2. The variable sys is now the sys module, just as if you had said import sys. The variable os is now the os module, and so forth.

                                        So __import__ imports a module, but takes a string argument to do it. In this case the module you imported was just a hard-coded string, but it could just as easily be a variable, or the result of a function call. And the variable that you assign the module to doesn't need to match the module name, either. You could import a series of modules and assign them to a list.

                                        Example 16.15. Importing a list of modules dynamically

                                        >>> moduleNames = ['sys', 'os', 're', 'unittest'] 
                                        >>> moduleNames
                                        ['sys', 'os', 're', 'unittest']
                                        >>> modules = map(__import__, moduleNames)        
                                        >>> modules   
                                        [<module 'sys' (built-in)>,
                                        <module 'os' from 'c:\Python22\lib\os.pyc'>,
                                        <module 're' from 'c:\Python22\lib\re.pyc'>,
                                        <module 'unittest' from 'c:\Python22\lib\unittest.pyc'>]
                                        >>> modules[0].version          
                                        '2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]'
                                        >>> import sys
                                        >>> sys.version
                                        '2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]'
                                        
                                        1. moduleNames is just a list of strings. Nothing fancy, except that the strings happen to be names of modules that you could import, if you wanted to.
                                        2. Surprise, you wanted to import them, and you did, by mapping the __import__ function onto the list. Remember, this takes each element of the list (moduleNames) and calls the function (__import__) over and over, once with each element of the list, builds a list of the return values, and returns the result.
                                        3. So now from a list of strings, you've created a list of actual modules. (Your paths may be different, depending on your operating system, where you installed Python, the phase of the moon, etc.)
                                        4. To drive home the point that these are real modules, let's look at some module attributes. Remember, modules[0] is the sys module, so modules[0].version is sys.version. All the other attributes and methods of these modules are also available. There's nothing magic about the import statement, and there's nothing magic about modules. Modules are objects. Everything is an object.

                                          Now you should be able to put this all together and figure out what most of this chapter's code sample is doing.

                                          16.7. Putting it all together

                                          You've learned enough now to deconstruct the first seven lines of this chapter's code sample: reading a directory and importing selected modules within it.

                                          Example 16.16. The regressionTest function

                                          
                                          def regressionTest():
                                              path = os.path.abspath(os.path.dirname(sys.argv[0]))   
                                              files = os.listdir(path)             
                                              test = re.compile("test\.py$", re.IGNORECASE)          
                                              files = filter(test.search, files)   
                                              filenameToModuleName = lambda f: os.path.splitext(f)[0]
                                              moduleNames = map(filenameToModuleName, files)         
                                              modules = map(__import__, moduleNames)                 
                                          load = unittest.defaultTestLoader.loadTestsFromModule  
                                          return unittest.TestSuite(map(load, modules))          
                                          

                                          Let's look at it line by line, interactively. Assume that the current directory is c:\diveintopython3\py, which contains the examples that come with this book, including this chapter's script. As you saw in Section 16.2, “Finding the path”, the script directory will end up in the path variable, so let's start hard-code that and go from there.

                                          Example 16.17. Step 1: Get all the files

                                          >>> import sys, os, re, unittest
                                          >>> path = r'c:\diveintopython3\py'
                                          >>> files = os.listdir(path)             
                                          >>> files 
                                          ['BaseHTMLProcessor.py', 'LICENSE.txt', 'apihelper.py', 'apihelpertest.py',
                                          'argecho.py', 'autosize.py', 'builddialectexamples.py', 'dialect.py',
                                          'fileinfo.py', 'fullpath.py', 'kgptest.py', 'makerealworddoc.py',
                                          'odbchelper.py', 'odbchelpertest.py', 'parsephone.py', 'piglatin.py',
                                          'plural.py', 'pluraltest.py', 'pyfontify.py', 'regression.py', 'roman.py', 'romantest.py',
                                          'uncurly.py', 'unicode2koi8r.py', 'urllister.py', 'kgp', 'plural', 'roman',
                                          'colorize.py']
                                          
                                          1. files is a list of all the files and directories in the script's directory. (If you've been running some of the examples already, you may also see some .pyc files in there as well.)

                                            Example 16.18. Step 2: Filter to find the files you care about

                                            >>> test = re.compile("test\.py$", re.IGNORECASE)           
                                            >>> files = filter(test.search, files)    
                                            >>> files               
                                            ['apihelpertest.py', 'kgptest.py', 'odbchelpertest.py', 'pluraltest.py', 'romantest.py']
                                            
                                            1. This regular expression will match any string that ends with test.py. Note that you need to escape the period, since a period in a regular expression usually means “match any single character”, but you actually want to match a literal period instead.
                                            2. The compiled regular expression acts like a function, so you can use it to filter the large list of files and directories, to find the ones that match the regular expression.
                                            3. And you're left with the list of unit testing scripts, because they were the only ones named SOMETHINGtest.py.

                                              Example 16.19. Step 3: Map filenames to module names

                                              >>> filenameToModuleName = lambda f: os.path.splitext(f)[0] 
                                              >>> filenameToModuleName('romantest.py')  
                                              'romantest'
                                              >>> filenameToModuleName('odchelpertest.py')
                                              'odbchelpertest'
                                              >>> moduleNames = map(filenameToModuleName, files)          
                                              >>> moduleNames         
                                              ['apihelpertest', 'kgptest', 'odbchelpertest', 'pluraltest', 'romantest']
                                              
                                              1. As you saw in Section 4.7, “Using lambda Functions”, lambda is a quick-and-dirty way of creating an inline, one-line function. This one takes a filename with an extension and returns just the filename part, using the standard library function os.path.splitext that you saw in Example 6.17, “Splitting Pathnames”.
                                              2. filenameToModuleName is a function. There's nothing magic about lambda functions as opposed to regular functions that you define with a def statement. You can call the filenameToModuleName function like any other, and it does just what you wanted it to do: strips the file extension off of its argument.
                                              3. Now you can apply this function to each file in the list of unit test files, using map.
                                              4. And the result is just what you wanted: a list of modules, as strings.

                                                Example 16.20. Step 4: Mapping module names to modules

                                                >>> modules = map(__import__, moduleNames)
                                                >>> modules             
                                                [<module 'apihelpertest' from 'apihelpertest.py'>,
                                                <module 'kgptest' from 'kgptest.py'>,
                                                <module 'odbchelpertest' from 'odbchelpertest.py'>,
                                                <module 'pluraltest' from 'pluraltest.py'>,
                                                <module 'romantest' from 'romantest.py'>]
                                                >>> modules[-1]         
                                                <module 'romantest' from 'romantest.py'>
                                                
                                                1. As you saw in Section 16.6, “Dynamically importing modules”, you can use a combination of map and __import__ to map a list of module names (as strings) into actual modules (which you can call or access like any other module).
                                                2. modules is now a list of modules, fully accessible like any other module.
                                                3. The last module in the list is the romantest module, just as if you had said import romantest.

                                                  Example 16.21. Step 5: Loading the modules into a test suite

                                                  >>> load = unittest.defaultTestLoader.loadTestsFromModule  
                                                  >>> map(load, modules)   
                                                  [<unittest.TestSuite tests=[
                                                    <unittest.TestSuite tests=[<apihelpertest.BadInput testMethod=testNoObject>]>,
                                                    <unittest.TestSuite tests=[<apihelpertest.KnownValues testMethod=testApiHelper>]>,
                                                    <unittest.TestSuite tests=[
                                                      <apihelpertest.ParamChecks testMethod=testCollapse>, 
                                                      <apihelpertest.ParamChecks testMethod=testSpacing>]>, 
                                                      ...
                                                    ]
                                                  ]
                                                  >>> unittest.TestSuite(map(load, modules)) 
                                                  
                                                  1. These are real module objects. Not only can you access them like any other module, instantiate classes and call functions, you can also introspect into the module to figure out which classes and functions it has in the first place. That's what the loadTestsFromModule method does: it introspects into each module and returns a unittest.TestSuite object for each module. Each TestSuite object actually contains a list of TestSuite objects, one for each TestCase class in your module, and each of those TestSuite objects contains a list of tests, one for each test method in your module.
                                                  2. Finally, you wrap the list of TestSuite objects into one big test suite. The unittest module has no problem traversing this tree of nested test suites within test suites; eventually it gets down to an individual test method and executes it, verifies that it passes or fails, and moves on to the next one.

                                                    This introspection process is what the unittest module usually does for us. Remember that magic-looking unittest.main() function that our individual test modules called to kick the whole thing off? unittest.main() actually creates an instance of unittest.TestProgram, which in turn creates an instance of a unittest.defaultTestLoader and loads it up with the module that called it. (How does it get a reference to the module that called it if you don't give it one? By using the equally-magic __import__('__main__') command, which dynamically imports the currently-running module. I could write a book on all the tricks and techniques used in the unittest module, but then I'd never finish this one.)

                                                    Example 16.22. Step 6: Telling unittest to use your test suite

                                                    
                                                    if __name__ == "__main__": 
                                                        unittest.main(defaultTest="regressionTest") 
                                                    
                                                    1. Instead of letting the unittest module do all its magic for us, you've done most of it yourself. You've created a function (regressionTest) that imports the modules yourself, calls unittest.defaultTestLoader yourself, and wraps it all up in a test suite. Now all you need to do is tell unittest that, instead of looking for tests and building a test suite in the usual way, it should just call the regressionTest function, which returns a ready-to-use TestSuite.

                                                      16.8. Summary

                                                      The regression.py program and its output should now make perfect sense.

                                                      You should now feel comfortable doing all of these things:



                                                      [7] Technically, the second argument to filter can be any sequence, including lists, tuples, and custom classes that act like lists by defining the __getitem__ special method. If possible, filter will return the same datatype as you give it, so filtering a list returns a list, but filtering a tuple returns a tuple.

                                                      [8] Again, I should point out that map can take a list, a tuple, or any object that acts like a sequence. See previous footnote about filter.