You are here: Home Dive Into Python 3

Difficulty level: ♦♦♢♢♢

Comprehensions

FIXME
— FIXME

 

Diving In

This chapter will teach you about list comprehensions, dictionary comprehensions, and set comprehensions: three related concepts centered around one very powerful technique. But first, I want to take a little detour into two modules that will help you navigate your local file system.

The os module

Python 3 comes with a module called os, which stands for “operating system.” The os module contains a plethora of functions to get information on — and in some cases, to manipulate — local directories, files, processes, and environment variables. Python does its best to offer a unified API across all supported operating systems so your programs can run on any computer with as little platform-specific code as possible.

The Current Working Directory

When you’re just getting started with Python, you’re going to spend a lot of time in the Python Shell. Throughout this book, you will see examples that go like this:

  1. Import one of the modules in the examples folder
  2. Call a function in that module
  3. Explain the result

If you don’t know about the current working directory, step 1 will probably fail with an ImportError. Why? Because Python will look for the example module in the import search path, but it won’t find it because the examples folder isn’t one of the directories in the search path. To get past this, you can do one of two things:

  1. Add the examples folder to the import search path
  2. Change the current working directory to the examples folder

The current working directory is an invisible property that Python holds in memory at all times. There is always a current working directory, whether you’re in the Python Shell, running your own Python script from the command line, or running a Python CGI script on a web server somewhere.

The os module contains two functions to deal with the current working directory.

>>> import os                                            
>>> print(os.getcwd())                                   
C:\Python31
>>> os.chdir('/Users/pilgrim/diveintopython3/examples')  
>>> print(os.getcwd())                                   
C:\Users\pilgrim\diveintopython3\examples
  1. When you run the graphical Python Shell, the current working directory starts as the directory where the Python Shell executable is. On Windows, this depends on where you installed Python; the default directory is c:\Python31. If you run the Python Shell from the command line, the current working directory starts as the directory you were in when you ran python3.
  2. FIXME
  3. FIXME
  4. FIXME

The os.path module

FIXME The os.path module has several functions for manipulating files and directories. Here, we're looking at handling pathnames and listing the contents of a directory.

>>> import os
>>> os.path.join("c:\\music\\ap\\", "mahadeva.mp3")  
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.join("c:\\music\\ap", "mahadeva.mp3")   
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.expanduser("~")       
'c:\\Documents and Settings\\mpilgrim\\My Documents'
>>> os.path.join(os.path.expanduser("~"), "Python") 
'c:\\Documents and Settings\\mpilgrim\\My Documents\\Python'
  1. os.path is a reference to a module -- which module depends on your platform. Just as getpass encapsulates differences between platforms by setting getpass to a platform-specific function, os encapsulates differences between platforms by setting path to a platform-specific module.
  2. The join function of os.path constructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. (Note that dealing with pathnames on Windows is annoying because the backslash character must be escaped.)
  3. In this slightly less trivial case, join will add an extra backslash to the pathname before joining it to the filename. I was overjoyed when I discovered this, since addSlashIfNecessary is one of the stupid little functions I always need to write when building up my toolbox in a new language. Do not write this stupid little function in Python; smart people have already taken care of it for you.
  4. expanduser will expand a pathname that uses ~ to represent the current user's home directory. This works on any platform where users have a home directory, like Windows, UNIX, and Mac OS X; it has no effect on Mac OS.
  5. Combining these techniques, you can easily construct pathnames for directories and files under the user's home directory.

FIXME

>>> os.path.split("c:\\music\\ap\\mahadeva.mp3")      
('c:\\music\\ap', 'mahadeva.mp3')
>>> (filepath, filename) = os.path.split("c:\\music\\ap\\mahadeva.mp3") 
>>> filepath      
'c:\\music\\ap'
>>> filename      
'mahadeva.mp3'
>>> (shortname, extension) = os.path.splitext(filename)                 
>>> shortname
'mahadeva'
>>> extension
'.mp3'
  1. The split function splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use multi-variable assignment to return multiple values from a function? Well, split is such a function.
  2. You assign the return value of the split function into a tuple of two variables. Each variable receives the value of the corresponding element of the returned tuple.
  3. The first variable, filepath, receives the value of the first element of the tuple returned from split, the file path.
  4. The second variable, filename, receives the value of the second element of the tuple returned from split, the filename.
  5. os.path also contains a function splitext, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique to assign each of them to separate variables.

FIXME

>>> os.listdir("c:\\music\\_singles\\")              
['a_time_long_forgotten_con.mp3', 'hellraiser.mp3',
'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3', 
'spinning.mp3']
>>> dirname = "c:\\"
>>> os.listdir(dirname)            
['AUTOEXEC.BAT', 'boot.ini', 'CONFIG.SYS', 'cygwin',
'docbook', 'Documents and Settings', 'Incoming', 'Inetpub', 'IO.SYS',
'MSDOS.SYS', 'Music', 'NTDETECT.COM', 'ntldr', 'pagefile.sys',
'Program Files', 'Python20', 'RECYCLER',
'System Volume Information', 'TEMP', 'WINNT']
>>> [f for f in os.listdir(dirname)
...    if os.path.isfile(os.path.join(dirname, f))] 
['AUTOEXEC.BAT', 'boot.ini', 'CONFIG.SYS', 'IO.SYS', 'MSDOS.SYS',
'NTDETECT.COM', 'ntldr', 'pagefile.sys']
>>> [f for f in os.listdir(dirname)
...    if os.path.isdir(os.path.join(dirname, f))]  
['cygwin', 'docbook', 'Documents and Settings', 'Incoming',
'Inetpub', 'Music', 'Program Files', 'Python20', 'RECYCLER',
'System Volume Information', 'TEMP', 'WINNT']
  1. The listdir function takes a pathname and returns a list of the contents of the directory.
  2. listdir returns both files and folders, with no indication of which is which.
  3. You can use list filtering and the isfile function of the os.path module to separate the files from the folders. isfile takes a pathname and returns 1 if the path represents a file, and 0 otherwise. Here you're using os.path.join to ensure a full pathname, but isfile also works with a partial path, relative to the current working directory. You can use os.getcwd() to get the current working directory.
  4. os.path also has a isdir function which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories within a directory.

The glob module

FIXME

def listDirectory(directory, fileExtList):
    "get list of file info objects for files of particular extensions"
    fileList = [os.path.normcase(f)
                for f in os.listdir(directory)]             
    fileList = [os.path.join(directory, f) 
               for f in fileList
                if os.path.splitext(f)[1] in fileExtList]    
  1. os.listdir(directory) returns a list of all the files and folders in directory.
  2. Iterating through the list with f, you use os.path.normcase(f) to normalize the case according to operating system defaults. normcase is a useful little function that compensates for case-insensitive operating systems that think that mahadeva.mp3 and mahadeva.MP3 are the same file. For instance, on Windows and Mac OS, normcase will convert the entire filename to lowercase; on UNIX-compatible systems, it will return the filename unchanged.
  3. Iterating through the normalized list with f again, you use os.path.splitext(f) to split each filename into name and extension.
  4. For each file, you see if the extension is in the list of file extensions you care about (fileExtList, which was passed to the listDirectory function).
  5. For each file you care about, you use os.path.join(directory, f) to construct the full pathname of the file, and return a list of the full pathnames.

Whenever possible, you should use the functions in os and os.path for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like os.path.split() work on UNIX, Windows, Mac OS X, and any other platform supported by Python.

There is one other way to get the contents of a directory. It's very powerful, and it uses the sort of wildcards that you may already be familiar with from working on the command line.

>>> os.listdir("c:\\music\\_singles\\")               
['a_time_long_forgotten_con.mp3', 'hellraiser.mp3',
'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3',
'spinning.mp3']
>>> import glob
>>> glob.glob('c:\\music\\_singles\\*.mp3')           
['c:\\music\\_singles\\a_time_long_forgotten_con.mp3',
'c:\\music\\_singles\\hellraiser.mp3',
'c:\\music\\_singles\\kairo.mp3',
'c:\\music\\_singles\\long_way_home1.mp3',
'c:\\music\\_singles\\sidewinder.mp3',
'c:\\music\\_singles\\spinning.mp3']
>>> glob.glob('c:\\music\\_singles\\s*.mp3')          
['c:\\music\\_singles\\sidewinder.mp3',
'c:\\music\\_singles\\spinning.mp3']
>>> glob.glob('c:\\music\\*\\*.mp3')
  1. As you saw earlier, os.listdir simply takes a directory path and lists all files and directories in that directory.
  2. The glob module, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard. Here the wildcard is a directory path plus "*.mp3", which will match all .mp3 files. Note that each element of the returned list already includes the full path of the file.
  3. If you want to find all the files in a specific directory that start with "s" and end with ".mp3", you can do that too.
  4. Now consider this scenario: you have a music directory, with several subdirectories within it, with .mp3 files within each subdirectory. You can get a list of all of those with a single call to glob, by using two wildcards at once. One wildcard is the "*.mp3" (to match .mp3 files), and one wildcard is within the directory path itself, to match any subdirectory within c:\music. That's a crazy amount of power packed into one deceptively simple-looking function!

List Comprehensions

One of the most powerful features of Python is the list comprehension, which provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.

>>> li = [1, 9, 8, 4]
>>> [elem * 2 for elem in li]      
[2, 18, 16, 8]
>>> li         
[1, 9, 8, 4]
>>> li = [elem * 2 for elem in li] 
>>> li
[2, 18, 16, 8]
  1. To make sense of this, look at it from right to left. li is the list you're mapping. Python loops through li one element at a time, temporarily assigning the value of each element to the variable elem. Python then applies the function elem*2 and appends that result to the returned list.
  2. Note that list comprehensions do not change the original list.
  3. It is safe to assign the result of a list comprehension to the variable that you're mapping. Python constructs the new list in memory, and when the list comprehension is complete, it assigns the result to the variable.

FIXME Here are the list comprehensions in the buildConnectionString function that you declared in Chapter 2:

["%s=%s" % (k, v) for k, v in params.items()]

First, notice that you're calling the items function of the params dictionary. This function returns a list of tuples of all the data in the dictionary.

>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> params.keys()   
['server', 'uid', 'database', 'pwd']
>>> params.values() 
['mpilgrim', 'sa', 'master', 'secret']
>>> params.items()  
[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
  1. The keys method of a dictionary returns a list of all the keys. The list is not in the order in which the dictionary was defined (remember that elements in a dictionary are unordered), but it is a list.
  2. The values method returns a list of all the values. The list is in the same order as the list returned by keys, so params.values()[n] == params[params.keys()[n]] for all values of n.
  3. The items method returns a list of tuples of the form (key, value). The list contains all the data in the dictionary.

Now let's see what buildConnectionString does. It takes a list, params.items(), and maps it to a new list by applying string formatting to each element. The new list will have the same number of elements as params.items(), but each element in the new list will be a string that contains both a key and its associated value from the params dictionary.

>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> params.items()
[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
>>> [k for k, v in params.items()]                
['server', 'uid', 'database', 'pwd']
>>> [v for k, v in params.items()]                
['mpilgrim', 'sa', 'master', 'secret']
>>> ["%s=%s" % (k, v) for k, v in params.items()] 
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
  1. Note that you're using two variables to iterate through the params.items() list. This is another use of multi-variable assignment. The first element of params.items() is ('server', 'mpilgrim'), so in the first iteration of the list comprehension, k will get 'server' and v will get 'mpilgrim'. In this case, you're ignoring the value of v and only including the value of k in the returned list, so this list comprehension ends up being equivalent to params.keys().
  2. Here you're doing the same thing, but ignoring the value of k, so this list comprehension ends up being equivalent to params.values().
  3. Combining the previous two examples with some simple string formatting, you get a list of strings that include both the key and value of each element of the dictionary. This looks suspiciously like the output of the program. All that remains is to join the elements in this list into a single string.

Set Comprehensions

FIXME

Dictionary Comprehensions

FIXME

Further Reading

© 2001–9 Mark Pilgrim