20 May 2004
Copyright © 2000, 2001, 2002, 2003, 2004 Mark Pilgrim
This book lives at http://diveintopython3.org/. If you're reading it somewhere else, you may not have the latest version.
Table of Contents
Welcome to Python. Let's dive in. In this chapter, you'll install the version of Python that's right for you.
The first thing you need to do with Python is install it. Or do you?
If you're using an account on a hosted server, your ISP may have already installed Python. Most popular Linux distributions come with Python in the default installation. Mac OS X 10.2 and later includes a command-line version of Python, although you'll probably want to install a version that includes a more Mac-like graphical interface.
Windows does not come with any version of Python, but don't despair! There are several ways to point-and-click your way to Python on Windows.
As you can see already, Python runs on a great many operating systems. The full list includes Windows, Mac OS, Mac OS X, and all varieties of free UNIX-compatible systems like Linux. There are also versions that run on Sun Solaris, AS/400, Amiga, OS/2, BeOS, and a plethora of other platforms you've probably never even heard of.
What's more, Python programs written on one platform can, with a little care, run on any supported platform. For instance, I regularly develop Python programs on Windows and later deploy them on Linux.
So back to the question that started this section, “Which Python is right for you?” The answer is whichever one runs on the computer you already have.
On Windows, you have a couple choices for installing Python.
ActiveState makes a Windows installer for Python called ActivePython, which includes a complete version of Python, an IDE with a Python-aware code editor, plus some Windows extensions for Python that allow complete access to Windows-specific services, APIs, and the Windows Registry.
ActivePython is freely downloadable, although it is not open source. It is the IDE I used to learn Python, and I recommend you try it unless you have a specific reason not to. One such reason might be that ActiveState is generally several months behind in updating their ActivePython installer when new version of Python are released. If you absolutely need the latest version of Python and ActivePython is still a version behind as you read this, you'll want to use the second option for installing Python on Windows.
The second option is the “official” Python installer, distributed by the people who develop Python itself. It is freely downloadable and open source, and it is always current with the latest version of Python.
Here is the procedure for installing ActivePython:
Download ActivePython from http://www.activestate.com/Products/ActivePython/.
If you are using Windows 95, Windows 98, or Windows ME, you will also need to download and install Windows Installer 2.0 before installing ActivePython.
Double-click the installer, ActivePython-2.2.2-224-win32-ix86.msi.
Step through the installer program.
If space is tight, you can do a custom installation and deselect the documentation, but I don't recommend this unless you absolutely can't spare the 14MB.
After the installation is complete, close the installer and choose Start->Programs->ActiveState ActivePython 2.2->PythonWin IDE. You'll see something like the following:
PythonWin 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)] on win32. Portions Copyright 1994-2001 Mark Hammond (mhammond@skippinet.com.au) - see 'Help/About PythonWin' for further copyright information. >>>
Download the latest Python Windows installer by going to http://www.python.org/ftp/python/ and selecting the highest version number listed, then downloading the .exe installer.
Double-click the installer, Python-2.xxx.yyy.exe. The name will depend on the version of Python available when you read this.
Step through the installer program.
If disk space is tight, you can deselect the HTMLHelp file, the utility scripts (Tools/), and/or the test suite (Lib/test/).
If you do not have administrative rights on your machine, you can select Advanced Options, then choose Non-Admin Install. This just affects where Registry entries and Start menu shortcuts are created.
After the installation is complete, close the installer and select Start->Programs->Python 2.3->IDLE (Python GUI). You'll see something like the following:
Python 2.3.2 (#49, Oct 2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
****************************************************************
Personal firewall software may warn about the connection IDLE
makes to its subprocess using this computer's internal loopback
interface. This connection is not visible on any external
interface and no data is sent to or received from the Internet.
****************************************************************
IDLE 1.0
>>>
On Mac OS X, you have two choices for installing Python: install it, or don't install it. You probably want to install it.
Mac OS X 10.2 and later comes with a command-line version of Python preinstalled. If you are comfortable with the command line, you can use this version for the first third of the book. However, the preinstalled version does not come with an XML parser, so when you get to the XML chapter, you'll need to install the full version.
Rather than using the preinstalled version, you'll probably want to install the latest version, which also comes with a graphical interactive shell.
To use the preinstalled version of Python, follow these steps:
Open the /Applications folder.
Open the Utilities folder.
Double-click Terminal to open a terminal window and get to a command line.
Type python at the command prompt.
Try it out:
Welcome to Darwin! [localhost:~] you% python Python 2.2 (#1, 07/14/02, 23:25:09) [GCC Apple cpp-precomp 6.14] on darwin Type "help", "copyright", "credits", or "license" for more information. >>> [press Ctrl+D to get back to the command prompt] [localhost:~] you%
Follow these steps to download and install the latest version of Python:
Download the MacPython-OSX disk image from http://homepages.cwi.nl/~jack/macpython/download.html.
If your browser has not already done so, double-click MacPython-OSX-2.3-1.dmg to mount the disk image on your desktop.
Double-click the installer, MacPython-OSX.pkg.
The installer will prompt you for your administrative username and password.
Step through the installer program.
After installation is complete, close the installer and open the /Applications folder.
Open the MacPython-2.3 folder
Double-click PythonIDE to launch Python.
The MacPython IDE should display a splash screen, then take you to the interactive shell. If the interactive shell does not appear, select Window->Python Interactive (Cmd-0). The opening window will look something like this:
Python 2.3 (#2, Jul 30 2003, 11:45:28) [GCC 3.1 20020420 (prerelease)] Type "copyright", "credits" or "license" for more information. MacPython IDE 1.0.1 >>>
Note that once you install the latest version, the pre-installed version is still present. If you are running scripts from the command line, you need to be aware which version of Python you are using.
[localhost:~] you% python Python 2.2 (#1, 07/14/02, 23:25:09) [GCC Apple cpp-precomp 6.14] on darwin Type "help", "copyright", "credits", or "license" for more information. >>> [press Ctrl+D to get back to the command prompt] [localhost:~] you% /usr/local/bin/python Python 2.3 (#2, Jul 30 2003, 11:45:28) [GCC 3.1 20020420 (prerelease)] on darwin Type "help", "copyright", "credits", or "license" for more information. >>> [press Ctrl+D to get back to the command prompt] [localhost:~] you%
Mac OS 9 does not come with any version of Python, but installation is very simple, and there is only one choice.
Follow these steps to install Python on Mac OS 9:
Download the MacPython23full.bin file from http://homepages.cwi.nl/~jack/macpython/download.html.
If your browser does not decompress the file automatically, double-click MacPython23full.bin to decompress the file with Stuffit Expander.
Double-click the installer, MacPython23full.
Step through the installer program.
AFter installation is complete, close the installer and open the /Applications folder.
Open the MacPython-OS9 2.3 folder.
Double-click Python IDE to launch Python.
The MacPython IDE should display a splash screen, and then take you to the interactive shell. If the interactive shell does not appear, select Window->Python Interactive (Cmd-0). You'll see a screen like this:
Python 2.3 (#2, Jul 30 2003, 11:45:28) [GCC 3.1 20020420 (prerelease)] Type "copyright", "credits" or "license" for more information. MacPython IDE 1.0.1 >>>
Installing under UNIX-compatible operating systems such as Linux is easy if you're willing to install a binary package. Pre-built binary packages are available for most popular Linux distributions. Or you can always compile from source.
Download the latest Python RPM by going to http://www.python.org/ftp/python/ and selecting the highest version number listed, then selecting the rpms/ directory within that. Then download the RPM with the highest version number. You can install it with the rpm command, as shown here:
localhost:~$ su - Password: [enter your root password] [root@localhost root]# wget http://python.org/ftp/python/2.3/rpms/redhat-9/python2.3-2.3-5pydotorg.i386.rpm Resolving python.org... done. Connecting to python.org[194.109.137.226]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7,495,111 [application/octet-stream] ... [root@localhost root]# rpm -Uvh python2.3-2.3-5pydotorg.i386.rpm Preparing... ########################################### [100%] 1:python2.3 ########################################### [100%] [root@localhost root]# pythonPython 2.2.2 (#1, Feb 24 2003, 19:13:11) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-4)] on linux2 Type "help", "copyright", "credits", or "license" for more information. >>> [press Ctrl+D to exit] [root@localhost root]# python2.3
Python 2.3 (#1, Sep 12 2003, 10:53:56) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits", or "license" for more information. >>> [press Ctrl+D to exit] [root@localhost root]# which python2.3
/usr/bin/python2.3
If you are lucky enough to be running Debian GNU/Linux, you install Python through the apt command.
localhost:~$ su - Password: [enter your root password] localhost:~# apt-get install python Reading Package Lists... Done Building Dependency Tree... Done The following extra packages will be installed: python2.3 Suggested packages: python-tk python2.3-doc The following NEW packages will be installed: python python2.3 0 upgraded, 2 newly installed, 0 to remove and 3 not upgraded. Need to get 0B/2880kB of archives. After unpacking 9351kB of additional disk space will be used. Do you want to continue? [Y/n] Y Selecting previously deselected package python2.3. (Reading database ... 22848 files and directories currently installed.) Unpacking python2.3 (from .../python2.3_2.3.1-1_i386.deb) ... Selecting previously deselected package python. Unpacking python (from .../python_2.3.1-1_all.deb) ... Setting up python (2.3.1-1) ... Setting up python2.3 (2.3.1-1) ... Compiling python modules in /usr/lib/python2.3 ... Compiling optimized python modules in /usr/lib/python2.3 ... localhost:~# exit logout localhost:~$ python Python 2.3.1 (#2, Sep 24 2003, 11:39:14) [GCC 3.3.2 20030908 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> [press Ctrl+D to exit]
If you prefer to build from source, you can download the Python source code from http://www.python.org/ftp/python/. Select the highest version number listed, download the .tgz file), and then do the usual configure, make, make install dance.
localhost:~$ su - Password: [enter your root password] localhost:~# wget http://www.python.org/ftp/python/2.3/Python-2.3.tgz Resolving www.python.org... done. Connecting to www.python.org[194.109.137.226]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 8,436,880 [application/x-tar] ... localhost:~# tar xfz Python-2.3.tgz localhost:~# cd Python-2.3 localhost:~/Python-2.3# ./configure checking MACHDEP... linux2 checking EXTRAPLATDIR... checking for --without-gcc... no ... localhost:~/Python-2.3# make gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Modules/python.o Modules/python.c gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Parser/acceler.o Parser/acceler.c gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Parser/grammar1.o Parser/grammar1.c ... localhost:~/Python-2.3# make install /usr/bin/install -c python /usr/local/bin/python2.3 ... localhost:~/Python-2.3# exit logout localhost:~$ which python /usr/local/bin/python localhost:~$ python Python 2.3.1 (#2, Sep 24 2003, 11:39:14) [GCC 3.3.2 20030908 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> [press Ctrl+D to get back to the command prompt] localhost:~$
Now that you have Python installed, what's this interactive shell thing you're running?
It's like this: Python leads a double life. It's an interpreter for scripts that you can run from the command line or run like applications, by double-clicking the scripts. But it's also an interactive shell that can evaluate arbitrary statements and expressions. This is extremely useful for debugging, quick hacking, and testing. I even know some people who use the Python interactive shell in lieu of a calculator!
Launch the Python interactive shell in whatever way works on your platform, and let's dive in with the steps shown here:
>>> 1 + 12 >>> print 'hello world'
hello world >>> x = 1
>>> y = 2 >>> x + y 3
You should now have a version of Python installed that works for you.
Depending on your platform, you may have more than one version of Python intsalled. If so, you need to be aware of your paths. If simply typing python on the command line doesn't run the version of Python that you want to use, you may need to enter the full pathname of your preferred version.
Congratulations, and welcome to Python.
You know how other books go on and on about programming fundamentals and finally work up to building a complete, working program? Let's skip all that.
Here is a complete, working Python program.
It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
odbchelper.pyIf you have not already done so, you can download this and other examples used in this book.
def buildConnectionString(params):
"""Build a connection string from a dictionary of parameters.
Returns string."""
return ";".join(["%s=%s" % (k, v) for k, v in params.items()])
if __name__ == "__main__":
myParams = {"server":"mpilgrim", \
"database":"master", \
"uid":"sa", \
"pwd":"secret" \
}
print buildConnectionString(myParams)Now run this program and see what happens.
| In the ActivePython IDE on Windows, you can run the Python program you're editing by choosing File->Run... (Ctrl-R). Output is displayed in the interactive window. | |
In the Python IDE on Mac OS, you can run a Python program with
Python->Run window... (Cmd-R), but there is an important option you must set first. Open the .py file in the IDE, pop up the options menu by clicking the black triangle in the upper-right corner of the window, and make sure the Run as __main__ option is checked. This is a per-file setting, but you'll only need to do it once per file.
|
|
On UNIX-compatible systems (including Mac OS X), you can run a Python program from the command line: python odbchelper.py |
|
The id="odbchelper.output" output of odbchelper.py will look like this:
server=mpilgrim;uid=sa;database=master;pwd=secret
Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it, like this:
def buildConnectionString(params):
Note that the keyword def starts the function declaration, followed by the function name, followed by the arguments in parentheses. Multiple arguments
(not shown here) are separated with commas.
Also note that the function doesn't define a return datatype. Python functions do not specify the datatype of their return value; they don't even specify whether or not they return a value.
In fact, every Python function returns a value; if the function ever executes a return statement, it will return that value, otherwise it will return None, the Python null value.
In Visual Basic, functions (that return a value) start with function, and subroutines (that do not return a value) start with sub. There are no subroutines in Python. Everything is a function, all functions return a value (even if it's None), and all functions start with def.
|
|
The argument, params, doesn't specify a datatype. In Python, variables are never explicitly typed. Python figures out what type a variable is and keeps track of it internally.
| In Java, C++, and other statically-typed languages, you must specify the datatype of the function return value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally. | |
An erudite reader sent me this explanation of how Python compares to other programming languages:
'12' and the integer 3 to get the string '123', then treat that as the integer 123, all without any explicit conversion.
So Python is both dynamically typed (because it doesn't use explicit datatype declarations) and strongly typed (because once a variable has a datatype, it actually matters).
You can document a Python function by giving it a docstring.
buildConnectionString Function's docstring
def buildConnectionString(params):
"""Build a connection string from a dictionary of parameters.
Returns string."""Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including
carriage returns and other quote characters. You can use them anywhere, but you'll see them most often used when defining
a docstring.
Triple quotes are also an easy way to define a string with both single and double quotes, like qq/.../ in Perl.
|
|
Everything between the triple quotes is the function's docstring, which documents what the function does. A docstring, if it exists, must be the first thing defined in a function (that is, the first thing after the colon). You don't technically
need to give your function a docstring, but you always should. I know you've heard this in every programming class you've ever taken, but Python gives you an added incentive: the docstring is available at runtime as an attribute of the function.
Many Python IDEs use the docstring to provide context-sensitive documentation, so that when you type a function name, its docstring appears as a tooltip. This can be incredibly helpful, but it's only as good as the docstrings you write.
|
|
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them.
Here's an example that uses the if __name__ trick.
if __name__ == "__main__":
Some quick observations before you get to the good stuff. First, parentheses are not required around the if expression. Second, the if statement ends with a colon, and is followed by indented code.
Like C, Python uses == for comparison and = for assignment. Unlike C, Python does not support in-line assignment, so there's no chance of accidentally assigning the value you thought you were comparing.
|
|
So why is this particular if statement a trick? Modules are objects, and all modules have a built-in attribute __name__. A module's __name__ depends on how you're using the module. If you import the module, then __name__ is the module's filename, without a directory path or file extension. But you can also run the module directly as a standalone
program, in which case __name__ will be a special default value, __main__.
>>> import odbchelper
>>> odbchelper.__name__
'odbchelper'Knowing this, you can design a test suite for your module within the module itself by putting it in this if statement. When you run the module directly, __name__ is __main__, so the test suite executes. When you import the module, __name__ is something else, so the test suite is ignored. This makes it easier to develop and debug new modules before integrating
them into a larger program.
On MacPython, there is an additional step to make the if __name__ trick work. Pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and
make sure Run as __main__ is checked.
|
|
Lists are Python's workhorse datatype. If your only experience with lists is arrays in Visual Basic or (God forbid) the datastore in Powerbuilder, brace yourself for Python lists.
A list in Python is like an array in Perl. In Perl, variables that store arrays always start with the @ character; in Python, variables can be named anything, and Python keeps track of the datatype internally.
|
|
A list in Python is much more than an array in Java (although it can be used as one if that's really all you want out of life). A better analogy would be to the ArrayList class, which can hold arbitrary objects and can expand dynamically as new items are added.
|
|
>>> li = ["a", "b", "mpilgrim", "z", "example"]>>> li ['a', 'b', 'mpilgrim', 'z', 'example'] >>> li[0]
'a' >>> li[4]
'example'
>>> li ['a', 'b', 'mpilgrim', 'z', 'example'] >>> li[-1]'example' >>> li[-3]
'mpilgrim'
>>> li ['a', 'b', 'mpilgrim', 'z', 'example'] >>> li[1:3]['b', 'mpilgrim'] >>> li[1:-1]
['b', 'mpilgrim', 'z'] >>> li[0:3]
['a', 'b', 'mpilgrim']
>>> li ['a', 'b', 'mpilgrim', 'z', 'example'] >>> li[:3]['a', 'b', 'mpilgrim'] >>> li[3:]
![]()
['z', 'example'] >>> li[:]
['a', 'b', 'mpilgrim', 'z', 'example']
If the left slice index is 0, you can leave it out, and 0 is implied. So li[:3] is the same as li[0:3] from Example 3.8, “Slicing a List”.
|
|
Similarly, if the right slice index is the length of the list, you can leave it out. So li[3:] is the same as li[3:5], because this list has five elements.
|
|
Note the symmetry here. In this five-element list, li[:3] returns the first 3 elements, and li[3:] returns the last two elements. In fact, li[:n] will always return the first n elements, and li[n:] will return the rest, regardless of the length of the list.
|
|
If both slice indices are left out, all elements of the list are included. But this is not the same as the original li list; it is a new list that happens to have all the same elements. li[:] is shorthand for making a complete copy of a list.
|
>>> li
['a', 'b', 'mpilgrim', 'z', 'example']
>>> li.append("new")
>>> li
['a', 'b', 'mpilgrim', 'z', 'example', 'new']
>>> li.insert(2, "new")
>>> li
['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new']
>>> li.extend(["two", "elements"])
>>> li
['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']extend and append>>> li = ['a', 'b', 'c'] >>> li.extend(['d', 'e', 'f'])>>> li ['a', 'b', 'c', 'd', 'e', 'f'] >>> len(li)
6 >>> li[-1] 'f' >>> li = ['a', 'b', 'c'] >>> li.append(['d', 'e', 'f'])
>>> li ['a', 'b', 'c', ['d', 'e', 'f']] >>> len(li)
4 >>> li[-1] ['d', 'e', 'f']
>>> li
['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
>>> li.index("example")
5
>>> li.index("new")
2
>>> li.index("c")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
ValueError: list.index(x): x not in list
>>> "c" in li
FalseBefore version 2.2.1, Python had no separate boolean datatype. To compensate for this, Python accepted almost anything in a boolean context (like an if statement), according to the following rules:
True or False. Note the capitalization; these values, like everything else in Python, are case-sensitive.
|
|
>>> li
['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
>>> li.remove("z")
>>> li
['a', 'b', 'new', 'mpilgrim', 'example', 'new', 'two', 'elements']
>>> li.remove("new")
>>> li
['a', 'b', 'mpilgrim', 'example', 'new', 'two', 'elements']
>>> li.remove("c")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
ValueError: list.remove(x): x not in list
>>> li.pop()
'elements'
>>> li
['a', 'b', 'mpilgrim', 'example', 'new', 'two']>>> li = ['a', 'b', 'mpilgrim'] >>> li = li + ['example', 'new']>>> li ['a', 'b', 'mpilgrim', 'example', 'new'] >>> li += ['two']
>>> li ['a', 'b', 'mpilgrim', 'example', 'new', 'two'] >>> li = [1, 2] * 3
>>> li [1, 2, 1, 2, 1, 2]
Lists can also be concatenated with the + operator. list = list + otherlist has the same result as list.extend(otherlist). But the + operator returns a new (concatenated) list as a value, whereas extend only alters an existing list. This means that extend is faster, especially for large lists.
|
|
Python supports the += operator. li += ['two'] is equivalent to li.extend(['two']). The += operator works for lists, strings, and integers, and it can be overloaded to work for user-defined classes as well. (More
on classes in Chapter 5.)
|
|
The * operator works on lists as a repeater. li = [1, 2] * 3 is equivalent to li = [1, 2] + [1, 2] + [1, 2], which concatenates the three lists into one.
|
A tuple is an immutable list. A tuple can not be changed in any way once it is created.
>>> t = ("a", "b", "mpilgrim", "z", "example")
>>> t
('a', 'b', 'mpilgrim', 'z', 'example')
>>> t[0]
'a'
>>> t[-1]
'example'
>>> t[1:3]
('b', 'mpilgrim')>>> t
('a', 'b', 'mpilgrim', 'z', 'example')
>>> t.append("new")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'tuple' object has no attribute 'append'
>>> t.remove("z")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'tuple' object has no attribute 'remove'
>>> t.index("example")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'tuple' object has no attribute 'index'
>>> "z" in t
TrueSo what are tuples good for?
assert statement that shows this data is constant, and that special thought (and a specific function) is required to override that.
Tuples can be converted into lists, and vice-versa. The built-in tuple function takes a list and returns a tuple with the same elements, and the list function takes a tuple and returns a list. In effect, tuple freezes a list, and list thaws a tuple.
|
|
Now that you know something about dictionaries, tuples, and lists (oh my!), let's get back to the sample program from Chapter 2, odbchelper.py.
Python has local and global variables like most other languages, but it has no explicit variable declarations. Variables spring into existence by being assigned a value, and they are automatically destroyed when they go out of scope.
if __name__ == "__main__":
myParams = {"server":"mpilgrim", \
"database":"master", \
"uid":"sa", \
"pwd":"secret" \
}Notice the indentation. An if statement is a code block and needs to be indented just like a function.
Also notice that the variable assignment is one command split over several lines, with a backslash (“\”) serving as a line-continuation marker.
When a command is split among several lines with the line-continuation marker (“\”), the continued lines can be indented in any manner; Python's normally stringent indentation rules do not apply. If your Python IDE auto-indents the continued line, you should probably accept its default unless you have a burning reason not to.
|
|
Strictly speaking, expressions in parentheses, straight brackets, or curly braces (like defining a dictionary) can be split into multiple lines with or without the line continuation character (“\”). I like to include the backslash even when it's not required because I think it makes the code easier to read, but that's
a matter of style.
Third, you never declared the variable myParams, you just assigned a value to it. This is like VBScript without the option explicit option. Luckily, unlike VBScript, Python will not allow you to reference a variable that has never been assigned a value; trying to do so will raise an exception.
>>> x Traceback (innermost last): File "<interactive input>", line 1, in ? NameError: There is no variable named 'x' >>> x = 1 >>> x 1
You will thank Python for this one day.
One of the cooler programming shortcuts in Python is using sequences to assign multiple values at once.
>>> v = ('a', 'b', 'e')
>>> (x, y, z) = v
>>> x
'a'
>>> y
'b'
>>> z
'e'v is a tuple of three elements, and (x, y, z) is a tuple of three variables. Assigning one to the other assigns each of the values of v to each of the variables, in order.
|
This has all sorts of uses. I often want to assign names to a range of values. In C, you would use enum and manually list each constant and its associated value, which seems especially tedious when the values are consecutive.
In Python, you can use the built-in range function with multi-variable assignment to quickly assign consecutive values.
>>> range(7)[0, 1, 2, 3, 4, 5, 6] >>> (MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7)
>>> MONDAY
0 >>> TUESDAY 1 >>> SUNDAY 6
You can also use multi-variable assignment to build functions that return multiple values, simply by returning a tuple of
all the values. The caller can treat it as a tuple, or assign the values to individual variables. Many standard Python libraries do this, including the os module, which you'll discuss in Chapter 6.
Python supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is
to insert values into a string with the %s placeholder.
String formatting in Python uses the same syntax as the sprintf function in C.
|
|
>>> k = "uid" >>> v = "sa" >>> "%s=%s" % (k, v)'uid=sa'
Note that (k, v) is a tuple. I told you they were good for something.
You might be thinking that this is a lot of work just to do simple string concatentation, and you would be right, except that string formatting isn't just concatenation. It's not even just formatting. It's also type coercion.
>>> uid = "sa" >>> pwd = "secret" >>> print pwd + " is not a good password for " + uidsecret is not a good password for sa >>> print "%s is not a good password for %s" % (pwd, uid)
secret is not a good password for sa >>> userCount = 6 >>> print "Users connected: %d" % (userCount, )
![]()
Users connected: 6 >>> print "Users connected: " + userCount
Traceback (innermost last): File "<interactive input>", line 1, in ? TypeError: cannot concatenate 'str' and 'int' objects
As with printf in C, string formatting in Python is like a Swiss Army knife. There are options galore, and modifier strings to specially format many different types of values.
>>> print "Today's stock price: %f" % 50.462550.462500 >>> print "Today's stock price: %.2f" % 50.4625
50.46 >>> print "Change since yesterday: %+.2f" % 1.5
+1.50
One of the most powerful features of Python is the list comprehension, which provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.
>>> li = [1, 9, 8, 4] >>> [elem*2 for elem in li][2, 18, 16, 8] >>> li
[1, 9, 8, 4] >>> li = [elem*2 for elem in li]
>>> li [2, 18, 16, 8]
Here are the list comprehensions in the buildConnectionString function that you declared in Chapter 2:
["%s=%s" % (k, v) for k, v in params.items()]
First, notice that you're calling the items function of the params dictionary. This function returns a list of tuples of all the data in the dictionary.
keys, values, and items Functions>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> params.keys()
['server', 'uid', 'database', 'pwd']
>>> params.values()
['mpilgrim', 'sa', 'master', 'secret']
>>> params.items()
[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]Now let's see what buildConnectionString does. It takes a list, params., and maps it to a new list by applying string formatting to each element. The new list will have the same number of elements
as items()params., but each element in the new list will be a string that contains both a key and its associated value from the params dictionary.
items()
buildConnectionString, Step by Step>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> params.items()
[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
>>> [k for k, v in params.items()]
['server', 'uid', 'database', 'pwd']
>>> [v for k, v in params.items()]
['mpilgrim', 'sa', 'master', 'secret']
>>> ["%s=%s" % (k, v) for k, v in params.items()]
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']Note that you're using two variables to iterate through the params.items() list. This is another use of multi-variable assignment. The first element of params.items() is ('server', 'mpilgrim'), so in the first iteration of the list comprehension, k will get 'server' and v will get 'mpilgrim'. In this case, you're ignoring the value of v and only including the value of k in the returned list, so this list comprehension ends up being equivalent to params..
|
|
Here you're doing the same thing, but ignoring the value of k, so this list comprehension ends up being equivalent to params..
|
|
| Combining the previous two examples with some simple string formatting, you get a list of strings that include both the key and value of each element of the dictionary. This looks suspiciously like the output of the program. All that remains is to join the elements in this list into a single string. |
map function.
You have a list of key-value pairs in the form key=value, and you want to join them into a single string. To join any list of strings into a single string, use the join method of a string object.
Here is an example of joining a list from the buildConnectionString function:
return ";".join(["%s=%s" % (k, v) for k, v in params.items()])One interesting note before you continue. I keep repeating that functions are objects, strings are objects... everything
is an object. You might have thought I meant that string variables are objects. But no, look closely at this example and you'll see that the string ";" itself is an object, and you are calling its join method.
The join method joins the elements of the list into a single string, with each element separated by a semi-colon. The delimiter doesn't
need to be a semi-colon; it doesn't even need to be a single character. It can be any string.
join works only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements
will raise an exception.
|
|
odbchelper.py>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> ["%s=%s" % (k, v) for k, v in params.items()]
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
>>> ";".join(["%s=%s" % (k, v) for k, v in params.items()])
'server=mpilgrim;uid=sa;database=master;pwd=secret'This string is then returned from the odbchelper function and printed by the calling block, which gives you the output that you marveled at when you started reading this
chapter.
You're probably wondering if there's an analogous method to split a string into a list. And of course there is, and it's
called split.
>>> li = ['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
>>> s = ";".join(li)
>>> s
'server=mpilgrim;uid=sa;database=master;pwd=secret'
>>> s.split(";")
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
>>> s.split(";", 1)
['server=mpilgrim', 'uid=sa;database=master;pwd=secret']anystring. is a useful technique when you want to search a string for a substring and then work with everything before the substring
(which ends up in the first element of the returned list) and everything after it (which ends up in the second element).
|
|
string module.
join is a string method instead of a list method.
When I first learned Python, I expected join to be a method of a list, which would take the delimiter as an argument. Many people feel the same way, and there's a story
behind the join method. Prior to Python 1.6, strings didn't have all these useful methods. There was a separate string module that contained all the string functions; each function took a string as its first argument. The functions were deemed
important enough to put onto the strings themselves, which made sense for functions like lower, upper, and split. But many hard-core Python programmers objected to the new join method, arguing that it should be a method of the list instead, or that it shouldn't move at all but simply stay a part of
the old string module (which still has a lot of useful stuff in it). I use the new join method exclusively, but you will see code written either way, and if it really bothers you, you can use the old string.join function instead.
The odbchelper.py program and its output should now make perfect sense.
def buildConnectionString(params):
"""Build a connection string from a dictionary of parameters.
Returns string."""
return ";".join(["%s=%s" % (k, v) for k, v in params.items()])
if __name__ == "__main__":
myParams = {"server":"mpilgrim", \
"database":"master", \
"uid":"sa", \
"pwd":"secret" \
}
print buildConnectionString(myParams)Here is the output of odbchelper.py:
server=mpilgrim;uid=sa;database=master;pwd=secret
Before diving into the next chapter, make sure you're comfortable doing all of these things:
docstrings, local variables, and proper indentation
This chapter covers one of Python's strengths: introspection. As you know, everything in Python is an object, and introspection is code looking at other modules and functions in memory as objects, getting information about them, and manipulating them. Along the way, you'll define functions with no name, call functions with arguments out of order, and reference functions whose names you don't even know ahead of time.
Here is a complete, working Python program. You should understand a good deal about it just by looking at it. The numbered lines illustrate concepts covered in Chapter 2, Your First Python Program. Don't worry if the rest of the code looks intimidating; you'll learn all about it throughout this chapter.
apihelper.pyIf you have not already done so, you can download this and other examples used in this book.
def info(object, spacing=10, collapse=1):![]()
![]()
"""Print methods and docstrings. Takes module, class, list, dictionary, or string.""" methodList = [method for method in dir(object) if callable(getattr(object, method))] processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s) print "\n".join(["%s %s" % (method.ljust(spacing), processFunc(str(getattr(object, method).__doc__))) for method in methodList]) if __name__ == "__main__":
![]()
print info.__doc__
This module has one function, info. According to its function declaration, it takes three parameters: object, spacing, and collapse. The last two are actually optional parameters, as you'll see shortly.
|
|
The info function has a multi-line docstring that succinctly describes the function's purpose. Note that no return value is mentioned; this function will be used solely
for its effects, rather than its value.
|
|
| Code within the function is indented. | |
The if __name__ trick allows this program do something useful when run by itself, without interfering with its use as a module for other programs.
In this case, the program simply prints out the docstring of the info function.
|
|
if statements use == for comparison, and parentheses are not required.
|
The info function is designed to be used by you, the programmer, while working in the Python IDE. It takes any object that has functions or methods (like a module, which has functions, or a list, which has methods) and
prints out the functions and their docstrings.
apihelper.py>>> from apihelper import info >>> li = [] >>> info(li) append L.append(object) -- append object to end count L.count(value) -> integer -- return number of occurrences of value extend L.extend(list) -- extend list by appending list elements index L.index(value) -> integer -- return index of first occurrence of value insert L.insert(index, object) -- insert object before index pop L.pop([index]) -> item -- remove and return item at index (default last) remove L.remove(value) -- remove first occurrence of value reverse L.reverse() -- reverse *IN PLACE* sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1, 0, 1
By default the output is formatted to be easy to read. Multi-line docstrings are collapsed into a single long line, but this option can be changed by specifying 0 for the collapse argument. If the function names are longer than 10 characters, you can specify a larger value for the spacing argument to make the output easier to read.
apihelper.py>>> import odbchelper
>>> info(odbchelper)
buildConnectionString Build a connection string from a dictionary Returns string.
>>> info(odbchelper, 30)
buildConnectionString Build a connection string from a dictionary Returns string.
>>> info(odbchelper, 30, 0)
buildConnectionString Build a connection string from a dictionary
Returns string.
Python allows function arguments to have default values; if the function is called without the argument, the argument gets its default value. Futhermore, arguments can be specified in any order by using named arguments. Stored procedures in SQL Server Transact/SQL can do this, so if you're a SQL Server scripting guru, you can skim this part.
Here is an example of info, a function with two optional arguments:
def info(object, spacing=10, collapse=1):
spacing and collapse are optional, because they have default values defined. object is required, because it has no default value. If info is called with only one argument, spacing defaults to 10 and collapse defaults to 1. If info is called with two arguments, collapse still defaults to 1.
Say you want to specify a value for collapse but want to accept the default value for spacing. In most languages, you would be out of luck, because you would need to call the function with three arguments. But in Python, arguments can be specified by name, in any order.
infoinfo(odbchelper)info(odbchelper, 12)
info(odbchelper, collapse=0)
info(spacing=15, object=odbchelper)
This looks totally whacked until you realize that arguments are simply a dictionary. The “normal” method of calling functions without argument names is actually just a shorthand where Python matches up the values with the argument names in the order they're specified in the function declaration. And most of the time, you'll call functions the “normal” way, but you always have the additional flexibility if you need it.
| The only thing you need to do to call a function is specify a value (somehow) for each required argument; the manner and order in which you do that is up to you. | |
type, str, dir, and Other Built-In FunctionsPython has a small set of extremely useful built-in functions. All other functions are partitioned off into modules. This was actually a conscious design decision, to keep the core language from getting bloated like other scripting languages (cough cough, Visual Basic).
type FunctionThe type function returns the datatype of any arbitrary object. The possible types are listed in the types module. This is useful for helper functions that can handle several types of data.
type>>> type(1)<type 'int'> >>> li = [] >>> type(li)
<type 'list'> >>> import odbchelper >>> type(odbchelper)
<type 'module'> >>> import types
>>> type(odbchelper) == types.ModuleType True
str FunctionThe str coerces data into a string. Every datatype can be coerced into a string.
str>>> str(1)'1' >>> horsemen = ['war', 'pestilence', 'famine'] >>> horsemen ['war', 'pestilence', 'famine'] >>> horsemen.append('Powerbuilder') >>> str(horsemen)
"['war', 'pestilence', 'famine', 'Powerbuilder']" >>> str(odbchelper)
"<module 'odbchelper' from 'c:\\docbook\\dip\\py\\odbchelper.py'>" >>> str(None)
'None'
At the heart of the info function is the powerful dir function. dir returns a list of the attributes and methods of any object: modules, functions, strings, lists, dictionaries... pretty much
anything.
dir>>> li = [] >>> dir(li)['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> d = {} >>> dir(d)
['clear', 'copy', 'get', 'has_key', 'items', 'keys', 'setdefault', 'update', 'values'] >>> import odbchelper >>> dir(odbchelper)
['__builtins__', '__doc__', '__file__', '__name__', 'buildConnectionString']
li is a list, so returns a list of all the methods of a list. Note that the returned list contains the names of the methods as strings, not
the methods themselves.
|
|
d is a dictionary, so returns a list of the names of dictionary methods. At least one of these, keys, should look familiar.
|
|
This is where it really gets interesting. odbchelper is a module, so returns a list of all kinds of stuff defined in the module, including built-in attributes, like __name__, __doc__, and whatever other attributes and methods you define. In this case, odbchelper has only one user-defined method, the buildConnectionString function described in Chapter 2.
|
Finally, the callable function takes any object and returns True if the object can be called, or False otherwise. Callable objects include functions, class methods, even classes themselves. (More on classes in the next chapter.)
callable>>> import string >>> string.punctuation'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' >>> string.join
<function join at 00C55A7C> >>> callable(string.punctuation)
False >>> callable(string.join)
True >>> print string.join.__doc__
join(list [,sep]) -> string Return a string composed of the words in list, with intervening occurrences of sep. The default separator is a single space. (joinfields and join are synonymous)
The functions in the string module are deprecated (although many people still use the join function), but the module contains a lot of useful constants like this string.punctuation, which contains all the standard punctuation characters.
|
|
string.join is a function that joins a list of strings.
|
|
| string.punctuation is not callable; it is a string. (A string does have callable methods, but the string itself is not callable.) | |
string.join is callable; it's a function that takes two arguments.
|
|
Any callable object may have a docstring. By using the callable function on each of an object's attributes, you can determine which attributes you care about (methods, functions, classes)
and which you want to ignore (constants and so on) without knowing anything about the object ahead of time.
|
type, str, dir, and all the rest of Python's built-in functions are grouped into a special module called __builtin__. (That's two underscores before and after.) If it helps, you can think of Python automatically executing from __builtin__ import * on startup, which imports all the “built-in” functions into the namespace so you can use them directly.
The advantage of thinking like this is that you can access all the built-in functions and attributes as a group by getting
information about the __builtin__ module. And guess what, Python has a function called info. Try it yourself and skim through the list now. We'll dive into some of the more important functions later. (Some of the
built-in error classes, like AttributeError, should already look familiar.)
>>> from apihelper import info >>> import __builtin__ >>> info(__builtin__, 20) ArithmeticError Base class for arithmetic errors. AssertionError Assertion failed. AttributeError Attribute not found. EOFError Read beyond end of file. EnvironmentError Base class for I/O related errors. Exception Common base class for all exceptions. FloatingPointError Floating point operation failed. IOError I/O operation failed. [...snip...]
| Python comes with excellent reference manuals, which you should peruse thoroughly to learn all the modules Python has to offer. But unlike most languages, where you would find yourself referring back to the manuals or man pages to remind yourself how to use these modules, Python is largely self-documenting. | |
getattrYou already know that Python functions are objects. What you don't know is that you can get a reference to a function without knowing its name until run-time, by using the
getattr function.
getattr>>> li = ["Larry", "Curly"] >>> li.pop<built-in method pop of list object at 010DF884> >>> getattr(li, "pop")
<built-in method pop of list object at 010DF884> >>> getattr(li, "append")("Moe")
>>> li ["Larry", "Curly", "Moe"] >>> getattr({}, "clear")
<built-in method clear of dictionary object at 00F113D4> >>> getattr((), "pop")
Traceback (innermost last): File "<interactive input>", line 1, in ? AttributeError: 'tuple' object has no attribute 'pop'
This gets a reference to the pop method of the list. Note that this is not calling the pop method; that would be li.pop(). This is the method itself.
|
|
This also returns a reference to the pop method, but this time, the method name is specified as a string argument to the getattr function. getattr is an incredibly useful built-in function that returns any attribute of any object. In this case, the object is a list,
and the attribute is the pop method.
|
|
In case it hasn't sunk in just how incredibly useful this is, try this: the return value of getattr is the method, which you can then call just as if you had said li.append("Moe") directly. But you didn't call the function directly; you specified the function name as a string instead.
|
|
getattr also works on dictionaries.
|
|
In theory, getattr would work on tuples, except that tuples have no methods, so getattr will raise an exception no matter what attribute name you give.
|
getattr with Modulesgetattr isn't just for built-in datatypes. It also works on modules.
getattr Function in apihelper.py>>> import odbchelper >>> odbchelper.buildConnectionString<function buildConnectionString at 00D18DD4> >>> getattr(odbchelper, "buildConnectionString")
<function buildConnectionString at 00D18DD4> >>> object = odbchelper >>> method = "buildConnectionString" >>> getattr(object, method)
<function buildConnectionString at 00D18DD4> >>> type(getattr(object, method))
<type 'function'> >>> import types >>> type(getattr(object, method)) == types.FunctionType True >>> callable(getattr(object, method))
True
This returns a reference to the buildConnectionString function in the odbchelper module, which you studied in Chapter 2, Your First Python Program. (The hex address you see is specific to my machine; your output will be different.)
|
|
Using getattr, you can get the same reference to the same function. In general, is equivalent to object.attribute. If object is a module, then attribute can be anything defined in the module: a function, class, or global variable.
|
|
And this is what you actually use in the info function. object is passed into the function as an argument; method is a string which is the name of a method or function.
|
|
In this case, method is the name of a function, which you can prove by getting its type.
|
|
| Since method is a function, it is callable. |
getattr As a DispatcherA common usage pattern of getattr is as a dispatcher. For example, if you had a program that could output data in a variety of different formats, you could
define separate functions for each output format and use a single dispatch function to call the right one.
For example, let's imagine a program that prints site statistics in HTML, XML, and plain text formats. The choice of output format could be specified on the command line, or stored in a configuration
file. A statsout module defines three functions, output_html, output_xml, and output_text. Then the main program defines a single output function, like this:
getattrimport statsout def output(data, format="text"):output_function = getattr(statsout, "output_%s" % format)
return output_function(data)
![]()
Did you see the bug in the previous example? This is a very loose coupling of strings and functions, and there is no error
checking. What happens if the user passes in a format that doesn't have a corresponding function defined in statsout? Well, getattr will return None, which will be assigned to output_function instead of a valid function, and the next line that attempts to call that function will crash and raise an exception. That's
bad.
Luckily, getattr takes an optional third argument, a default value.
getattr Default Values
import statsout
def output(data, format="text"):
output_function = getattr(statsout, "output_%s" % format, statsout.output_text)
return output_function(data)
As you can see, getattr is quite powerful. It is the heart of introspection, and you'll see even more powerful examples of it in later chapters.
As you know, Python has powerful capabilities for mapping lists into other lists, via list comprehensions (Section 3.6, “Mapping Lists”). This can be combined with a filtering mechanism, where some elements in the list are mapped while others are skipped entirely.
Here is the list filtering syntax:
[mapping-expressionforelementinsource-listiffilter-expression]
This is an extension of the list comprehensions that you know and love. The first two thirds are the same; the last part, starting with the if, is the filter expression. A filter expression can be any expression that evaluates true or false (which in Python can be almost anything). Any element for which the filter expression evaluates true will be included in the mapping. All other elements are ignored,
so they are never put through the mapping expression and are not included in the output list.
>>> li = ["a", "mpilgrim", "foo", "b", "c", "b", "d", "d"] >>> [elem for elem in li if len(elem) > 1]['mpilgrim', 'foo'] >>> [elem for elem in li if elem != "b"]
['a', 'mpilgrim', 'foo', 'c', 'd', 'd'] >>> [elem for elem in li if li.count(elem) == 1]
['a', 'mpilgrim', 'foo', 'c']
Let's id="apihelper.filter.care" get back to this line from apihelper.py:
methodList = [method for method in dir(object) if callable(getattr(object, method))]This looks complicated, and it is complicated, but the basic structure is the same. The whole filter expression returns a
list, which is assigned to the methodList variable. The first half of the expression is the list mapping part. The mapping expression is an identity expression,
which it returns the value of each element. returns a list of object's attributes and methods -- that's the list you're mapping. So the only new part is the filter expression after the dir(object)if.
The filter expression looks scary, but it's not. You already know about callable, getattr, and in. As you saw in the previous section, the expression getattr(object, method) returns a function object if object is a module and method is the name of a function in that module.
So this expression takes an object (named object). Then it gets a list of the names of the object's attributes, methods, functions, and a few other things. Then it filters
that list to weed out all the stuff that you don't care about. You do the weeding out by taking the name of each attribute/method/function
and getting a reference to the real thing, via the getattr function. Then you check to see if that object is callable, which will be any methods and functions, both built-in (like
the pop method of a list) and user-defined (like the buildConnectionString function of the odbchelper module). You don't care about other attributes, like the __name__ attribute that's built in to every module.
filter function.
and and orIn Python, and and or perform boolean logic as you would expect, but they do not return boolean values; instead, they return one of the actual
values they are comparing.
and>>> 'a' and 'b''b' >>> '' and 'b'
'' >>> 'a' and 'b' and 'c'
'c'
When using and, values are evaluated in a boolean context from left to right. 0, '', [], (), {}, and None are false in a boolean context; everything else is true. Well, almost everything. By default, instances of classes are
true in a boolean context, but you can define special methods in your class to make an instance evaluate to false. You'll
learn all about classes and special methods in Chapter 5. If all values are true in a boolean context, and returns the last value. In this case, and evaluates 'a', which is true, then 'b', which is true, and returns 'b'.
|
|
If any value is false in a boolean context, and returns the first false value. In this case, '' is the first false value.
|
|
All values are true, so and returns the last value, 'c'.
|
or>>> 'a' or 'b''a' >>> '' or 'b'
'b' >>> '' or [] or {}
{} >>> def sidefx(): ... print "in sidefx()" ... return 1 >>> 'a' or sidefx()
'a'
If you're a C hacker, you are certainly familiar with the bool ? a : b expression, which evaluates to a if bool is true, and b otherwise. Because of the way and and or work in Python, you can accomplish the same thing.
and-or Trickand-or Trick>>> a = "first" >>> b = "second" >>> 1 and a or b'first' >>> 0 and a or b
'second'
However, since this Python expression is simply boolean logic, and not a special construct of the language, there is one extremely important difference
between this and-or trick in Python and the bool ? a : b syntax in C. If the value of a is false, the expression will not work as you would expect it to. (Can you tell I was bitten by this? More than once?)
and-or Trick Fails>>> a = "" >>> b = "second" >>> 1 and a or b'second'
Since a is an empty string, which Python considers false in a boolean context, 1 and '' evalutes to '', and then '' or 'second' evalutes to 'second'. Oops! That's not what you wanted.
|
The and-or trick, bool and a or b, will not work like the C expression bool ? a : b when a is false in a boolean context.
The real trick behind the and-or trick, then, is to make sure that the value of a is never false. One common way of doing this is to turn a into [a] and b into [b], then taking the first element of the returned list, which will be either a or b.
and-or Trick Safely>>> a = "" >>> b = "second" >>> (1 and [a] or [b])[0]''
Since [a] is a non-empty list, it is never false. Even if a is 0 or '' or some other false value, the list [a] is true because it has one element.
|
By now, this trick may seem like more trouble than it's worth. You could, after all, accomplish the same thing with an if statement, so why go through all this fuss? Well, in many cases, you are choosing between two constant values, so you can
use the simpler syntax and not worry, because you know that the a value will always be true. And even if you need to use the more complicated safe form, there are good reasons to do so.
For example, there are some cases in Python where if statements are not allowed, such as in lambda functions.
and-or Trickand-or trick.
lambda FunctionsPython supports an interesting syntax that lets you define one-line mini-functions on the fly. Borrowed from Lisp, these so-called lambda functions can be used anywhere a function is required.
lambda Functions>>> def f(x): ... return x*2 ... >>> f(3) 6 >>> g = lambda x: x*2>>> g(3) 6 >>> (lambda x: x*2)(3)
6
To generalize, a lambda function is a function that takes any number of arguments (including optional arguments) and returns the value of a single expression. lambda functions can not contain commands, and they can not contain more than one expression. Don't try to squeeze too much into
a lambda function; if you need something more complex, define a normal function instead and make it as long as you want.
lambda functions are a matter of style. Using them is never required; anywhere you could use them, you could define a separate
normal function and use that instead. I use them in places where I want to encapsulate specific, non-reusable code without
littering my code with a lot of little one-line functions.
|
|
lambda FunctionsHere are the lambda functions in apihelper.py:
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)Notice that this uses the simple form of the and-or trick, which is okay, because a lambda function is always true in a boolean context. (That doesn't mean that a lambda function can't return a false value. The function is always true; its return value could be anything.)
Also notice that you're using the split function with no arguments. You've already seen it used with one or two arguments, but without any arguments it splits on whitespace.
split With No Arguments>>> s = "this is\na\ttest">>> print s this is a test >>> print s.split()
['this', 'is', 'a', 'test'] >>> print " ".join(s.split())
'this is a test'
This is a multiline string, defined by escape characters instead of triple quotes. \n is a carriage return, and \t is a tab character.
|
|
split without any arguments splits on whitespace. So three spaces, a carriage return, and a tab character are all the same.
|
|
You can normalize whitespace by splitting a string with split and then rejoining it with join, using a single space as a delimiter. This is what the info function does to collapse multi-line docstrings into a single line.
|
So what is the info function actually doing with these lambda functions, splits, and and-or tricks?
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)processFunc is now a function, but which function it is depends on the value of the collapse variable. If collapse is true, processFunc(string) will collapse whitespace; otherwise, processFunc(string) will return its argument unchanged.
To do this in a less robust language, like Visual Basic, you would probably create a function that took a string and a collapse argument and used an if statement to decide whether to collapse the whitespace or not, then returned the appropriate value. This would be inefficient,
because the function would need to handle every possible case. Every time you called it, it would need to decide whether
to collapse whitespace before it could give you what you wanted. In Python, you can take that decision logic out of the function and define a lambda function that is custom-tailored to give you exactly (and only) what you want. This is more efficient, more elegant, and
less prone to those nasty oh-I-thought-those-arguments-were-reversed kinds of errors.
lambda Functionslambda to call functions indirectly.
lambda function. (PEP 227 explains how this will change in future versions of Python.)
lambda.
The last line of code, the only one you haven't deconstructed yet, is the one that does all the work. But by now the work is easy, because everything you need is already set up just the way you need it. All the dominoes are in place; it's time to knock them down.
This is the meat of apihelper.py:
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])Note that this is one command, split over multiple lines, but it doesn't use the line continuation character (\). Remember when I said that some expressions can be split into multiple lines without using a backslash? A list comprehension is one of those expressions, since the entire expression is contained in
square brackets.
Now, let's take it from the end and work backwards. The
for method in methodList
shows that this is a list comprehension. As you know, methodList is a list of all the methods you care about in object. So you're looping through that list with method.
docstring Dynamically>>> import odbchelper >>> object = odbchelper>>> method = 'buildConnectionString'
>>> getattr(object, method)
<function buildConnectionString at 010D6D74> >>> print getattr(object, method).__doc__
Build a connection string from a dictionary of parameters. Returns string.
In the info function, object is the object you're getting help on, passed in as an argument.
|
|
| As you're looping through methodList, method is the name of the current method. | |
Using the getattr function, you're getting a reference to the method function in the object module.
|
|
Now, printing the actual docstring of the method is easy.
|
The next piece of the puzzle is the use of str around the docstring. As you may recall, str is a built-in function that coerces data into a string. But a docstring is always a string, so why bother with the str function? The answer is that not every function has a docstring, and if it doesn't, its __doc__ attribute is None.
str on a docstring?>>> >>> def foo(): print 2 >>> >>> foo() 2 >>> >>> foo.__doc__>>> foo.__doc__ == None
True >>> str(foo.__doc__)
'None'
In SQL, you must use IS NULL instead of = NULL to compare a null value. In Python, you can use either == None or is None, but is None is faster.
|
|
Now that you are guaranteed to have a string, you can pass the string to processFunc, which you have already defined as a function that either does or doesn't collapse whitespace. Now you see why it was important to use str to convert a None value into a string representation. processFunc is assuming a string argument and calling its split method, which would crash if you passed it None because None doesn't have a split method.
Stepping back even further, you see that you're using string formatting again to concatenate the return value of processFunc with the return value of method's ljust method. This is a new string method that you haven't seen before.
ljust>>> s = 'buildConnectionString' >>> s.ljust(30)'buildConnectionString ' >>> s.ljust(20)
'buildConnectionString'
You're almost finished. Given the padded method name from the ljust method and the (possibly collapsed) docstring from the call to processFunc, you concatenate the two and get a single string. Since you're mapping methodList, you end up with a list of strings. Using the join method of the string "\n", you join this list into a single string, with each element of the list on a separate line, and print the result.
>>> li = ['a', 'b', 'c'] >>> print "\n".join(li)a b c
| This is also a useful debugging trick when you're working with lists. And in Python, you're always working with lists. |
That's the last piece of the puzzle. You should now understand this code.
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])The apihelper.py program and its output should now make perfect sense.
def info(object, spacing=10, collapse=1):
"""Print methods and docstrings.
Takes module, class, list, dictionary, or string."""
methodList = [method for method in dir(object) if callable(getattr(object, method))]
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])
if __name__ == "__main__":
print info.__doc__Here is the output of apihelper.py:
>>> from apihelper import info >>> li = [] >>> info(li) append L.append(object) -- append object to end count L.count(value) -> integer -- return number of occurrences of value extend L.extend(list) -- extend list by appending list elements index L.index(value) -> integer -- return index of first occurrence of value insert L.insert(index, object) -- insert object before index pop L.pop([index]) -> item -- remove and return item at index (default last) remove L.remove(value) -- remove first occurrence of value reverse L.reverse() -- reverse *IN PLACE* sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1, 0, 1
Before diving into the next chapter, make sure you're comfortable doing all of these things:
str to coerce any arbitrary value into a string representation
getattr to get references to functions and other attributes dynamically
and-or trick and using it safely
lambda functions
This chapter, and pretty much every chapter after this, deals with object-oriented Python programming.
Here is a complete, working Python program. Read the docstrings of the module, the classes, and the functions to get an overview of what this program does and how it works. As usual, don't
worry about the stuff you don't understand; that's what the rest of the chapter is for.
fileinfo.pyIf you have not already done so, you can download this and other examples used in this book.
"""Framework for getting filetype-specific metadata.
Instantiate appropriate class with filename. Returned object acts like a
dictionary, with key-value pairs for each piece of metadata.
import fileinfo
info = fileinfo.MP3FileInfo("/music/ap/mahadeva.mp3")
print "\\n".join(["%s=%s" % (k, v) for k, v in info.items()])
Or use listDirectory function to get info on all files in a directory.
for info in fileinfo.listDirectory("/music/ap/", [".mp3"]):
...
Framework can be extended by adding classes for particular file types, e.g.
HTMLFileInfo, MPGFileInfo, DOCFileInfo. Each class is completely responsible for
parsing its files appropriately; see MP3FileInfo for example.
"""
import os
import sys
from UserDict import UserDict
def stripnulls(data):
"strip whitespace and nulls"
return data.replace("\00", "").strip()
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self)
self["name"] = filename
class MP3FileInfo(FileInfo):
"store ID3v1.0 MP3 tags"
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}
def __parse(self, filename):
"parse ID3v1.0 tags from MP3 file"
self.clear()
try:
fsock = open(filename, "rb", 0)
try:
fsock.seek(-128, 2)
tagdata = fsock.read(128)
finally:
fsock.close()
if tagdata[:3] == "TAG":
for tag, (start, end, parseFunc) in self.tagDataMap.items():
self[tag] = parseFunc(tagdata[start:end])
except IOError:
pass
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item)
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f)
for f in os.listdir(directory)]
fileList = [os.path.join(directory, f)
for f in fileList
if os.path.splitext(f)[1] in fileExtList]
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
return [getFileInfoClass(f)(f) for f in fileList]
if __name__ == "__main__":
for info in listDirectory("/music/_singles/", [".mp3"]):
print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
print| This program's output depends on the files on your hard drive. To get meaningful output, you'll need to change the directory path to point to a directory of MP3 files on your own machine. |
This is the output I got on my machine. Your output will be different, unless, by some startling coincidence, you share my exact taste in music.
album= artist=Ghost in the Machine title=A Time Long Forgotten (Concept genre=31 name=/music/_singles/a_time_long_forgotten_con.mp3 year=1999 comment=http://mp3.com/ghostmachine album=Rave Mix artist=***DJ MARY-JANE*** title=HELLRAISER****Trance from Hell genre=31 name=/music/_singles/hellraiser.mp3 year=2000 comment=http://mp3.com/DJMARYJANE album=Rave Mix artist=***DJ MARY-JANE*** title=KAIRO****THE BEST GOA genre=31 name=/music/_singles/kairo.mp3 year=2000 comment=http://mp3.com/DJMARYJANE album=Journeys artist=Masters of Balance title=Long Way Home genre=31 name=/music/_singles/long_way_home1.mp3 year=2000 comment=http://mp3.com/MastersofBalan album= artist=The Cynic Project title=Sidewinder genre=18 name=/music/_singles/sidewinder.mp3 year=2000 comment=http://mp3.com/cynicproject album=Digitosis@128k artist=VXpanded title=Spinning genre=255 name=/music/_singles/spinning.mp3 year=2000 comment=http://mp3.com/artists/95/vxp
from module importPython has two ways of importing modules. Both are useful, and you should know when to use each. One way, import module, you've already seen in Section 2.4, “Everything Is an Object”. The other way accomplishes the same thing, but it has subtle and important differences.
Here is the basic from module import syntax:
from UserDict import UserDict
This is similar to the import module syntax that you know and love, but with an important difference: the attributes and methods of the imported module types are imported directly into the local namespace, so they are available directly, without qualification by module name. You
can import individual items or use from module import * to import everything.
from module import * in Python is like use module in Perl; import module in Python is like require module in Perl.
|
|
from module import * in Python is like import module.* in Java; import module in Python is like import module in Java.
|
|
import module vs. from module import>>> import types >>> types.FunctionType<type 'function'> >>> FunctionType
Traceback (innermost last): File "<interactive input>", line 1, in ? NameError: There is no variable named 'FunctionType' >>> from types import FunctionType
>>> FunctionType
<type 'function'>
When should you use from module import?
from module import.
from module import.
import module to avoid name conflicts.
Other than that, it's just a matter of style, and you will see Python code written both ways.
Use from module import * sparingly, because it makes it difficult to determine where a particular function or attribute came from, and that makes
debugging and refactoring more difficult.
|
|
import module vs. from module import.
from module import *.
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you've defined.
Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word class, followed by the class name. Technically, that's all that's required, since a class doesn't need to inherit from any other
class.
class Loaf:pass
![]()
The pass statement in Python is like an empty set of braces ({}) in Java or C.
|
|
Of course, realistically, most classes will be inherited from other classes, and they will define their own class methods
and attributes. But as you've just seen, there is nothing that a class absolutely must have, other than a name. In particular,
C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Python classes do have something similar to a constructor: the __init__ method.
FileInfo Classfrom UserDict import UserDict class FileInfo(UserDict):
In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. So the FileInfo class is inherited from the UserDict class (which was imported from the UserDict module). UserDict is a class that acts like a dictionary, allowing you to essentially subclass the dictionary datatype and add your own behavior.
(There are similar classes UserList and UserString which allow you to subclass lists and strings.) There is a bit of black magic behind this, which you will demystify later
in this chapter when you explore the UserDict class in more depth.
|
In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. There is no special keyword like
extends in Java.
|
|
Python supports multiple inheritance. In the parentheses following the class name, you can list as many ancestor classes as you like, separated by commas.
This example shows the initialization of the FileInfo class using the __init__ method.
FileInfo Class
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):

Classes can (and should) have docstrings too, just like modules and functions.
|
|
__init__ is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor
of the class. It's tempting, because it looks like a constructor (by convention, __init__ is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance
of the class), and even sounds like one (“init” certainly suggests a constructor-ish nature). Incorrect, because the object has already been constructed by the time __init__ is called, and you already have a valid reference to the new instance of the class. But __init__ is the closest thing you're going to get to a constructor in Python, and it fills much the same role.
|
|
The first argument of every class method, including __init__, is always a reference to the current instance of the class. By convention, this argument is always named self. In the __init__ method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although
you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.
|
|
__init__ methods can take any number of arguments, and just like functions, the arguments can be defined with default values, making
them optional to the caller. In this case, filename has a default value of None, which is the Python null value.
|
By convention, the first argument of any Python class method (the reference to the current instance) is called self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention.
|
|
FileInfo Class
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self)
self["name"] = filename

self and __init__When defining your class methods, you must explicitly list self as the first argument for each method, including __init__. When you call a method of an ancestor class from within your class, you must include the self argument. But when you call your class method from outside, you do not specify anything for the self argument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent,
but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don't know
about yet.
Whew. I realize that's a lot to absorb, but you'll get the hang of it. All Python classes work the same way, so once you learn one, you've learned them all. If you forget everything else, remember this one thing, because I promise it will trip you up:
__init__ methods are optional, but when you define one, you must remember to explicitly call the ancestor's __init__ method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor,
the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments.
|
|
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the
__init__ method defines. The return value will be the newly created object.
FileInfo Instance>>> import fileinfo
>>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3")
>>> f.__class__
<class fileinfo.FileInfo at 010EC204>
>>> f.__doc__
'store file metadata'
>>> f
{'name': '/music/_singles/kairo.mp3'}You are creating an instance of the FileInfo class (defined in the fileinfo module) and assigning the newly created instance to the variable f. You are passing one parameter, /music/_singles/kairo.mp3, which will end up as the filename argument in FileInfo's __init__ method.
|
|
Every class instance has a built-in attribute, __class__, which is the object's class. (Note that the representation of this includes the physical address of the instance on my
machine; your representation will be different.) Java programmers may be familiar with the Class class, which contains methods like getName and getSuperclass to get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like __class__, __name__, and __bases__.
|
|
You can access the instance's docstring just as with a function or a module. All instances of a class share the same docstring.
|
|
Remember when the __init__ method assigned its filename argument to self["name"]? Well, here's the result. The arguments you pass when you create the class instance get sent right along to the __init__ method (along with the object reference, self, which Python adds for free).
|
In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit new operator like C++ or Java.
|
|
If creating new instances is easy, destroying them is even easier. In general, there is no need to explicitly free instances, because they are freed automatically when the variables assigned to them go out of scope. Memory leaks are rare in Python.
>>> def leakmem():
... f = fileinfo.FileInfo('/music/_singles/kairo.mp3')
...
>>> for i in range(100):
... leakmem() 
The technical term for this form of garbage collection is “reference counting”. Python keeps a list of references to every instance created. In the above example, there was only one reference to the FileInfo instance: the local variable f. When the function ends, the variable f goes out of scope, so the reference count drops to 0, and Python destroys the instance automatically.
In previous versions of Python, there were situations where reference counting failed, and Python couldn't clean up after you. If you created two instances that referenced each other (for instance, a doubly-linked list, where each node has a pointer to the previous and next node in the list), neither instance would ever be destroyed automatically because Python (correctly) believed that there is always a reference to each instance. Python 2.0 has an additional form of garbage collection called “mark-and-sweep” which is smart enough to notice this virtual gridlock and clean up circular references correctly.
As a former philosophy major, it disturbs me to think that things disappear when no one is looking at them, but that's exactly what happens in Python. In general, you can simply forget about memory management and let Python clean up after you.
__class__.
gc module, which gives you low-level control over Python's garbage collection.
UserDict: A Wrapper ClassAs you've seen, FileInfo is a class that acts like a dictionary. To explore this further, let's look at the UserDict class in the UserDict module, which is the ancestor of the FileInfo class. This is nothing special; the class is written in Python and stored in a .py file, just like any other Python code. In particular, it's stored in the lib directory in your Python installation.
| In the ActivePython IDE on Windows, you can quickly open any module in your library path by selecting File->Locate... (Ctrl-L). | |
UserDict Classclass UserDict:def __init__(self, dict=None):
self.data = {}
if dict is not None: self.update(dict)
![]()
![]()
Note that UserDict is a base class, not inherited from any other class.
|
|
This is the __init__ method that you overrode in the FileInfo class. Note that the argument list in this ancestor class is different than the descendant. That's okay; each subclass can have
its own set of arguments, as long as it calls the ancestor with the correct arguments. Here the ancestor class has a way
to define initial values (by passing a dictionary in the dict argument) which the FileInfo does not use.
|
|
Python supports data attributes (called “instance variables” in Java and Powerbuilder, and “member variables” in C++). Data attributes are pieces of data held by a specific instance of a class. In this case, each instance of UserDict will have a data attribute data. To reference this attribute from code outside the class, you qualify it with the instance name, instance.data, in the same way that you qualify a function with its module name. To reference a data attribute from within the class,
you use self as the qualifier. By convention, all data attributes are initialized to reasonable values in the __init__ method. However, this is not required, since data attributes, like local variables, spring into existence when they are first assigned a value.
|
|
The update method is a dictionary duplicator: it copies all the keys and values from one dictionary to another. This does not clear the target dictionary first; if the target dictionary already has some keys, the ones from the source dictionary will
be overwritten, but others will be left untouched. Think of update as a merge function, not a copy function.
|
|
This is a syntax you may not have seen before (I haven't used it in the examples in this book). It's an if statement, but instead of having an indented block starting on the next line, there is just a single statement on the same
line, after the colon. This is perfectly legal syntax, which is just a shortcut you can use when you have only one statement
in a block. (It's like specifying a single statement without braces in C++.) You can use this syntax, or you can have indented code on subsequent lines, but you can't do both for the same block.
|
Java and Powerbuilder support function overloading by argument list, i.e. one class can have multiple methods with the same name but a different number of arguments, or arguments of different types.
Other languages (most notably PL/SQL) even support function overloading by argument name; i.e. one class can have multiple methods with the same name and the same number of arguments of the same type but different argument
names. Python supports neither of these; it has no form of function overloading whatsoever. Methods are defined solely by their name,
and there can be only one method per class with a given name. So if a descendant class has an __init__ method, it always overrides the ancestor __init__ method, even if the descendant defines it with a different argument list. And the same rule applies to any other method.
|
|
| Guido, the original author of Python, explains method overriding this way: "Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class, may in fact end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)" If that doesn't make sense to you (it confuses the hell out of me), feel free to ignore it. I just thought I'd pass it along. | |
Always assign an initial value to all of an instance's data attributes in the __init__ method. It will save you hours of debugging later, tracking down AttributeError exceptions because you're referencing uninitialized (and therefore non-existent) attributes.
|
|
UserDict Normal Methods
def clear(self): self.data.clear()
def copy(self):
if self.__class__ is UserDict:
return UserDict(self.data)
import copy
return copy.copy(self)
def keys(self): return self.data.keys()
def items(self): return self.data.items()
def values(self): return self.data.values()
clear is a normal class method; it is publicly available to be called by anyone at any time. Notice that clear, like all class methods, has self as its first argument. (Remember that you don't include self when you call the method; it's something that Python adds for you.) Also note the basic technique of this wrapper class: store a real dictionary (data) as a data attribute, define all the methods that a real dictionary has, and have each class method redirect to the corresponding
method on the real dictionary. (In case you'd forgotten, a dictionary's clear method deletes all of its keys and their associated values.)
|
|
The copy method of a real dictionary returns a new dictionary that is an exact duplicate of the original (all the same key-value pairs).
But UserDict can't simply redirect to self.data.copy, because that method returns a real dictionary, and what you want is to return a new instance that is the same class as self.
|
|
You use the __class__ attribute to see if self is a UserDict; if so, you're golden, because you know how to copy a UserDict: just create a new UserDict and give it the real dictionary that you've squirreled away in self.data. Then you immediately return the new UserDict you don't even get to the import copy on the next line.
|
|
If self.__class__ is not UserDict, then self must be some subclass of UserDict (like maybe FileInfo), in which case life gets trickier. UserDict doesn't know how to make an exact copy of one of its descendants; there could, for instance, be other data attributes defined
in the subclass, so you would need to iterate through them and make sure to copy all of them. Luckily, Python comes with a module to do exactly this, and it's called copy. I won't go into the details here (though it's a wicked cool module, if you're ever inclined to dive into it on your own).
Suffice it to say that copy can copy arbitrary Python objects, and that's how you're using it here.
|
|
| The rest of the methods are straightforward, redirecting the calls to the built-in methods on self.data. |
In versions of Python prior to 2.2, you could not directly subclass built-in datatypes like strings, lists, and dictionaries. To compensate for
this, Python comes with wrapper classes that mimic the behavior of these built-in datatypes: UserString, UserList, and UserDict. Using a combination of normal and special methods, the UserDict class does an excellent imitation of a dictionary. In Python 2.2 and later, you can inherit classes directly from built-in datatypes like dict. An example of this is given in the examples that come with this book, in fileinfo_fromdict.py.
|
|
In Python, you can inherit directly from the dict built-in datatype, as shown in this example. There are three differences here compared to the UserDict version.
dictclass FileInfo(dict):"store file metadata" def __init__(self, filename=None):
self["name"] = filename
UserDictUserDict module and the copy module.
In addition to normal class methods, there are a number of special methods that Python classes can define. Instead of being called directly by your code (like normal methods), special methods are called for you by Python in particular circumstances or when specific syntax is used.
As you saw in the previous section, normal methods go a long way towards wrapping a dictionary in a class. But normal methods alone are not enough, because there are a lot of things you can do with dictionaries besides call methods on them. For starters, you can get and set items with a syntax that doesn't include explicitly invoking methods. This is where special class methods come in: they provide a way to map non-method-calling syntax into method calls.
__getitem__ Special Method
def __getitem__(self, key): return self.data[key]>>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3")
>>> f
{'name':'/music/_singles/kairo.mp3'}
>>> f.__getitem__("name")
'/music/_singles/kairo.mp3'
>>> f["name"]
'/music/_singles/kairo.mp3'The __getitem__ special method looks simple enough. Like the normal methods clear, keys, and values, it just redirects to the dictionary to return its value. But how does it get called? Well, you can call __getitem__ directly, but in practice you wouldn't actually do that; I'm just doing it here to show you how it works. The right way
to use __getitem__ is to get Python to call it for you.
|
|
This looks just like the syntax you would use to get a dictionary value, and in fact it returns the value you would expect. But here's the missing link: under the covers, Python has converted this syntax to the method call f.__getitem__("name"). That's why __getitem__ is a special class method; not only can you call it yourself, you can get Python to call it for you by using the right syntax.
|
Of course, Python has a __setitem__ special method to go along with __getitem__, as shown in the next example.
__setitem__ Special Method
def __setitem__(self, key, item): self.data[key] = item>>> f
{'name':'/music/_singles/kairo.mp3'}
>>> f.__setitem__("genre", 31)
>>> f
{'name':'/music/_singles/kairo.mp3', 'genre':31}
>>> f["genre"] = 32
>>> f
{'name':'/music/_singles/kairo.mp3', 'genre':32}__setitem__ is a special class method because it gets called for you, but it's still a class method. Just as easily as the __setitem__ method was defined in UserDict, you can redefine it in the descendant class to override the ancestor method. This allows you to define classes that act
like dictionaries in some ways but define their own behavior above and beyond the built-in dictionary.
This concept is the basis of the entire framework you're studying in this chapter. Each file type can have a handler class
that knows how to get metadata from a particular type of file. Once some attributes (like the file's name and location) are
known, the handler class knows how to derive other attributes automatically. This is done by overriding the __setitem__ method, checking for particular keys, and adding additional processing when they are found.
For example, MP3FileInfo is a descendant of FileInfo. When an MP3FileInfo's name is set, it doesn't just set the name key (like the ancestor FileInfo does); it also looks in the file itself for MP3 tags and populates a whole set of keys. The next example shows how this works.
__setitem__ in MP3FileInfo
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item) 
Notice that this __setitem__ method is defined exactly the same way as the ancestor method. This is important, since Python will be calling the method for you, and it expects it to be defined with a certain number of arguments. (Technically speaking,
the names of the arguments don't matter; only the number of arguments is important.)
|
|
Here's the crux of the entire MP3FileInfo class: if you're assigning a value to the name key, you want to do something extra.
|
|
The extra processing you do for names is encapsulated in the __parse method. This is another class method defined in MP3FileInfo, and when you call it, you qualify it with self. Just calling __parse would look for a normal function defined outside the class, which is not what you want. Calling self.__parse will look for a class method defined within the class. This isn't anything new; you reference data attributes the same way.
|
|
After doing this extra processing, you want to call the ancestor method. Remember that this is never done for you in Python; you must do it manually. Note that you're calling the immediate ancestor, FileInfo, even though it doesn't have a __setitem__ method. That's okay, because Python will walk up the ancestor tree until it finds a class with the method you're calling, so this line of code will eventually
find and call the __setitem__ defined in UserDict.
|
When accessing data attributes within a class, you need to qualify the attribute name: self.attribute. When calling other methods within a class, you need to qualify the method name: self.method.
|
|
MP3FileInfo's name>>> import fileinfo >>> mp3file = fileinfo.MP3FileInfo()>>> mp3file {'name':None} >>> mp3file["name"] = "/music/_singles/kairo.mp3"
>>> mp3file {'album': 'Rave Mix', 'artist': '***DJ MARY-JANE***', 'genre': 31, 'title': 'KAIRO****THE BEST GOA', 'name': '/music/_singles/kairo.mp3', 'year': '2000', 'comment': 'http://mp3.com/DJMARYJANE'} >>> mp3file["name"] = "/music/_singles/sidewinder.mp3"
>>> mp3file {'album': '', 'artist': 'The Cynic Project', 'genre': 18, 'title': 'Sidewinder', 'name': '/music/_singles/sidewinder.mp3', 'year': '2000', 'comment': 'http://mp3.com/cynicproject'}
First, you create an instance of MP3FileInfo, without passing it a filename. (You can get away with this because the filename argument of the __init__ method is optional.) Since MP3FileInfo has no __init__ method of its own, Python walks up the ancestor tree and finds the __init__ method of FileInfo. This __init__ method manually calls the __init__ method of UserDict and then sets the name key to filename, which is None, since you didn't pass a filename. Thus, mp3file initially looks like a dictionary with one key, name, whose value is None.
|
|
Now the real fun begins. Setting the name key of mp3file triggers the __setitem__ method on MP3FileInfo (not UserDict), which notices that you're setting the name key with a real value and calls self.__parse. Although you haven't traced through the __parse method yet, you can see from the output that it sets several other keys: album, artist, genre, title, year, and comment.
|
|
Modifying the name key will go through the same process again: Python calls __setitem__, which calls self.__parse, which sets all the other keys.
|
Python has more special methods than just __getitem__ and __setitem__. Some of them let you emulate functionality that you may not even know about.
This example shows some of the other special methods in UserDict.
UserDict
def __repr__(self): return repr(self.data)
def __cmp__(self, dict):
if isinstance(dict, UserDict):
return cmp(self.data, dict.data)
else:
return cmp(self.data, dict)
def __len__(self): return len(self.data)
def __delitem__(self, key): del self.data[key] 
__repr__ is a special method that is called when you call repr(instance). The repr function is a built-in function that returns a string representation of an object. It works on any object, not just class
instances. You're already intimately familiar with repr and you don't even know it. In the interactive window, when you type just a variable name and press the ENTER key, Python uses repr to display the variable's value. Go create a dictionary d with some data and then print repr(d) to see for yourself.
|
|
__cmp__ is called when you compare class instances. In general, you can compare any two Python objects, not just class instances, by using ==. There are rules that define when built-in datatypes are considered equal; for instance, dictionaries are equal when they
have all the same keys and values, and strings are equal when they are the same length and contain the same sequence of characters.
For class instances, you can define the __cmp__ method and code the comparison logic yourself, and then you can use == to compare instances of your class and Python will call your __cmp__ special method for you.
|
|
__len__ is called when you call len(instance). The len function is a built-in function that returns the length of an object. It works on any object that could reasonably be thought
of as having a length. The len of a string is its number of characters; the len of a dictionary is its number of keys; the len of a list or tuple is its number of elements. For class instances, define the __len__ method and code the length calculation yourself, and then call len(instance) and Python will call your __len__ special method for you.
|
|
__delitem__ is called when you call del instance[key], which you may remember as the way to delete individual items from a dictionary. When you use del on a class instance, Python calls the __delitem__ special method for you.
|
In Java, you determine whether two string variables reference the same physical memory location by using str1 == str2. This is called object identity, and it is written in Python as str1 is str2. To compare string values in Java, you would use str1.equals(str2); in Python, you would use str1 == str2. Java programmers who have been taught to believe that the world is a better place because == in Java compares by identity instead of by value may have a difficult time adjusting to Python's lack of such “gotchas”.
|
|
At this point, you may be thinking, “All this work just to do something in a class that I can do with a built-in datatype.” And it's true that life would be easier (and the entire UserDict class would be unnecessary) if you could inherit from built-in datatypes like a dictionary. But even if you could, special
methods would still be useful, because they can be used in any class, not just wrapper classes like UserDict.
Special methods mean that any class can store key/value pairs like a dictionary, just by defining the __setitem__ method. Any class can act like a sequence, just by defining the __getitem__ method. Any class that defines the __cmp__ method can be compared with ==. And if your class represents something that has a length, don't define a GetLength method; define the __len__ method and use len(instance).
While other object-oriented languages only let you define the physical model of an object (“this object has a GetLength method”), Python's special class methods like __len__ allow you to define the logical model of an object (“this object has a length”).
|
|
Python has a lot of other special methods. There's a whole set of them that let classes act like numbers, allowing you to add,
subtract, and do other arithmetic operations on class instances. (The canonical example of this is a class that represents
complex numbers, numbers with both real and imaginary components.) The __call__ method lets a class act like a function, allowing you to call a class instance directly. And there are other special methods
that allow classes to have read-only and write-only data attributes; you'll talk more about those in later chapters.
You already know about data attributes, which are variables owned by a specific instance of a class. Python also supports class attributes, which are variables owned by the class itself.
class MP3FileInfo(FileInfo):
"store ID3v1.0 MP3 tags"
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}>>> import fileinfo >>> fileinfo.MP3FileInfo<class fileinfo.MP3FileInfo at 01257FDC> >>> fileinfo.MP3FileInfo.tagDataMap
{'title': (3, 33, <function stripnulls at 0260C8D4>), 'genre': (127, 128, <built-in function ord>), 'artist': (33, 63, <function stripnulls at 0260C8D4>), 'year': (93, 97, <function stripnulls at 0260C8D4>), 'comment': (97, 126, <function stripnulls at 0260C8D4>), 'album': (63, 93, <function stripnulls at 0260C8D4>)} >>> m = fileinfo.MP3FileInfo()
>>> m.tagDataMap {'title': (3, 33, <function stripnulls at 0260C8D4>), 'genre': (127, 128, <built-in function ord>), 'artist': (33, 63, <function stripnulls at 0260C8D4>), 'year': (93, 97, <function stripnulls at 0260C8D4>), 'comment': (97, 126, <function stripnulls at 0260C8D4>), 'album': (63, 93, <function stripnulls at 0260C8D4>)}
In Java, both static variables (called class attributes in Python) and instance variables (called data attributes in Python) are defined immediately after the class definition (one with the static keyword, one without). In Python, only class attributes can be defined here; data attributes are defined in the __init__ method.
|
|
Class attributes can be used as class-level constants (which is how you use them in MP3FileInfo), but they are not really constants. You can also change them.
There are no constants in Python. Everything can be changed if you try hard enough. This fits with one of the core principles of Python: bad behavior should be discouraged but not banned. If you really want to change the value of None, you can do it, but don't come running to me when your code is impossible to debug.
|
|
>>> class counter: ... count = 0... def __init__(self): ... self.__class__.count += 1
... >>> counter <class __main__.counter at 010EAECC> >>> counter.count
0 >>> c = counter() >>> c.count
1 >>> counter.count 1 >>> d = counter()
>>> d.count 2 >>> c.count 2 >>> counter.count 2
Like most languages, Python has the concept of private elements:
Unlike in most languages, whether a Python function, method, or attribute is private or public is determined entirely by its name.
If the name of a Python function, class method, or attribute starts with (but doesn't end with) two underscores, it's private; everything else is public. Python has no concept of protected class methods (accessible only in their own class and descendant classes). Class methods are either private (accessible only in their own class) or public (accessible from anywhere).
In MP3FileInfo, there are two methods: __parse and __setitem__. As you have already discussed, __setitem__ is a special method; normally, you would call it indirectly by using the dictionary syntax on a class instance, but it is public, and you could
call it directly (even from outside the fileinfo module) if you had a really good reason. However, __parse is private, because it has two underscores at the beginning of its name.
In Python, all special methods (like __setitem__) and built-in attributes (like __doc__) follow a standard naming convention: they both start with and end with two underscores. Don't name your own methods and
attributes this way, because it will only confuse you (and others) later.
|
|
>>> import fileinfo
>>> m = fileinfo.MP3FileInfo()
>>> m.__parse("/music/_singles/kairo.mp3")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'MP3FileInfo' instance has no attribute '__parse'That's it for the hard-core object trickery. You'll see a real-world application of special class methods in Chapter 12, which uses getattr to create a proxy to a remote web service.
The next chapter will continue using this code sample to explore other Python concepts, such as exceptions, file objects, and for loops.
Before diving into the next chapter, make sure you're comfortable doing all of these things:
import module or from module import
__init__ methods and other special class methods, and understanding when they are called
UserDict to define classes that act like dictionaries
In this chapter, you will dive into exceptions, file objects, for loops, and the os and sys modules. If you've used exceptions in another programming language, you can skim the first section to get a sense of Python's syntax. Be sure to tune in again for file handling.
Like many other programming languages, Python has exception handling via try...except blocks.
Python uses try...except to handle exceptions and raise to generate them. Java and C++ use try...catch to handle exceptions, and throw to generate them.
|
|
Exceptions are everywhere in Python. Virtually every module in the standard Python library uses them, and Python itself will raise them in a lot of different circumstances. You've already seen them repeatedly throughout this book.
KeyError exception.
ValueError exception.
AttributeError exception.
NameError exception.
TypeError exception.
In each of these cases, you were simply playing around in the Python IDE: an error occurred, the exception was printed (depending on your IDE, perhaps in an intentionally jarring shade of red), and that was that. This is called an unhandled exception. When the exception was raised, there was no code to explicitly notice it and deal with it, so it bubbled its way back to the default behavior built in to Python, which is to spit out some debugging information and give up. In the IDE, that's no big deal, but if that happened while your actual Python program was running, the entire program would come to a screeching halt.
An exception doesn't need result in a complete program crash, though. Exceptions, when raised, can be handled. Sometimes an exception is really because you have a bug in your code (like accessing a variable that doesn't exist), but
many times, an exception is something you can anticipate. If you're opening a file, it might not exist. If you're connecting
to a database, it might be unavailable, or you might not have the correct security credentials to access it. If you know
a line of code may raise an exception, you should handle the exception using a try...except block.
>>> fsock = open("/notthere", "r")
Traceback (innermost last):
File "<interactive input>", line 1, in ?
IOError: [Errno 2] No such file or directory: '/notthere'
>>> try:
... fsock = open("/notthere")
... except IOError:
... print "The file does not exist, exiting gracefully"
... print "This line will always print"
The file does not exist, exiting gracefully
This line will always printExceptions may seem unfriendly (after all, if you don't catch the exception, your entire program will crash), but consider the alternative. Would you rather get back an unusable file object to a non-existent file? You'd need to check its validity somehow anyway, and if you forgot, somewhere down the line, your program would give you strange errors somewhere down the line that you would need to trace back to the source. I'm sure you've experienced this, and you know it's not fun. With exceptions, errors occur immediately, and you can handle them in a standard way at the source of the problem.
There are a lot of other uses for exceptions besides handling actual error conditions. A common use in the standard Python library is to try to import a module, and then check whether it worked. Importing a module that does not exist will raise
an ImportError exception. You can use this to define multiple levels of functionality based on which modules are available at run-time,
or to support multiple platforms (where platform-specific code is separated into different modules).
You can also define your own exceptions by creating a class that inherits from the built-in Exception class, and then raise your exceptions with the raise command. See the further reading section if you're interested in doing this.
The next example demonstrates how to use an exception to support platform-specific functionality. This code comes from the
getpass module, a wrapper module for getting a password from the user. Getting a password is accomplished differently on UNIX, Windows, and Mac OS platforms, but this code encapsulates all of those differences.
# Bind the name getpass to the appropriate function
try:
import termios, TERMIOS
except ImportError:
try:
import msvcrt
except ImportError:
try:
from EasyDialogs import AskPassword
except ImportError:
getpass = default_getpass
else:
getpass = AskPassword
else:
getpass = win_getpass
else:
getpass = unix_getpasstraceback module, which provides low-level access to exception attributes after an exception is raised.
try...except block.
Python has a built-in function, open, for opening a file on disk. open returns a file object, which has methods and attributes for getting information about and manipulating the opened file.
>>> f = open("/music/_singles/kairo.mp3", "rb")
>>> f
<open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
>>> f.mode
'rb'
>>> f.name
'/music/_singles/kairo.mp3'The open method can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename,
is required; the other two are optional. If not specified, the file is opened for reading in text mode. Here you are opening the file for reading in binary mode.
(print open.__doc__ displays a great explanation of all the possible modes.)
|
|
The open function returns an object (by now, this should not surprise you). A file object has several useful attributes.
|
|
| The mode attribute of a file object tells you in which mode the file was opened. | |
| The name attribute of a file object tells you the name of the file that the file object has open. |
After you open a file, the first thing you'll want to do is read from it, as shown in the next example.
>>> f <open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988> >>> f.tell()0 >>> f.seek(-128, 2)
>>> f.tell()
7542909 >>> tagData = f.read(128)
>>> tagData 'TAGKAIRO****THE BEST GOA ***DJ MARY-JANE*** Rave Mix 2000http://mp3.com/DJMARYJANE \037' >>> f.tell()
7543037
Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It's important to close files as soon as you're finished with them.
>>> f <open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988> >>> f.closedFalse >>> f.close()
>>> f <closed file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988> >>> f.closed
True >>> f.seek(0)
Traceback (innermost last): File "<interactive input>", line 1, in ? ValueError: I/O operation on closed file >>> f.tell() Traceback (innermost last): File "<interactive input>", line 1, in ? ValueError: I/O operation on closed file >>> f.read() Traceback (innermost last): File "<interactive input>", line 1, in ? ValueError: I/O operation on closed file >>> f.close()
The closed attribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (closed is False).
|
|
To close a file, call the close method of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any)
that the system hadn't gotten around to actually writing yet, and releases the system resources.
|
|
| The closed attribute confirms that the file is closed. | |
| Just because a file is closed doesn't mean that the file object ceases to exist. The variable f will continue to exist until it goes out of scope or gets manually deleted. However, none of the methods that manipulate an open file will work once the file has been closed; they all raise an exception. | |
Calling close on a file object whose file is already closed does not raise an exception; it fails silently.
|
Now you've seen enough to understand the file handling code in the fileinfo.py sample code from teh previous chapter. This example shows how to safely open and read from a file and gracefully handle
errors.
MP3FileInfo
try:
fsock = open(filename, "rb", 0)
try:
fsock.seek(-128, 2)
tagdata = fsock.read(128)
finally:
fsock.close()
.
.
.
except IOError:
pass Because opening and reading files is risky and may raise an exception, all of this code is wrapped in a try...except block. (Hey, isn't standardized indentation great? This is where you start to appreciate it.)
|
|
The open function may raise an IOError. (Maybe the file doesn't exist.)
|
|
The seek method may raise an IOError. (Maybe the file is smaller than 128 bytes.)
|
|
The read method may raise an IOError. (Maybe the disk has a bad sector, or it's on a network drive and the network just went down.)
|
|
This is new: a try...finally block. Once the file has been opened successfully by the open function, you want to make absolutely sure that you close it, even if an exception is raised by the seek or read methods. That's what a try...finally block is for: code in the finally block will always be executed, even if something in the try block raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.
|
|
At last, you handle your IOError exception. This could be the IOError exception raised by the call to open, seek, or read. Here, you really don't care, because all you're going to do is ignore it silently and continue. (Remember, pass is a Python statement that does nothing.) That's perfectly legal; “handling” an exception can mean explicitly doing nothing. It still counts as handled, and processing will continue normally on the
next line of code after the try...except block.
|
As you would expect, you can also write to files in much the same way that you read from them. There are two basic file modes:
Either mode will create the file automatically if it doesn't already exist, so there's never a need for any sort of fiddly "if the log file doesn't exist yet, create a new empty file just so you can open it for the first time" logic. Just open it and start writing.
>>> logfile = open('test.log', 'w')
>>> logfile.write('test succeeded')
>>> logfile.close()
>>> print file('test.log').read()
test succeeded
>>> logfile = open('test.log', 'a')
>>> logfile.write('line 2')
>>> logfile.close()
>>> print file('test.log').read()
test succeededline 2
for LoopsLike most other languages, Python has for loops. The only reason you haven't seen them until now is that Python is good at so many other things that you don't need them as often.
Most other languages don't have a powerful list datatype like Python, so you end up doing a lot of manual work, specifying a start, end, and step to define a range of integers or characters
or other iteratable entities. But in Python, a for loop simply iterates over a list, the same way list comprehensions work.
for Loop>>> li = ['a', 'b', 'e'] >>> for s in li:... print s
a b e >>> print "\n".join(li)
a b e
The syntax for a for loop is similar to list comprehensions. li is a list, and s will take the value of each element in turn, starting from the first element.
|
|
Like an if statement or any other indented block, a for loop can have any number of lines of code in it.
|
|
This is the reason you haven't seen the for loop yet: you haven't needed it yet. It's amazing how often you use for loops in other languages when all you really want is a join or a list comprehension.
|
Doing a “normal” (by Visual Basic standards) counter for loop is also simple.
>>> for i in range(5):... print i 0 1 2 3 4 >>> li = ['a', 'b', 'c', 'd', 'e'] >>> for i in range(len(li)):
... print li[i] a b c d e
As you saw in Example 3.20, “Assigning Consecutive Values”, range produces a list of integers, which you then loop through. I know it looks a bit odd, but it is occasionally (and I stress
occasionally) useful to have a counter loop.
|
|
| Don't ever do this. This is Visual Basic-style thinking. Break out of it. Just iterate through the list, as shown in the previous example. |
for loops are not just for simple counters. They can iterate through all kinds of things. Here is an example of using a for loop to iterate through a dictionary.
>>> import os >>> for k, v in os.environ.items():![]()
... print "%s=%s" % (k, v) USERPROFILE=C:\Documents and Settings\mpilgrim OS=Windows_NT COMPUTERNAME=MPILGRIM USERNAME=mpilgrim [...snip...] >>> print "\n".join(["%s=%s" % (k, v) ... for k, v in os.environ.items()])
USERPROFILE=C:\Documents and Settings\mpilgrim OS=Windows_NT COMPUTERNAME=MPILGRIM USERNAME=mpilgrim [...snip...]
| os.environ is a dictionary of the environment variables defined on your system. In Windows, these are your user and system variables accessible from MS-DOS. In UNIX, they are the variables exported in your shell's startup scripts. In Mac OS, there is no concept of environment variables, so this dictionary is empty. | |
os.environ.items() returns a list of tuples: [(key1, value1), (key2, value2), ...]. The for loop iterates through this list. The first round, it assigns key1 to k and value1 to v, so k = USERPROFILE and v = C:\Documents and Settings\mpilgrim. In the second round, k gets the second key, OS, and v gets the corresponding value, Windows_NT.
|
|
With multi-variable assignment and list comprehensions, you can replace the entire for loop with a single statement. Whether you actually do this in real code is a matter of personal coding style. I like it
because it makes it clear that what I'm doing is mapping a dictionary into a list, then joining the list into a single string.
Other programmers prefer to write this out as a for loop. The output is the same in either case, although this version is slightly faster, because there is only one print statement instead of many.
|
Now we can look at the for loop in MP3FileInfo, from the sample fileinfo.py program introduced in Chapter 5.
for Loop in MP3FileInfo
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}
.
.
.
if tagdata[:3] == "TAG":
for tag, (start, end, parseFunc) in self.tagDataMap.items():
self[tag] = parseFunc(tagdata[start:end]) 
| tagDataMap is a class attribute that defines the tags you're looking for in an MP3 file. Tags are stored in fixed-length fields. Once you read the last 128 bytes of the file, bytes 3 through 32 of those are always the song title, 33 through 62 are always the artist name, 63 through 92 are the album name, and so forth. Note that tagDataMap is a dictionary of tuples, and each tuple contains two integers and a function reference. | |
This looks complicated, but it's not. The structure of the for variables matches the structure of the elements of the list returned by items. Remember that items returns a list of tuples of the form (key, value). The first element of that list is ("title", (3, 33, <function stripnulls>)), so the first time around the loop, tag gets "title", start gets 3, end gets 33, and parseFunc gets the function stripnulls.
|
|
| Now that you've extracted all the parameters for a single MP3 tag, saving the tag data is easy. You slice tagdata from start to end to get the actual data for this tag, call parseFunc to post-process the data, and assign this as the value for the key tag in the pseudo-dictionary self. After iterating through all the elements in tagDataMap, self has the values for all the tags, and you know what that looks like. |
sys.modulesModules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary .
sys.modules
sys.modules>>> import sys>>> print '\n'.join(sys.modules.keys())
win32api os.path os exceptions __main__ ntpath nt sys __builtin__ site signal UserDict stat
This example demonstrates how to use .
sys.modules
sys.modules>>> import fileinfo>>> print '\n'.join(sys.modules.keys()) win32api os.path os fileinfo exceptions __main__ ntpath nt sys __builtin__ site signal UserDict stat >>> fileinfo <module 'fileinfo' from 'fileinfo.pyc'> >>> sys.modules["fileinfo"]
<module 'fileinfo' from 'fileinfo.pyc'>
The next example shows how to use the __module__ class attribute with the dictionary to get a reference to the module in which a class is defined.
sys.modules
__module__ Class Attribute>>> from fileinfo import MP3FileInfo >>> MP3FileInfo.__module__'fileinfo' >>> sys.modules[MP3FileInfo.__module__]
<module 'fileinfo' from 'fileinfo.pyc'>
Every Python class has a built-in class attribute __module__, which is the name of the module in which the class is defined.
|
|
Combining this with the dictionary, you can get a reference to the module in which a class is defined.
|
Now you're ready to see how is used in sys.modulesfileinfo.py, the sample program introduced in Chapter 5. This example shows that portion of the code.
sys.modules in fileinfo.py
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo 
This is a function with two arguments; filename is required, but module is optional and defaults to the module that contains the FileInfo class. This looks inefficient, because you might expect Python to evaluate the expression every time the function is called. In fact, Python evaluates default expressions only once, the first time the module is imported. As you'll see later, you never call this
function with a module argument, so module serves as a function-level constant.
|
|
You'll plow through this line later, after you dive into the os module. For now, take it on faith that subclass ends up as the name of a class, like MP3FileInfo.
|
|
You already know about getattr, which gets a reference to an object by name. hasattr is a complementary function that checks whether an object has a particular attribute; in this case, whether a module has
a particular class (although it works for any object and any attribute, just like getattr). In English, this line of code says, “If this module has the class named by subclass then return it, otherwise return the base class FileInfo.”
|
sys module.
The os.path module has several functions for manipulating files and directories. Here, we're looking at handling pathnames and listing
the contents of a directory.
>>> import os
>>> os.path.join("c:\\music\\ap\\", "mahadeva.mp3")
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.join("c:\\music\\ap", "mahadeva.mp3")
'c:\\music\\ap\\mahadeva.mp3'
>>> os.path.expanduser("~")
'c:\\Documents and Settings\\mpilgrim\\My Documents'
>>> os.path.join(os.path.expanduser("~"), "Python")
'c:\\Documents and Settings\\mpilgrim\\My Documents\\Python'os.path is a reference to a module -- which module depends on your platform. Just as getpass encapsulates differences between platforms by setting getpass to a platform-specific function, os encapsulates differences between platforms by setting path to a platform-specific module.
|
|
The join function of os.path constructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. (Note that dealing
with pathnames on Windows is annoying because the backslash character must be escaped.)
|
|
In this slightly less trivial case, join will add an extra backslash to the pathname before joining it to the filename. I was overjoyed when I discovered this, since
addSlashIfNecessary is one of the stupid little functions I always need to write when building up my toolbox in a new language. Do not write this stupid little function in Python; smart people have already taken care of it for you.
|
|
expanduser will expand a pathname that uses ~ to represent the current user's home directory. This works on any platform where users have a home directory, like Windows,
UNIX, and Mac OS X; it has no effect on Mac OS.
|
|
| Combining these techniques, you can easily construct pathnames for directories and files under the user's home directory. |
>>> os.path.split("c:\\music\\ap\\mahadeva.mp3")
('c:\\music\\ap', 'mahadeva.mp3')
>>> (filepath, filename) = os.path.split("c:\\music\\ap\\mahadeva.mp3")
>>> filepath
'c:\\music\\ap'
>>> filename
'mahadeva.mp3'
>>> (shortname, extension) = os.path.splitext(filename)
>>> shortname
'mahadeva'
>>> extension
'.mp3'The split function splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use
multi-variable assignment to return multiple values from a function? Well, split is such a function.
|
|
You assign the return value of the split function into a tuple of two variables. Each variable receives the value of the corresponding element of the returned tuple.
|
|
The first variable, filepath, receives the value of the first element of the tuple returned from split, the file path.
|
|
The second variable, filename, receives the value of the second element of the tuple returned from split, the filename.
|
|
os.path also contains a function splitext, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique
to assign each of them to separate variables.
|
>>> os.listdir("c:\\music\\_singles\\")
['a_time_long_forgotten_con.mp3', 'hellraiser.mp3',
'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3',
'spinning.mp3']
>>> dirname = "c:\\"
>>> os.listdir(dirname)
['AUTOEXEC.BAT', 'boot.ini', 'CONFIG.SYS', 'cygwin',
'docbook', 'Documents and Settings', 'Incoming', 'Inetpub', 'IO.SYS',
'MSDOS.SYS', 'Music', 'NTDETECT.COM', 'ntldr', 'pagefile.sys',
'Program Files', 'Python20', 'RECYCLER',
'System Volume Information', 'TEMP', 'WINNT']
>>> [f for f in os.listdir(dirname)
... if os.path.isfile(os.path.join(dirname, f))]
['AUTOEXEC.BAT', 'boot.ini', 'CONFIG.SYS', 'IO.SYS', 'MSDOS.SYS',
'NTDETECT.COM', 'ntldr', 'pagefile.sys']
>>> [f for f in os.listdir(dirname)
... if os.path.isdir(os.path.join(dirname, f))]
['cygwin', 'docbook', 'Documents and Settings', 'Incoming',
'Inetpub', 'Music', 'Program Files', 'Python20', 'RECYCLER',
'System Volume Information', 'TEMP', 'WINNT']The listdir function takes a pathname and returns a list of the contents of the directory.
|
|
listdir returns both files and folders, with no indication of which is which.
|
|
You can use list filtering and the isfile function of the os.path module to separate the files from the folders. isfile takes a pathname and returns 1 if the path represents a file, and 0 otherwise. Here you're using to ensure a full pathname, but isfile also works with a partial path, relative to the current working directory. You can use os.getcwd() to get the current working directory.
|
|
os.path also has a isdir function which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories
within a directory.
|
fileinfo.py
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f)
for f in os.listdir(directory)]
fileList = [os.path.join(directory, f)
for f in fileList
if os.path.splitext(f)[1] in fileExtList]

Whenever possible, you should use the functions in os and os.path for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like
os.path.split work on UNIX, Windows, Mac OS, and any other platform supported by Python.
|
|
There is one other way to get the contents of a directory. It's very powerful, and it uses the sort of wildcards that you may already be familiar with from working on the command line.
glob
>>> os.listdir("c:\\music\\_singles\\")
['a_time_long_forgotten_con.mp3', 'hellraiser.mp3',
'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3',
'spinning.mp3']
>>> import glob
>>> glob.glob('c:\\music\\_singles\\*.mp3')
['c:\\music\\_singles\\a_time_long_forgotten_con.mp3',
'c:\\music\\_singles\\hellraiser.mp3',
'c:\\music\\_singles\\kairo.mp3',
'c:\\music\\_singles\\long_way_home1.mp3',
'c:\\music\\_singles\\sidewinder.mp3',
'c:\\music\\_singles\\spinning.mp3']
>>> glob.glob('c:\\music\\_singles\\s*.mp3')
['c:\\music\\_singles\\sidewinder.mp3',
'c:\\music\\_singles\\spinning.mp3']
>>> glob.glob('c:\\music\\*\\*.mp3')
os Moduleos module.
os module and the os.path module.
Once again, all the dominoes are in place. You've seen how each line of code works. Now let's step back and see how it all fits together.
listDirectorydef listDirectory(directory, fileExtList):"get list of file info objects for files of particular extensions" fileList = [os.path.normcase(f) for f in os.listdir(directory)] fileList = [os.path.join(directory, f) for f in fileList if os.path.splitext(f)[1] in fileExtList]
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension" subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
return [getFileInfoClass(f)(f) for f in fileList]
listDirectory is the main attraction of this entire module. It takes a directory (like c:\music\_singles\ in my case) and a list of interesting file extensions (like ['.mp3']), and it returns a list of class instances that act like dictionaries that contain metadata about each interesting file in
that directory. And it does it in just a few straightforward lines of code.
|
|
| As you saw in the previous section, this line of code gets a list of the full pathnames of all the files in directory that have an interesting file extension (as specified by fileExtList). | |
Old-school Pascal programmers may be familiar with them, but most people give me a blank stare when I tell them that Python supports nested functions -- literally, a function within a function. The nested function getFileInfoClass can be called only from the function in which it is defined, listDirectory. As with any other function, you don't need an interface declaration or anything fancy; just define the function and code
it.
|
|
Now that you've seen the os module, this line should make more sense. It gets the extension of the file (os.path.splitext(filename)[1]), forces it to uppercase (.upper()), slices off the dot ([1:]), and constructs a class name out of it with string formatting. So c:\music\ap\mahadeva.mp3 becomes .mp3 becomes .MP3 becomes MP3 becomes MP3FileInfo.
|
|
Having constructed the name of the handler class that would handle this file, you check to see if that handler class actually
exists in this module. If it does, you return the class, otherwise you return the base class FileInfo. This is a very important point: this function returns a class. Not an instance of a class, but the class itself.
|
|
For each file in the “interesting files” list (fileList), you call getFileInfoClass with the filename (f). Calling getFileInfoClass(f) returns a class; you don't know exactly which class, but you don't care. You then create an instance of this class (whatever
it is) and pass the filename (f again), to the __init__ method. As you saw earlier in this chapter, the __init__ method of FileInfo sets self["name"], which triggers __setitem__, which is overridden in the descendant (MP3FileInfo) to parse the file appropriately to pull out the file's metadata. You do all that for each interesting file and return a
list of the resulting instances.
|
Note that listDirectory is completely generic. It doesn't know ahead of time which types of files it will be getting, or which classes are defined
that could potentially handle those files. It inspects the directory for the files to process, and then introspects its own
module to see what special handler classes (like MP3FileInfo) are defined. You can extend this program to handle other types of files simply by defining an appropriately-named class:
HTMLFileInfo for HTML files, DOCFileInfo for Word .doc files, and so forth. listDirectory will handle them all, without modification, by handing off the real work to the appropriate classes and collating the results.
The fileinfo.py program introduced in Chapter 5 should now make perfect sense.
"""Framework for getting filetype-specific metadata.
Instantiate appropriate class with filename. Returned object acts like a
dictionary, with key-value pairs for each piece of metadata.
import fileinfo
info = fileinfo.MP3FileInfo("/music/ap/mahadeva.mp3")
print "\\n".join(["%s=%s" % (k, v) for k, v in info.items()])
Or use listDirectory function to get info on all files in a directory.
for info in fileinfo.listDirectory("/music/ap/", [".mp3"]):
...
Framework can be extended by adding classes for particular file types, e.g.
HTMLFileInfo, MPGFileInfo, DOCFileInfo. Each class is completely responsible for
parsing its files appropriately; see MP3FileInfo for example.
"""
import os
import sys
from UserDict import UserDict
def stripnulls(data):
"strip whitespace and nulls"
return data.replace("\00", "").strip()
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self)
self["name"] = filename
class MP3FileInfo(FileInfo):
"store ID3v1.0 MP3 tags"
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}
def __parse(self, filename):
"parse ID3v1.0 tags from MP3 file"
self.clear()
try:
fsock = open(filename, "rb", 0)
try:
fsock.seek(-128, 2)
tagdata = fsock.read(128)
finally:
fsock.close()
if tagdata[:3] == "TAG":
for tag, (start, end, parseFunc) in self.tagDataMap.items():
self[tag] = parseFunc(tagdata[start:end])
except IOError:
pass
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item)
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f)
for f in os.listdir(directory)]
fileList = [os.path.join(directory, f)
for f in fileList
if os.path.splitext(f)[1] in fileExtList]
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
return [getFileInfoClass(f)(f) for f in fileList]
if __name__ == "__main__":
for info in listDirectory("/music/_singles/", [".mp3"]):
print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
printBefore diving into the next chapter, make sure you're comfortable doing the following things:
try...except
try...finally
for loop
os module for all your cross-platform file manipulation needs
Regular expressions are a powerful and standardized way of searching, replacing, and parsing text with complex patterns of
characters. If you've used regular expressions in other languages (like Perl), the syntax will be very familiar, and you get by just reading the summary of the re module to get an overview of the available functions and their arguments.
Strings have methods for searching (index, find, and count), replacing (replace), and parsing (split), but they are limited to the simplest of cases. The search methods look for a single, hard-coded substring, and they are
always case-sensitive. To do case-insensitive searches of a string s, you must call s.lower() or s.upper() and make sure your search strings are the appropriate case to match. The replace and split methods have the same limitations.
If what you're trying to do can be accomplished with string functions, you should use them. They're fast and simple and easy
to read, and there's a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different
string functions with if statements to handle special cases, or if you're combining them with split and join and list comprehensions in weird unreadable ways, you may need to move up to regular expressions.
Although the regular expression syntax is tight and unlike normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions to make them practically self-documenting.
This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don't just make this stuff up; it's actually useful.) This example shows how I approached the problem.
>>> s = '100 NORTH MAIN ROAD'
>>> s.replace('ROAD', 'RD.')
'100 NORTH MAIN RD.'
>>> s = '100 NORTH BROAD ROAD'
>>> s.replace('ROAD', 'RD.')
'100 NORTH BRD. RD.'
>>> s[:-4] + s[-4:].replace('ROAD', 'RD.')
'100 NORTH BROAD RD.'
>>> import re
>>> re.sub('ROAD$', 'RD.', s)
'100 NORTH BROAD RD.'Continuing with my story of scrubbing addresses, I soon discovered that the previous example, matching 'ROAD' at the end of the address, was not good enough, because not all addresses included a street designation at all; some just
ended with the street name. Most of the time, I got away with it, but if the street name was 'BROAD', then the regular expression would match 'ROAD' at the end of the string as part of the word 'BROAD', which is not what I wanted.
>>> s = '100 BROAD'
>>> re.sub('ROAD$', 'RD.', s)
'100 BRD.'
>>> re.sub('\\bROAD$', 'RD.', s)
'100 BROAD'
>>> re.sub(r'\bROAD$', 'RD.', s)
'100 BROAD'
>>> s = '100 BROAD ROAD APT. 3'
>>> re.sub(r'\bROAD$', 'RD.', s)
'100 BROAD ROAD APT. 3'
>>> re.sub(r'\bROAD\b', 'RD.', s)
'100 BROAD RD. APT 3'You've most likely seen Roman numerals, even if you didn't recognize them. You may have seen them in copyrights of old movies
and television shows (“Copyright MCMXLVI” instead of “Copyright 1946”), or on the dedication walls of libraries or universities (“established MDCCCLXXXVIII” instead of “established 1888”). You may also have seen them in outlines and bibliographical references. It's a system of representing numbers that really
does date back to the ancient Roman empire (hence the name).
In Roman numerals, there are seven characters that are repeated and combined in various ways to represent numbers.
I = 1
V = 5
X = 10
L = 50
C = 100
D = 500
M = 1000
The following are some general rules for constructing Roman numerals:
I is 1, II is 2, and III is 3. VI is 6 (literally, “5 and 1”), VII is 7, and VIII is 8.
I, X, C, and M) can be repeated up to three times. At 4, you need to subtract from the next highest fives character. You can't represent 4 as IIII; instead, it is represented as IV (“1 less than 5”). The number 40 is written as XL (10 less than 50), 41 as XLI, 42 as XLII, 43 as XLIII, and then 44 as XLIV (10 less than 50, then 1 less than 5).
9, you need to subtract from the next highest tens character: 8 is VIII, but 9 is IX (1 less than 10), not VIIII (since the I character can not be repeated four times). The number 90 is XC, 900 is CM.
10 is always represented as X, never as VV. The number 100 is always C, never LL.
DC is 600; CD is a completely different number (400, 100 less than 500). CI is 101; IC is not even a valid Roman numeral (because you can't subtract 1 directly from 100; you would need to write it as XCIX, for 10 less than 100, then 1 less than 10).
What would it take to validate that an arbitrary string is a valid Roman numeral? Let's take it one digit at a time. Since
Roman numerals are always written highest to lowest, let's start with the highest: the thousands place. For numbers 1000
and higher, the thousands are represented by a series of M characters.
>>> import re >>> pattern = '^M?M?M?$'>>> re.search(pattern, 'M')
<SRE_Match object at 0106FB58> >>> re.search(pattern, 'MM')
<SRE_Match object at 0106C290> >>> re.search(pattern, 'MMM')
<SRE_Match object at 0106AA38> >>> re.search(pattern, 'MMMM')
>>> re.search(pattern, '')
<SRE_Match object at 0106F4A8>
The hundreds place is more difficult than the thousands, because there are several mutually exclusive ways it could be expressed, depending on its value.
100 = C
200 = CC
300 = CCC
400 = CD
500 = D
600 = DC
700 = DCC
800 = DCCC
900 = CM
So there are four possible patterns:
CM
CD
C characters (zero if the hundreds place is 0)
D, followed by zero to three C characters
The last two patterns can be combined:
D, followed by zero to three C characters
This example shows how to validate the hundreds place of a Roman numeral.
>>> import re >>> pattern = '^M?M?M?(CM|CD|D?C?C?C?)$'>>> re.search(pattern, 'MCM')
<SRE_Match object at 01070390> >>> re.search(pattern, 'MD')
<SRE_Match object at 01073A50> >>> re.search(pattern, 'MMMCCC')
<SRE_Match object at 010748A8> >>> re.search(pattern, 'MCMC')
>>> re.search(pattern, '')
<SRE_Match object at 01071D98>
Whew! See how quickly regular expressions can get nasty? And you've only covered the thousands and hundreds places of Roman numerals. But if you followed all that, the tens and ones places are easy, because they're exactly the same pattern. But let's look at another way to express the pattern.
{n,m} SyntaxIn the previous section, you were dealing with a pattern where the same character could be repeated up to three times. There is another way to express this in regular expressions, which some people find more readable. First look at the method we already used in the previous example.
>>> import re >>> pattern = '^M?M?M?$' >>> re.search(pattern, 'M')<_sre.SRE_Match object at 0x008EE090> >>> pattern = '^M?M?M?$' >>> re.search(pattern, 'MM')
<_sre.SRE_Match object at 0x008EEB48> >>> pattern = '^M?M?M?$' >>> re.search(pattern, 'MMM')
<_sre.SRE_Match object at 0x008EE090> >>> re.search(pattern, 'MMMM')
>>>