 | On UNIX-compatible systems (including Mac OS X), you can run a Python program from the command line: python odbchelper.pyThe id="odbchelper.output" output of odbchelper.py will look like this: server=mpilgrim;uid=sa;database=master;pwd=secret 2.2. Declaring Functions
Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it, like this:
def buildConnectionString(params):
Note that the keyword def starts the function declaration, followed by the function name, followed by the arguments in parentheses. Multiple arguments
(not shown here) are separated with commas.
Also note that the function doesn't define a return datatype. Python functions do not specify the datatype of their return value; they don't even specify whether or not they return a value.
In fact, every Python function returns a value; if the function ever executes a return statement, it will return that value, otherwise it will return None, the Python null value.
 | In Visual Basic, functions (that return a value) start with function, and subroutines (that do not return a value) start with sub. There are no subroutines in Python. Everything is a function, all functions return a value (even if it's None), and all functions start with def.
The argument, params, doesn't specify a datatype. In Python, variables are never explicitly typed. Python figures out what type a variable is and keeps track of it internally.
 | In Java, C++, and other statically-typed languages, you must specify the datatype of the function return value and each function argument.
In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally.
2.2.1. How Python's Datatypes Compare to Other Programming Languages
An erudite reader sent me this explanation of how Python compares to other programming languages:
- statically typed language
- A language in which types are fixed at compile time. Most statically typed languages enforce this by requiring you to declare
all variables with their datatypes before using them. Java and C are statically typed languages.
- dynamically typed language
- A language in which types are discovered at execution time; the opposite of statically typed. VBScript and Python are dynamically typed, because they figure out what type a variable is when you first assign it a value.
- strongly typed language
- A language in which types are always enforced. Java and Python are strongly typed. If you have an integer, you can't treat it like a string without explicitly converting it.
- weakly typed language
- A language in which types may be ignored; the opposite of strongly typed. VBScript is weakly typed. In VBScript, you can concatenate the string
'12' and the integer 3 to get the string '123', then treat that as the integer 123, all without any explicit conversion.
So Python is both dynamically typed (because it doesn't use explicit datatype declarations) and strongly typed (because once a variable has a datatype, it actually matters).
2.3. Documenting Functions
You can document a Python function by giving it a docstring.
Example 2.2. Defining the buildConnectionString Function's docstring
def buildConnectionString(params):
"""Build a connection string from a dictionary of parameters.
Returns string."""
Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including
carriage returns and other quote characters. You can use them anywhere, but you'll see them most often used when defining
a docstring.
 | Triple quotes are also an easy way to define a string with both single and double quotes, like qq/.../ in Perl.
Everything between the triple quotes is the function's docstring, which documents what the function does. A docstring, if it exists, must be the first thing defined in a function (that is, the first thing after the colon). You don't technically
need to give your function a docstring, but you always should. I know you've heard this in every programming class you've ever taken, but Python gives you an added incentive: the docstring is available at runtime as an attribute of the function.
 | Many Python IDEs use the docstring to provide context-sensitive documentation, so that when you type a function name, its docstring appears as a tooltip. This can be incredibly helpful, but it's only as good as the docstrings you write.
2.4. Everything Is an Object
2.6. Testing Modules
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them.
Here's an example that uses the if __name__ trick.
if __name__ == "__main__": Some quick observations before you get to the good stuff. First, parentheses are not required around the if expression. Second, the if statement ends with a colon, and is followed by indented code.
 | Like C, Python uses == for comparison and = for assignment. Unlike C, Python does not support in-line assignment, so there's no chance of accidentally assigning the value you thought you were comparing.
So why is this particular if statement a trick? Modules are objects, and all modules have a built-in attribute __name__. A module's __name__ depends on how you're using the module. If you import the module, then __name__ is the module's filename, without a directory path or file extension. But you can also run the module directly as a standalone
program, in which case __name__ will be a special default value, __main__.
>>> import odbchelper
>>> odbchelper.__name__
'odbchelper' Knowing this, you can design a test suite for your module within the module itself by putting it in this if statement. When you run the module directly, __name__ is __main__, so the test suite executes. When you import the module, __name__ is something else, so the test suite is ignored. This makes it easier to develop and debug new modules before integrating
them into a larger program.
 | On MacPython, there is an additional step to make the if __name__ trick work. Pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and
make sure Run as __main__ is checked.
Further Reading on Importing Modules
Chapter 3. Native Datatypes
3.2. Introducing Lists
3.4. Declaring variables
Now that you know something about dictionaries, tuples, and lists (oh my!), let's get back to the sample program from Chapter 2, odbchelper.py.
Python has local and global variables like most other languages, but it has no explicit variable declarations. Variables spring
into existence by being assigned a value, and they are automatically destroyed when they go out of scope.
Example 3.17. Defining the myParams Variable
if __name__ == "__main__":
myParams = {"server":"mpilgrim", \
"database":"master", \
"uid":"sa", \
"pwd":"secret" \
}
Notice the indentation. An if statement is a code block and needs to be indented just like a function.
Also notice that the variable assignment is one command split over several lines, with a backslash (“\”) serving as a line-continuation marker.
 | When a command is split among several lines with the line-continuation marker (“\”), the continued lines can be indented in any manner; Python's normally stringent indentation rules do not apply. If your Python IDE auto-indents the continued line, you should probably accept its default unless you have a burning reason not to.
Strictly speaking, expressions in parentheses, straight brackets, or curly braces (like defining a dictionary) can be split into multiple lines with or without the line continuation character (“\”). I like to include the backslash even when it's not required because I think it makes the code easier to read, but that's
a matter of style.
Third, you never declared the variable myParams, you just assigned a value to it. This is like VBScript without the option explicit option. Luckily, unlike VBScript, Python will not allow you to reference a variable that has never been assigned a value; trying to do so will raise an exception.
3.4.1. Referencing Variables
Example 3.18. Referencing an Unbound Variable>>> x
Traceback (innermost last):
File "<interactive input>", line 1, in ?
NameError: There is no variable named 'x'
>>> x = 1
>>> x
1 You will thank Python for this one day.
3.4.2. Assigning Multiple Values at Once
One of the cooler programming shortcuts in Python is using sequences to assign multiple values at once.
Example 3.19. Assigning multiple values at once>>> v = ('a', 'b', 'e')
>>> (x, y, z) = v ①
>>> x
'a'
>>> y
'b'
>>> z
'e'
- v is a tuple of three elements, and
(x, y, z) is a tuple of three variables. Assigning one to the other assigns each of the values of v to each of the variables, in order.
This has all sorts of uses. I often want to assign names to a range of values. In C, you would use enum and manually list each constant and its associated value, which seems especially tedious when the values are consecutive.
In Python, you can use the built-in range function with multi-variable assignment to quickly assign consecutive values.
Example 3.20. Assigning Consecutive Values>>> range(7) ①
[0, 1, 2, 3, 4, 5, 6]
>>> (MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7) ②
>>> MONDAY ③
0
>>> TUESDAY
1
>>> SUNDAY
6
- The built-in
range function returns a list of integers. In its simplest form, it takes an upper limit and returns a zero-based list counting
up to but not including the upper limit. (If you like, you can pass other parameters to specify a base other than 0 and a step other than 1. You can print range.__doc__ for details.)
- MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, and SUNDAY are the variables you're defining. (This example came from the
calendar module, a fun little module that prints calendars, like the UNIX program cal. The calendar module defines integer constants for days of the week.)
- Now each variable has its value: MONDAY is
0, TUESDAY is 1, and so forth.
You can also use multi-variable assignment to build functions that return multiple values, simply by returning a tuple of
all the values. The caller can treat it as a tuple, or assign the values to individual variables. Many standard Python libraries do this, including the os module, which you'll discuss in Chapter 6.
Further Reading on Variables
Python supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is
to insert values into a string with the %s placeholder.
 | String formatting in Python uses the same syntax as the sprintf function in C.
Example 3.21. Introducing String Formatting>>> k = "uid"
>>> v = "sa"
>>> "%s=%s" % (k, v) ①
'uid=sa'
- The whole expression evaluates to a string. The first
%s is replaced by the value of k; the second %s is replaced by the value of v. All other characters in the string (in this case, the equal sign) stay as they are.
Note that (k, v) is a tuple. I told you they were good for something.
You might be thinking that this is a lot of work just to do simple string concatentation, and you would be right, except that
string formatting isn't just concatenation. It's not even just formatting. It's also type coercion.
>>> uid = "sa"
>>> pwd = "secret"
>>> print pwd + " is not a good password for " + uid ①
secret is not a good password for sa
>>> print "%s is not a good password for %s" % (pwd, uid) ②
secret is not a good password for sa
>>> userCount = 6
>>> print "Users connected: %d" % (userCount, ) ③ ④
Users connected: 6
>>> print "Users connected: " + userCount ⑤
Traceback (innermost last):
File "<interactive input>", line 1, in ?
TypeError: cannot concatenate 'str' and 'int' objects
+ is the string concatenation operator.
- In this trivial case, string formatting accomplishes the same result as concatentation.
(userCount, ) is a tuple with one element. Yes, the syntax is a little strange, but there's a good reason for it: it's unambiguously a
tuple. In fact, you can always include a comma after the last element when defining a list, tuple, or dictionary, but the
comma is required when defining a tuple with one element. If the comma weren't required, Python wouldn't know whether (userCount) was a tuple with one element or just the value of userCount.
- String formatting works with integers by specifying
%d instead of %s.
- Trying to concatenate a string with a non-string raises an exception. Unlike string formatting, string concatenation works
only when everything is already a string.
As with printf in C, string formatting in Python is like a Swiss Army knife. There are options galore, and modifier strings to specially format many different types of values.
>>> print "Today's stock price: %f" % 50.4625 ①
50.462500
>>> print "Today's stock price: %.2f" % 50.4625 ②
50.46
>>> print "Change since yesterday: %+.2f" % 1.5 ③
+1.50
- The
%f string formatting option treats the value as a decimal, and prints it to six decimal places.
- The ".2" modifier of the
%f option truncates the value to two decimal places.
- You can even combine modifiers. Adding the
+ modifier displays a plus or minus sign before the value. Note that the ".2" modifier is still in place, and is padding
the value to exactly two decimal places.
Further Reading on String Formatting
3.6. Mapping Lists
One of the most powerful features of Python is the list comprehension, which provides a compact way of mapping a list into another list by applying a function to each
of the elements of the list.
Example 3.24. Introducing List Comprehensions>>> li = [1, 9, 8, 4]
>>> [elem*2 for elem in li] ①
[2, 18, 16, 8]
>>> li ②
[1, 9, 8, 4]
>>> li = [elem*2 for elem in li] ③
>>> li
[2, 18, 16, 8]
- To make sense of this, look at it from right to left. li is the list you're mapping. Python loops through li one element at a time, temporarily assigning the value of each element to the variable elem. Python then applies the function
elem*2 and appends that result to the returned list.
- Note that list comprehensions do not change the original list.
- It is safe to assign the result of a list comprehension to the variable that you're mapping. Python constructs the new list in memory, and when the list comprehension is complete, it assigns the result to the variable.
Here are the list comprehensions in the buildConnectionString function that you declared in Chapter 2:
["%s=%s" % (k, v) for k, v in params.items()]
First, notice that you're calling the items function of the params dictionary. This function returns a list of tuples of all the data in the dictionary.
Example 3.25. The keys, values, and items Functions>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> params.keys() ①
['server', 'uid', 'database', 'pwd']
>>> params.values() ②
['mpilgrim', 'sa', 'master', 'secret']
>>> params.items() ③
[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
- The
keys method of a dictionary returns a list of all the keys. The list is not in the order in which the dictionary was defined
(remember that elements in a dictionary are unordered), but it is a list.
- The
values method returns a list of all the values. The list is in the same order as the list returned by keys, so params.values()[n] == params[params.keys()[n]] for all values of n.
- The
items method returns a list of tuples of the form (key, value). The list contains all the data in the dictionary.
Now let's see what buildConnectionString does. It takes a list, params.items(), and maps it to a new list by applying string formatting to each element. The new list will have the same number of elements
as params.items(), but each element in the new list will be a string that contains both a key and its associated value from the params dictionary.
Example 3.26. List Comprehensions in buildConnectionString, Step by Step>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> params.items()
[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
>>> [k for k, v in params.items()] ①
['server', 'uid', 'database', 'pwd']
>>> [v for k, v in params.items()] ②
['mpilgrim', 'sa', 'master', 'secret']
>>> ["%s=%s" % (k, v) for k, v in params.items()] ③
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
- Note that you're using two variables to iterate through the
params.items() list. This is another use of multi-variable assignment. The first element of params.items() is ('server', 'mpilgrim'), so in the first iteration of the list comprehension, k will get 'server' and v will get 'mpilgrim'. In this case, you're ignoring the value of v and only including the value of k in the returned list, so this list comprehension ends up being equivalent to params.keys().
- Here you're doing the same thing, but ignoring the value of k, so this list comprehension ends up being equivalent to
params.values().
- Combining the previous two examples with some simple string formatting, you get a list of strings that include both the key and value of each element of the dictionary. This looks suspiciously
like the output of the program. All that remains is to join the elements in this list into a single string.
Further Reading on List Comprehensions
3.7. Joining Lists and Splitting Strings
You have a list of key-value pairs in the form key=value, and you want to join them into a single string. To join any list of strings into a single string, use the join method of a string object.
Here is an example of joining a list from the buildConnectionString function:
return ";".join(["%s=%s" % (k, v) for k, v in params.items()])
One interesting note before you continue. I keep repeating that functions are objects, strings are objects... everything
is an object. You might have thought I meant that string variables are objects. But no, look closely at this example and you'll see that the string ";" itself is an object, and you are calling its join method.
The join method joins the elements of the list into a single string, with each element separated by a semi-colon. The delimiter doesn't
need to be a semi-colon; it doesn't even need to be a single character. It can be any string.
 | join works only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements
will raise an exception.
Example 3.27. Output of odbchelper.py>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}
>>> ["%s=%s" % (k, v) for k, v in params.items()]
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
>>> ";".join(["%s=%s" % (k, v) for k, v in params.items()])
'server=mpilgrim;uid=sa;database=master;pwd=secret'This string is then returned from the odbchelper function and printed by the calling block, which gives you the output that you marveled at when you started reading this
chapter.
You're probably wondering if there's an analogous method to split a string into a list. And of course there is, and it's
called split.
Example 3.28. Splitting a String>>> li = ['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
>>> s = ";".join(li)
>>> s
'server=mpilgrim;uid=sa;database=master;pwd=secret'
>>> s.split(";") ①
['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
>>> s.split(";", 1) ②
['server=mpilgrim', 'uid=sa;database=master;pwd=secret']
split reverses join by splitting a string into a multi-element list. Note that the delimiter (“;”) is stripped out completely; it does not appear in any of the elements of the returned list.
split takes an optional second argument, which is the number of times to split. (“Oooooh, optional arguments...” You'll learn how to do this in your own functions in the next chapter.)
 | anystring.split(delimiter, 1) is a useful technique when you want to search a string for a substring and then work with everything before the substring
(which ends up in the first element of the returned list) and everything after it (which ends up in the second element).
Further Reading on String Methods
3.7.1. Historical Note on String Methods
When I first learned Python, I expected join to be a method of a list, which would take the delimiter as an argument. Many people feel the same way, and there's a story
behind the join method. Prior to Python 1.6, strings didn't have all these useful methods. There was a separate string module that contained all the string functions; each function took a string as its first argument. The functions were deemed
important enough to put onto the strings themselves, which made sense for functions like lower, upper, and split. But many hard-core Python programmers objected to the new join method, arguing that it should be a method of the list instead, or that it shouldn't move at all but simply stay a part of
the old string module (which still has a lot of useful stuff in it). I use the new join method exclusively, but you will see code written either way, and if it really bothers you, you can use the old string.join function instead.
3.8. Summary
The odbchelper.py program and its output should now make perfect sense.
def buildConnectionString(params):
"""Build a connection string from a dictionary of parameters.
Returns string."""
return ";".join(["%s=%s" % (k, v) for k, v in params.items()])
if __name__ == "__main__":
myParams = {"server":"mpilgrim", \
"database":"master", \
"uid":"sa", \
"pwd":"secret" \
}
print buildConnectionString(myParams)
Here is the output of odbchelper.py: server=mpilgrim;uid=sa;database=master;pwd=secret
Before diving into the next chapter, make sure you're comfortable doing all of these things:
- Using the Python IDE to test expressions interactively
- Writing Python programs and running them from within your IDE, or from the command line
- Importing modules and calling their functions
- Declaring functions and using
docstrings, local variables, and proper indentation
- Defining dictionaries, tuples, and lists
- Accessing attributes and methods of any object, including strings, lists, dictionaries, functions, and modules
- Concatenating values through string formatting
- Mapping lists into other lists using list comprehensions
- Splitting strings into lists and joining lists into strings
Chapter 4. The Power Of Introspection
This chapter covers one of Python's strengths: introspection. As you know, everything in Python is an object, and introspection is code looking at other modules and functions in memory as objects, getting information about them, and
manipulating them. Along the way, you'll define functions with no name, call functions with arguments out of order, and reference
functions whose names you don't even know ahead of time.
4.1. Diving In
Here is a complete, working Python program. You should understand a good deal about it just by looking at it. The numbered lines illustrate concepts covered
in Chapter 2, Your First Python Program. Don't worry if the rest of the code looks intimidating; you'll learn all about it throughout this chapter.
Example 4.1. apihelper.py
If you have not already done so, you can download this and other examples used in this book.
def info(object, spacing=10, collapse=1): ① ② ③
"""Print methods and docstrings.
Takes module, class, list, dictionary, or string."""
methodList = [method for method in dir(object) if callable(getattr(object, method))]
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])
if __name__ == "__main__": ④ ⑤
print info.__doc__
- This module has one function,
info. According to its function declaration, it takes three parameters: object, spacing, and collapse. The last two are actually optional parameters, as you'll see shortly.
- The
info function has a multi-line docstring that succinctly describes the function's purpose. Note that no return value is mentioned; this function will be used solely
for its effects, rather than its value.
- Code within the function is indented.
- The
if __name__ trick allows this program do something useful when run by itself, without interfering with its use as a module for other programs.
In this case, the program simply prints out the docstring of the info function.
if statements use == for comparison, and parentheses are not required.
The info function is designed to be used by you, the programmer, while working in the Python IDE. It takes any object that has functions or methods (like a module, which has functions, or a list, which has methods) and
prints out the functions and their docstrings.
Example 4.2. Sample Usage of apihelper.py>>> from apihelper import info
>>> li = []
>>> info(li)
append L.append(object) -- append object to end
count L.count(value) -> integer -- return number of occurrences of value
extend L.extend(list) -- extend list by appending list elements
index L.index(value) -> integer -- return index of first occurrence of value
insert L.insert(index, object) -- insert object before index
pop L.pop([index]) -> item -- remove and return item at index (default last)
remove L.remove(value) -- remove first occurrence of value
reverse L.reverse() -- reverse *IN PLACE*
sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1, 0, 1 By default the output is formatted to be easy to read. Multi-line docstrings are collapsed into a single long line, but this option can be changed by specifying 0 for the collapse argument. If the function names are longer than 10 characters, you can specify a larger value for the spacing argument to make the output easier to read.
Example 4.3. Advanced Usage of apihelper.py>>> import odbchelper
>>> info(odbchelper)
buildConnectionString Build a connection string from a dictionary Returns string.
>>> info(odbchelper, 30)
buildConnectionString Build a connection string from a dictionary Returns string.
>>> info(odbchelper, 30, 0)
buildConnectionString Build a connection string from a dictionary
Returns string.
4.2. Using Optional and Named Arguments
Python allows function arguments to have default values; if the function is called without the argument, the argument gets its default
value. Futhermore, arguments can be specified in any order by using named arguments. Stored procedures in SQL Server Transact/SQL can do this, so if you're a SQL Server scripting guru, you can skim this part.
Here is an example of info, a function with two optional arguments:
def info(object, spacing=10, collapse=1):
spacing and collapse are optional, because they have default values defined. object is required, because it has no default value. If info is called with only one argument, spacing defaults to 10 and collapse defaults to 1. If info is called with two arguments, collapse still defaults to 1.
Say you want to specify a value for collapse but want to accept the default value for spacing. In most languages, you would be out of luck, because you would need to call the function with three arguments. But in
Python, arguments can be specified by name, in any order.
Example 4.4. Valid Calls of info
info(odbchelper) ①
info(odbchelper, 12) ②
info(odbchelper, collapse=0) ③
info(spacing=15, object=odbchelper) ④
- With only one argument, spacing gets its default value of
10 and collapse gets its default value of 1.
- With two arguments, collapse gets its default value of
1.
- Here you are naming the collapse argument explicitly and specifying its value. spacing still gets its default value of
10.
- Even required arguments (like object, which has no default value) can be named, and named arguments can appear in any order.
This looks totally whacked until you realize that arguments are simply a dictionary. The “normal” method of calling functions without argument names is actually just a shorthand where Python matches up the values with the argument names in the order they're specified in the function declaration. And most of the
time, you'll call functions the “normal” way, but you always have the additional flexibility if you need it.
 | The only thing you need to do to call a function is specify a value (somehow) for each required argument; the manner and order
in which you do that is up to you.
Further Reading on Optional Arguments
4.3. Using type, str, dir, and Other Built-In Functions
Python has a small set of extremely useful built-in functions. All other functions are partitioned off into modules. This was
actually a conscious design decision, to keep the core language from getting bloated like other scripting languages (cough
cough, Visual Basic).
4.3.1. The type Function
The type function returns the datatype of any arbitrary object. The possible types are listed in the types module. This is useful for helper functions that can handle several types of data.
Example 4.5. Introducing type>>> type(1) ①
<type 'int'>
>>> li = []
>>> type(li) ②
<type 'list'>
>>> import odbchelper
>>> type(odbchelper) ③
<type 'module'>
>>> import types ④
>>> type(odbchelper) == types.ModuleType
True
type takes anything -- and I mean anything -- and returns its datatype. Integers, strings, lists, dictionaries, tuples, functions,
classes, modules, even types are acceptable.
type can take a variable and return its datatype.
type also works on modules.
- You can use the constants in the
types module to compare types of objects. This is what the info function does, as you'll see shortly.
4.3.2. The str Function
The str coerces data into a string. Every datatype can be coerced into a string.
Example 4.6. Introducing str
>>> str(1) ①
'1'
>>> horsemen = ['war', 'pestilence', 'famine']
>>> horsemen
['war', 'pestilence', 'famine']
>>> horsemen.append('Powerbuilder')
>>> str(horsemen) ②
"['war', 'pestilence', 'famine', 'Powerbuilder']"
>>> str(odbchelper) ③
"<module 'odbchelper' from 'c:\\docbook\\dip\\py\\odbchelper.py'>"
>>> str(None) ④
'None'
- For simple datatypes like integers, you would expect
str to work, because almost every language has a function to convert an integer to a string.
- However,
str works on any object of any type. Here it works on a list which you've constructed in bits and pieces.
str also works on modules. Note that the string representation of the module includes the pathname of the module on disk, so
yours will be different.
- A subtle but important behavior of
str is that it works on None, the Python null value. It returns the string 'None'. You'll use this to your advantage in the info function, as you'll see shortly.
At the heart of the info function is the powerful dir function. dir returns a list of the attributes and methods of any object: modules, functions, strings, lists, dictionaries... pretty much
anything.
Example 4.7. Introducing dir>>> li = []
>>> dir(li) ①
['append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
>>> d = {}
>>> dir(d) ②
['clear', 'copy', 'get', 'has_key', 'items', 'keys', 'setdefault', 'update', 'values']
>>> import odbchelper
>>> dir(odbchelper) ③
['__builtins__', '__doc__', '__file__', '__name__', 'buildConnectionString']
- li is a list, so
dir(li) returns a list of all the methods of a list. Note that the returned list contains the names of the methods as strings, not
the methods themselves.
- d is a dictionary, so
dir(d) returns a list of the names of dictionary methods. At least one of these, keys, should look familiar.
- This is where it really gets interesting.
odbchelper is a module, so dir(odbchelper) returns a list of all kinds of stuff defined in the module, including built-in attributes, like __name__, __doc__, and whatever other attributes and methods you define. In this case, odbchelper has only one user-defined method, the buildConnectionString function described in Chapter 2.
Finally, the callable function takes any object and returns True if the object can be called, or False otherwise. Callable objects include functions, class methods, even classes themselves. (More on classes in the next chapter.)
Example 4.8. Introducing callable
>>> import string
>>> string.punctuation ①
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> string.join②
<function join at 00C55A7C>
>>> callable(string.punctuation) ③
False
>>> callable(string.join) ④
True
>>> print string.join.__doc__ ⑤
join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
- The functions in the
string module are deprecated (although many people still use the join function), but the module contains a lot of useful constants like this string.punctuation, which contains all the standard punctuation characters.
string.join is a function that joins a list of strings.
- string.punctuation is not callable; it is a string. (A string does have callable methods, but the string itself is not callable.)
string.join is callable; it's a function that takes two arguments.
- Any callable object may have a
docstring. By using the callable function on each of an object's attributes, you can determine which attributes you care about (methods, functions, classes)
and which you want to ignore (constants and so on) without knowing anything about the object ahead of time.
4.3.3. Built-In Functions
type, str, dir, and all the rest of Python's built-in functions are grouped into a special module called __builtin__. (That's two underscores before and after.) If it helps, you can think of Python automatically executing from __builtin__ import * on startup, which imports all the “built-in” functions into the namespace so you can use them directly.
The advantage of thinking like this is that you can access all the built-in functions and attributes as a group by getting
information about the __builtin__ module. And guess what, Python has a function called info. Try it yourself and skim through the list now. We'll dive into some of the more important functions later. (Some of the
built-in error classes, like AttributeError, should already look familiar.)
Example 4.9. Built-in Attributes and Functions>>> from apihelper import info
>>> import __builtin__
>>> info(__builtin__, 20)
ArithmeticError Base class for arithmetic errors.
AssertionError Assertion failed.
AttributeError Attribute not found.
EOFError Read beyond end of file.
EnvironmentError Base class for I/O related errors.
Exception Common base class for all exceptions.
FloatingPointError Floating point operation failed.
IOError I/O operation failed.
[...snip...]
 | Python comes with excellent reference manuals, which you should peruse thoroughly to learn all the modules Python has to offer. But unlike most languages, where you would find yourself referring back to the manuals or man pages to remind
yourself how to use these modules, Python is largely self-documenting.
Further Reading on Built-In Functions
4.4. Getting Object References With getattr
You already know that Python functions are objects. What you don't know is that you can get a reference to a function without knowing its name until run-time, by using the
getattr function.
Example 4.10. Introducing getattr>>> li = ["Larry", "Curly"]
>>> li.pop ①
<built-in method pop of list object at 010DF884>
>>> getattr(li, "pop") ②
<built-in method pop of list object at 010DF884>
>>> getattr(li, "append")("Moe") ③
>>> li
["Larry", "Curly", "Moe"]
>>> getattr({}, "clear") ④
<built-in method clear of dictionary object at 00F113D4>
>>> getattr((), "pop") ⑤
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'tuple' object has no attribute 'pop'
- This gets a reference to the
pop method of the list. Note that this is not calling the pop method; that would be li.pop(). This is the method itself.
- This also returns a reference to the
pop method, but this time, the method name is specified as a string argument to the getattr function. getattr is an incredibly useful built-in function that returns any attribute of any object. In this case, the object is a list,
and the attribute is the pop method.
- In case it hasn't sunk in just how incredibly useful this is, try this: the return value of
getattr is the method, which you can then call just as if you had said li.append("Moe") directly. But you didn't call the function directly; you specified the function name as a string instead.
getattr also works on dictionaries.
- In theory,
getattr would work on tuples, except that tuples have no methods, so getattr will raise an exception no matter what attribute name you give.
4.4.1. getattr with Modules
getattr isn't just for built-in datatypes. It also works on modules.
Example 4.11. The getattr Function in apihelper.py>>> import odbchelper
>>> odbchelper.buildConnectionString ①
<function buildConnectionString at 00D18DD4>
>>> getattr(odbchelper, "buildConnectionString") ②
<function buildConnectionString at 00D18DD4>
>>> object = odbchelper
>>> method = "buildConnectionString"
>>> getattr(object, method) ③
<function buildConnectionString at 00D18DD4>
>>> type(getattr(object, method)) ④
<type 'function'>
>>> import types
>>> type(getattr(object, method)) == types.FunctionType
True
>>> callable(getattr(object, method)) ⑤
True
- This returns a reference to the
buildConnectionString function in the odbchelper module, which you studied in Chapter 2, Your First Python Program. (The hex address you see is specific to my machine; your output will be different.)
- Using
getattr, you can get the same reference to the same function. In general, getattr(object, "attribute") is equivalent to object.attribute. If object is a module, then attribute can be anything defined in the module: a function, class, or global variable.
- And this is what you actually use in the
info function. object is passed into the function as an argument; method is a string which is the name of a method or function.
- In this case, method is the name of a function, which you can prove by getting its
type.
- Since method is a function, it is callable.
4.4.2. getattr As a Dispatcher
A common usage pattern of getattr is as a dispatcher. For example, if you had a program that could output data in a variety of different formats, you could
define separate functions for each output format and use a single dispatch function to call the right one.
For example, let's imagine a program that prints site statistics in HTML, XML, and plain text formats. The choice of output format could be specified on the command line, or stored in a configuration
file. A statsout module defines three functions, output_html, output_xml, and output_text. Then the main program defines a single output function, like this:
Example 4.12. Creating a Dispatcher with getattr
import statsout
def output(data, format="text"): ①
output_function = getattr(statsout, "output_%s" % format) ②
return output_function(data) ③
- The
output function takes one required argument, data, and one optional argument, format. If format is not specified, it defaults to text, and you will end up calling the plain text output function.
- You concatenate the format argument with "output_" to produce a function name, and then go get that function from the
statsout module. This allows you to easily extend the program later to support other output formats, without changing this dispatch
function. Just add another function to statsout named, for instance, output_pdf, and pass "pdf" as the format into the output function.
- Now you can simply call the output function in the same way as any other function. The output_function variable is a reference to the appropriate function from the
statsout module.
Did you see the bug in the previous example? This is a very loose coupling of strings and functions, and there is no error
checking. What happens if the user passes in a format that doesn't have a corresponding function defined in statsout? Well, getattr will return None, which will be assigned to output_function instead of a valid function, and the next line that attempts to call that function will crash and raise an exception. That's
bad.
Luckily, getattr takes an optional third argument, a default value.
Example 4.13. getattr Default Values
import statsout
def output(data, format="text"):
output_function = getattr(statsout, "output_%s" % format, statsout.output_text)
return output_function(data) ①
- This function call is guaranteed to work, because you added a third argument to the call to
getattr. The third argument is a default value that is returned if the attribute or method specified by the second argument wasn't
found.
As you can see, getattr is quite powerful. It is the heart of introspection, and you'll see even more powerful examples of it in later chapters.
4.5. Filtering Lists
As you know, Python has powerful capabilities for mapping lists into other lists, via list comprehensions (Section 3.6, “Mapping Lists”). This can be combined with a filtering mechanism, where some elements in the list are mapped while others are skipped entirely.
Here is the list filtering syntax:
[mapping-expression for element in source-list if filter-expression]
This is an extension of the list comprehensions that you know and love. The first two thirds are the same; the last part, starting with the if, is the filter expression. A filter expression can be any expression that evaluates true or false (which in Python can be almost anything). Any element for which the filter expression evaluates true will be included in the mapping. All other elements are ignored,
so they are never put through the mapping expression and are not included in the output list.
Example 4.14. Introducing List Filtering>>> li = ["a", "mpilgrim", "foo", "b", "c", "b", "d", "d"]
>>> [elem for elem in li if len(elem) > 1] ①
['mpilgrim', 'foo']
>>> [elem for elem in li if elem != "b"] ②
['a', 'mpilgrim', 'foo', 'c', 'd', 'd']
>>> [elem for elem in li if li.count(elem) == 1] ③
['a', 'mpilgrim', 'foo', 'c']
- The mapping expression here is simple (it just returns the value of each element), so concentrate on the filter expression.
As Python loops through the list, it runs each element through the filter expression. If the filter expression is true, the element
is mapped and the result of the mapping expression is included in the returned list. Here, you are filtering out all the
one-character strings, so you're left with a list of all the longer strings.
- Here, you are filtering out a specific value,
b. Note that this filters all occurrences of b, since each time it comes up, the filter expression will be false.
count is a list method that returns the number of times a value occurs in a list. You might think that this filter would eliminate
duplicates from a list, returning a list containing only one copy of each value in the original list. But it doesn't, because
values that appear twice in the original list (in this case, b and d) are excluded completely. There are ways of eliminating duplicates from a list, but filtering is not the solution.
Let's id="apihelper.filter.care" get back to this line from apihelper.py:
methodList = [method for method in dir(object) if callable(getattr(object, method))]
This looks complicated, and it is complicated, but the basic structure is the same. The whole filter expression returns a
list, which is assigned to the methodList variable. The first half of the expression is the list mapping part. The mapping expression is an identity expression,
which it returns the value of each element. dir(object) returns a list of object's attributes and methods -- that's the list you're mapping. So the only new part is the filter expression after the if.
The filter expression looks scary, but it's not. You already know about callable, getattr, and in. As you saw in the previous section, the expression getattr(object, method) returns a function object if object is a module and method is the name of a function in that module.
So this expression takes an object (named object). Then it gets a list of the names of the object's attributes, methods, functions, and a few other things. Then it filters
that list to weed out all the stuff that you don't care about. You do the weeding out by taking the name of each attribute/method/function
and getting a reference to the real thing, via the getattr function. Then you check to see if that object is callable, which will be any methods and functions, both built-in (like
the pop method of a list) and user-defined (like the buildConnectionString function of the odbchelper module). You don't care about other attributes, like the __name__ attribute that's built in to every module.
Further Reading on Filtering Lists
4.6. The Peculiar Nature of and and or
In Python, and and or perform boolean logic as you would expect, but they do not return boolean values; instead, they return one of the actual
values they are comparing.
Example 4.15. Introducing and>>> 'a' and 'b' ①
'b'
>>> '' and 'b' ②
''
>>> 'a' and 'b' and 'c' ③
'c'
- When using
and, values are evaluated in a boolean context from left to right. 0, '', [], (), {}, and None are false in a boolean context; everything else is true. Well, almost everything. By default, instances of classes are
true in a boolean context, but you can define special methods in your class to make an instance evaluate to false. You'll
learn all about classes and special methods in Chapter 5. If all values are true in a boolean context, and returns the last value. In this case, and evaluates 'a', which is true, then 'b', which is true, and returns 'b'.
- If any value is false in a boolean context,
and returns the first false value. In this case, '' is the first false value.
- All values are true, so
and returns the last value, 'c'.
Example 4.16. Introducing or>>> 'a' or 'b' ①
'a'
>>> '' or 'b' ②
'b'
>>> '' or [] or {} ③
{}
>>> def sidefx():
... print "in sidefx()"
... return 1
>>> 'a' or sidefx() ④
'a'
- When using
or, values are evaluated in a boolean context from left to right, just like and. If any value is true, or returns that value immediately. In this case, 'a' is the first true value.
or evaluates '', which is false, then 'b', which is true, and returns 'b'.
- If all values are false,
or returns the last value. or evaluates '', which is false, then [], which is false, then {}, which is false, and returns {}.
- Note that
or evaluates values only until it finds one that is true in a boolean context, and then it ignores the rest. This distinction
is important if some values can have side effects. Here, the function sidefx is never called, because or evaluates 'a', which is true, and returns 'a' immediately.
If you're a C hacker, you are certainly familiar with the bool ? a : b expression, which evaluates to a if bool is true, and b otherwise. Because of the way and and or work in Python, you can accomplish the same thing.
4.6.1. Using the and-or Trick
Example 4.17. Introducing the and-or Trick>>> a = "first"
>>> b = "second"
>>> 1 and a or b ①
'first'
>>> 0 and a or b ②
'second'
- This syntax looks similar to the
bool ? a : b expression in C. The entire expression is evaluated from left to right, so the and is evaluated first. 1 and 'first' evalutes to 'first', then 'first' or 'second' evalutes to 'first'.
0 and 'first' evalutes to False, and then 0 or 'second' evaluates to 'second'.
However, since this Python expression is simply boolean logic, and not a special construct of the language, there is one extremely important difference
between this and-or trick in Python and the bool ? a : b syntax in C. If the value of a is false, the expression will not work as you would expect it to. (Can you tell I was bitten by this? More than once?)
Example 4.18. When the and-or Trick Fails>>> a = ""
>>> b = "second"
>>> 1 and a or b ①
'second'
- Since a is an empty string, which Python considers false in a boolean context,
1 and '' evalutes to '', and then '' or 'second' evalutes to 'second'. Oops! That's not what you wanted.
The and-or trick, bool and a or b, will not work like the C expression bool ? a : b when a is false in a boolean context.
The real trick behind the and-or trick, then, is to make sure that the value of a is never false. One common way of doing this is to turn a into [a] and b into [b], then taking the first element of the returned list, which will be either a or b.
Example 4.19. Using the and-or Trick Safely>>> a = ""
>>> b = "second"
>>> (1 and [a] or [b])[0] ①
''
- Since
[a] is a non-empty list, it is never false. Even if a is 0 or '' or some other false value, the list [a] is true because it has one element.
By now, this trick may seem like more trouble than it's worth. You could, after all, accomplish the same thing with an if statement, so why go through all this fuss? Well, in many cases, you are choosing between two constant values, so you can
use the simpler syntax and not worry, because you know that the a value will always be true. And even if you need to use the more complicated safe form, there are good reasons to do so.
For example, there are some cases in Python where if statements are not allowed, such as in lambda functions.
Further Reading on the and-or Trick
4.7. Using lambda Functions
Python supports an interesting syntax that lets you define one-line mini-functions on the fly. Borrowed from Lisp, these so-called lambda functions can be used anywhere a function is required.
Example 4.20. Introducing lambda Functions>>> def f(x):
... return x*2
...
>>> f(3)
6
>>> g = lambda x: x*2 ①
>>> g(3)
6
>>> (lambda x: x*2)(3) ②
6
- This is a
lambda function that accomplishes the same thing as the normal function above it. Note the abbreviated syntax here: there are no
parentheses around the argument list, and the return keyword is missing (it is implied, since the entire function can only be one expression). Also, the function has no name,
but it can be called through the variable it is assigned to.
- You can use a
lambda function without even assigning it to a variable. This may not be the most useful thing in the world, but it just goes to
show that a lambda is just an in-line function.
To generalize, a lambda function is a function that takes any number of arguments (including optional arguments) and returns the value of a single expression. lambda functions can not contain commands, and they can not contain more than one expression. Don't try to squeeze too much into
a lambda function; if you need something more complex, define a normal function instead and make it as long as you want.
 | lambda functions are a matter of style. Using them is never required; anywhere you could use them, you could define a separate
normal function and use that instead. I use them in places where I want to encapsulate specific, non-reusable code without
littering my code with a lot of little one-line functions.
4.7.1. Real-World lambda Functions
Here are the lambda functions in apihelper.py:
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
Notice that this uses the simple form of the and-or trick, which is okay, because a lambda function is always true in a boolean context. (That doesn't mean that a lambda function can't return a false value. The function is always true; its return value could be anything.)
Also notice that you're using the split function with no arguments. You've already seen it used with one or two arguments, but without any arguments it splits on whitespace.
Example 4.21. split With No Arguments>>> s = "this is\na\ttest" ①
>>> print s
this is
a test
>>> print s.split() ②
['this', 'is', 'a', 'test']
>>> print " ".join(s.split()) ③
'this is a test'
- This is a multiline string, defined by escape characters instead of triple quotes.
\n is a carriage return, and \t is a tab character.
split without any arguments splits on whitespace. So three spaces, a carriage return, and a tab character are all the same.
- You can normalize whitespace by splitting a string with
split and then rejoining it with join, using a single space as a delimiter. This is what the info function does to collapse multi-line docstrings into a single line.
So what is the info function actually doing with these lambda functions, splits, and and-or tricks?
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)processFunc is now a function, but which function it is depends on the value of the collapse variable. If collapse is true, processFunc(string) will collapse whitespace; otherwise, processFunc(string) will return its argument unchanged.
To do this in a less robust language, like Visual Basic, you would probably create a function that took a string and a collapse argument and used an if statement to decide whether to collapse the whitespace or not, then returned the appropriate value. This would be inefficient,
because the function would need to handle every possible case. Every time you called it, it would need to decide whether
to collapse whitespace before it could give you what you wanted. In Python, you can take that decision logic out of the function and define a lambda function that is custom-tailored to give you exactly (and only) what you want. This is more efficient, more elegant, and
less prone to those nasty oh-I-thought-those-arguments-were-reversed kinds of errors.
Further Reading on lambda Functions
4.8. Putting It All Together
The last line of code, the only one you haven't deconstructed yet, is the one that does all the work. But by now the work
is easy, because everything you need is already set up just the way you need it. All the dominoes are in place; it's time
to knock them down.
This is the meat of apihelper.py:
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])
Note that this is one command, split over multiple lines, but it doesn't use the line continuation character (\). Remember when I said that some expressions can be split into multiple lines without using a backslash? A list comprehension is one of those expressions, since the entire expression is contained in
square brackets.
Now, let's take it from the end and work backwards. The
for method in methodList
shows that this is a list comprehension. As you know, methodList is a list of all the methods you care about in object. So you're looping through that list with method.
Example 4.22. Getting a docstring Dynamically>>> import odbchelper
>>> object = odbchelper ①
>>> method = 'buildConnectionString' ②
>>> getattr(object, method) ③
<function buildConnectionString at 010D6D74>
>>> print getattr(object, method).__doc__ ④
Build a connection string from a dictionary of parameters.
Returns string.
- In the
info function, object is the object you're getting help on, passed in as an argument.
- As you're looping through methodList, method is the name of the current method.
- Using the
getattr function, you're getting a reference to the method function in the object module.
- Now, printing the actual
docstring of the method is easy.
The next piece of the puzzle is the use of str around the docstring. As you may recall, str is a built-in function that coerces data into a string. But a docstring is always a string, so why bother with the str function? The answer is that not every function has a docstring, and if it doesn't, its __doc__ attribute is None.
Example 4.23. Why Use str on a docstring?>>> >>> def foo(): print 2
>>> >>> foo()
2
>>> >>> foo.__doc__ ①
>>> foo.__doc__ == None ②
True
>>> str(foo.__doc__) ③
'None'
- You can easily define a function that has no
docstring, so its __doc__ attribute is None. Confusingly, if you evaluate the __doc__ attribute directly, the Python IDE prints nothing at all, which makes sense if you think about it, but is still unhelpful.
- You can verify that the value of the
__doc__ attribute is actually None by comparing it directly.
- The
str function takes the null value and returns a string representation of it, 'None'.
 | In SQL, you must use IS NULL instead of = NULL to compare a null value. In Python, you can use either == None or is None, but is None is faster.
Now that you are guaranteed to have a string, you can pass the string to processFunc, which you have already defined as a function that either does or doesn't collapse whitespace. Now you see why it was important to use str to convert a None value into a string representation. processFunc is assuming a string argument and calling its split method, which would crash if you passed it None because None doesn't have a split method.
Stepping back even further, you see that you're using string formatting again to concatenate the return value of processFunc with the return value of method's ljust method. This is a new string method that you haven't seen before.
Example 4.24. Introducing ljust>>> s = 'buildConnectionString'
>>> s.ljust(30) ①
'buildConnectionString '
>>> s.ljust(20) ②
'buildConnectionString'
ljust pads the string with spaces to the given length. This is what the info function uses to make two columns of output and line up all the docstrings in the second column.
- If the given length is smaller than the length of the string,
ljust will simply return the string unchanged. It never truncates the string.
You're almost finished. Given the padded method name from the ljust method and the (possibly collapsed) docstring from the call to processFunc, you concatenate the two and get a single string. Since you're mapping methodList, you end up with a list of strings. Using the join method of the string "\n", you join this list into a single string, with each element of the list on a separate line, and print the result.
Example 4.25. Printing a List>>> li = ['a', 'b', 'c']
>>> print "\n".join(li) ①
a
b
c
- This is also a useful debugging trick when you're working with lists. And in Python, you're always working with lists.
That's the last piece of the puzzle. You should now understand this code.
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])
4.9. Summary
The apihelper.py program and its output should now make perfect sense.
def info(object, spacing=10, collapse=1):
"""Print methods and docstrings.
Takes module, class, list, dictionary, or string."""
methodList = [method for method in dir(object) if callable(getattr(object, method))]
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])
if __name__ == "__main__":
print info.__doc__
Here is the output of apihelper.py: >>> from apihelper import info
>>> li = []
>>> info(li)
append L.append(object) -- append object to end
count L.count(value) -> integer -- return number of occurrences of value
extend L.extend(list) -- extend list by appending list elements
index L.index(value) -> integer -- return index of first occurrence of value
insert L.insert(index, object) -- insert object before index
pop L.pop([index]) -> item -- remove and return item at index (default last)
remove L.remove(value) -- remove first occurrence of value
reverse L.reverse() -- reverse *IN PLACE*
sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1, 0, 1
Before diving into the next chapter, make sure you're comfortable doing all of these things:
- Defining and calling functions with optional and named arguments
- Using
str to coerce any arbitrary value into a string representation
- Using
getattr to get references to functions and other attributes dynamically
- Extending the list comprehension syntax to do list filtering
- Recognizing the
and-or trick and using it safely
- Defining
lambda functions
- Assigning functions to variables and calling the function by referencing the variable. I can't emphasize this enough, because this mode of thought is vital
to advancing your understanding of Python. You'll see more complex applications of this concept throughout this book.
Chapter 5. Objects and Object-Orientation
This chapter, and pretty much every chapter after this, deals with object-oriented Python programming.
5.1. Diving In
Here is a complete, working Python program. Read the docstrings of the module, the classes, and the functions to get an overview of what this program does and how it works. As usual, don't
worry about the stuff you don't understand; that's what the rest of the chapter is for.
Example 5.1. fileinfo.py
If you have not already done so, you can download this and other examples used in this book.
"""Framework for getting filetype-specific metadata.
Instantiate appropriate class with filename. Returned object acts like a
dictionary, with key-value pairs for each piece of metadata.
import fileinfo
info = fileinfo.MP3FileInfo("/music/ap/mahadeva.mp3")
print "\\n".join(["%s=%s" % (k, v) for k, v in info.items()])
Or use listDirectory function to get info on all files in a directory.
for info in fileinfo.listDirectory("/music/ap/", [".mp3"]):
...
Framework can be extended by adding classes for particular file types, e.g.
HTMLFileInfo, MPGFileInfo, DOCFileInfo. Each class is completely responsible for
parsing its files appropriately; see MP3FileInfo for example.
"""
import os
import sys
from UserDict import UserDict
def stripnulls(data):
"strip whitespace and nulls"
return data.replace("\00", "").strip()
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self)
self["name"] = filename
class MP3FileInfo(FileInfo):
"store ID3v1.0 MP3 tags"
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}
def __parse(self, filename):
"parse ID3v1.0 tags from MP3 file"
self.clear()
try:
fsock = open(filename, "rb", 0)
try:
fsock.seek(-128, 2)
tagdata = fsock.read(128)
finally:
fsock.close()
if tagdata[:3] == "TAG":
for tag, (start, end, parseFunc) in self.tagDataMap.items():
self[tag] = parseFunc(tagdata[start:end])
except IOError:
pass
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item)
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f)
for f in os.listdir(directory)]
fileList = [os.path.join(directory, f)
for f in fileList
if os.path.splitext(f)[1] in fileExtList]
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
return [getFileInfoClass(f)(f) for f in fileList]
if __name__ == "__main__":
for info in listDirectory("/music/_singles/", [".mp3"]): ①
print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
print
- This program's output depends on the files on your hard drive. To get meaningful output, you'll need to change the directory
path to point to a directory of MP3 files on your own machine.
This is the output I got on my machine. Your output will be different, unless, by some startling coincidence, you share my
exact taste in music.
album=
artist=Ghost in the Machine
title=A Time Long Forgotten (Concept
genre=31
name=/music/_singles/a_time_long_forgotten_con.mp3
year=1999
comment=http://mp3.com/ghostmachine
album=Rave Mix
artist=***DJ MARY-JANE***
title=HELLRAISER****Trance from Hell
genre=31
name=/music/_singles/hellraiser.mp3
year=2000
comment=http://mp3.com/DJMARYJANE
album=Rave Mix
artist=***DJ MARY-JANE***
title=KAIRO****THE BEST GOA
genre=31
name=/music/_singles/kairo.mp3
year=2000
comment=http://mp3.com/DJMARYJANE
album=Journeys
artist=Masters of Balance
title=Long Way Home
genre=31
name=/music/_singles/long_way_home1.mp3
year=2000
comment=http://mp3.com/MastersofBalan
album=
artist=The Cynic Project
title=Sidewinder
genre=18
name=/music/_singles/sidewinder.mp3
year=2000
comment=http://mp3.com/cynicproject
album=Digitosis@128k
artist=VXpanded
title=Spinning
genre=255
name=/music/_singles/spinning.mp3
year=2000
comment=http://mp3.com/artists/95/vxp 5.2. Importing Modules Using from module import
Python has two ways of importing modules. Both are useful, and you should know when to use each. One way, import module, you've already seen in Section 2.4, “Everything Is an Object”. The other way accomplishes the same thing, but it has subtle and important differences.
Here is the basic from module import syntax:
from UserDict import UserDict
This is similar to the import module syntax that you know and love, but with an important difference: the attributes and methods of the imported module types are imported directly into the local namespace, so they are available directly, without qualification by module name. You
can import individual items or use from module import * to import everything.
 | from module import * in Python is like use module in Perl; import module in Python is like require module in Perl.
 | from module import * in Python is like import module.* in Java; import module in Python is like import module in Java.
Example 5.2. import module vs. from module import>>> import types
>>> types.FunctionType ①
<type 'function'>
>>> FunctionType ②
Traceback (innermost last):
File "<interactive input>", line 1, in ?
NameError: There is no variable named 'FunctionType'
>>> from types import FunctionType ③
>>> FunctionType ④
<type 'function'>
- The
types module contains no methods; it just has attributes for each Python object type. Note that the attribute, FunctionType, must be qualified by the module name, types.
FunctionType by itself has not been defined in this namespace; it exists only in the context of types.
- This syntax imports the attribute
FunctionType from the types module directly into the local namespace.
- Now
FunctionType can be accessed directly, without reference to types.
When should you use from module import?
- If you will be accessing attributes and methods often and don't want to type the module name over and over, use
from module import.
- If you want to selectively import some attributes and methods but not others, use
from module import.
- If the module contains attributes or functions with the same name as ones in your module, you must use
import module to avoid name conflicts.
Other than that, it's just a matter of style, and you will see Python code written both ways.
 | Use from module import * sparingly, because it makes it difficult to determine where a particular function or attribute came from, and that makes
debugging and refactoring more difficult.
Further Reading on Module Importing Techniques
5.3. Defining Classes
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the
classes you've defined.
Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word class, followed by the class name. Technically, that's all that's required, since a class doesn't need to inherit from any other
class.
Example 5.3. The Simplest Python Class
class Loaf: ①
pass ② ③
- The name of this class is
Loaf, and it doesn't inherit from any other class. Class names are usually capitalized, EachWordLikeThis, but this is only a convention, not a requirement.
- This class doesn't define any methods or attributes, but syntactically, there needs to be something in the definition, so
you use
pass. This is a Python reserved word that just means “move along, nothing to see here”. It's a statement that does nothing, and it's a good placeholder when you're stubbing out functions or classes.
- You probably guessed this, but everything in a class is indented, just like the code within a function,
if statement, for loop, and so forth. The first thing not indented is not in the class.
 | The pass statement in Python is like an empty set of braces ({}) in Java or C.
Of course, realistically, most classes will be inherited from other classes, and they will define their own class methods
and attributes. But as you've just seen, there is nothing that a class absolutely must have, other than a name. In particular,
C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Python classes do have something similar to a constructor: the __init__ method.
Example 5.4. Defining the FileInfo Class
from UserDict import UserDict
class FileInfo(UserDict): ①
- In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. So the
FileInfo class is inherited from the UserDict class (which was imported from the UserDict module). UserDict is a class that acts like a dictionary, allowing you to essentially subclass the dictionary datatype and add your own behavior.
(There are similar classes UserList and UserString which allow you to subclass lists and strings.) There is a bit of black magic behind this, which you will demystify later
in this chapter when you explore the UserDict class in more depth.
 | In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. There is no special keyword like
extends in Java.
Python supports multiple inheritance. In the parentheses following the class name, you can list as many ancestor classes as you
like, separated by commas.
5.3.1. Initializing and Coding Classes
This example shows the initialization of the FileInfo class using the __init__ method.
Example 5.5. Initializing the FileInfo Class
class FileInfo(UserDict):
"store file metadata" ①
def __init__(self, filename=None): ② ③ ④
- Classes can (and should) have
docstrings too, just like modules and functions.
__init__ is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor
of the class. It's tempting, because it looks like a constructor (by convention, __init__ is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance
of the class), and even sounds like one (“init” certainly suggests a constructor-ish nature). Incorrect, because the object has already been constructed by the time __init__ is called, and you already have a valid reference to the new instance of the class. But __init__ is the closest thing you're going to get to a constructor in Python, and it fills much the same role.
- The first argument of every class method, including
__init__, is always a reference to the current instance of the class. By convention, this argument is always named self. In the __init__ method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although
you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.
__init__ methods can take any number of arguments, and just like functions, the arguments can be defined with default values, making
them optional to the caller. In this case, filename has a default value of None, which is the Python null value.
 | By convention, the first argument of any Python class method (the reference to the current instance) is called self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention.
Example 5.6. Coding the FileInfo Class
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self) ①
self["name"] = filename ②
③
- Some pseudo-object-oriented languages like Powerbuilder have a concept of “extending” constructors and other events, where the ancestor's method is called automatically before the descendant's method is executed.
Python does not do this; you must always explicitly call the appropriate method in the ancestor class.
- I told you that this class acts like a dictionary, and here is the first sign of it. You're assigning the argument filename as the value of this object's
name key.
- Note that the
__init__ method never returns a value.
5.3.2. Knowing When to Use self and __init__
When defining your class methods, you must explicitly list self as the first argument for each method, including __init__. When you call a method of an ancestor class from within your class, you must include the self argument. But when you call your class method from outside, you do not specify anything for the self argument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent,
but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don't know
about yet.
Whew. I realize that's a lot to absorb, but you'll get the hang of it. All Python classes work the same way, so once you learn one, you've learned them all. If you forget everything else, remember this
one thing, because I promise it will trip you up:
 | __init__ methods are optional, but when you define one, you must remember to explicitly call the ancestor's __init__ method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor,
the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments.
Further Reading on Python Classes
5.4. Instantiating Classes
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the
__init__ method defines. The return value will be the newly created object.
Example 5.7. Creating a FileInfo Instance>>> import fileinfo
>>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3") ①
>>> f.__class__ ②
<class fileinfo.FileInfo at 010EC204>
>>> f.__doc__ ③
'store file metadata'
>>> f ④
{'name': '/music/_singles/kairo.mp3'}
- You are creating an instance of the
FileInfo class (defined in the fileinfo module) and assigning the newly created instance to the variable f. You are passing one parameter, /music/_singles/kairo.mp3, which will end up as the filename argument in FileInfo's __init__ method.
- Every class instance has a built-in attribute,
__class__, which is the object's class. (Note that the representation of this includes the physical address of the instance on my
machine; your representation will be different.) Java programmers may be familiar with the Class class, which contains methods like getName and getSuperclass to get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like __class__, __name__, and __bases__.
- You can access the instance's
docstring just as with a function or a module. All instances of a class share the same docstring.
- Remember when the
__init__ method assigned its filename argument to self["name"]? Well, here's the result. The arguments you pass when you create the class instance get sent right along to the __init__ method (along with the object reference, self, which Python adds for free).
 | In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit new operator like C++ or Java.
5.4.1. Garbage Collection
If creating new instances is easy, destroying them is even easier. In general, there is no need to explicitly free instances,
because they are freed automatically when the variables assigned to them go out of scope. Memory leaks are rare in Python.
Example 5.8. Trying to Implement a Memory Leak>>> def leakmem():
... f = fileinfo.FileInfo('/music/_singles/kairo.mp3') ①
...
>>> for i in range(100):
... leakmem() ②
- Every time the
leakmem function is called, you are creating an instance of FileInfo and assigning it to the variable f, which is a local variable within the function. Then the function ends without ever freeing f, so you would expect a memory leak, but you would be wrong. When the function ends, the local variable f goes out of scope. At this point, there are no longer any references to the newly created instance of FileInfo (since you never assigned it to anything other than f), so Python destroys the instance for us.
- No matter how many times you call the
leakmem function, it will never leak memory, because every time, Python will destroy the newly created FileInfo class before returning from leakmem.
The technical term for this form of garbage collection is “reference counting”. Python keeps a list of references to every instance created. In the above example, there was only one reference to the | | | | | | | | | | | | | | | | | | | | | | |