diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index 965a80c..04a5530 100644 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -8,10 +8,11 @@ +
+chardet to Python 3❝ Words, words. They’re all we have to go on. ❞
— Rosencrantz and Guildenstern are Dead diff --git a/dip2 b/dip2 index f18bbf2..a16be0c 100644 --- a/dip2 +++ b/dip2 @@ -316,7 +316,7 @@ several months behind in updating their ActivePython installer when new versionIf you are using Windows 95, Windows 98, or Windows ME, you will also need to download and install Windows Installer 2.0 before installing ActivePython.
- Double-click the installer,
ActivePython-2.2.2-224-win32-ix86.msi. +Double-click the installer,
ActivePython-2.2.2-224-win32-ix86.msi.Step through the installer program. @@ -341,13 +341,13 @@ see 'Help/About PythonWin' for further copyright information.
Download the latest Python Windows installer by going to http://www.python.org/ftp/python/ and selecting the highest version number listed, then downloading the
.exeinstaller.- Double-click the installer,
Python-2.xxx.yyy.exe. The name will depend on the version of Python available when you read this. +Double-click the installer,
Python-2.xxx.yyy.exe. The name will depend on the version of Python available when you read this.Step through the installer program.
- If disk space is tight, you can deselect the HTMLHelp file, the utility scripts (
Tools/), and/or the test suite (Lib/test/). +If disk space is tight, you can deselect the HTMLHelp file, the utility scripts (
Tools/), and/or the test suite (Lib/test/).If you do not have administrative rights on your machine, you can select Advanced Options, then choose Non-Admin Install. This just affects where Registry entries and Start menu shortcuts are created. @@ -380,13 +380,13 @@ interactive shell.
To use the preinstalled version of Python, follow these steps:
- -
Open the
/Applicationsfolder. +Open the
/Applicationsfolder.- -
Open the
Utilitiesfolder. +Open the
Utilitiesfolder.- -
Double-click
Terminalto open a terminal window and get to a command line. +Double-click
Terminalto open a terminal window and get to a command line.Type python at the command prompt. @@ -406,13 +406,13 @@ Type "help", "copyright", "credits", or "license" for more information.
Follow these steps to download and install the latest version of Python:
- -
Download the
MacPython-OSXdisk image from http://homepages.cwi.nl/~jack/macpython/download.html. +Download the
MacPython-OSXdisk image from http://homepages.cwi.nl/~jack/macpython/download.html.- -
If your browser has not already done so, double-click
MacPython-OSX-2.3-1.dmgto mount the disk image on your desktop. +If your browser has not already done so, double-click
MacPython-OSX-2.3-1.dmgto mount the disk image on your desktop.- -
Double-click the installer,
MacPython-OSX.pkg. +Double-click the installer,
MacPython-OSX.pkg.The installer will prompt you for your administrative username and password. @@ -421,13 +421,13 @@ Type "help", "copyright", "credits", or "license" for more information.
Step through the installer program.
- -
After installation is complete, close the installer and open the
/Applicationsfolder. +After installation is complete, close the installer and open the
/Applicationsfolder.- -
Open the
MacPython-2.3folder +Open the
MacPython-2.3folder- -
Double-click
PythonIDEto launch Python. +Double-click
PythonIDEto launch Python.The MacPython IDE should display a splash screen, then take you to the interactive shell. If the interactive shell does not appear, select @@ -458,25 +458,25 @@ Type "help", "copyright", "credits", or "license" for more information.
Follow these steps to install Python on Mac OS 9:
- -
Download the
MacPython23full.binfile from http://homepages.cwi.nl/~jack/macpython/download.html. +Download the
MacPython23full.binfile from http://homepages.cwi.nl/~jack/macpython/download.html.- -
If your browser does not decompress the file automatically, double-click
MacPython23full.binto decompress the file with Stuffit Expander. +If your browser does not decompress the file automatically, double-click
MacPython23full.binto decompress the file with Stuffit Expander.- -
Double-click the installer,
MacPython23full. +Double-click the installer,
MacPython23full.Step through the installer program.
- -
AFter installation is complete, close the installer and open the
/Applicationsfolder. +AFter installation is complete, close the installer and open the
/Applicationsfolder.- -
Open the
MacPython-OS9 2.3folder. +Open the
MacPython-OS9 2.3folder.- -
Double-click
Python IDEto launch Python. +Double-click
Python IDEto launch Python.The MacPython IDE should display a splash screen, and then take you to the interactive shell. If the interactive shell does not appear, select @@ -490,7 +490,7 @@ MacPython IDE 1.0.1
1.5. Python on RedHat Linux
Installing under UNIX-compatible operating systems such as Linux is easy if you're willing to install a binary package. Pre-built binary packages are available for most popular Linux distributions. Or you can always compile from source. -
Download the latest Python RPM by going to http://www.python.org/ftp/python/ and selecting the highest version number listed, then selecting the
rpms/directory within that. Then download the RPM with the highest version number. You can install it with the rpm command, as shown here: +Download the latest Python RPM by going to http://www.python.org/ftp/python/ and selecting the highest version number listed, then selecting the
rpms/directory within that. Then download the RPM with the highest version number. You can install it with the rpm command, as shown here:Example 1.2. Installing on RedHat Linux 9
localhost:~$ su - Password: [enter your root password] @@ -571,7 +571,7 @@ logout Type "help", "copyright", "credits" or "license" for more information. >>> [press Ctrl+D to exit]1.7. Python Installation from Source
-If you prefer to build from source, you can download the Python source code from http://www.python.org/ftp/python/. Select the highest version number listed, download the
.tgzfile), and then do the usual configure, make, make install dance. +If you prefer to build from source, you can download the Python source code from http://www.python.org/ftp/python/. Select the highest version number listed, download the
.tgzfile), and then do the usual configure, make, make install dance.Example 1.4. Installing from source
localhost:~$ su - Password: [enter your root password] @@ -658,7 +658,7 @@ Let's skip all that.Here is a complete, working Python program.
It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it. -
Example 2.1.
+odbchelper.pyExample 2.1.
odbchelper.pyIf you have not already done so, you can download this and other examples used in this book.
def buildConnectionString(params): """Build a connection string from a dictionary of parameters. @@ -687,7 +687,7 @@ File->Run... (Ctrl-R). Output is displayed in the iIn the Python IDE on Mac OS, you can run a Python program with -Python->Run window... (Cmd-R), but there is an important option you must set first. Open the .pyfile in the IDE, pop up the options menu by clicking the black triangle in the upper-right corner of the window, and make sure the Run as __main__ option is checked. This is a per-file setting, but you'll only need to do it once per file. +Python->Run window... (Cmd-R), but there is an important option you must set first. Open the.pyfile in the IDE, pop up the options menu by clicking the black triangle in the upper-right corner of the window, and make sure the Run as __main__ option is checked. This is a per-file setting, but you'll only need to do it once per file.@@ -695,10 +695,10 @@ Python->Run window... (Cmd-R), but there is an impor
-- On UNIX-compatible systems (including Mac OS X), you can run a Python program from the command line: python +odbchelper.pyOn UNIX-compatible systems (including Mac OS X), you can run a Python program from the command line: python odbchelper.pyThe id="odbchelper.output" output of
odbchelper.pywill look like this:server=mpilgrim;uid=sa;database=master;pwd=secret2.2. Declaring Functions
+The id="odbchelper.output" output of
odbchelper.pywill look like this:server=mpilgrim;uid=sa;database=master;pwd=secret2.2. Declaring Functions
Python has functions like most other languages, but it does not have separate header files like C++ or
interface/implementationsections like Pascal. When you need a function, just declare it, like this:def buildConnectionString(params):Note that the keyword
defstarts the function declaration, followed by the function name, followed by the arguments in parentheses. Multiple arguments @@ -744,7 +744,7 @@ In fact, every Python function returns a value; if the function ever executes aSo Python is both dynamically typed (because it doesn't use explicit datatype declarations) and strongly typed (because once a variable has a datatype, it actually matters).
2.3. Documenting Functions
You can document a Python function by giving it a
docstring. -Example 2.2. Defining the
buildConnectionStringFunction'sdocstring+Example 2.2. Defining the
buildConnectionStringFunction'sdocstringdef buildConnectionString(params): """Build a connection string from a dictionary of parameters. @@ -776,166 +776,6 @@ need to give your function adocstring, but you always should. I k2.4. Everything Is an Object
-In case you missed it, I just said that Python functions have attributes, and that those attributes are available at runtime. -
A function, like everything else in Python, is an object. -
Open your favorite Python IDE and follow along: -
Example 2.3. Accessing the
buildConnectionStringFunction'sdocstring>>> import odbchelper->>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} ->>> print odbchelper.buildConnectionString(params)
-server=mpilgrim;uid=sa;database=master;pwd=secret ->>> print odbchelper.buildConnectionString.__doc__
-Build a connection string from a dictionary - -Returns string.
--
-- -- -
The first line imports the -odbchelperprogram as a module -- a chunk of code that you can use interactively, or from a larger Python program. (You'll see examples of multi-module Python programs in Chapter 4.) Once you import a module, you can reference any of its public functions, classes, or attributes. Modules can do this - to access functionality in other modules, and you can do it in the IDE too. This is an important concept, and you'll talk more about it later. -- -- -
When you want to use functions defined in imported modules, you need to include the module name. So you can't just say -buildConnectionString; it must beodbchelper.buildConnectionString. If you've used classes in Java, this should feel vaguely familiar. -- -- -
Instead of calling the function as you would expect to, you asked for one of the function's attributes, -__doc__. --
-- -- - -- importin Python is likerequirein Perl. Once youimporta Python module, you access its functions withmodule.function; once yourequirea Perl module, you access its functions withmodule::function. -2.4.1. The Import Search Path
-Before you go any further, I want to briefly mention the library search path. Python looks in several places when you try to import a module. Specifically, it looks in all the directories defined in
sys.path. This is just a list, and you can easily view it or modify it with standard list methods. (You'll learn more about lists - later in this chapter.) -Example 2.4. Import Search Path
->>> import sys->>> sys.path
-['', '/usr/local/lib/python2.2', '/usr/local/lib/python2.2/plat-linux2', -'/usr/local/lib/python2.2/lib-dynload', '/usr/local/lib/python2.2/site-packages', -'/usr/local/lib/python2.2/site-packages/PIL', '/usr/local/lib/python2.2/site-packages/piddle'] ->>> sys
-<module 'sys' (built-in)> ->>> sys.path.append('/my/new/path')
--
-- -- -
Importing the -sysmodule makes all of its functions and attributes available. -- -- -
- sys.pathis a list of directory names that constitute the current search path. (Yours will look different, depending on your operating - system, what version of Python you're running, and where it was originally installed.) Python will look through these directories (in this order) for a.pyfile matching the module name you're trying to import. -- -- -
Actually, I lied; the truth is more complicated than that, because not all modules are stored as -.pyfiles. Some, like thesysmodule, are "built-in modules"; they are actually baked right into Python itself. Built-in modules behave just like regular modules, but their Python source code is not available, because they are not written in Python! (Thesysmodule is written in C.) -- -- -
You can add a new directory to Python's search path at runtime by appending the directory name to -sys.path, and then Python will look in that directory as well, whenever you try to import a module. The effect lasts as long as Python is running. (You'll talk more aboutappendand other list methods in Chapter 3.) -2.4.2. What's an Object?
-Everything in Python is an object, and almost everything has attributes and methods. All functions have a built-in attribute
__doc__, which returns thedocstringdefined in the function's source code. Thesysmodule is an object which has (among other things) an attribute calledpath. And so forth. -Still, this begs the question. What is an object? Different programming languages define “object” in different ways. In some, it means that all objects must have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser; some objects have neither attributes nor methods (more on this in Chapter 3), and not all objects are subclassable (more on this in Chapter 5). But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function - (more in this in Chapter 4). -
This is so important that I'm going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects. -
-Further Reading on Objects
--
-- Python Reference Manual explains exactly what it means to say that everything in Python is an object, because some people are pedantic and like to discuss this sort of thing at great length. - -
- eff-bot summarizes Python objects. - -
2.5. Indenting Code
-Python functions have no explicit
beginorend, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (:) and the indentation of the code itself. -Example 2.5. Indenting the
buildConnectionStringFunction-def buildConnectionString(params): - """Build a connection string from a dictionary of parameters. - - Returns string.""" - return ";".join(["%s=%s" % (k, v) for k, v in params.items()])Code blocks are defined by their indentation. By "code block", I mean functions,
ifstatements,forloops,whileloops, and so forth. Indenting starts a block and unindenting ends it. There are no explicit braces, brackets, or keywords. -This means that whitespace is significant, and must be consistent. In this example, the function code (including thedocstring) is indented four spaces. It doesn't need to be four spaces, it just needs to be consistent. The first line that is not -indented is outside the function. -Example 2.6, “if Statements” shows an example of code indentation with
ifstatements. -Example 2.6.
ifStatements-def fib(n):- print 'n =', n
- if n > 1:
- return n * fib(n - 1) - else:
- print 'end of the line' - return 1 -
-After some initial protests and several snide analogies to Fortran, you will make peace with this and start seeing its benefits. One major benefit is that all Python programs look similar, since indentation is a language requirement and not a matter of style. This makes it easier to read -and understand other people's Python code.
-
-- -- - -Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++ and Java use semicolons to separate statements and curly braces to separate code blocks. - --Further Reading on Code Indentation
--
- Python Reference Manual discusses cross-platform indentation issues and shows various indentation errors. - -
- Python Style Guide discusses good indentation style. - -
2.6. Testing Modules
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them. Here's an example that uses the
if__name__trick. @@ -988,7 +828,7 @@ them into a larger program.
![]()
- A dictionary in Python is like an instance of the Hashtableclass in Java. +A dictionary in Python is like an instance of the Hashtableclass in Java.- @@ -1289,7 +1129,7 @@ KeyError: mpilgrimA list in Python is much more than an array in Java (although it can be used as one if that's really all you want out of life). A better analogy would be to the ArrayListclass, which can hold arbitrary objects and can expand dynamically as new items are added. +A list in Python is much more than an array in Java (although it can be used as one if that's really all you want out of life). A better analogy would be to the ArrayListclass, which can hold arbitrary objects and can expand dynamically as new items are added.@@ -1309,24 +1149,24 @@ KeyError: mpilgrim - ![]()
If both slice indices are left out, all elements of the list are included. But this is not the same as the original lilist; it is a new list that happens to have all the same elements.li[:]is shorthand for making a complete copy of a list. +If both slice indices are left out, all elements of the list are included. But this is not the same as the original li list; it is a new list that happens to have all the same elements. li[:]is shorthand for making a complete copy of a list.- ![]()
appendadds a single element to the end of the list. +appendadds a single element to the end of the list.- ![]()
insertinserts a single element into a list. The numeric argument is the index of the first element that gets bumped out of position. +insertinserts a single element into a list. The numeric argument is the index of the first element that gets bumped out of position. Note that list elements do not need to be unique; there are now two separate elements with the value'new',li[2]andli[6].- - ![]()
extendconcatenates lists. Note that you do not callextendwith multiple arguments; you call it with one argument, a list. In this case, that list has two elements. +extendconcatenates lists. Note that you do not callextendwith multiple arguments; you call it with one argument, a list. In this case, that list has two elements.Example 3.11. The Difference between
extendandappend+Example 3.11. The Difference between
extendandappend>>> li = ['a', 'b', 'c'] >>> li.extend(['d', 'e', 'f'])>>> li @@ -1348,7 +1188,7 @@ KeyError: mpilgrim
- ![]()
Lists have two methods, extendandappend, that look like they do the same thing, but are in fact completely different.extendtakes a single argument, which is always a list, and adds each of the elements of that list to the original list. +Lists have two methods, extendandappend, that look like they do the same thing, but are in fact completely different.extendtakes a single argument, which is always a list, and adds each of the elements of that list to the original list.@@ -1360,14 +1200,14 @@ KeyError: mpilgrim - ![]()
On the other hand, appendtakes one argument, which can be any data type, and simply adds it to the end of the list. Here, you're calling theappendmethod with a single argument, which is a list of three elements. +On the other hand, appendtakes one argument, which can be any data type, and simply adds it to the end of the list. Here, you're calling theappendmethod with a single argument, which is a list of three elements.@@ -1388,13 +1228,13 @@ False ![]()
Now the original list, which started as a list of three elements, contains four elements. Why four? Because the last element - that you just appended is itself a list. Lists can contain any type of data, including other lists. That may be what you want, or maybe not. Don't use appendif you meanextend. + that you just appended is itself a list. Lists can contain any type of data, including other lists. That may be what you want, or maybe not. Don't useappendif you meanextend.- ![]()
indexfinds the first occurrence of a value in the list and returns the index. +indexfinds the first occurrence of a value in the list and returns the index.- ![]()
indexfinds the first occurrence of a value in the list. In this case,'new'occurs twice in the list, inli[2]andli[6], butindexwill return only the first index,2. +indexfinds the first occurrence of a value in the list. In this case,'new'occurs twice in the list, inli[2]andli[6], butindexwill return only the first index,2.@@ -1408,7 +1248,7 @@ False @@ -1420,7 +1260,7 @@ False - ![]()
To test whether a value is in the list, use in, which returnsTrueif the value is found orFalseif it is not. +To test whether a value is in the list, use in, which returnsTrueif the value is found orFalseif it is not.Before version 2.2.1, Python had no separate boolean datatype. To compensate for this, Python accepted almost anything in a boolean context (like an ifstatement), according to the following rules:-
0is false; all other numbers are true. +0is false; all other numbers are true.- An empty string (
"") is false, all other strings are true. @@ -1456,26 +1296,26 @@ ValueError: list.remove(x): x not in list- ![]()
removeremoves the first occurrence of a value from a list. +removeremoves the first occurrence of a value from a list.- ![]()
removeremoves only the first occurrence of a value. In this case,'new'appeared twice in the list, butli.remove("new")removed only the first occurrence. +removeremoves only the first occurrence of a value. In this case,'new'appeared twice in the list, butli.remove("new")removed only the first occurrence.- ![]()
If the value is not found in the list, Python raises an exception. This mirrors the behavior of the indexmethod. +If the value is not found in the list, Python raises an exception. This mirrors the behavior of the indexmethod.@@ -1494,7 +1334,7 @@ ValueError: list.remove(x): x not in list - ![]()
popis an interesting beast. It does two things: it removes the last element of the list, and it returns the value that it removed. - Note that this is different fromli[-1], which returns a value but does not change the list, and different fromli.remove(value), which changes the list but does not return a value. +popis an interesting beast. It does two things: it removes the last element of the list, and it returns the value that it removed. + Note that this is different fromli[-1], which returns a value but does not change the list, and different fromli.remove(value), which changes the list but does not return a value.- ![]()
Lists can also be concatenated with the +operator.list = list + otherlisthas the same result aslist.extend(otherlist). But the+operator returns a new (concatenated) list as a value, whereasextendonly alters an existing list. This means thatextendis faster, especially for large lists. +Lists can also be concatenated with the +operator.list = list + otherlisthas the same result aslist.extend(otherlist). But the+operator returns a new (concatenated) list as a value, whereasextendonly alters an existing list. This means thatextendis faster, especially for large lists.@@ -1582,25 +1422,25 @@ True - ![]()
You can't add elements to a tuple. Tuples have no appendorextendmethod. +You can't add elements to a tuple. Tuples have no appendorextendmethod.- ![]()
You can't remove elements from a tuple. Tuples have no removeorpopmethod. +You can't remove elements from a tuple. Tuples have no removeorpopmethod.- ![]()
You can't find elements in a tuple. Tuples have no indexmethod. +You can't find elements in a tuple. Tuples have no indexmethod.@@ -1623,7 +1463,7 @@ True - ![]()
You can, however, use into see if an element exists in the tuple. +You can, however, use into see if an element exists in the tuple.- @@ -1638,10 +1478,10 @@ TrueTuples can be converted into lists, and vice-versa. The built-in tuplefunction takes a list and returns a tuple with the same elements, and thelistfunction takes a tuple and returns a list. In effect,tuplefreezes a list, andlistthaws a tuple. +Tuples can be converted into lists, and vice-versa. The built-in tuplefunction takes a list and returns a tuple with the same elements, and thelistfunction takes a tuple and returns a list. In effect,tuplefreezes a list, andlistthaws a tuple.3.4. Declaring variables
-Now that you know something about dictionaries, tuples, and lists (oh my!), let's get back to the sample program from Chapter 2,
odbchelper.py. +Now that you know something about dictionaries, tuples, and lists (oh my!), let's get back to the sample program from Chapter 2,
odbchelper.py.Python has local and global variables like most other languages, but it has no explicit variable declarations. Variables spring into existence by being assigned a value, and they are automatically destroyed when they go out of scope. -
Example 3.17. Defining the
myParamsVariable+Example 3.17. Defining the myParams Variable
if __name__ == "__main__": myParams = {"server":"mpilgrim", \ "database":"master", \ @@ -1659,7 +1499,7 @@ if __name__ == "__main__":Strictly speaking, expressions in parentheses, straight brackets, or curly braces (like defining a dictionary) can be split into multiple lines with or without the line continuation character (“
\”). I like to include the backslash even when it's not required because I think it makes the code easier to read, but that's a matter of style. -Third, you never declared the variable
myParams, you just assigned a value to it. This is like VBScript without theoption explicitoption. Luckily, unlike VBScript, Python will not allow you to reference a variable that has never been assigned a value; trying to do so will raise an exception. +Third, you never declared the variable myParams, you just assigned a value to it. This is like VBScript without the
option explicitoption. Luckily, unlike VBScript, Python will not allow you to reference a variable that has never been assigned a value; trying to do so will raise an exception.3.4.1. Referencing Variables
Example 3.18. Referencing an Unbound Variable
>>> x Traceback (innermost last): @@ -1682,12 +1522,12 @@ NameError: There is no variable named 'x'- ![]()
vis a tuple of three elements, and(x, y, z)is a tuple of three variables. Assigning one to the other assigns each of the values ofvto each of the variables, in order. +v is a tuple of three elements, and (x, y, z)is a tuple of three variables. Assigning one to the other assigns each of the values of v to each of the variables, in order.This has all sorts of uses. I often want to assign names to a range of values. In C, you would use
enumand manually list each constant and its associated value, which seems especially tedious when the values are consecutive. - In Python, you can use the built-inrangefunction with multi-variable assignment to quickly assign consecutive values. + In Python, you can use the built-inrangefunction with multi-variable assignment to quickly assign consecutive values.Example 3.20. Assigning Consecutive Values
>>> range(7)[0, 1, 2, 3, 4, 5, 6] >>> (MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7)
@@ -1701,25 +1541,25 @@ NameError: There is no variable named 'x'
- ![]()
The built-in rangefunction returns a list of integers. In its simplest form, it takes an upper limit and returns a zero-based list counting - up to but not including the upper limit. (If you like, you can pass other parameters to specify a base other than0and a step other than1. You canprint range.__doc__for details.) +The built-in rangefunction returns a list of integers. In its simplest form, it takes an upper limit and returns a zero-based list counting + up to but not including the upper limit. (If you like, you can pass other parameters to specify a base other than0and a step other than1. You canprint range.__doc__for details.)- ![]()
MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY,SATURDAY, andSUNDAYare the variables you're defining. (This example came from thecalendarmodule, a fun little module that prints calendars, like the UNIX programcal. Thecalendarmodule defines integer constants for days of the week.) +MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, and SUNDAY are the variables you're defining. (This example came from the calendarmodule, a fun little module that prints calendars, like the UNIX programcal. Thecalendarmodule defines integer constants for days of the week.)- ![]()
Now each variable has its value: MONDAYis0,TUESDAYis1, and so forth. +Now each variable has its value: MONDAY is 0, TUESDAY is1, and so forth.You can also use multi-variable assignment to build functions that return multiple values, simply by returning a tuple of - all the values. The caller can treat it as a tuple, or assign the values to individual variables. Many standard Python libraries do this, including the
osmodule, which you'll discuss in Chapter 6. + all the values. The caller can treat it as a tuple, or assign the values to individual variables. Many standard Python libraries do this, including theosmodule, which you'll discuss in Chapter 6.Further Reading on Variables
@@ -1736,7 +1576,7 @@ NameError: There is no variable named 'x'
- @@ -1748,7 +1588,7 @@ NameError: There is no variable named 'x'String formatting in Python uses the same syntax as the sprintffunction in C. +String formatting in Python uses the same syntax as the sprintffunction in C.@@ -1785,7 +1625,7 @@ TypeError: cannot concatenate 'str' and 'int' objects - ![]()
The whole expression evaluates to a string. The first %sis replaced by the value ofk; the second%sis replaced by the value ofv. All other characters in the string (in this case, the equal sign) stay as they are. +The whole expression evaluates to a string. The first %sis replaced by the value of k; the second%sis replaced by the value of v. All other characters in the string (in this case, the equal sign) stay as they are.(userCount, )is a tuple with one element. Yes, the syntax is a little strange, but there's a good reason for it: it's unambiguously a tuple. In fact, you can always include a comma after the last element when defining a list, tuple, or dictionary, but the - comma is required when defining a tuple with one element. If the comma weren't required, Python wouldn't know whether(userCount)was a tuple with one element or just the value ofuserCount. + comma is required when defining a tuple with one element. If the comma weren't required, Python wouldn't know whether(userCount)was a tuple with one element or just the value of userCount.@@ -1802,7 +1642,7 @@ TypeError: cannot concatenate 'str' and 'int' objects printf in C, string formatting in Python is like a Swiss Army knife. There are options galore, and modifier strings to specially format many different types of values. +As with
printfin C, string formatting in Python is like a Swiss Army knife. There are options galore, and modifier strings to specially format many different types of values.Example 3.23. Formatting Numbers
>>> print "Today's stock price: %f" % 50.462550.462500 @@ -1855,7 +1695,7 @@ TypeError: cannot concatenate 'str' and 'int' objects
-
To make sense of this, look at it from right to left. liis the list you're mapping. Python loops throughlione element at a time, temporarily assigning the value of each element to the variableelem. Python then applies the functionand appends that result to the returned list. +elem*2To make sense of this, look at it from right to left. li is the list you're mapping. Python loops through li one element at a time, temporarily assigning the value of each element to the variable elem. Python then applies the function elem*2and appends that result to the returned list.@@ -1871,9 +1711,9 @@ TypeError: cannot concatenate 'str' and 'int' objects -Here are the list comprehensions in the
buildConnectionStringfunction that you declared in Chapter 2:-["%s=%s" % (k, v) for k, v in params.items()]First, notice that you're calling the
itemsfunction of theparamsdictionary. This function returns a list of tuples of all the data in the dictionary. -Example 3.25. The
keys,values, anditemsFunctions>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} +Here are the list comprehensions in the
buildConnectionStringfunction that you declared in Chapter 2:+["%s=%s" % (k, v) for k, v in params.items()]First, notice that you're calling the
itemsfunction of the params dictionary. This function returns a list of tuples of all the data in the dictionary. +Example 3.25. The
keys,values, anditemsFunctions>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} >>> params.keys()['server', 'uid', 'database', 'pwd'] >>> params.values()
@@ -1884,26 +1724,26 @@ TypeError: cannot concatenate 'str' and 'int' objects
-
The keysmethod of a dictionary returns a list of all the keys. The list is not in the order in which the dictionary was defined +The keysmethod of a dictionary returns a list of all the keys. The list is not in the order in which the dictionary was defined (remember that elements in a dictionary are unordered), but it is a list.- ![]()
The valuesmethod returns a list of all the values. The list is in the same order as the list returned bykeys, soparams.values()[n] == params[params.keys()[n]]for all values ofn. +The valuesmethod returns a list of all the values. The list is in the same order as the list returned bykeys, soparams.values()[n] == params[params.keys()[n]]for all values of n.- - ![]()
The itemsmethod returns a list of tuples of the form(key, value). The list contains all the data in the dictionary. +The itemsmethod returns a list of tuples of the form(key, value). The list contains all the data in the dictionary.Now let's see what
buildConnectionStringdoes. It takes a list,, and maps it to a new list by applying string formatting to each element. The new list will have the same number of elements -asparams.items(), but each element in the new list will be a string that contains both a key and its associated value from theparams.items()paramsdictionary. -Example 3.26. List Comprehensions in
buildConnectionString, Step by Step>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} +Now let's see what
buildConnectionStringdoes. It takes a list,params., and maps it to a new list by applying string formatting to each element. The new list will have the same number of elements +asitems()params., but each element in the new list will be a string that contains both a key and its associated value from the params dictionary. +items()Example 3.26. List Comprehensions in
buildConnectionString, Step by Step>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} >>> params.items() [('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')] >>> [k for k, v in params.items()]@@ -1916,13 +1756,13 @@ as
params.items- ![]()
Note that you're using two variables to iterate through the params.items()list. This is another use of multi-variable assignment. The first element ofparams.items()is('server', 'mpilgrim'), so in the first iteration of the list comprehension,kwill get'server'andvwill get'mpilgrim'. In this case, you're ignoring the value ofvand only including the value ofkin the returned list, so this list comprehension ends up being equivalent to. +params.keys()Note that you're using two variables to iterate through the params.items()list. This is another use of multi-variable assignment. The first element ofparams.items()is('server', 'mpilgrim'), so in the first iteration of the list comprehension, k will get'server'and v will get'mpilgrim'. In this case, you're ignoring the value of v and only including the value of k in the returned list, so this list comprehension ends up being equivalent toparams..keys()- ![]()
Here you're doing the same thing, but ignoring the value of k, so this list comprehension ends up being equivalent to. +params.values()Here you're doing the same thing, but ignoring the value of k, so this list comprehension ends up being equivalent to params..values()@@ -1936,36 +1776,36 @@ as params.itemsFurther Reading on List Comprehensions
-
- Python Tutorial discusses another way to map lists using the built-in
mapfunction. +- Python Tutorial discusses another way to map lists using the built-in
mapfunction.- Python Tutorial shows how to do nested list comprehensions.
3.7. Joining Lists and Splitting Strings
-You have a list of key-value pairs in the form
key=value, and you want to join them into a single string. To join any list of strings into a single string, use thejoinmethod of a string object. +You have a list of key-value pairs in the form
key=value, and you want to join them into a single string. To join any list of strings into a single string, use thejoinmethod of a string object.-Here is an example of joining a list from the
buildConnectionStringfunction:+Here is an example of joining a list from the
buildConnectionStringfunction:return ";".join(["%s=%s" % (k, v) for k, v in params.items()])One interesting note before you continue. I keep repeating that functions are objects, strings are objects... everything -is an object. You might have thought I meant that string variables are objects. But no, look closely at this example and you'll see that the string
";"itself is an object, and you are calling itsjoinmethod. -The
joinmethod joins the elements of the list into a single string, with each element separated by a semi-colon. The delimiter doesn't +is an object. You might have thought I meant that string variables are objects. But no, look closely at this example and you'll see that the string";"itself is an object, and you are calling itsjoinmethod. +The
joinmethod joins the elements of the list into a single string, with each element separated by a semi-colon. The delimiter doesn't need to be a semi-colon; it doesn't even need to be a single character. It can be any string.-
- joinworks only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements +joinworks only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements will raise an exception.Example 3.27. Output of
odbchelper.py>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} +Example 3.27. Output of
odbchelper.py>>> params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"} >>> ["%s=%s" % (k, v) for k, v in params.items()] ['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret'] >>> ";".join(["%s=%s" % (k, v) for k, v in params.items()]) -'server=mpilgrim;uid=sa;database=master;pwd=secret'This string is then returned from the
odbchelperfunction and printed by the calling block, which gives you the output that you marveled at when you started reading this +'server=mpilgrim;uid=sa;database=master;pwd=secret'This string is then returned from the
odbchelperfunction and printed by the calling block, which gives you the output that you marveled at when you started reading this chapter.You're probably wondering if there's an analogous method to split a string into a list. And of course there is, and it's -called
split. +calledsplit.Example 3.28. Splitting a String
>>> li = ['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret'] >>> s = ";".join(li) >>> s @@ -1978,13 +1818,13 @@ calledsplit.- ![]()
splitreversesjoinby splitting a string into a multi-element list. Note that the delimiter (“;”) is stripped out completely; it does not appear in any of the elements of the returned list. +splitreversesjoinby splitting a string into a multi-element list. Note that the delimiter (“;”) is stripped out completely; it does not appear in any of the elements of the returned list.@@ -1993,7 +1833,7 @@ called - ![]()
splittakes an optional second argument, which is the number of times to split. (“Oooooh, optional arguments...” You'll learn how to do this in your own functions in the next chapter.) +splittakes an optional second argument, which is the number of times to split. (“Oooooh, optional arguments...” You'll learn how to do this in your own functions in the next chapter.)split.- @@ -2005,18 +1845,18 @@ calledanystring.is a useful technique when you want to search a string for a substring and then work with everything before the substring +split(delimiter, 1)anystring.is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element).split(delimiter, 1)split.- Python Library Reference summarizes all the string methods. -
- Python Library Reference documents the
stringmodule. +- Python Library Reference documents the
stringmodule. -- The Whole Python FAQ explains why
joinis a string method instead of a list method. +- The Whole Python FAQ explains why
joinis a string method instead of a list method.3.7.1. Historical Note on String Methods
-When I first learned Python, I expected
jointo be a method of a list, which would take the delimiter as an argument. Many people feel the same way, and there's a story - behind thejoinmethod. Prior to Python 1.6, strings didn't have all these useful methods. There was a separatestringmodule that contained all the string functions; each function took a string as its first argument. The functions were deemed - important enough to put onto the strings themselves, which made sense for functions likelower,upper, andsplit. But many hard-core Python programmers objected to the newjoinmethod, arguing that it should be a method of the list instead, or that it shouldn't move at all but simply stay a part of - the oldstringmodule (which still has a lot of useful stuff in it). I use the newjoinmethod exclusively, but you will see code written either way, and if it really bothers you, you can use the oldstring.joinfunction instead. +When I first learned Python, I expected
jointo be a method of a list, which would take the delimiter as an argument. Many people feel the same way, and there's a story + behind thejoinmethod. Prior to Python 1.6, strings didn't have all these useful methods. There was a separatestringmodule that contained all the string functions; each function took a string as its first argument. The functions were deemed + important enough to put onto the strings themselves, which made sense for functions likelower,upper, andsplit. But many hard-core Python programmers objected to the newjoinmethod, arguing that it should be a method of the list instead, or that it shouldn't move at all but simply stay a part of + the oldstringmodule (which still has a lot of useful stuff in it). I use the newjoinmethod exclusively, but you will see code written either way, and if it really bothers you, you can use the oldstring.joinfunction instead.3.8. Summary
-The
odbchelper.pyprogram and its output should now make perfect sense. +The
odbchelper.pyprogram and its output should now make perfect sense.def buildConnectionString(params): """Build a connection string from a dictionary of parameters. @@ -2031,7 +1871,7 @@ if __name__ == "__main__": "pwd":"secret" \ } print buildConnectionString(myParams)-Here is the output of
odbchelper.py:server=mpilgrim;uid=sa;database=master;pwd=secret+Here is the output of
odbchelper.py:server=mpilgrim;uid=sa;database=master;pwd=secretBefore diving into the next chapter, make sure you're comfortable doing all of these things:
@@ -2059,7 +1899,7 @@ functions whose names you don't even know ahead of time.
4.1. Diving In
Here is a complete, working Python program. You should understand a good deal about it just by looking at it. The numbered lines illustrate concepts covered in Chapter 2, Your First Python Program. Don't worry if the rest of the code looks intimidating; you'll learn all about it throughout this chapter. -
Example 4.1.
+apihelper.pyExample 4.1.
apihelper.pyIf you have not already done so, you can download this and other examples used in this book.
def info(object, spacing=10, collapse=1):![]()
![]()
"""Print methods and docstrings. @@ -2078,13 +1918,13 @@ if __name__ == "__main__":
-
This module has one function, info. According to its function declaration, it takes three parameters:object,spacing, andcollapse. The last two are actually optional parameters, as you'll see shortly. +This module has one function, info. According to its function declaration, it takes three parameters: object, spacing, and collapse. The last two are actually optional parameters, as you'll see shortly.@@ -2098,7 +1938,7 @@ if __name__ == "__main__": - ![]()
The infofunction has a multi-linedocstringthat succinctly describes the function's purpose. Note that no return value is mentioned; this function will be used solely +The infofunction has a multi-linedocstringthat succinctly describes the function's purpose. Note that no return value is mentioned; this function will be used solely for its effects, rather than its value.![]()
The if __name__trick allows this program do something useful when run by itself, without interfering with its use as a module for other programs. - In this case, the program simply prints out thedocstringof theinfofunction. + In this case, the program simply prints out thedocstringof theinfofunction.@@ -2108,9 +1948,9 @@ if __name__ == "__main__": info function is designed to be used by you, the programmer, while working in the Python IDE. It takes any object that has functions or methods (like a module, which has functions, or a list, which has methods) and +
The
infofunction is designed to be used by you, the programmer, while working in the Python IDE. It takes any object that has functions or methods (like a module, which has functions, or a list, which has methods) and prints out the functions and theirdocstrings. -Example 4.2. Sample Usage of
apihelper.py>>> from apihelper import info +Example 4.2. Sample Usage of
apihelper.py>>> from apihelper import info >>> li = [] >>> info(li) append L.append(object) -- append object to end @@ -2121,8 +1961,8 @@ insert L.insert(index, object) -- insert object before index pop L.pop([index]) -> item -- remove and return item at index (default last) remove L.remove(value) -- remove first occurrence of value reverse L.reverse() -- reverse *IN PLACE* -sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1, 0, 1By default the output is formatted to be easy to read. Multi-line
docstrings are collapsed into a single long line, but this option can be changed by specifying0for thecollapseargument. If the function names are longer than 10 characters, you can specify a larger value for thespacingargument to make the output easier to read. -Example 4.3. Advanced Usage of
apihelper.py>>> import odbchelper +sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1, 0, 1By default the output is formatted to be easy to read. Multi-line
docstrings are collapsed into a single long line, but this option can be changed by specifying0for thecollapseargument. If the function names are longer than 10 characters, you can specify a larger value for thespacingargument to make the output easier to read. +Example 4.3. Advanced Usage of
apihelper.py>>> import odbchelper >>> info(odbchelper) buildConnectionString Build a connection string from a dictionary Returns string. >>> info(odbchelper, 30) @@ -2135,11 +1975,11 @@ buildConnectionString Build a connection string from a dictionary ReturPython allows function arguments to have default values; if the function is called without the argument, the argument gets its default value. Futhermore, arguments can be specified in any order by using named arguments. Stored procedures in SQL Server Transact/SQL can do this, so if you're a SQL Server scripting guru, you can skim this part.
-Here is an example of
info, a function with two optional arguments:-def info(object, spacing=10, collapse=1):
spacingandcollapseare optional, because they have default values defined.objectis required, because it has no default value. Ifinfois called with only one argument,spacingdefaults to10andcollapsedefaults to1. Ifinfois called with two arguments,collapsestill defaults to1. -Say you want to specify a value for
collapsebut want to accept the default value forspacing. In most languages, you would be out of luck, because you would need to call the function with three arguments. But in +Here is an example of
info, a function with two optional arguments:+def info(object, spacing=10, collapse=1):spacing and collapse are optional, because they have default values defined. object is required, because it has no default value. If
infois called with only one argument, spacing defaults to10and collapse defaults to1. Ifinfois called with two arguments, collapse still defaults to1. +Say you want to specify a value for collapse but want to accept the default value for spacing. In most languages, you would be out of luck, because you would need to call the function with three arguments. But in Python, arguments can be specified by name, in any order. -
Example 4.4. Valid Calls of
info+Example 4.4. Valid Calls of
infoinfo(odbchelper)info(odbchelper, 12)
info(odbchelper, collapse=0)
@@ -2148,25 +1988,25 @@ info(spacing=15, object=odbchelper)
-
With only one argument, spacinggets its default value of10andcollapsegets its default value of1. +With only one argument, spacing gets its default value of 10and collapse gets its default value of1.- ![]()
With two arguments, collapsegets its default value of1. +With two arguments, collapse gets its default value of 1.- ![]()
Here you are naming the collapseargument explicitly and specifying its value.spacingstill gets its default value of10. +Here you are naming the collapse argument explicitly and specifying its value. spacing still gets its default value of 10.@@ -2187,13 +2027,13 @@ time, you'll call functions the “normal” way, but you always have th - ![]()
Even required arguments (like object, which has no default value) can be named, and named arguments can appear in any order. +Even required arguments (like object, which has no default value) can be named, and named arguments can appear in any order. - Python Tutorial discusses exactly when and how default arguments are evaluated, which matters when the default value is a list or an expression with side effects. -
4.3. Using
+type,str,dir, and Other Built-In Functions4.3. Using
type,str,dir, and Other Built-In FunctionsPython has a small set of extremely useful built-in functions. All other functions are partitioned off into modules. This was actually a conscious design decision, to keep the core language from getting bloated like other scripting languages (cough cough, Visual Basic). -
4.3.1. The
-typeFunctionThe
typefunction returns the datatype of any arbitrary object. The possible types are listed in thetypesmodule. This is useful for helper functions that can handle several types of data. -Example 4.5. Introducing
type>>> type(1)+
4.3.1. The
+typeFunctionThe
typefunction returns the datatype of any arbitrary object. The possible types are listed in thetypesmodule. This is useful for helper functions that can handle several types of data. +Example 4.5. Introducing
type>>> type(1)<type 'int'> >>> li = [] >>> type(li)
@@ -2208,32 +2048,32 @@ True
- ![]()
typetakes anything -- and I mean anything -- and returns its datatype. Integers, strings, lists, dictionaries, tuples, functions, +typetakes anything -- and I mean anything -- and returns its datatype. Integers, strings, lists, dictionaries, tuples, functions, classes, modules, even types are acceptable.- ![]()
typecan take a variable and return its datatype. +typecan take a variable and return its datatype.- ![]()
typealso works on modules. +typealso works on modules.- - ![]()
You can use the constants in the typesmodule to compare types of objects. This is what theinfofunction does, as you'll see shortly. +You can use the constants in the typesmodule to compare types of objects. This is what theinfofunction does, as you'll see shortly.4.3.2. The
-strFunctionThe
strcoerces data into a string. Every datatype can be coerced into a string. -Example 4.6. Introducing
str+4.3.2. The
+strFunctionThe
strcoerces data into a string. Every datatype can be coerced into a string. +Example 4.6. Introducing
str>>> str(1)'1' >>> horsemen = ['war', 'pestilence', 'famine'] @@ -2250,32 +2090,32 @@ True
- ![]()
For simple datatypes like integers, you would expect strto work, because almost every language has a function to convert an integer to a string. +For simple datatypes like integers, you would expect strto work, because almost every language has a function to convert an integer to a string.- ![]()
However, strworks on any object of any type. Here it works on a list which you've constructed in bits and pieces. +However, strworks on any object of any type. Here it works on a list which you've constructed in bits and pieces.- ![]()
stralso works on modules. Note that the string representation of the module includes the pathname of the module on disk, so +stralso works on modules. Note that the string representation of the module includes the pathname of the module on disk, so yours will be different.- - ![]()
A subtle but important behavior of stris that it works onNone, the Python null value. It returns the string'None'. You'll use this to your advantage in theinfofunction, as you'll see shortly. +A subtle but important behavior of stris that it works onNone, the Python null value. It returns the string'None'. You'll use this to your advantage in theinfofunction, as you'll see shortly.At the heart of the
infofunction is the powerfuldirfunction.dirreturns a list of the attributes and methods of any object: modules, functions, strings, lists, dictionaries... pretty much +At the heart of the
infofunction is the powerfuldirfunction.dirreturns a list of the attributes and methods of any object: modules, functions, strings, lists, dictionaries... pretty much anything. -Example 4.7. Introducing
dir>>> li = [] +Example 4.7. Introducing
dir>>> li = [] >>> dir(li)['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] @@ -2289,25 +2129,25 @@ True
- ![]()
liis a list, soreturns a list of all the methods of a list. Note that the returned list contains the names of the methods as strings, not +dir(li)li is a list, so returns a list of all the methods of a list. Note that the returned list contains the names of the methods as strings, not the methods themselves.dir(li)- ![]()
dis a dictionary, soreturns a list of the names of dictionary methods. At least one of these,dir(d)keys, should look familiar. +d is a dictionary, so returns a list of the names of dictionary methods. At least one of these,dir(d)keys, should look familiar.- - ![]()
This is where it really gets interesting. odbchelperis a module, soreturns a list of all kinds of stuff defined in the module, including built-in attributes, likedir(odbchelper)__name__,__doc__, and whatever other attributes and methods you define. In this case,odbchelperhas only one user-defined method, thebuildConnectionStringfunction described in Chapter 2. +This is where it really gets interesting. odbchelperis a module, soreturns a list of all kinds of stuff defined in the module, including built-in attributes, likedir(odbchelper)__name__,__doc__, and whatever other attributes and methods you define. In this case,odbchelperhas only one user-defined method, thebuildConnectionStringfunction described in Chapter 2.Finally, the
callablefunction takes any object and returnsTrueif the object can be called, orFalseotherwise. Callable objects include functions, class methods, even classes themselves. (More on classes in the next chapter.) -Example 4.8. Introducing
callable+Finally, the
callablefunction takes any object and returnsTrueif the object can be called, orFalseotherwise. Callable objects include functions, class methods, even classes themselves. (More on classes in the next chapter.) +Example 4.8. Introducing
callable>>> import string >>> string.punctuation'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' @@ -2329,40 +2169,40 @@ True
- ![]()
The functions in the stringmodule are deprecated (although many people still use thejoinfunction), but the module contains a lot of useful constants like thisstring.punctuation, which contains all the standard punctuation characters. +The functions in the stringmodule are deprecated (although many people still use thejoinfunction), but the module contains a lot of useful constants like this string.punctuation, which contains all the standard punctuation characters.- ![]()
string.joinis a function that joins a list of strings. +string.joinis a function that joins a list of strings.- ![]()
string.punctuationis not callable; it is a string. (A string does have callable methods, but the string itself is not callable.) +string.punctuation is not callable; it is a string. (A string does have callable methods, but the string itself is not callable.) - ![]()
string.joinis callable; it's a function that takes two arguments. +string.joinis callable; it's a function that takes two arguments.- ![]()
Any callable object may have a docstring. By using thecallablefunction on each of an object's attributes, you can determine which attributes you care about (methods, functions, classes) +Any callable object may have a docstring. By using thecallablefunction on each of an object's attributes, you can determine which attributes you care about (methods, functions, classes) and which you want to ignore (constants and so on) without knowing anything about the object ahead of time.4.3.3. Built-In Functions
-
type,str,dir, and all the rest of Python's built-in functions are grouped into a special module called__builtin__. (That's two underscores before and after.) If it helps, you can think of Python automatically executingfrom __builtin__ import *on startup, which imports all the “built-in” functions into the namespace so you can use them directly. +
type,str,dir, and all the rest of Python's built-in functions are grouped into a special module called__builtin__. (That's two underscores before and after.) If it helps, you can think of Python automatically executingfrom __builtin__ import *on startup, which imports all the “built-in” functions into the namespace so you can use them directly.The advantage of thinking like this is that you can access all the built-in functions and attributes as a group by getting - information about the
__builtin__module. And guess what, Python has a function calledinfo. Try it yourself and skim through the list now. We'll dive into some of the more important functions later. (Some of the - built-in error classes, likeAttributeError, should already look familiar.) + information about the__builtin__module. And guess what, Python has a function calledinfo. Try it yourself and skim through the list now. We'll dive into some of the more important functions later. (Some of the + built-in error classes, likeAttributeError, should already look familiar.)Example 4.9. Built-in Attributes and Functions
>>> from apihelper import info >>> import __builtin__ >>> info(__builtin__, 20) @@ -2391,10 +2231,10 @@ IOError I/O operation failed.- Python Library Reference documents all the built-in functions and all the built-in exceptions. -
4.4. Getting Object References With
+getattr4.4. Getting Object References With
getattrYou already know that Python functions are objects. What you don't know is that you can get a reference to a function without knowing its name until run-time, by using the -
getattrfunction. -Example 4.10. Introducing
getattr>>> li = ["Larry", "Curly"] +getattrfunction. +Example 4.10. Introducing
getattr>>> li = ["Larry", "Curly"] >>> li.pop<built-in method pop of list object at 010DF884> >>> getattr(li, "pop")
@@ -2412,38 +2252,38 @@ AttributeError: 'tuple' object has no attribute 'pop'
-
This gets a reference to the popmethod of the list. Note that this is not calling thepopmethod; that would beli.pop(). This is the method itself. +This gets a reference to the popmethod of the list. Note that this is not calling thepopmethod; that would beli.pop(). This is the method itself.- ![]()
This also returns a reference to the popmethod, but this time, the method name is specified as a string argument to thegetattrfunction.getattris an incredibly useful built-in function that returns any attribute of any object. In this case, the object is a list, - and the attribute is thepopmethod. +This also returns a reference to the popmethod, but this time, the method name is specified as a string argument to thegetattrfunction.getattris an incredibly useful built-in function that returns any attribute of any object. In this case, the object is a list, + and the attribute is thepopmethod.- ![]()
In case it hasn't sunk in just how incredibly useful this is, try this: the return value of getattris the method, which you can then call just as if you had saidli.append("Moe")directly. But you didn't call the function directly; you specified the function name as a string instead. +In case it hasn't sunk in just how incredibly useful this is, try this: the return value of getattris the method, which you can then call just as if you had saidli.append("Moe")directly. But you didn't call the function directly; you specified the function name as a string instead.- ![]()
getattralso works on dictionaries. +getattralso works on dictionaries.- - ![]()
In theory, getattrwould work on tuples, except that tuples have no methods, sogetattrwill raise an exception no matter what attribute name you give. +In theory, getattrwould work on tuples, except that tuples have no methods, sogetattrwill raise an exception no matter what attribute name you give.4.4.1.
-getattrwith Modules
getattrisn't just for built-in datatypes. It also works on modules. -Example 4.11. The
getattrFunction inapihelper.py>>> import odbchelper +4.4.1.
+getattrwith Modules
getattrisn't just for built-in datatypes. It also works on modules. +Example 4.11. The
getattrFunction inapihelper.py>>> import odbchelper >>> odbchelper.buildConnectionString<function buildConnectionString at 00D18DD4> >>> getattr(odbchelper, "buildConnectionString")
@@ -2463,40 +2303,40 @@ True
- ![]()
This returns a reference to the buildConnectionStringfunction in theodbchelpermodule, which you studied in Chapter 2, Your First Python Program. (The hex address you see is specific to my machine; your output will be different.) +This returns a reference to the buildConnectionStringfunction in theodbchelpermodule, which you studied in Chapter 2, Your First Python Program. (The hex address you see is specific to my machine; your output will be different.)- ![]()
Using getattr, you can get the same reference to the same function. In general,is equivalent togetattr(object, "attribute")object.attribute. Ifobjectis a module, thenattributecan be anything defined in the module: a function, class, or global variable. +Using getattr, you can get the same reference to the same function. In general,is equivalent togetattr(object, "attribute")object.attribute. Ifobjectis a module, thenattributecan be anything defined in the module: a function, class, or global variable.- ![]()
And this is what you actually use in the infofunction.objectis passed into the function as an argument;methodis a string which is the name of a method or function. +And this is what you actually use in the infofunction. object is passed into the function as an argument; method is a string which is the name of a method or function.- ![]()
In this case, methodis the name of a function, which you can prove by getting itstype. +In this case, method is the name of a function, which you can prove by getting its type.- - ![]()
Since methodis a function, it is callable. +Since method is a function, it is callable. 4.4.2.
-getattrAs a DispatcherA common usage pattern of
getattris as a dispatcher. For example, if you had a program that could output data in a variety of different formats, you could +4.4.2.
+getattrAs a DispatcherA common usage pattern of
getattris as a dispatcher. For example, if you had a program that could output data in a variety of different formats, you could define separate functions for each output format and use a single dispatch function to call the right one.For example, let's imagine a program that prints site statistics in HTML, XML, and plain text formats. The choice of output format could be specified on the command line, or stored in a configuration - file. A
statsoutmodule defines three functions,output_html,output_xml, andoutput_text. Then the main program defines a single output function, like this: -Example 4.12. Creating a Dispatcher with
getattr+ file. Astatsoutmodule defines three functions,output_html,output_xml, andoutput_text. Then the main program defines a single output function, like this: +Example 4.12. Creating a Dispatcher with
getattrimport statsout def output(data, format="text"):@@ -2507,28 +2347,28 @@ def output(data, format="text"):
![]()
- ![]()
The outputfunction takes one required argument,data, and one optional argument,format. Ifformatis not specified, it defaults totext, and you will end up calling the plain text output function. +The outputfunction takes one required argument, data, and one optional argument, format. If format is not specified, it defaults totext, and you will end up calling the plain text output function.- ![]()
You concatenate the formatargument with "output_" to produce a function name, and then go get that function from thestatsoutmodule. This allows you to easily extend the program later to support other output formats, without changing this dispatch - function. Just add another function tostatsoutnamed, for instance,output_pdf, and pass "pdf" as theformatinto theoutputfunction. +You concatenate the format argument with "output_" to produce a function name, and then go get that function from the statsoutmodule. This allows you to easily extend the program later to support other output formats, without changing this dispatch + function. Just add another function tostatsoutnamed, for instance,output_pdf, and pass "pdf" as the format into theoutputfunction.- ![]()
Now you can simply call the output function in the same way as any other function. The output_functionvariable is a reference to the appropriate function from thestatsoutmodule. +Now you can simply call the output function in the same way as any other function. The output_function variable is a reference to the appropriate function from the statsoutmodule.Did you see the bug in the previous example? This is a very loose coupling of strings and functions, and there is no error - checking. What happens if the user passes in a format that doesn't have a corresponding function defined in
statsout? Well,getattrwill returnNone, which will be assigned tooutput_functioninstead of a valid function, and the next line that attempts to call that function will crash and raise an exception. That's + checking. What happens if the user passes in a format that doesn't have a corresponding function defined instatsout? Well,getattrwill returnNone, which will be assigned to output_function instead of a valid function, and the next line that attempts to call that function will crash and raise an exception. That's bad. -Luckily,
getattrtakes an optional third argument, a default value. -Example 4.13.
getattrDefault Values+Luckily,
getattrtakes an optional third argument, a default value. +Example 4.13.
getattrDefault Valuesimport statsout def output(data, format="text"): @@ -2539,17 +2379,17 @@ def output(data, format="text"):- - ![]()
This function call is guaranteed to work, because you added a third argument to the call to getattr. The third argument is a default value that is returned if the attribute or method specified by the second argument wasn't +This function call is guaranteed to work, because you added a third argument to the call to getattr. The third argument is a default value that is returned if the attribute or method specified by the second argument wasn't found.As you can see,
getattris quite powerful. It is the heart of introspection, and you'll see even more powerful examples of it in later chapters. +As you can see,
getattris quite powerful. It is the heart of introspection, and you'll see even more powerful examples of it in later chapters.4.5. Filtering Lists
As you know, Python has powerful capabilities for mapping lists into other lists, via list comprehensions (Section 3.6, “Mapping Lists”). This can be combined with a filtering mechanism, where some elements in the list are mapped while others are skipped entirely.
Here is the list filtering syntax:
-[mapping-expressionforelementinsource-listiffilter-expression]This is an extension of the list comprehensions that you know and love. The first two thirds are the same; the last part, starting with the
if, is the filter expression. A filter expression can be any expression that evaluates true or false (which in Python can be almost anything). Any element for which the filter expression evaluates true will be included in the mapping. All other elements are ignored, +[mapping-expressionforelementinsource-listiffilter-expression]This is an extension of the list comprehensions that you know and love. The first two thirds are the same; the last part, starting with the
if, is the filter expression. A filter expression can be any expression that evaluates true or false (which in Python can be almost anything). Any element for which the filter expression evaluates true will be included in the mapping. All other elements are ignored, so they are never put through the mapping expression and are not included in the output list.Example 4.14. Introducing List Filtering
>>> li = ["a", "mpilgrim", "foo", "b", "c", "b", "d", "d"] >>> [elem for elem in li if len(elem) > 1]@@ -2577,25 +2417,25 @@ so they are never put through the mapping expression and are not included in the
- - ![]()
countis a list method that returns the number of times a value occurs in a list. You might think that this filter would eliminate +countis a list method that returns the number of times a value occurs in a list. You might think that this filter would eliminate duplicates from a list, returning a list containing only one copy of each value in the original list. But it doesn't, because values that appear twice in the original list (in this case,bandd) are excluded completely. There are ways of eliminating duplicates from a list, but filtering is not the solution.Let's id="apihelper.filter.care" get back to this line from
apihelper.py:+Let's id="apihelper.filter.care" get back to this line from
apihelper.py:methodList = [method for method in dir(object) if callable(getattr(object, method))]This looks complicated, and it is complicated, but the basic structure is the same. The whole filter expression returns a -list, which is assigned to the
methodListvariable. The first half of the expression is the list mapping part. The mapping expression is an identity expression, -which it returns the value of each element.returns a list ofdir(object)object's attributes and methods -- that's the list you're mapping. So the only new part is the filter expression after theif. -The filter expression looks scary, but it's not. You already know about
callable,getattr, andin. As you saw in the previous section, the expressiongetattr(object, method)returns a function object ifobjectis a module andmethodis the name of a function in that module. -So this expression takes an object (named
object). Then it gets a list of the names of the object's attributes, methods, functions, and a few other things. Then it filters +list, which is assigned to the methodList variable. The first half of the expression is the list mapping part. The mapping expression is an identity expression, +which it returns the value of each element.returns a list of object's attributes and methods -- that's the list you're mapping. So the only new part is the filter expression after thedir(object)if. +The filter expression looks scary, but it's not. You already know about
callable,getattr, andin. As you saw in the previous section, the expressiongetattr(object, method)returns a function object if object is a module and method is the name of a function in that module. +So this expression takes an object (named object). Then it gets a list of the names of the object's attributes, methods, functions, and a few other things. Then it filters that list to weed out all the stuff that you don't care about. You do the weeding out by taking the name of each attribute/method/function -and getting a reference to the real thing, via the
getattrfunction. Then you check to see if that object is callable, which will be any methods and functions, both built-in (like -thepopmethod of a list) and user-defined (like thebuildConnectionStringfunction of theodbchelpermodule). You don't care about other attributes, like the__name__attribute that's built in to every module. +and getting a reference to the real thing, via thegetattrfunction. Then you check to see if that object is callable, which will be any methods and functions, both built-in (like +thepopmethod of a list) and user-defined (like thebuildConnectionStringfunction of theodbchelpermodule). You don't care about other attributes, like the__name__attribute that's built in to every module.Further Reading on Filtering Lists
-
- Python Tutorial discusses another way to filter lists using the built-in
filterfunction. +- Python Tutorial discusses another way to filter lists using the built-in
filterfunction.4.6. The Peculiar Nature of
@@ -2611,7 +2451,7 @@ theandandorpopmethod of a list) and user-defined (like t- - ![]()
When using and, values are evaluated in a boolean context from left to right.0,'',[],(),{}, andNoneare false in a boolean context; everything else is true. Well, almost everything. By default, instances of classes are +When using @@ -2663,11 +2503,11 @@ theand, values are evaluated in a boolean context from left to right.0,'',[],(),{}, andNoneare false in a boolean context; everything else is true. Well, almost everything. By default, instances of classes are true in a boolean context, but you can define special methods in your class to make an instance evaluate to false. You'll learn all about classes and special methods in Chapter 5. If all values are true in a boolean context,andreturns the last value. In this case,andevaluates'a', which is true, then'b', which is true, and returns'b'.popmethod of a list) and user-defined (like t![]()
Note that orevaluates values only until it finds one that is true in a boolean context, and then it ignores the rest. This distinction - is important if some values can have side effects. Here, the functionsidefxis never called, becauseorevaluates'a', which is true, and returns'a'immediately. + is important if some values can have side effects. Here, the functionsidefxis never called, becauseorevaluates'a', which is true, and returns'a'immediately.If you're a C hacker, you are certainly familiar with the
bool ?expression, which evaluates toa:baifboolis true, andbotherwise. Because of the wayandandorwork in Python, you can accomplish the same thing. +If you're a C hacker, you are certainly familiar with the
bool ? a : bexpression, which evaluates to a ifboolis true, and b otherwise. Because of the wayandandorwork in Python, you can accomplish the same thing.4.6.1. Using the
and-orTrickExample 4.17. Introducing the
and-orTrick>>> a = "first" >>> b = "second" @@ -2680,18 +2520,18 @@ thepopmethod of a list) and user-defined (like t- ![]()
This syntax looks similar to the bool ?expression in C. The entire expression is evaluated from left to right, so thea:bandis evaluated first.1 and 'first'evalutes to'first', then'first' or 'second'evalutes to'first'. +This syntax looks similar to the bool ? a : bexpression in C. The entire expression is evaluated from left to right, so theandis evaluated first.1 and 'first'evalutes to'first', then'first' or 'second'evalutes to'first'.- ![]()
0 and 'first'evalutes toFalse, and then0 or 'second'evaluates to'second'. +0 and 'first'evalutes toFalse, and then0 or 'second'evaluates to'second'.However, since this Python expression is simply boolean logic, and not a special construct of the language, there is one extremely important difference - between this
and-ortrick in Python and thebool ?syntax in C. If the value ofa:bais false, the expression will not work as you would expect it to. (Can you tell I was bitten by this? More than once?) + between thisand-ortrick in Python and thebool ? a : bsyntax in C. If the value of a is false, the expression will not work as you would expect it to. (Can you tell I was bitten by this? More than once?)Example 4.18. When the
and-orTrick Fails>>> a = "" >>> b = "second" >>> 1 and a or b@@ -2700,12 +2540,12 @@ the
popmethod of a list) and user-defined (like t- - ![]()
Since ais an empty string, which Python considers false in a boolean context,1 and ''evalutes to'', and then'' or 'second'evalutes to'second'. Oops! That's not what you wanted. +Since a is an empty string, which Python considers false in a boolean context, 1 and ''evalutes to'', and then'' or 'second'evalutes to'second'. Oops! That's not what you wanted.The
and-ortrick,bool and, will not work like the C expressionaorbbool ?whena:bais false in a boolean context. -The real trick behind the
and-ortrick, then, is to make sure that the value ofais never false. One common way of doing this is to turnainto[anda]binto[, then taking the first element of the returned list, which will be eitherb]aorb. +The
and-ortrick,bool and a or b, will not work like the C expressionbool ? a : bwhen a is false in a boolean context. +The real trick behind the
and-ortrick, then, is to make sure that the value of a is never false. One common way of doing this is to turn a into[a]and b into[b], then taking the first element of the returned list, which will be either a or b.Example 4.19. Using the
and-orTrick Safely>>> a = "" >>> b = "second" >>> (1 and [a] or [b])[0]@@ -2714,12 +2554,12 @@ the
popmethod of a list) and user-defined (like t- ![]()
Since [is a non-empty list, it is never false. Even ifa]ais0or''or some other false value, the list[is true because it has one element. +a]Since [a]is a non-empty list, it is never false. Even if a is0or''or some other false value, the list[a]is true because it has one element.By now, this trick may seem like more trouble than it's worth. You could, after all, accomplish the same thing with an
ifstatement, so why go through all this fuss? Well, in many cases, you are choosing between two constant values, so you can - use the simpler syntax and not worry, because you know that theavalue will always be true. And even if you need to use the more complicated safe form, there are good reasons to do so. + use the simpler syntax and not worry, because you know that the a value will always be true. And even if you need to use the more complicated safe form, there are good reasons to do so. For example, there are some cases in Python whereifstatements are not allowed, such as inlambdafunctions.Further Reading on the
@@ -2770,10 +2610,10 @@ aand-orTricklambdafunction; if you need something more complex, define a nor4.7.1. Real-World
lambdaFunctions-Here are the
lambdafunctions inapihelper.py:+Here are the
lambdafunctions inapihelper.py:processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)Notice that this uses the simple form of the
and-ortrick, which is okay, because alambdafunction is always true in a boolean context. (That doesn't mean that alambdafunction can't return a false value. The function is always true; its return value could be anything.) -Also notice that you're using the
splitfunction with no arguments. You've already seen it used with one or two arguments, but without any arguments it splits on whitespace. -Example 4.21.
splitWith No Arguments>>> s = "this is\na\ttest"+
Also notice that you're using the
splitfunction with no arguments. You've already seen it used with one or two arguments, but without any arguments it splits on whitespace. +Example 4.21.
splitWith No Arguments>>> s = "this is\na\ttest">>> print s this is a test @@ -2791,19 +2631,19 @@ a test
- ![]()
splitwithout any arguments splits on whitespace. So three spaces, a carriage return, and a tab character are all the same. +splitwithout any arguments splits on whitespace. So three spaces, a carriage return, and a tab character are all the same.- - ![]()
You can normalize whitespace by splitting a string with splitand then rejoining it withjoin, using a single space as a delimiter. This is what theinfofunction does to collapse multi-linedocstrings into a single line. +You can normalize whitespace by splitting a string with splitand then rejoining it withjoin, using a single space as a delimiter. This is what theinfofunction does to collapse multi-linedocstrings into a single line.So what is the
infofunction actually doing with theselambdafunctions,splits, andand-ortricks? +So what is the
infofunction actually doing with theselambdafunctions,splits, andand-ortricks?- processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
processFuncis now a function, but which function it is depends on the value of thecollapsevariable. Ifcollapseis true,will collapse whitespace; otherwise,processFunc(string)will return its argument unchanged. + processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)processFunc(string)processFunc is now a function, but which function it is depends on the value of the collapse variable. If collapse is true,
processFunc(string)will collapse whitespace; otherwise,processFunc(string)will return its argument unchanged.To do this in a less robust language, like Visual Basic, you would probably create a function that took a string and a
collapseargument and used anifstatement to decide whether to collapse the whitespace or not, then returned the appropriate value. This would be inefficient, because the function would need to handle every possible case. Every time you called it, it would need to decide whether to collapse whitespace before it could give you what you wanted. In Python, you can take that decision logic out of the function and define alambdafunction that is custom-tailored to give you exactly (and only) what you want. This is more efficient, more elegant, and @@ -2823,14 +2663,14 @@ a test is easy, because everything you need is already set up just the way you need it. All the dominoes are in place; it's time to knock them down.-This is the meat of
apihelper.py:+This is the meat of
apihelper.py:print "\n".join(["%s %s" % (method.ljust(spacing), processFunc(str(getattr(object, method).__doc__))) for method in methodList])Note that this is one command, split over multiple lines, but it doesn't use the line continuation character (
\). Remember when I said that some expressions can be split into multiple lines without using a backslash? A list comprehension is one of those expressions, since the entire expression is contained in square brackets.Now, let's take it from the end and work backwards. The
-for method in methodListshows that this is a list comprehension. As you know,
methodListis a list of all the methods you care about inobject. So you're looping through that list withmethod. +for method in methodListshows that this is a list comprehension. As you know, methodList is a list of all the methods you care about in object. So you're looping through that list with method.
Example 4.22. Getting a
docstringDynamically>>> import odbchelper >>> object = odbchelper>>> method = 'buildConnectionString'
@@ -2844,19 +2684,19 @@ for method in methodList
- ![]()
In the infofunction,objectis the object you're getting help on, passed in as an argument. +In the infofunction, object is the object you're getting help on, passed in as an argument.- ![]()
As you're looping through methodList,methodis the name of the current method. +As you're looping through methodList, method is the name of the current method. - ![]()
Using the getattrfunction, you're getting a reference to themethodfunction in theobjectmodule. +Using the getattrfunction, you're getting a reference to themethodfunction in theobjectmodule.@@ -2866,8 +2706,8 @@ for method in methodList -The next piece of the puzzle is the use of
straround thedocstring. As you may recall,stris a built-in function that coerces data into a string. But adocstringis always a string, so why bother with thestrfunction? The answer is that not every function has adocstring, and if it doesn't, its__doc__attribute isNone. -Example 4.23. Why Use
stron adocstring?>>> >>> def foo(): print 2 +The next piece of the puzzle is the use of
straround thedocstring. As you may recall,stris a built-in function that coerces data into a string. But adocstringis always a string, so why bother with thestrfunction? The answer is that not every function has adocstring, and if it doesn't, its__doc__attribute isNone. +Example 4.23. Why Use
stron adocstring?>>> >>> def foo(): print 2 >>> >>> foo() 2 >>> >>> foo.__doc__@@ -2892,7 +2732,7 @@ True
@@ -2905,9 +2745,9 @@ True - - ![]()
The strfunction takes the null value and returns a string representation of it,'None'. +The strfunction takes the null value and returns a string representation of it,'None'.Now that you are guaranteed to have a string, you can pass the string to
processFunc, which you have already defined as a function that either does or doesn't collapse whitespace. Now you see why it was important to usestrto convert aNonevalue into a string representation.processFuncis assuming a string argument and calling itssplitmethod, which would crash if you passed itNonebecauseNonedoesn't have asplitmethod. -Stepping back even further, you see that you're using string formatting again to concatenate the return value of
processFuncwith the return value ofmethod'sljustmethod. This is a new string method that you haven't seen before. -Example 4.24. Introducing
ljust>>> s = 'buildConnectionString' +Now that you are guaranteed to have a string, you can pass the string to processFunc, which you have already defined as a function that either does or doesn't collapse whitespace. Now you see why it was important to use
strto convert aNonevalue into a string representation. processFunc is assuming a string argument and calling itssplitmethod, which would crash if you passed itNonebecauseNonedoesn't have asplitmethod. +Stepping back even further, you see that you're using string formatting again to concatenate the return value of processFunc with the return value of method's
ljustmethod. This is a new string method that you haven't seen before. +Example 4.24. Introducing
ljust>>> s = 'buildConnectionString' >>> s.ljust(30)'buildConnectionString ' >>> s.ljust(20)
@@ -2916,17 +2756,17 @@ True
- ![]()
ljustpads the string with spaces to the given length. This is what theinfofunction uses to make two columns of output and line up all thedocstrings in the second column. +ljustpads the string with spaces to the given length. This is what theinfofunction uses to make two columns of output and line up all thedocstrings in the second column.- - ![]()
If the given length is smaller than the length of the string, ljustwill simply return the string unchanged. It never truncates the string. +If the given length is smaller than the length of the string, ljustwill simply return the string unchanged. It never truncates the string.You're almost finished. Given the padded method name from the
ljustmethod and the (possibly collapsed)docstringfrom the call toprocessFunc, you concatenate the two and get a single string. Since you're mappingmethodList, you end up with a list of strings. Using thejoinmethod of the string"\n", you join this list into a single string, with each element of the list on a separate line, and print the result. +You're almost finished. Given the padded method name from the
ljustmethod and the (possibly collapsed)docstringfrom the call to processFunc, you concatenate the two and get a single string. Since you're mapping methodList, you end up with a list of strings. Using thejoinmethod of the string"\n", you join this list into a single string, with each element of the list on a separate line, and print the result.Example 4.25. Printing a List
>>> li = ['a', 'b', 'c'] >>> print "\n".join(li)a @@ -2946,7 +2786,7 @@ c
(method.ljust(spacing), processFunc(str(getattr(object, method).__doc__))) for method in methodList])4.9. Summary
-The
apihelper.pyprogram and its output should now make perfect sense. +The
apihelper.pyprogram and its output should now make perfect sense.def info(object, spacing=10, collapse=1): """Print methods and docstrings. @@ -2961,7 +2801,7 @@ def info(object, spacing=10, collapse=1): if __name__ == "__main__": print info.__doc__-Here is the output of
apihelper.py:>>> from apihelper import info +Here is the output of
apihelper.py:>>> from apihelper import info >>> li = [] >>> info(li) append L.append(object) -- append object to end @@ -2977,9 +2817,9 @@ sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1,
- Defining and calling functions with optional and named arguments -
- Using
strto coerce any arbitrary value into a string representation +- Using
strto coerce any arbitrary value into a string representation -- Using
getattrto get references to functions and other attributes dynamically +- Using
getattrto get references to functions and other attributes dynamically- Extending the list comprehension syntax to do list filtering
- Recognizing the
and-ortrick and using it safely @@ -2995,7 +2835,7 @@ sort L.sort([cmpfunc]) -- sort *IN PLACE*; if given, cmpfunc(x, y) -> -1,5.1. Diving In
Here is a complete, working Python program. Read the
docstrings of the module, the classes, and the functions to get an overview of what this program does and how it works. As usual, don't worry about the stuff you don't understand; that's what the rest of the chapter is for. -Example 5.1.
+fileinfo.pyExample 5.1.
fileinfo.pyIf you have not already done so, you can download this and other examples used in this book.
"""Framework for getting filetype-specific metadata. @@ -3131,18 +2971,18 @@ title=Spinning genre=255 name=/music/_singles/spinning.mp3 year=2000 -comment=http://mp3.com/artists/95/vxp5.2. Importing Modules Using
-from module importPython has two ways of importing modules. Both are useful, and you should know when to use each. One way,
import module, you've already seen in Section 2.4, “Everything Is an Object”. The other way accomplishes the same thing, but it has subtle and important differences. +comment=http://mp3.com/artists/95/vxp5.2. Importing Modules Using
+from module importPython has two ways of importing modules. Both are useful, and you should know when to use each. One way,
import module, you've already seen in Section 2.4, “Everything Is an Object”. The other way accomplishes the same thing, but it has subtle and important differences.-Here is the basic
from module importsyntax:+Here is the basic
from module importsyntax:from UserDict import UserDict -This is similar to the
import modulesyntax that you know and love, but with an important difference: the attributes and methods of the imported moduletypesare imported directly into the local namespace, so they are available directly, without qualification by module name. You -can import individual items or usefrom module import *to import everything.+
-This is similar to the
import modulesyntax that you know and love, but with an important difference: the attributes and methods of the imported moduletypesare imported directly into the local namespace, so they are available directly, without qualification by module name. You +can import individual items or usefrom module import *to import everything.
- from module import *in Python is likeuse modulein Perl;import modulein Python is likerequire modulein Perl. +from module import *in Python is likeuse modulein Perl;import modulein Python is likerequire modulein Perl.@@ -3150,11 +2990,11 @@ can import individual items or use
-from module- from module import *in Python is likeimport module.*in Java;import modulein Python is likeimport modulein Java. +from module import *in Python is likeimport module.*in Java;import modulein Python is likeimport modulein Java.Example 5.2.
import modulevs.from module import>>> import types +Example 5.2.
import modulevs.from module import>>> import types >>> types.FunctionType<type 'function'> >>> FunctionType
@@ -3168,36 +3008,36 @@ NameError: There is no variable named 'FunctionType'
- ![]()
The typesmodule contains no methods; it just has attributes for each Python object type. Note that the attribute,FunctionType, must be qualified by the module name,types. +The typesmodule contains no methods; it just has attributes for each Python object type. Note that the attribute,FunctionType, must be qualified by the module name,types.- ![]()
FunctionTypeby itself has not been defined in this namespace; it exists only in the context oftypes. +FunctionTypeby itself has not been defined in this namespace; it exists only in the context oftypes.- ![]()
This syntax imports the attribute FunctionTypefrom thetypesmodule directly into the local namespace. +This syntax imports the attribute FunctionTypefrom thetypesmodule directly into the local namespace.- ![]()
Now FunctionTypecan be accessed directly, without reference totypes. +Now FunctionTypecan be accessed directly, without reference totypes.When should you use
from module import? +When should you use
from module import?-
- If you will be accessing attributes and methods often and don't want to type the module name over and over, use
from module import. +- If you will be accessing attributes and methods often and don't want to type the module name over and over, use
from module import. -- If you want to selectively import some attributes and methods but not others, use
from module import. +- If you want to selectively import some attributes and methods but not others, use
from module import. -- If the module contains attributes or functions with the same name as ones in your module, you must use
import moduleto avoid name conflicts. +- If the module contains attributes or functions with the same name as ones in your module, you must use
import moduleto avoid name conflicts.Other than that, it's just a matter of style, and you will see Python code written both ways.
@@ -3213,9 +3053,9 @@ NameError: There is no variable named 'FunctionType'
@@ -3286,8 +3126,8 @@ class FileInfo(UserDict):Further Reading on Module Importing Techniques
-
- eff-bot has more to say on
import modulevs.from module import. +- eff-bot has more to say on
import modulevs.from module import. -- Python Tutorial discusses advanced import techniques, including
from module import *. +- Python Tutorial discusses advanced import techniques, including
from module import *.5.3. Defining Classes
@@ -3230,7 +3070,7 @@ class Loaf:![]()
- ![]()
The name of this class is Loaf, and it doesn't inherit from any other class. Class names are usually capitalized,EachWordLikeThis, but this is only a convention, not a requirement. +The name of this class is Loaf, and it doesn't inherit from any other class. Class names are usually capitalized,EachWordLikeThis, but this is only a convention, not a requirement.@@ -3258,8 +3098,8 @@ class Loaf: ![]()
Of course, realistically, most classes will be inherited from other classes, and they will define their own class methods and attributes. But as you've just seen, there is nothing that a class absolutely must have, other than a name. In particular, -C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Python classes do have something similar to a constructor: the
__init__method. -Example 5.4. Defining the
FileInfoClass+C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Python classes do have something similar to a constructor: the__init__method. +Example 5.4. Defining the
FileInfoClassfrom UserDict import UserDict class FileInfo(UserDict):@@ -3267,9 +3107,9 @@ class FileInfo(UserDict):-
In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. So the FileInfoclass is inherited from theUserDictclass (which was imported from theUserDictmodule).UserDictis a class that acts like a dictionary, allowing you to essentially subclass the dictionary datatype and add your own behavior. - (There are similar classesUserListandUserStringwhich allow you to subclass lists and strings.) There is a bit of black magic behind this, which you will demystify later - in this chapter when you explore theUserDictclass in more depth. +In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. So the FileInfoclass is inherited from theUserDictclass (which was imported from theUserDictmodule).UserDictis a class that acts like a dictionary, allowing you to essentially subclass the dictionary datatype and add your own behavior. + (There are similar classesUserListandUserStringwhich allow you to subclass lists and strings.) There is a bit of black magic behind this, which you will demystify later + in this chapter when you explore theUserDictclass in more depth.FileInfo class using the
__init__method. -Example 5.5. Initializing the
FileInfoClass+This example shows the initialization of the
FileInfoclass using the__init__method. +Example 5.5. Initializing the
FileInfoClassclass FileInfo(UserDict): "store file metadata"def __init__(self, filename=None):
![]()
@@ -3301,23 +3141,23 @@ class FileInfo(UserDict):- ![]()
__init__is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor - of the class. It's tempting, because it looks like a constructor (by convention,__init__is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance - of the class), and even sounds like one (“init” certainly suggests a constructor-ish nature). Incorrect, because the object has already been constructed by the time__init__is called, and you already have a valid reference to the new instance of the class. But__init__is the closest thing you're going to get to a constructor in Python, and it fills much the same role. +__init__is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor + of the class. It's tempting, because it looks like a constructor (by convention,__init__is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance + of the class), and even sounds like one (“init” certainly suggests a constructor-ish nature). Incorrect, because the object has already been constructed by the time__init__is called, and you already have a valid reference to the new instance of the class. But__init__is the closest thing you're going to get to a constructor in Python, and it fills much the same role.- ![]()
The first argument of every class method, including __init__, is always a reference to the current instance of the class. By convention, this argument is always namedself. In the__init__method,selfrefers to the newly created object; in other class methods, it refers to the instance whose method was called. Although +The first argument of every class method, including __init__, is always a reference to the current instance of the class. By convention, this argument is always namedself. In the__init__method,selfrefers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specifyselfexplicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.@@ -3330,7 +3170,7 @@ class FileInfo(UserDict): - - ![]()
__init__methods can take any number of arguments, and just like functions, the arguments can be defined with default values, making - them optional to the caller. In this case,filenamehas a default value ofNone, which is the Python null value. +__init__methods can take any number of arguments, and just like functions, the arguments can be defined with default values, making + them optional to the caller. In this case, filename has a default value ofNone, which is the Python null value.Example 5.6. Coding the
FileInfoClass+Example 5.6. Coding the
FileInfoClassclass FileInfo(UserDict): "store file metadata" def __init__(self, filename=None): @@ -3348,18 +3188,18 @@ class FileInfo(UserDict):- ![]()
I told you that this class acts like a dictionary, and here is the first sign of it. You're assigning the argument filenameas the value of this object'snamekey. +I told you that this class acts like a dictionary, and here is the first sign of it. You're assigning the argument filename as the value of this object's namekey.- - ![]()
Note that the __init__method never returns a value. +Note that the __init__method never returns a value.5.3.2. Knowing When to Use
-selfand__init__When defining your class methods, you must explicitly list
selfas the first argument for each method, including__init__. When you call a method of an ancestor class from within your class, you must include theselfargument. But when you call your class method from outside, you do not specify anything for theselfargument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent, +5.3.2. Knowing When to Use
+selfand__init__When defining your class methods, you must explicitly list
selfas the first argument for each method, including__init__. When you call a method of an ancestor class from within your class, you must include theselfargument. But when you call your class method from outside, you do not specify anything for theselfargument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent, but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don't know about yet.Whew. I realize that's a lot to absorb, but you'll get the hang of it. All Python classes work the same way, so once you learn one, you've learned them all. If you forget everything else, remember this @@ -3368,7 +3208,7 @@ class FileInfo(UserDict):
- @@ -3387,8 +3227,8 @@ class FileInfo(UserDict):__init__methods are optional, but when you define one, you must remember to explicitly call the ancestor's__init__method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, +__init__methods are optional, but when you define one, you must remember to explicitly call the ancestor's__init__method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments.5.4. Instantiating Classes
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the -
__init__method defines. The return value will be the newly created object. -Example 5.7. Creating a
FileInfoInstance>>> import fileinfo +__init__method defines. The return value will be the newly created object. +Example 5.7. Creating a
FileInfoInstance>>> import fileinfo >>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3")>>> f.__class__
<class fileinfo.FileInfo at 010EC204> @@ -3400,14 +3240,14 @@ class FileInfo(UserDict):
- ![]()
You are creating an instance of the FileInfoclass (defined in thefileinfomodule) and assigning the newly created instance to the variablef. You are passing one parameter,/music/_singles/kairo.mp3, which will end up as thefilenameargument inFileInfo's__init__method. +You are creating an instance of the FileInfoclass (defined in thefileinfomodule) and assigning the newly created instance to the variable f. You are passing one parameter,/music/_singles/kairo.mp3, which will end up as the filename argument inFileInfo's__init__method.![]()
Every class instance has a built-in attribute, __class__, which is the object's class. (Note that the representation of this includes the physical address of the instance on my - machine; your representation will be different.) Java programmers may be familiar with theClassclass, which contains methods likegetNameandgetSuperclassto get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like__class__,__name__, and__bases__. + machine; your representation will be different.) Java programmers may be familiar with theClassclass, which contains methods likegetNameandgetSuperclassto get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like__class__,__name__, and__bases__.@@ -3419,7 +3259,7 @@ class FileInfo(UserDict): @@ -3444,17 +3284,17 @@ class FileInfo(UserDict): - ![]()
Remember when the __init__method assigned itsfilenameargument toself["name"]? Well, here's the result. The arguments you pass when you create the class instance get sent right along to the__init__method (along with the object reference,self, which Python adds for free). +Remember when the __init__method assigned its filename argument toself["name"]? Well, here's the result. The arguments you pass when you create the class instance get sent right along to the__init__method (along with the object reference,self, which Python adds for free).- ![]()
Every time the leakmemfunction is called, you are creating an instance ofFileInfoand assigning it to the variablef, which is a local variable within the function. Then the function ends without ever freeingf, so you would expect a memory leak, but you would be wrong. When the function ends, the local variablefgoes out of scope. At this point, there are no longer any references to the newly created instance ofFileInfo(since you never assigned it to anything other thanf), so Python destroys the instance for us. +Every time the leakmemfunction is called, you are creating an instance ofFileInfoand assigning it to the variable f, which is a local variable within the function. Then the function ends without ever freeing f, so you would expect a memory leak, but you would be wrong. When the function ends, the local variable f goes out of scope. At this point, there are no longer any references to the newly created instance ofFileInfo(since you never assigned it to anything other than f), so Python destroys the instance for us.- - ![]()
No matter how many times you call the leakmemfunction, it will never leak memory, because every time, Python will destroy the newly createdFileInfoclass before returning fromleakmem. +No matter how many times you call the leakmemfunction, it will never leak memory, because every time, Python will destroy the newly createdFileInfoclass before returning fromleakmem.The technical term for this form of garbage collection is “reference counting”. Python keeps a list of references to every instance created. In the above example, there was only one reference to the
FileInfoinstance: the local variablef. When the function ends, the variablefgoes out of scope, so the reference count drops to0, and Python destroys the instance automatically. +The technical term for this form of garbage collection is “reference counting”. Python keeps a list of references to every instance created. In the above example, there was only one reference to the
FileInfoinstance: the local variable f. When the function ends, the variable f goes out of scope, so the reference count drops to0, and Python destroys the instance automatically.In previous versions of Python, there were situations where reference counting failed, and Python couldn't clean up after you. If you created two instances that referenced each other (for instance, a doubly-linked list, where each node has a pointer to the previous and next node in the list), neither instance would ever be destroyed automatically because Python (correctly) believed that there is always a reference to each instance. Python 2.0 has an additional form of garbage collection called “mark-and-sweep” which is smart enough to notice this virtual gridlock and clean up circular references correctly. @@ -3465,11 +3305,11 @@ class FileInfo(UserDict):
-
- Python Library Reference summarizes built-in attributes like
__class__. -- Python Library Reference documents the
gcmodule, which gives you low-level control over Python's garbage collection. +- Python Library Reference documents the
gcmodule, which gives you low-level control over Python's garbage collection.5.5. Exploring
-UserDict: A Wrapper ClassAs you've seen,
FileInfois a class that acts like a dictionary. To explore this further, let's look at theUserDictclass in theUserDictmodule, which is the ancestor of theFileInfoclass. This is nothing special; the class is written in Python and stored in a.pyfile, just like any other Python code. In particular, it's stored in thelibdirectory in your Python installation.+
5.5. Exploring
+UserDict: A Wrapper ClassAs you've seen,
FileInfois a class that acts like a dictionary. To explore this further, let's look at theUserDictclass in theUserDictmodule, which is the ancestor of theFileInfoclass. This is nothing special; the class is written in Python and stored in a.pyfile, just like any other Python code. In particular, it's stored in thelibdirectory in your Python installation.-
@@ -3479,7 +3319,7 @@ File->Locate... (Ctrl-L). Example 5.9. Defining the
UserDictClass+Example 5.9. Defining the
UserDictClassclass UserDict:def __init__(self, dict=None):
self.data = {}
@@ -3489,29 +3329,29 @@ class UserDict:
-
Note that UserDictis a base class, not inherited from any other class. +Note that UserDictis a base class, not inherited from any other class.- ![]()
This is the __init__method that you overrode in theFileInfoclass. Note that the argument list in this ancestor class is different than the descendant. That's okay; each subclass can have +This is the __init__method that you overrode in theFileInfoclass. Note that the argument list in this ancestor class is different than the descendant. That's okay; each subclass can have its own set of arguments, as long as it calls the ancestor with the correct arguments. Here the ancestor class has a way - to define initial values (by passing a dictionary in thedictargument) which theFileInfodoes not use. + to define initial values (by passing a dictionary in the dict argument) which theFileInfodoes not use.- ![]()
Python supports data attributes (called “instance variables” in Java and Powerbuilder, and “member variables” in C++). Data attributes are pieces of data held by a specific instance of a class. In this case, each instance of UserDictwill have a data attributedata. To reference this attribute from code outside the class, you qualify it with the instance name,instance.data, in the same way that you qualify a function with its module name. To reference a data attribute from within the class, - you useselfas the qualifier. By convention, all data attributes are initialized to reasonable values in the__init__method. However, this is not required, since data attributes, like local variables, spring into existence when they are first assigned a value. +Python supports data attributes (called “instance variables” in Java and Powerbuilder, and “member variables” in C++). Data attributes are pieces of data held by a specific instance of a class. In this case, each instance of UserDictwill have a data attribute data. To reference this attribute from code outside the class, you qualify it with the instance name,instance.data, in the same way that you qualify a function with its module name. To reference a data attribute from within the class, + you useselfas the qualifier. By convention, all data attributes are initialized to reasonable values in the__init__method. However, this is not required, since data attributes, like local variables, spring into existence when they are first assigned a value.- ![]()
The updatemethod is a dictionary duplicator: it copies all the keys and values from one dictionary to another. This does not clear the target dictionary first; if the target dictionary already has some keys, the ones from the source dictionary will - be overwritten, but others will be left untouched. Think ofupdateas a merge function, not a copy function. +The updatemethod is a dictionary duplicator: it copies all the keys and values from one dictionary to another. This does not clear the target dictionary first; if the target dictionary already has some keys, the ones from the source dictionary will + be overwritten, but others will be left untouched. Think ofupdateas a merge function, not a copy function.@@ -3531,7 +3371,7 @@ class UserDict: Java and Powerbuilder support function overloading by argument list, i.e. one class can have multiple methods with the same name but a different number of arguments, or arguments of different types. Other languages (most notably PL/SQL) even support function overloading by argument name; i.e. one class can have multiple methods with the same name and the same number of arguments of the same type but different argument names. Python supports neither of these; it has no form of function overloading whatsoever. Methods are defined solely by their name, - and there can be only one method per class with a given name. So if a descendant class has an
__init__method, it always overrides the ancestor__init__method, even if the descendant defines it with a different argument list. And the same rule applies to any other method. + and there can be only one method per class with a given name. So if a descendant class has an__init__method, it always overrides the ancestor__init__method, even if the descendant defines it with a different argument list. And the same rule applies to any other method.@@ -3550,11 +3390,11 @@ class UserDict:
-![]()
- Always assign an initial value to all of an instance's data attributes in the __init__method. It will save you hours of debugging later, tracking downAttributeErrorexceptions because you're referencing uninitialized (and therefore non-existent) attributes. +Always assign an initial value to all of an instance's data attributes in the __init__method. It will save you hours of debugging later, tracking downAttributeErrorexceptions because you're referencing uninitialized (and therefore non-existent) attributes.Example 5.10.
UserDictNormal Methods+Example 5.10.
UserDictNormal Methodsdef clear(self): self.data.clear()def copy(self):
if self.__class__ is UserDict:
@@ -3569,35 +3409,35 @@ class UserDict:
-
clearis a normal class method; it is publicly available to be called by anyone at any time. Notice thatclear, like all class methods, hasselfas its first argument. (Remember that you don't includeselfwhen you call the method; it's something that Python adds for you.) Also note the basic technique of this wrapper class: store a real dictionary (data) as a data attribute, define all the methods that a real dictionary has, and have each class method redirect to the corresponding - method on the real dictionary. (In case you'd forgotten, a dictionary'sclearmethod deletes all of its keys and their associated values.) +clearis a normal class method; it is publicly available to be called by anyone at any time. Notice thatclear, like all class methods, hasselfas its first argument. (Remember that you don't includeselfwhen you call the method; it's something that Python adds for you.) Also note the basic technique of this wrapper class: store a real dictionary (data) as a data attribute, define all the methods that a real dictionary has, and have each class method redirect to the corresponding + method on the real dictionary. (In case you'd forgotten, a dictionary'sclearmethod deletes all of its keys and their associated values.)- ![]()
The copymethod of a real dictionary returns a new dictionary that is an exact duplicate of the original (all the same key-value pairs). - ButUserDictcan't simply redirect toself.data.copy, because that method returns a real dictionary, and what you want is to return a new instance that is the same class asself. +The copymethod of a real dictionary returns a new dictionary that is an exact duplicate of the original (all the same key-value pairs). + ButUserDictcan't simply redirect toself.data.copy, because that method returns a real dictionary, and what you want is to return a new instance that is the same class asself.- ![]()
You use the __class__attribute to see ifselfis aUserDict; if so, you're golden, because you know how to copy aUserDict: just create a newUserDictand give it the real dictionary that you've squirreled away inself.data. Then you immediately return the newUserDictyou don't even get to theimport copyon the next line. +You use the __class__attribute to see ifselfis aUserDict; if so, you're golden, because you know how to copy aUserDict: just create a newUserDictand give it the real dictionary that you've squirreled away in self.data. Then you immediately return the newUserDictyou don't even get to theimport copyon the next line.- ![]()
If self.__class__is notUserDict, thenselfmust be some subclass ofUserDict(like maybeFileInfo), in which case life gets trickier.UserDictdoesn't know how to make an exact copy of one of its descendants; there could, for instance, be other data attributes defined - in the subclass, so you would need to iterate through them and make sure to copy all of them. Luckily, Python comes with a module to do exactly this, and it's calledcopy. I won't go into the details here (though it's a wicked cool module, if you're ever inclined to dive into it on your own). - Suffice it to say thatcopycan copy arbitrary Python objects, and that's how you're using it here. +If self.__class__is notUserDict, thenselfmust be some subclass ofUserDict(like maybeFileInfo), in which case life gets trickier.UserDictdoesn't know how to make an exact copy of one of its descendants; there could, for instance, be other data attributes defined + in the subclass, so you would need to iterate through them and make sure to copy all of them. Luckily, Python comes with a module to do exactly this, and it's calledcopy. I won't go into the details here (though it's a wicked cool module, if you're ever inclined to dive into it on your own). + Suffice it to say thatcopycan copy arbitrary Python objects, and that's how you're using it here.@@ -3607,12 +3447,12 @@ class UserDict: - ![]()
The rest of the methods are straightforward, redirecting the calls to the built-in methods on self.data. +The rest of the methods are straightforward, redirecting the calls to the built-in methods on self.data. In versions of Python prior to 2.2, you could not directly subclass built-in datatypes like strings, lists, and dictionaries. To compensate for - this, Python comes with wrapper classes that mimic the behavior of these built-in datatypes:
UserString,UserList, andUserDict. Using a combination of normal and special methods, theUserDictclass does an excellent imitation of a dictionary. In Python 2.2 and later, you can inherit classes directly from built-in datatypes likedict. An example of this is given in the examples that come with this book, infileinfo_fromdict.py. + this, Python comes with wrapper classes that mimic the behavior of these built-in datatypes:UserString,UserList, andUserDict. Using a combination of normal and special methods, theUserDictclass does an excellent imitation of a dictionary. In Python 2.2 and later, you can inherit classes directly from built-in datatypes likedict. An example of this is given in the examples that come with this book, infileinfo_fromdict.py. -In Python, you can inherit directly from the
dictbuilt-in datatype, as shown in this example. There are three differences here compared to theUserDictversion. -Example 5.11. Inheriting Directly from Built-In Datatype
dict+In Python, you can inherit directly from the
dictbuilt-in datatype, as shown in this example. There are three differences here compared to theUserDictversion. +Example 5.11. Inheriting Directly from Built-In Datatype
dictclass FileInfo(dict):"store file metadata" def __init__(self, filename=None):
@@ -3622,20 +3462,20 @@ class FileInfo(dict):
![]()
- ![]()
The first difference is that you don't need to import the UserDictmodule, sincedictis a built-in datatype and is always available. The second is that you are inheriting fromdictdirectly, instead of fromUserDict.UserDict. +The first difference is that you don't need to import the UserDictmodule, sincedictis a built-in datatype and is always available. The second is that you are inheriting fromdictdirectly, instead of fromUserDict.UserDict.- ![]()
The third difference is subtle but important. Because of the way UserDictworks internally, it requires you to manually call its__init__method to properly initialize its internal data structures.dictdoes not work like this; it is not a wrapper, and it requires no explicit initialization. +The third difference is subtle but important. Because of the way UserDictworks internally, it requires you to manually call its__init__method to properly initialize its internal data structures.dictdoes not work like this; it is not a wrapper, and it requires no explicit initialization.-Further Reading on
+UserDictFurther Reading on
UserDict-
- Python Library Reference documents the
UserDictmodule and thecopymodule. +- Python Library Reference documents the
UserDictmodule and thecopymodule.5.6. Special Class Methods
@@ -3645,7 +3485,7 @@ class FileInfo(dict):get and set items with a syntax that doesn't include explicitly invoking methods. This is where special class methods come in: they provide a way to map non-method-calling syntax into method calls.
5.6.1. Getting and Setting Items
-Example 5.12. The
__getitem__Special Method+Example 5.12. The
__getitem__Special Methoddef __getitem__(self, key): return self.data[key]>>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3") >>> f {'name':'/music/_singles/kairo.mp3'} @@ -3657,19 +3497,19 @@ provide a way to map non-method-calling syntax into method calls.- ![]()
The __getitem__special method looks simple enough. Like the normal methodsclear,keys, andvalues, it just redirects to the dictionary to return its value. But how does it get called? Well, you can call__getitem__directly, but in practice you wouldn't actually do that; I'm just doing it here to show you how it works. The right way - to use__getitem__is to get Python to call it for you. +The __getitem__special method looks simple enough. Like the normal methodsclear,keys, andvalues, it just redirects to the dictionary to return its value. But how does it get called? Well, you can call__getitem__directly, but in practice you wouldn't actually do that; I'm just doing it here to show you how it works. The right way + to use__getitem__is to get Python to call it for you.- - ![]()
This looks just like the syntax you would use to get a dictionary value, and in fact it returns the value you would expect. But here's the missing link: under the covers, Python has converted this syntax to the method call f.__getitem__("name"). That's why__getitem__is a special class method; not only can you call it yourself, you can get Python to call it for you by using the right syntax. +This looks just like the syntax you would use to get a dictionary value, and in fact it returns the value you would expect. But here's the missing link: under the covers, Python has converted this syntax to the method call f.__getitem__("name"). That's why__getitem__is a special class method; not only can you call it yourself, you can get Python to call it for you by using the right syntax.Of course, Python has a
__setitem__special method to go along with__getitem__, as shown in the next example. -Example 5.13. The
__setitem__Special Method+Of course, Python has a
__setitem__special method to go along with__getitem__, as shown in the next example. +Example 5.13. The
__setitem__Special Methoddef __setitem__(self, key, item): self.data[key] = item>>> f {'name':'/music/_singles/kairo.mp3'} >>> f.__setitem__("genre", 31)@@ -3682,23 +3522,23 @@ provide a way to map non-method-calling syntax into method calls.
- ![]()
Like the __getitem__method,__setitem__simply redirects to the real dictionaryself.datato do its work. And like__getitem__, you wouldn't ordinarily call it directly like this; Python calls__setitem__for you when you use the right syntax. +Like the __getitem__method,__setitem__simply redirects to the real dictionary self.data to do its work. And like__getitem__, you wouldn't ordinarily call it directly like this; Python calls__setitem__for you when you use the right syntax.- - ![]()
This looks like regular dictionary syntax, except of course that fis really a class that's trying very hard to masquerade as a dictionary, and__setitem__is an essential part of that masquerade. This line of code actually callsf.__setitem__("genre", 32)under the covers. +This looks like regular dictionary syntax, except of course that f is really a class that's trying very hard to masquerade as a dictionary, and __setitem__is an essential part of that masquerade. This line of code actually callsf.__setitem__("genre", 32)under the covers.
__setitem__is a special class method because it gets called for you, but it's still a class method. Just as easily as the__setitem__method was defined inUserDict, you can redefine it in the descendant class to override the ancestor method. This allows you to define classes that act +
__setitem__is a special class method because it gets called for you, but it's still a class method. Just as easily as the__setitem__method was defined inUserDict, you can redefine it in the descendant class to override the ancestor method. This allows you to define classes that act like dictionaries in some ways but define their own behavior above and beyond the built-in dictionary.This concept is the basis of the entire framework you're studying in this chapter. Each file type can have a handler class that knows how to get metadata from a particular type of file. Once some attributes (like the file's name and location) are - known, the handler class knows how to derive other attributes automatically. This is done by overriding the
__setitem__method, checking for particular keys, and adding additional processing when they are found. -For example,
MP3FileInfois a descendant ofFileInfo. When anMP3FileInfo'snameis set, it doesn't just set thenamekey (like the ancestorFileInfodoes); it also looks in the file itself for MP3 tags and populates a whole set of keys. The next example shows how this works. -Example 5.14. Overriding
__setitem__inMP3FileInfo+ known, the handler class knows how to derive other attributes automatically. This is done by overriding the__setitem__method, checking for particular keys, and adding additional processing when they are found. +For example,
MP3FileInfois a descendant ofFileInfo. When anMP3FileInfo'snameis set, it doesn't just set thenamekey (like the ancestorFileInfodoes); it also looks in the file itself for MP3 tags and populates a whole set of keys. The next example shows how this works. +Example 5.14. Overriding
__setitem__inMP3FileInfodef __setitem__(self, key, item):if key == "name" and item:
self.__parse(item)
@@ -3707,27 +3547,27 @@ provide a way to map non-method-calling syntax into method calls.
- ![]()
Notice that this __setitem__method is defined exactly the same way as the ancestor method. This is important, since Python will be calling the method for you, and it expects it to be defined with a certain number of arguments. (Technically speaking, +Notice that this __setitem__method is defined exactly the same way as the ancestor method. This is important, since Python will be calling the method for you, and it expects it to be defined with a certain number of arguments. (Technically speaking, the names of the arguments don't matter; only the number of arguments is important.)- ![]()
Here's the crux of the entire MP3FileInfoclass: if you're assigning a value to thenamekey, you want to do something extra. +Here's the crux of the entire MP3FileInfoclass: if you're assigning a value to thenamekey, you want to do something extra.- ![]()
The extra processing you do for names is encapsulated in the__parsemethod. This is another class method defined inMP3FileInfo, and when you call it, you qualify it withself. Just calling__parsewould look for a normal function defined outside the class, which is not what you want. Callingself.__parsewill look for a class method defined within the class. This isn't anything new; you reference data attributes the same way. +The extra processing you do for names is encapsulated in the__parsemethod. This is another class method defined inMP3FileInfo, and when you call it, you qualify it with self. Just calling__parsewould look for a normal function defined outside the class, which is not what you want. Callingself.__parsewill look for a class method defined within the class. This isn't anything new; you reference data attributes the same way.@@ -3736,11 +3576,11 @@ provide a way to map non-method-calling syntax into method calls. - ![]()
After doing this extra processing, you want to call the ancestor method. Remember that this is never done for you in Python; you must do it manually. Note that you're calling the immediate ancestor, FileInfo, even though it doesn't have a__setitem__method. That's okay, because Python will walk up the ancestor tree until it finds a class with the method you're calling, so this line of code will eventually - find and call the__setitem__defined inUserDict. +After doing this extra processing, you want to call the ancestor method. Remember that this is never done for you in Python; you must do it manually. Note that you're calling the immediate ancestor, FileInfo, even though it doesn't have a__setitem__method. That's okay, because Python will walk up the ancestor tree until it finds a class with the method you're calling, so this line of code will eventually + find and call the__setitem__defined inUserDict.- -When accessing data attributes within a class, you need to qualify the attribute name: self.attribute. When calling other methods within a class, you need to qualify the method name:self.method. +When accessing data attributes within a class, you need to qualify the attribute name: self.attribute. When calling other methods within a class, you need to qualify the method name:self.method.Example 5.15. Setting an
MP3FileInfo'sname>>> import fileinfo +Example 5.15. Setting an
MP3FileInfo'sname>>> import fileinfo >>> mp3file = fileinfo.MP3FileInfo()>>> mp3file {'name':None} @@ -3758,29 +3598,29 @@ provide a way to map non-method-calling syntax into method calls.
- ![]()
First, you create an instance of MP3FileInfo, without passing it a filename. (You can get away with this because thefilenameargument of the__init__method is optional.) SinceMP3FileInfohas no__init__method of its own, Python walks up the ancestor tree and finds the__init__method ofFileInfo. This__init__method manually calls the__init__method ofUserDictand then sets thenamekey tofilename, which isNone, since you didn't pass a filename. Thus,mp3fileinitially looks like a dictionary with one key,name, whose value isNone. +First, you create an instance of MP3FileInfo, without passing it a filename. (You can get away with this because the filename argument of the__init__method is optional.) SinceMP3FileInfohas no__init__method of its own, Python walks up the ancestor tree and finds the__init__method ofFileInfo. This__init__method manually calls the__init__method ofUserDictand then sets thenamekey to filename, which isNone, since you didn't pass a filename. Thus, mp3file initially looks like a dictionary with one key,name, whose value isNone.- ![]()
Now the real fun begins. Setting the namekey ofmp3filetriggers the__setitem__method onMP3FileInfo(notUserDict), which notices that you're setting thenamekey with a real value and callsself.__parse. Although you haven't traced through the__parsemethod yet, you can see from the output that it sets several other keys:album,artist,genre,title,year, andcomment. +Now the real fun begins. Setting the namekey of mp3file triggers the__setitem__method onMP3FileInfo(notUserDict), which notices that you're setting thenamekey with a real value and callsself.__parse. Although you haven't traced through the__parsemethod yet, you can see from the output that it sets several other keys:album,artist,genre,title,year, andcomment.- ![]()
Modifying the namekey will go through the same process again: Python calls__setitem__, which callsself.__parse, which sets all the other keys. +Modifying the namekey will go through the same process again: Python calls__setitem__, which callsself.__parse, which sets all the other keys.5.7. Advanced Special Class Methods
-Python has more special methods than just
__getitem__and__setitem__. Some of them let you emulate functionality that you may not even know about. -This example shows some of the other special methods in
UserDict. -Example 5.16. More Special Methods in
UserDict+Python has more special methods than just
__getitem__and__setitem__. Some of them let you emulate functionality that you may not even know about. +This example shows some of the other special methods in
UserDict. +Example 5.16. More Special Methods in
UserDictdef __repr__(self): return repr(self.data)def __cmp__(self, dict):
if isinstance(dict, UserDict): @@ -3793,29 +3633,29 @@ provide a way to map non-method-calling syntax into method calls.
- ![]()
__repr__is a special method that is called when you callrepr(instance). Thereprfunction is a built-in function that returns a string representation of an object. It works on any object, not just class - instances. You're already intimately familiar withreprand you don't even know it. In the interactive window, when you type just a variable name and press the ENTER key, Python usesreprto display the variable's value. Go create a dictionarydwith some data and thenprint repr(d)to see for yourself. +__repr__is a special method that is called when you callrepr(instance). Thereprfunction is a built-in function that returns a string representation of an object. It works on any object, not just class + instances. You're already intimately familiar withreprand you don't even know it. In the interactive window, when you type just a variable name and press the ENTER key, Python usesreprto display the variable's value. Go create a dictionary d with some data and thenprint repr(d)to see for yourself.- ![]()
__cmp__is called when you compare class instances. In general, you can compare any two Python objects, not just class instances, by using==. There are rules that define when built-in datatypes are considered equal; for instance, dictionaries are equal when they +__cmp__is called when you compare class instances. In general, you can compare any two Python objects, not just class instances, by using==. There are rules that define when built-in datatypes are considered equal; for instance, dictionaries are equal when they have all the same keys and values, and strings are equal when they are the same length and contain the same sequence of characters. - For class instances, you can define the__cmp__method and code the comparison logic yourself, and then you can use==to compare instances of your class and Python will call your__cmp__special method for you. + For class instances, you can define the__cmp__method and code the comparison logic yourself, and then you can use==to compare instances of your class and Python will call your__cmp__special method for you.- ![]()
__len__is called when you calllen(instance). Thelenfunction is a built-in function that returns the length of an object. It works on any object that could reasonably be thought - of as having a length. Thelenof a string is its number of characters; thelenof a dictionary is its number of keys; thelenof a list or tuple is its number of elements. For class instances, define the__len__method and code the length calculation yourself, and then calllen(instance)and Python will call your__len__special method for you. +__len__is called when you calllen(instance). Thelenfunction is a built-in function that returns the length of an object. It works on any object that could reasonably be thought + of as having a length. Thelenof a string is its number of characters; thelenof a dictionary is its number of keys; thelenof a list or tuple is its number of elements. For class instances, define the__len__method and code the length calculation yourself, and then calllen(instance)and Python will call your__len__special method for you.@@ -3828,20 +3668,20 @@ provide a way to map non-method-calling syntax into method calls. - - ![]()
__delitem__is called when you calldel instance[key], which you may remember as the way to delete individual items from a dictionary. When you usedelon a class instance, Python calls the__delitem__special method for you. +__delitem__is called when you calldel instance[key], which you may remember as the way to delete individual items from a dictionary. When you usedelon a class instance, Python calls the__delitem__special method for you.At this point, you may be thinking, “All this work just to do something in a class that I can do with a built-in datatype.” And it's true that life would be easier (and the entire
UserDictclass would be unnecessary) if you could inherit from built-in datatypes like a dictionary. But even if you could, special -methods would still be useful, because they can be used in any class, not just wrapper classes likeUserDict. -Special methods mean that any class can store key/value pairs like a dictionary, just by defining the
__setitem__method. Any class can act like a sequence, just by defining the__getitem__method. Any class that defines the__cmp__method can be compared with==. And if your class represents something that has a length, don't define aGetLengthmethod; define the__len__method and uselen(instance).+
-At this point, you may be thinking, “All this work just to do something in a class that I can do with a built-in datatype.” And it's true that life would be easier (and the entire
UserDictclass would be unnecessary) if you could inherit from built-in datatypes like a dictionary. But even if you could, special +methods would still be useful, because they can be used in any class, not just wrapper classes likeUserDict. +Special methods mean that any class can store key/value pairs like a dictionary, just by defining the
__setitem__method. Any class can act like a sequence, just by defining the__getitem__method. Any class that defines the__cmp__method can be compared with==. And if your class represents something that has a length, don't define aGetLengthmethod; define the__len__method and uselen(instance).
- While other object-oriented languages only let you define the physical model of an object (“this object has a GetLengthmethod”), Python's special class methods like__len__allow you to define the logical model of an object (“this object has a length”). +While other object-oriented languages only let you define the physical model of an object (“this object has a GetLengthmethod”), Python's special class methods like__len__allow you to define the logical model of an object (“this object has a length”).Python has a lot of other special methods. There's a whole set of them that let classes act like numbers, allowing you to add, subtract, and do other arithmetic operations on class instances. (The canonical example of this is a class that represents -complex numbers, numbers with both real and imaginary components.) The
__call__method lets a class act like a function, allowing you to call a class instance directly. And there are other special methods +complex numbers, numbers with both real and imaginary components.) The__call__method lets a class act like a function, allowing you to call a class instance directly. And there are other special methods that allow classes to have read-only and write-only data attributes; you'll talk more about those in later chapters.Further Reading on Special Class Methods
@@ -3881,13 +3721,13 @@ class MP3FileInfo(FileInfo):- ![]()
MP3FileInfois the class itself, not any particular instance of the class. +MP3FileInfois the class itself, not any particular instance of the class.- ![]()
tagDataMapis a class attribute: literally, an attribute of the class. It is available before creating any instances of the class. +tagDataMap is a class attribute: literally, an attribute of the class. It is available before creating any instances of the class. @@ -3901,11 +3741,11 @@ class MP3FileInfo(FileInfo): - In Java, both static variables (called class attributes in Python) and instance variables (called data attributes in Python) are defined immediately after the class definition (one with the statickeyword, one without). In Python, only class attributes can be defined here; data attributes are defined in the__init__method. +In Java, both static variables (called class attributes in Python) and instance variables (called data attributes in Python) are defined immediately after the class definition (one with the statickeyword, one without). In Python, only class attributes can be defined here; data attributes are defined in the__init__method.Class attributes can be used as class-level constants (which is how you use them in
MP3FileInfo), but they are not really constants. You can also change them.+
Class attributes can be used as class-level constants (which is how you use them in
MP3FileInfo), but they are not really constants. You can also change them.@@ -3980,13 +3820,13 @@ class MP3FileInfo(FileInfo):
@@ -3939,32 +3779,32 @@ class MP3FileInfo(FileInfo): - ![]()
countis a class attribute of thecounterclass. +count is a class attribute of the counterclass.- ![]()
__class__is a built-in attribute of every class instance (of every class). It is a reference to the class thatselfis an instance of (in this case, thecounterclass). +__class__is a built-in attribute of every class instance (of every class). It is a reference to the class that self is an instance of (in this case, thecounterclass).- ![]()
Because countis a class attribute, it is available through direct reference to the class, before you have created any instances of the +Because count is a class attribute, it is available through direct reference to the class, before you have created any instances of the class. - ![]()
Creating an instance of the class calls the __init__method, which increments the class attributecountby1. This affects the class itself, not just the newly created instance. +Creating an instance of the class calls the __init__method, which increments the class attribute count by1. This affects the class itself, not just the newly created instance.- ![]()
Creating a second instance will increment the class attribute countagain. Notice how the class attribute is shared by the class and all instances of the class. +Creating a second instance will increment the class attribute count again. Notice how the class attribute is shared by the class and all instances of the class. If the name of a Python function, class method, or attribute starts with (but doesn't end with) two underscores, it's private; everything else is public. Python has no concept of protected class methods (accessible only in their own class and descendant classes). Class methods are either private (accessible only in their own class) or public (accessible from anywhere). -
In
MP3FileInfo, there are two methods:__parseand__setitem__. As you have already discussed,__setitem__is a special method; normally, you would call it indirectly by using the dictionary syntax on a class instance, but it is public, and you could -call it directly (even from outside thefileinfomodule) if you had a really good reason. However,__parseis private, because it has two underscores at the beginning of its name.+
@@ -4238,35 +4078,35 @@ Rave Mix 2000http://mp3.com/DJMARYJANE \037'In
MP3FileInfo, there are two methods:__parseand__setitem__. As you have already discussed,__setitem__is a special method; normally, you would call it indirectly by using the dictionary syntax on a class instance, but it is public, and you could +call it directly (even from outside thefileinfomodule) if you had a really good reason. However,__parseis private, because it has two underscores at the beginning of its name.@@ -4176,13 +4016,13 @@ exceptions, errors occur immediately, and you can handle them in a standard way
- @@ -4003,7 +3843,7 @@ AttributeError: 'MP3FileInfo' instance has no attribute '__parse'In Python, all special methods (like __setitem__) and built-in attributes (like__doc__) follow a standard naming convention: they both start with and end with two underscores. Don't name your own methods and +In Python, all special methods (like __setitem__) and built-in attributes (like__doc__) follow a standard naming convention: they both start with and end with two underscores. Don't name your own methods and attributes this way, because it will only confuse you (and others) later.If you try to call a private method, Python will raise a slightly misleading exception, saying that the method does not exist. Of course it does exist, but it's private, so it's not accessible outside the class.Strictly speaking, private methods are accessible outside their class, just not easily accessible. Nothing in Python is truly private; internally, the names of private methods and attributes are mangled and unmangled on the fly to make them - seem inaccessible by their given names. You can access the @@ -4015,17 +3855,17 @@ AttributeError: 'MP3FileInfo' instance has no attribute '__parse'__parsemethod of theMP3FileInfoclass by the name_MP3FileInfo__parse. Acknowledge that this is interesting, but promise to never, ever do it in real code. Private methods are private for a + seem inaccessible by their given names. You can access the__parsemethod of theMP3FileInfoclass by the name_MP3FileInfo__parse. Acknowledge that this is interesting, but promise to never, ever do it in real code. Private methods are private for a reason, but like many other things in Python, their privateness is ultimately a matter of convention, not force.5.10. Summary
-That's it for the hard-core object trickery. You'll see a real-world application of special class methods in Chapter 12, which uses
getattrto create a proxy to a remote web service. +That's it for the hard-core object trickery. You'll see a real-world application of special class methods in Chapter 12, which uses
getattrto create a proxy to a remote web service.The next chapter will continue using this code sample to explore other Python concepts, such as exceptions, file objects, and
forloops.Before diving into the next chapter, make sure you're comfortable doing all of these things:
-
- Importing modules using either
import moduleorfrom module import+- Importing modules using either
import moduleorfrom module import- Defining and instantiating classes -
- Defining
__init__methods and other special class methods, and understanding when they are called +- Defining
__init__methods and other special class methods, and understanding when they are called -- Subclassing
UserDictto define classes that act like dictionaries +- Subclassing
UserDictto define classes that act like dictionaries- Defining data attributes and class attributes, and understanding the differences between them @@ -4033,7 +3873,7 @@ AttributeError: 'MP3FileInfo' instance has no attribute '__parse'
Chapter 6. Exceptions and File Handling
-In this chapter, you will dive into exceptions, file objects,
forloops, and theosandsysmodules. If you've used exceptions in another programming language, you can skim the first section to get a sense of Python's syntax. Be sure to tune in again for file handling. +In this chapter, you will dive into exceptions, file objects,
forloops, and theosandsysmodules. If you've used exceptions in another programming language, you can skim the first section to get a sense of Python's syntax. Be sure to tune in again for file handling.6.1. Handling Exceptions
Like many other programming languages, Python has exception handling via
try...exceptblocks.@@ -4109,11 +3949,11 @@ line that you would need to trace back to the source. I'm sure you've experienc exceptions, errors occur immediately, and you can handle them in a standard way at the source of the problem.
@@ -4047,15 +3887,15 @@ AttributeError: 'MP3FileInfo' instance has no attribute '__parse' Exceptions are everywhere in Python. Virtually every module in the standard Python library uses them, and Python itself will raise them in a lot of different circumstances. You've already seen them repeatedly throughout this book. -
- Accessing a non-existent dictionary key will raise a
KeyErrorexception. +- Accessing a non-existent dictionary key will raise a
KeyErrorexception. -- Searching a list for a non-existent value will raise a
ValueErrorexception. +- Searching a list for a non-existent value will raise a
ValueErrorexception. -- Calling a non-existent method will raise an
AttributeErrorexception. +- Calling a non-existent method will raise an
AttributeErrorexception. -- Referencing a non-existent variable will raise a
NameErrorexception. +- Referencing a non-existent variable will raise a
NameErrorexception. -- Mixing datatypes without coercion will raise a
TypeErrorexception. +- Mixing datatypes without coercion will raise a
TypeErrorexception.In each of these cases, you were simply playing around in the Python IDE: an error occurred, the exception was printed (depending on your IDE, perhaps in an intentionally jarring shade of red), and that was that. This is called an unhandled exception. When the exception was raised, there was no code to explicitly notice it and deal with it, so it bubbled its @@ -4079,7 +3919,7 @@ This line will always print
- ![]()
Using the built-in openfunction, you can try to open a file for reading (more onopenin the next section). But the file doesn't exist, so this raises theIOErrorexception. Since you haven't provided any explicit check for anIOErrorexception, Python just prints out some debugging information about what happened and then gives up. +Using the built-in openfunction, you can try to open a file for reading (more onopenin the next section). But the file doesn't exist, so this raises theIOErrorexception. Since you haven't provided any explicit check for anIOErrorexception, Python just prints out some debugging information about what happened and then gives up.@@ -4091,14 +3931,14 @@ This line will always print - ![]()
When the openmethod raises anIOErrorexception, you're ready for it. Theexcept IOError:line catches the exception and executes your own block of code, which in this case just prints a more pleasant error message. +When the openmethod raises anIOErrorexception, you're ready for it. Theexcept IOError:line catches the exception and executes your own block of code, which in this case just prints a more pleasant error message.![]()
Once an exception has been handled, processing continues normally on the first line after the try...exceptblock. Note that this line will always print, whether or not an exception occurs. If you really did have a file called -nottherein your root directory, the call toopenwould succeed, theexceptclause would be ignored, and this line would still be executed. +nottherein your root directory, the call toopenwould succeed, theexceptclause would be ignored, and this line would still be executed.6.1.1. Using Exceptions For Other Purposes
There are a lot of other uses for exceptions besides handling actual error conditions. A common use in the standard Python library is to try to import a module, and then check whether it worked. Importing a module that does not exist will raise - an
ImportErrorexception. You can use this to define multiple levels of functionality based on which modules are available at run-time, + anImportErrorexception. You can use this to define multiple levels of functionality based on which modules are available at run-time, or to support multiple platforms (where platform-specific code is separated into different modules). -You can also define your own exceptions by creating a class that inherits from the built-in
Exceptionclass, and then raise your exceptions with theraisecommand. See the further reading section if you're interested in doing this. +You can also define your own exceptions by creating a class that inherits from the built-in
Exceptionclass, and then raise your exceptions with theraisecommand. See the further reading section if you're interested in doing this.The next example demonstrates how to use an exception to support platform-specific functionality. This code comes from the -
getpassmodule, a wrapper module for getting a password from the user. Getting a password is accomplished differently on UNIX, Windows, and Mac OS platforms, but this code encapsulates all of those differences. +getpassmodule, a wrapper module for getting a password from the user. Getting a password is accomplished differently on UNIX, Windows, and Mac OS platforms, but this code encapsulates all of those differences.Example 6.2. Supporting Platform-Specific Functionality
# Bind the name getpass to the appropriate function try: @@ -4136,34 +3976,34 @@ exceptions, errors occur immediately, and you can handle them in a standard way- ![]()
termiosis a UNIX-specific module that provides low-level control over the input terminal. If this module is not available (because it's not - on your system, or your system doesn't support it), the import fails and Python raises anImportError, which you catch. +termiosis a UNIX-specific module that provides low-level control over the input terminal. If this module is not available (because it's not + on your system, or your system doesn't support it), the import fails and Python raises anImportError, which you catch.- ![]()
OK, you didn't have termios, so let's trymsvcrt, which is a Windows-specific module that provides an API to many useful functions in the Microsoft Visual C++ runtime services. If this import fails, Python will raise anImportError, which you catch. +OK, you didn't have termios, so let's trymsvcrt, which is a Windows-specific module that provides an API to many useful functions in the Microsoft Visual C++ runtime services. If this import fails, Python will raise anImportError, which you catch.- ![]()
If the first two didn't work, you try to import a function from EasyDialogs, which is a Mac OS-specific module that provides functions to pop up dialog boxes of various types. Once again, if this import fails, Python will raise anImportError, which you catch. +If the first two didn't work, you try to import a function from EasyDialogs, which is a Mac OS-specific module that provides functions to pop up dialog boxes of various types. Once again, if this import fails, Python will raise anImportError, which you catch.![]()
None of these platform-specific modules is available (which is possible, since Python has been ported to a lot of different platforms), so you need to fall back on a default password input function (which is - defined elsewhere in the getpassmodule). Notice what you're doing here: assigning the functiondefault_getpassto the variablegetpass. If you read the officialgetpassdocumentation, it tells you that thegetpassmodule defines agetpassfunction. It does this by bindinggetpassto the correct function for your platform. Then when you call thegetpassfunction, you're really calling a platform-specific function that this code has set up for you. You don't need to know or - care which platform your code is running on -- just callgetpass, and it will always do the right thing. + defined elsewhere in thegetpassmodule). Notice what you're doing here: assigning the functiondefault_getpassto the variable getpass. If you read the officialgetpassdocumentation, it tells you that thegetpassmodule defines agetpassfunction. It does this by binding getpass to the correct function for your platform. Then when you call thegetpassfunction, you're really calling a platform-specific function that this code has set up for you. You don't need to know or + care which platform your code is running on -- just callgetpass, and it will always do the right thing.- ![]()
A try...exceptblock can have anelseclause, like anifstatement. If no exception is raised during thetryblock, theelseclause is executed afterwards. In this case, that means that thefrom EasyDialogs import AskPasswordimport worked, so you should bindgetpassto theAskPasswordfunction. Each of the othertry...exceptblocks has similarelseclauses to bindgetpassto the appropriate function when you find animportthat works. +A try...exceptblock can have anelseclause, like anifstatement. If no exception is raised during thetryblock, theelseclause is executed afterwards. In this case, that means that thefrom EasyDialogs import AskPasswordimport worked, so you should bind getpass to theAskPasswordfunction. Each of the othertry...exceptblocks has similarelseclauses to bind getpass to the appropriate function when you find animportthat works.- Python Library Reference documents the getpass module. -
- Python Library Reference documents the
tracebackmodule, which provides low-level access to exception attributes after an exception is raised. +- Python Library Reference documents the
tracebackmodule, which provides low-level access to exception attributes after an exception is raised.- Python Reference Manual discusses the inner workings of the
try...exceptblock.6.2. Working with File Objects
-Python has a built-in function,
open, for opening a file on disk.openreturns a file object, which has methods and attributes for getting information about and manipulating the opened file. +Python has a built-in function,
open, for opening a file on disk.openreturns a file object, which has methods and attributes for getting information about and manipulating the opened file.Example 6.3. Opening a File
>>> f = open("/music/_singles/kairo.mp3", "rb")>>> f
<open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988> @@ -4194,7 +4034,7 @@ exceptions, errors occur immediately, and you can handle them in a standard way
- ![]()
The openmethod can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename, +The @@ -4202,19 +4042,19 @@ exceptions, errors occur immediately, and you can handle them in a standard wayopenmethod can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename, is required; the other two are optional. If not specified, the file is opened for reading in text mode. Here you are opening the file for reading in binary mode. (print open.__doc__displays a great explanation of all the possible modes.)- ![]()
The openfunction returns an object (by now, this should not surprise you). A file object has several useful attributes. +The openfunction returns an object (by now, this should not surprise you). A file object has several useful attributes.- ![]()
The modeattribute of a file object tells you in which mode the file was opened. +The mode attribute of a file object tells you in which mode the file was opened. - ![]()
The nameattribute of a file object tells you the name of the file that the file object has open. +The name attribute of a file object tells you the name of the file that the file object has open. - ![]()
A file object maintains state about the file it has open. The tellmethod of a file object tells you your current position in the open file. Since you haven't done anything with this file - yet, the current position is0, which is the beginning of the file. +A file object maintains state about the file it has open. The tellmethod of a file object tells you your current position in the open file. Since you haven't done anything with this file + yet, the current position is0, which is the beginning of the file.- ![]()
The seekmethod of a file object moves to another position in the open file. The second parameter specifies what the first one means; -0means move to an absolute position (counting from the start of the file),1means move to a relative position (counting from the current position), and2means move to a position relative to the end of the file. Since the MP3 tags you're looking for are stored at the end of the file, you use2and tell the file object to move to a position128bytes from the end of the file. +The seekmethod of a file object moves to another position in the open file. The second parameter specifies what the first one means; +0means move to an absolute position (counting from the start of the file),1means move to a relative position (counting from the current position), and2means move to a position relative to the end of the file. Since the MP3 tags you're looking for are stored at the end of the file, you use2and tell the file object to move to a position128bytes from the end of the file.- ![]()
The tellmethod confirms that the current file position has moved. +The tellmethod confirms that the current file position has moved.- ![]()
The readmethod reads a specified number of bytes from the open file and returns a string with the data that was read. The optional - parameter specifies the maximum number of bytes to read. If no parameter is specified,readwill read until the end of the file. (You could have simply saidread()here, since you know exactly where you are in the file and you are, in fact, reading the last 128 bytes.) The read data - is assigned to thetagDatavariable, and the current position is updated based on how many bytes were read. +The readmethod reads a specified number of bytes from the open file and returns a string with the data that was read. The optional + parameter specifies the maximum number of bytes to read. If no parameter is specified,readwill read until the end of the file. (You could have simply saidread()here, since you know exactly where you are in the file and you are, in fact, reading the last 128 bytes.) The read data + is assigned to the tagData variable, and the current position is updated based on how many bytes were read.@@ -4301,40 +4141,40 @@ ValueError: I/O operation on closed file - ![]()
The tellmethod confirms that the current position has moved. If you do the math, you'll see that after reading 128 bytes, the position +The tellmethod confirms that the current position has moved. If you do the math, you'll see that after reading 128 bytes, the position has been incremented by 128.- ![]()
The closedattribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (closedisFalse). +The closed attribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (closed is False).- ![]()
To close a file, call the closemethod of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any) +To close a file, call the closemethod of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any) that the system hadn't gotten around to actually writing yet, and releases the system resources.- ![]()
The closedattribute confirms that the file is closed. +The closed attribute confirms that the file is closed. - ![]()
Just because a file is closed doesn't mean that the file object ceases to exist. The variable fwill continue to exist until it goes out of scope or gets manually deleted. However, none of the methods that manipulate an open file will work once the file has been closed; +Just because a file is closed doesn't mean that the file object ceases to exist. The variable f will continue to exist until it goes out of scope or gets manually deleted. However, none of the methods that manipulate an open file will work once the file has been closed; they all raise an exception. - ![]()
Calling closeon a file object whose file is already closed does not raise an exception; it fails silently. +Calling closeon a file object whose file is already closed does not raise an exception; it fails silently.6.2.3. Handling I/O Errors
-Now you've seen enough to understand the file handling code in the
fileinfo.pysample code from teh previous chapter. This example shows how to safely open and read from a file and gracefully handle +Now you've seen enough to understand the file handling code in the
fileinfo.pysample code from teh previous chapter. This example shows how to safely open and read from a file and gracefully handle errors. -Example 6.6. File Objects in
MP3FileInfo+Example 6.6. File Objects in
MP3FileInfotry:fsock = open(filename, "rb", 0)
try: @@ -4357,31 +4197,31 @@ ValueError: I/O operation on closed file
- ![]()
The openfunction may raise anIOError. (Maybe the file doesn't exist.) +The openfunction may raise anIOError. (Maybe the file doesn't exist.)- ![]()
The seekmethod may raise anIOError. (Maybe the file is smaller than 128 bytes.) +The seekmethod may raise anIOError. (Maybe the file is smaller than 128 bytes.)- ![]()
The readmethod may raise anIOError. (Maybe the disk has a bad sector, or it's on a network drive and the network just went down.) +The readmethod may raise anIOError. (Maybe the disk has a bad sector, or it's on a network drive and the network just went down.)- ![]()
This is new: a try...finallyblock. Once the file has been opened successfully by theopenfunction, you want to make absolutely sure that you close it, even if an exception is raised by theseekorreadmethods. That's what atry...finallyblock is for: code in thefinallyblock will always be executed, even if something in thetryblock raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before. +This is new: a try...finallyblock. Once the file has been opened successfully by theopenfunction, you want to make absolutely sure that you close it, even if an exception is raised by theseekorreadmethods. That's what atry...finallyblock is for: code in thefinallyblock will always be executed, even if something in thetryblock raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.@@ -4412,33 +4252,33 @@ test succeededline 2 - ![]()
At last, you handle your IOErrorexception. This could be theIOErrorexception raised by the call toopen,seek, orread. Here, you really don't care, because all you're going to do is ignore it silently and continue. (Remember,passis a Python statement that does nothing.) That's perfectly legal; “handling” an exception can mean explicitly doing nothing. It still counts as handled, and processing will continue normally on the +At last, you handle your IOErrorexception. This could be theIOErrorexception raised by the call toopen,seek, orread. Here, you really don't care, because all you're going to do is ignore it silently and continue. (Remember,passis a Python statement that does nothing.) That's perfectly legal; “handling” an exception can mean explicitly doing nothing. It still counts as handled, and processing will continue normally on the next line of code after thetry...exceptblock.- ![]()
You start boldly by creating either the new file test.logor overwrites the existing file, and opening the file for writing. (The second parameter"w"means open the file for writing.) Yes, that's all as dangerous as it sounds. I hope you didn't care about the previous +You start boldly by creating either the new file test.logor overwrites the existing file, and opening the file for writing. (The second parameter"w"means open the file for writing.) Yes, that's all as dangerous as it sounds. I hope you didn't care about the previous contents of that file, because it's gone now.- ![]()
You can add data to the newly opened file with the writemethod of the file object returned byopen. +You can add data to the newly opened file with the writemethod of the file object returned byopen.- ![]()
fileis a synonym foropen. This one-liner opens the file, reads its contents, and prints them. +fileis a synonym foropen. This one-liner opens the file, reads its contents, and prints them.- ![]()
You happen to know that test.logexists (since you just finished writing to it), so you can open it and append to it. (The"a"parameter means open the file for appending.) Actually you could do this even if the file didn't exist, because opening +You happen to know that test.logexists (since you just finished writing to it), so you can open it and append to it. (The"a"parameter means open the file for appending.) Actually you could do this even if the file didn't exist, because opening the file for appending will create the file if necessary. But appending will never harm the existing contents of the file.@@ -4473,7 +4313,7 @@ e - ![]()
As you can see, both the original line you wrote and the second line you appended are now in test.log. Also note that carriage returns are not included. Since you didn't write them explicitly to the file either time, the +As you can see, both the original line you wrote and the second line you appended are now in test.log. Also note that carriage returns are not included. Since you didn't write them explicitly to the file either time, the file doesn't include them. You can write a carriage return with the"\n"character. Since you didn't do this, everything you wrote to the file ended up smooshed together on the same line.- ![]()
The syntax for a forloop is similar to list comprehensions.liis a list, andswill take the value of each element in turn, starting from the first element. +The syntax for a forloop is similar to list comprehensions. li is a list, and s will take the value of each element in turn, starting from the first element.@@ -4485,7 +4325,7 @@ e @@ -4511,7 +4351,7 @@ e - ![]()
This is the reason you haven't seen the forloop yet: you haven't needed it yet. It's amazing how often you useforloops in other languages when all you really want is ajoinor a list comprehension. +This is the reason you haven't seen the forloop yet: you haven't needed it yet. It's amazing how often you useforloops in other languages when all you really want is ajoinor a list comprehension.@@ -4545,14 +4385,14 @@ USERNAME=mpilgrim - ![]()
As you saw in Example 3.20, “Assigning Consecutive Values”, rangeproduces a list of integers, which you then loop through. I know it looks a bit odd, but it is occasionally (and I stress +As you saw in Example 3.20, “Assigning Consecutive Values”, rangeproduces a list of integers, which you then loop through. I know it looks a bit odd, but it is occasionally (and I stress occasionally) useful to have a counter loop.- ![]()
os.environis a dictionary of the environment variables defined on your system. In Windows, these are your user and system variables +os.environ is a dictionary of the environment variables defined on your system. In Windows, these are your user and system variables accessible from MS-DOS. In UNIX, they are the variables exported in your shell's startup scripts. In Mac OS, there is no concept of environment variables, so this dictionary is empty. - ![]()
os.environ.items()returns a list of tuples:[(key1, value1), (key2, value2), ...]. Theforloop iterates through this list. The first round, it assignskey1tokandvalue1tov, sok=USERPROFILEandv=C:\Documents and Settings\mpilgrim. In the second round,kgets the second key,OS, andvgets the corresponding value,Windows_NT. +os.environ.items()returns a list of tuples:[(key1, value1), (key2, value2), ...]. Theforloop iterates through this list. The first round, it assignskey1to k andvalue1to v, so k =USERPROFILEand v =C:\Documents and Settings\mpilgrim. In the second round, k gets the second key,OS, and v gets the corresponding value,Windows_NT.@@ -4560,12 +4400,12 @@ USERNAME=mpilgrim -With multi-variable assignment and list comprehensions, you can replace the entire forloop with a single statement. Whether you actually do this in real code is a matter of personal coding style. I like it because it makes it clear that what I'm doing is mapping a dictionary into a list, then joining the list into a single string. - Other programmers prefer to write this out as aforloop. The output is the same in either case, although this version is slightly faster, because there is only oneforloop. The output is the same in either case, although this version is slightly faster, because there is only oneNow we can look at the
forloop inMP3FileInfo, from the samplefileinfo.pyprogram introduced in Chapter 5. -Example 6.11.
forLoop inMP3FileInfo+Now we can look at the
forloop inMP3FileInfo, from the samplefileinfo.pyprogram introduced in Chapter 5. +Example 6.11.
forLoop inMP3FileInfotagDataMap = {"title" : ( 3, 33, stripnulls), "artist" : ( 33, 63, stripnulls), "album" : ( 63, 93, stripnulls), @@ -4582,27 +4422,27 @@ USERNAME=mpilgrim- ![]()
tagDataMapis a class attribute that defines the tags you're looking for in an MP3 file. Tags are stored in fixed-length fields. Once you read the last 128 bytes of the file, bytes 3 through 32 of those +tagDataMap is a class attribute that defines the tags you're looking for in an MP3 file. Tags are stored in fixed-length fields. Once you read the last 128 bytes of the file, bytes 3 through 32 of those are always the song title, 33 through 62 are always the artist name, 63 through 92 are the album name, and so forth. Note - that tagDataMapis a dictionary of tuples, and each tuple contains two integers and a function reference. + that tagDataMap is a dictionary of tuples, and each tuple contains two integers and a function reference.- ![]()
This looks complicated, but it's not. The structure of the forvariables matches the structure of the elements of the list returned byitems. Remember thatitemsreturns a list of tuples of the form(key, value). The first element of that list is("title", (3, 33, <function stripnulls>)), so the first time around the loop,taggets"title",startgets3,endgets33, andparseFuncgets the functionstripnulls. +This looks complicated, but it's not. The structure of the forvariables matches the structure of the elements of the list returned byitems. Remember thatitemsreturns a list of tuples of the form(key, value). The first element of that list is("title", (3, 33, <function stripnulls>)), so the first time around the loop, tag gets"title", start gets3, end gets33, and parseFunc gets the functionstripnulls.- - ![]()
Now that you've extracted all the parameters for a single MP3 tag, saving the tag data is easy. You slice tagdatafromstarttoendto get the actual data for this tag, callparseFuncto post-process the data, and assign this as the value for the keytagin the pseudo-dictionaryself. After iterating through all the elements intagDataMap,selfhas the values for all the tags, and you know what that looks like. +Now that you've extracted all the parameters for a single MP3 tag, saving the tag data is easy. You slice tagdata from start to end to get the actual data for this tag, call parseFunc to post-process the data, and assign this as the value for the key tag in the pseudo-dictionary self. After iterating through all the elements in tagDataMap, self has the values for all the tags, and you know what that looks like. 6.4. Using
-sys.modulesModules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary
. -sys.modulesExample 6.12. Introducing
sys.modules>>> import sys+
6.4. Using
+sys.modulesModules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary
. +sys.modulesExample 6.12. Introducing
sys.modules>>> import sys>>> print '\n'.join(sys.modules.keys())
win32api os.path @@ -4621,18 +4461,18 @@ stat
- ![]()
The sysmodule contains system-level information, such as the version of Python you're running (orsys.version), and system-level options such as the maximum allowed recursion depth (sys.version_infoandsys.getrecursionlimit()). +sys.setrecursionlimit()The sysmodule contains system-level information, such as the version of Python you're running (orsys.version), and system-level options such as the maximum allowed recursion depth (sys.version_infoandsys.getrecursionlimit()).sys.setrecursionlimit()- - ![]()
is a dictionary containing all the modules that have ever been imported since Python was started; the key is the module name, the value is the module object. Note that this is more than just the modules your program has imported. Python preloads some modules on startup, and if you're using a Python IDE,sys.modulescontains all the modules imported by all the programs you've run within the IDE. +sys.modulesis a dictionary containing all the modules that have ever been imported since Python was started; the key is the module name, the value is the module object. Note that this is more than just the modules your program has imported. Python preloads some modules on startup, and if you're using a Python IDE,sys.modulescontains all the modules imported by all the programs you've run within the IDE.sys.modulesThis example demonstrates how to use
. -sys.modulesExample 6.13. Using
sys.modules>>> import fileinfo+
This example demonstrates how to use
. +sys.modulesExample 6.13. Using
sys.modules>>> import fileinfo>>> print '\n'.join(sys.modules.keys()) win32api os.path @@ -4656,17 +4496,17 @@ stat
- ![]()
As new modules are imported, they are added to . This explains why importing the same module twice is very fast: Python has already loaded and cached the module insys.modules, so importing the second time is simply a dictionary lookup. +sys.modulesAs new modules are imported, they are added to . This explains why importing the same module twice is very fast: Python has already loaded and cached the module insys.modules, so importing the second time is simply a dictionary lookup.sys.modules- - ![]()
Given the name (as a string) of any previously-imported module, you can get a reference to the module itself through the dictionary. +sys.modulesGiven the name (as a string) of any previously-imported module, you can get a reference to the module itself through the dictionary.sys.modulesThe next example shows how to use the
__module__class attribute with thedictionary to get a reference to the module in which a class is defined. +sys.modulesThe next example shows how to use the
__module__class attribute with thedictionary to get a reference to the module in which a class is defined.sys.modulesExample 6.14. The
__module__Class Attribute>>> from fileinfo import MP3FileInfo >>> MP3FileInfo.__module__'fileinfo' @@ -4682,12 +4522,12 @@ stat
- - ![]()
Combining this with the dictionary, you can get a reference to the module in which a class is defined. +sys.modulesCombining this with the dictionary, you can get a reference to the module in which a class is defined.sys.modulesNow you're ready to see how
is used insys.modulesfileinfo.py, the sample program introduced in Chapter 5. This example shows that portion of the code. -Example 6.15.
insys.modulesfileinfo.py+Now you're ready to see how
is used insys.modulesfileinfo.py, the sample program introduced in Chapter 5. This example shows that portion of the code. +Example 6.15.
insys.modulesfileinfo.pydef getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):"get file info class from filename extension" subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
@@ -4696,21 +4536,21 @@ stat
- ![]()
This is a function with two arguments; filenameis required, butmoduleis optional and defaults to the module that contains theFileInfoclass. This looks inefficient, because you might expect Python to evaluate theexpression every time the function is called. In fact, Python evaluates default expressions only once, the first time the module is imported. As you'll see later, you never call this - function with asys.modulesmoduleargument, somoduleserves as a function-level constant. +This is a function with two arguments; filename is required, but module is optional and defaults to the module that contains the FileInfoclass. This looks inefficient, because you might expect Python to evaluate theexpression every time the function is called. In fact, Python evaluates default expressions only once, the first time the module is imported. As you'll see later, you never call this + function with a module argument, so module serves as a function-level constant.sys.modules- ![]()
You'll plow through this line later, after you dive into the osmodule. For now, take it on faith thatsubclassends up as the name of a class, likeMP3FileInfo. +You'll plow through this line later, after you dive into the osmodule. For now, take it on faith that subclass ends up as the name of a class, likeMP3FileInfo.@@ -4719,11 +4559,11 @@ stat - ![]()
You already know about getattr, which gets a reference to an object by name.hasattris a complementary function that checks whether an object has a particular attribute; in this case, whether a module has - a particular class (although it works for any object and any attribute, just likegetattr). In English, this line of code says, “If this module has the class named bysubclassthen return it, otherwise return the base classFileInfo.” +You already know about getattr, which gets a reference to an object by name.hasattris a complementary function that checks whether an object has a particular attribute; in this case, whether a module has + a particular class (although it works for any object and any attribute, just likegetattr). In English, this line of code says, “If this module has the class named by subclass then return it, otherwise return the base classFileInfo.”
- Python Tutorial discusses exactly when and how default arguments are evaluated. -
- Python Library Reference documents the
sysmodule. +- Python Library Reference documents the
sysmodule.6.5. Working with Directories
-The
os.pathmodule has several functions for manipulating files and directories. Here, we're looking at handling pathnames and listing +The
os.pathmodule has several functions for manipulating files and directories. Here, we're looking at handling pathnames and listing the contents of a directory.Example 6.16. Constructing Pathnames
>>> import os @@ -4739,27 +4579,27 @@ stat- ![]()
os.pathis a reference to a module -- which module depends on your platform. Just asgetpassencapsulates differences between platforms by settinggetpassto a platform-specific function,osencapsulates differences between platforms by settingpathto a platform-specific module. +os.pathis a reference to a module -- which module depends on your platform. Just asgetpassencapsulates differences between platforms by setting getpass to a platform-specific function,osencapsulates differences between platforms by setting path to a platform-specific module.- ![]()
The joinfunction ofos.pathconstructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. (Note that dealing +The joinfunction ofos.pathconstructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. (Note that dealing with pathnames on Windows is annoying because the backslash character must be escaped.)- ![]()
In this slightly less trivial case, joinwill add an extra backslash to the pathname before joining it to the filename. I was overjoyed when I discovered this, since -addSlashIfNecessaryis one of the stupid little functions I always need to write when building up my toolbox in a new language. Do not write this stupid little function in Python; smart people have already taken care of it for you. +In this slightly less trivial case, joinwill add an extra backslash to the pathname before joining it to the filename. I was overjoyed when I discovered this, since +addSlashIfNecessaryis one of the stupid little functions I always need to write when building up my toolbox in a new language. Do not write this stupid little function in Python; smart people have already taken care of it for you.@@ -4785,32 +4625,32 @@ stat - ![]()
expanduserwill expand a pathname that uses~to represent the current user's home directory. This works on any platform where users have a home directory, like Windows, +expanduserwill expand a pathname that uses~to represent the current user's home directory. This works on any platform where users have a home directory, like Windows, UNIX, and Mac OS X; it has no effect on Mac OS.- ![]()
The splitfunction splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use -multi-variable assignment to return multiple values from a function? Well,splitis such a function. +The splitfunction splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use +multi-variable assignment to return multiple values from a function? Well,splitis such a function.- ![]()
You assign the return value of the splitfunction into a tuple of two variables. Each variable receives the value of the corresponding element of the returned tuple. +You assign the return value of the splitfunction into a tuple of two variables. Each variable receives the value of the corresponding element of the returned tuple.- ![]()
The first variable, filepath, receives the value of the first element of the tuple returned fromsplit, the file path. +The first variable, filepath, receives the value of the first element of the tuple returned from split, the file path.- ![]()
The second variable, filename, receives the value of the second element of the tuple returned fromsplit, the filename. +The second variable, filename, receives the value of the second element of the tuple returned from split, the filename.@@ -4839,30 +4679,30 @@ stat - ![]()
os.pathalso contains a functionsplitext, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique +os.pathalso contains a functionsplitext, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique to assign each of them to separate variables.- ![]()
The listdirfunction takes a pathname and returns a list of the contents of the directory. +The listdirfunction takes a pathname and returns a list of the contents of the directory.- ![]()
listdirreturns both files and folders, with no indication of which is which. +listdirreturns both files and folders, with no indication of which is which.- ![]()
You can use list filtering and the isfilefunction of theos.pathmodule to separate the files from the folders.isfiletakes a pathname and returns 1 if the path represents a file, and 0 otherwise. Here you're usingto ensure a full pathname, butos.path.joinisfilealso works with a partial path, relative to the current working directory. You can useos.getcwd()to get the current working directory. +You can use list filtering and the isfilefunction of theos.pathmodule to separate the files from the folders.isfiletakes a pathname and returns 1 if the path represents a file, and 0 otherwise. Here you're usingto ensure a full pathname, butos.path.joinisfilealso works with a partial path, relative to the current working directory. You can useos.getcwd()to get the current working directory.- - ![]()
os.pathalso has aisdirfunction which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories +os.pathalso has aisdirfunction which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories within a directory.Example 6.19. Listing Directories in
fileinfo.py+Example 6.19. Listing Directories in
fileinfo.pydef listDirectory(directory, fileExtList): "get list of file info objects for files of particular extensions" fileList = [os.path.normcase(f) @@ -4874,25 +4714,25 @@ def listDirectory(directory, fileExtList):- ![]()
os.listdir(directory)returns a list of all the files and folders indirectory. +os.listdir(directory)returns a list of all the files and folders in directory.- ![]()
Iterating through the list with f, you useos.path.normcase(f)to normalize the case according to operating system defaults.normcaseis a useful little function that compensates for case-insensitive operating systems that think thatmahadeva.mp3andmahadeva.MP3are the same file. For instance, on Windows and Mac OS,normcasewill convert the entire filename to lowercase; on UNIX-compatible systems, it will return the filename unchanged. +Iterating through the list with f, you use os.path.normcase(f)to normalize the case according to operating system defaults.normcaseis a useful little function that compensates for case-insensitive operating systems that think thatmahadeva.mp3andmahadeva.MP3are the same file. For instance, on Windows and Mac OS,normcasewill convert the entire filename to lowercase; on UNIX-compatible systems, it will return the filename unchanged.- ![]()
Iterating through the normalized list with fagain, you useos.path.splitext(f)to split each filename into name and extension. +Iterating through the normalized list with f again, you use os.path.splitext(f)to split each filename into name and extension.- ![]()
For each file, you see if the extension is in the list of file extensions you care about ( fileExtList, which was passed to thelistDirectoryfunction). +For each file, you see if the extension is in the list of file extensions you care about (fileExtList, which was passed to the listDirectoryfunction).@@ -4907,14 +4747,14 @@ def listDirectory(directory, fileExtList): - Whenever possible, you should use the functions in osandos.pathfor file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like -os.path.splitwork on UNIX, Windows, Mac OS, and any other platform supported by Python. +Whenever possible, you should use the functions in osandos.pathfor file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like +os.path.splitwork on UNIX, Windows, Mac OS, and any other platform supported by Python.There is one other way to get the contents of a directory. It's very powerful, and it uses the sort of wildcards that you may already be familiar with from working on the command line. -
Example 6.20. Listing Directories with
glob+Example 6.20. Listing Directories with
glob>>> os.listdir("c:\\music\\_singles\\")['a_time_long_forgotten_con.mp3', 'hellraiser.mp3', 'kairo.mp3', 'long_way_home1.mp3', 'sidewinder.mp3', @@ -4936,14 +4776,14 @@ may already be familiar with from working on the command line.
- ![]()
As you saw earlier, os.listdirsimply takes a directory path and lists all files and directories in that directory. +As you saw earlier, os.listdirsimply takes a directory path and lists all files and directories in that directory.- ![]()
The globmodule, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard. - Here the wildcard is a directory path plus "*.mp3", which will match all.mp3files. Note that each element of the returned list already includes the full path of the file. +The globmodule, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard. + Here the wildcard is a directory path plus "*.mp3", which will match all.mp3files. Note that each element of the returned list already includes the full path of the file.@@ -4954,22 +4794,22 @@ may already be familiar with from working on the command line. - ![]()
Now consider this scenario: you have a musicdirectory, with several subdirectories within it, with.mp3files within each subdirectory. You can get a list of all of those with a single call toglob, by using two wildcards at once. One wildcard is the"*.mp3"(to match.mp3files), and one wildcard is within the directory path itself, to match any subdirectory withinc:\music. That's a crazy amount of power packed into one deceptively simple-looking function! +Now consider this scenario: you have a musicdirectory, with several subdirectories within it, with.mp3files within each subdirectory. You can get a list of all of those with a single call toglob, by using two wildcards at once. One wildcard is the"*.mp3"(to match.mp3files), and one wildcard is within the directory path itself, to match any subdirectory withinc:\music. That's a crazy amount of power packed into one deceptively simple-looking function!-Further Reading on the
+osModuleFurther Reading on the
osModule-
- Python Knowledge Base answers questions about the
osmodule. +- Python Knowledge Base answers questions about the
osmodule. -- Python Library Reference documents the
osmodule and theos.pathmodule. +- Python Library Reference documents the
osmodule and theos.pathmodule.6.6. Putting It All Together
Once again, all the dominoes are in place. You've seen how each line of code works. Now let's step back and see how it all fits together. -
Example 6.21.
listDirectory+Example 6.21.
listDirectorydef listDirectory(directory, fileExtList):"get list of file info objects for files of particular extensions" fileList = [os.path.normcase(f) @@ -4986,51 +4826,51 @@ def listDirectory(directory, fileExtList):
-
listDirectoryis the main attraction of this entire module. It takes a directory (likec:\music\_singles\in my case) and a list of interesting file extensions (like['.mp3']), and it returns a list of class instances that act like dictionaries that contain metadata about each interesting file in +listDirectoryis the main attraction of this entire module. It takes a directory (likec:\music\_singles\in my case) and a list of interesting file extensions (like['.mp3']), and it returns a list of class instances that act like dictionaries that contain metadata about each interesting file in that directory. And it does it in just a few straightforward lines of code.- ![]()
As you saw in the previous section, this line of code gets a list of the full pathnames of all the files in directorythat have an interesting file extension (as specified byfileExtList). +As you saw in the previous section, this line of code gets a list of the full pathnames of all the files in directory that have an interesting file extension (as specified by fileExtList). - ![]()
Old-school Pascal programmers may be familiar with them, but most people give me a blank stare when I tell them that Python supports nested functions -- literally, a function within a function. The nested function getFileInfoClasscan be called only from the function in which it is defined,listDirectory. As with any other function, you don't need an interface declaration or anything fancy; just define the function and code +Old-school Pascal programmers may be familiar with them, but most people give me a blank stare when I tell them that Python supports nested functions -- literally, a function within a function. The nested function getFileInfoClasscan be called only from the function in which it is defined,listDirectory. As with any other function, you don't need an interface declaration or anything fancy; just define the function and code it.- ![]()
Now that you've seen the osmodule, this line should make more sense. It gets the extension of the file (os.path.splitext(filename)[1]), forces it to uppercase (.upper()), slices off the dot ([1:]), and constructs a class name out of it with string formatting. Soc:\music\ap\mahadeva.mp3becomes.mp3becomes.MP3becomesMP3becomesMP3FileInfo. +Now that you've seen the osmodule, this line should make more sense. It gets the extension of the file (os.path.splitext(filename)[1]), forces it to uppercase (.upper()), slices off the dot ([1:]), and constructs a class name out of it with string formatting. Soc:\music\ap\mahadeva.mp3becomes.mp3becomes.MP3becomesMP3becomesMP3FileInfo.![]()
Having constructed the name of the handler class that would handle this file, you check to see if that handler class actually - exists in this module. If it does, you return the class, otherwise you return the base class FileInfo. This is a very important point: this function returns a class. Not an instance of a class, but the class itself. + exists in this module. If it does, you return the class, otherwise you return the base classFileInfo. This is a very important point: this function returns a class. Not an instance of a class, but the class itself.- - ![]()
For each file in the “interesting files” list ( fileList), you callgetFileInfoClasswith the filename (f). CallinggetFileInfoClass(f)returns a class; you don't know exactly which class, but you don't care. You then create an instance of this class (whatever - it is) and pass the filename (fagain), to the__init__method. As you saw earlier in this chapter, the__init__method ofFileInfosetsself["name"], which triggers__setitem__, which is overridden in the descendant (MP3FileInfo) to parse the file appropriately to pull out the file's metadata. You do all that for each interesting file and return a +For each file in the “interesting files” list (fileList), you call getFileInfoClasswith the filename (f). CallinggetFileInfoClass(f)returns a class; you don't know exactly which class, but you don't care. You then create an instance of this class (whatever + it is) and pass the filename (f again), to the__init__method. As you saw earlier in this chapter, the__init__method ofFileInfosetsself["name"], which triggers__setitem__, which is overridden in the descendant (MP3FileInfo) to parse the file appropriately to pull out the file's metadata. You do all that for each interesting file and return a list of the resulting instances.Note that
listDirectoryis completely generic. It doesn't know ahead of time which types of files it will be getting, or which classes are defined +Note that
listDirectoryis completely generic. It doesn't know ahead of time which types of files it will be getting, or which classes are defined that could potentially handle those files. It inspects the directory for the files to process, and then introspects its own -module to see what special handler classes (likeMP3FileInfo) are defined. You can extend this program to handle other types of files simply by defining an appropriately-named class: -HTMLFileInfofor HTML files,DOCFileInfofor Word.docfiles, and so forth.listDirectorywill handle them all, without modification, by handing off the real work to the appropriate classes and collating the results. +module to see what special handler classes (likeMP3FileInfo) are defined. You can extend this program to handle other types of files simply by defining an appropriately-named class: +HTMLFileInfofor HTML files,DOCFileInfofor Word.docfiles, and so forth.listDirectorywill handle them all, without modification, by handing off the real work to the appropriate classes and collating the results.6.7. Summary
-The
fileinfo.pyprogram introduced in Chapter 5 should now make perfect sense. +The
fileinfo.pyprogram introduced in Chapter 5 should now make perfect sense."""Framework for getting filetype-specific metadata. @@ -5116,7 +4956,7 @@ if __name__ == "__main__":- Protecting external resources with
try...finally- Reading from files
- Assigning multiple values at once in a
forloop -- Using the
osmodule for all your cross-platform file manipulation needs +- Using the
osmodule for all your cross-platform file manipulation needs- Dynamically instantiating classes of unknown type by treating classes as objects and passing them around @@ -5124,13 +4964,13 @@ if __name__ == "__main__":
Chapter 7. Regular Expressions
Regular expressions are a powerful and standardized way of searching, replacing, and parsing text with complex patterns of -characters. If you've used regular expressions in other languages (like Perl), the syntax will be very familiar, and you get by just reading the summary of the
remodule to get an overview of the available functions and their arguments. +characters. If you've used regular expressions in other languages (like Perl), the syntax will be very familiar, and you get by just reading the summary of theremodule to get an overview of the available functions and their arguments.7.1. Diving In
-Strings have methods for searching (
index,find, andcount), replacing (replace), and parsing (split), but they are limited to the simplest of cases. The search methods look for a single, hard-coded substring, and they are -always case-sensitive. To do case-insensitive searches of a strings, you must calls.lower()ors.upper()and make sure your search strings are the appropriate case to match. Thereplaceandsplitmethods have the same limitations. +Strings have methods for searching (
index,find, andcount), replacing (replace), and parsing (split), but they are limited to the simplest of cases. The search methods look for a single, hard-coded substring, and they are +always case-sensitive. To do case-insensitive searches of a string s, you must calls.lower()ors.upper()and make sure your search strings are the appropriate case to match. Thereplaceandsplitmethods have the same limitations.If what you're trying to do can be accomplished with string functions, you should use them. They're fast and simple and easy to read, and there's a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different - string functions with
ifstatements to handle special cases, or if you're combining them withsplitandjoinand list comprehensions in weird unreadable ways, you may need to move up to regular expressions. + string functions withifstatements to handle special cases, or if you're combining them withsplitandjoinand list comprehensions in weird unreadable ways, you may need to move up to regular expressions.Although the regular expression syntax is tight and unlike normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions to make them practically self-documenting.
7.2. Case Study: Street Addresses
@@ -5153,13 +4993,13 @@ within regular expressions to make them practically self-documenting.- ![]()
My goal is to standardize a street address so that 'ROAD'is always abbreviated as'RD.'. At first glance, I thought this was simple enough that I could just use the string methodreplace. After all, all the data was already uppercase, so case mismatches would not be a problem. And the search string,'ROAD', was a constant. And in this deceptively simple example,s.replacedoes indeed work. +My goal is to standardize a street address so that 'ROAD'is always abbreviated as'RD.'. At first glance, I thought this was simple enough that I could just use the string methodreplace. After all, all the data was already uppercase, so case mismatches would not be a problem. And the search string,'ROAD', was a constant. And in this deceptively simple example,s.replacedoes indeed work.- ![]()
Life, unfortunately, is full of counterexamples, and I quickly discovered this one. The problem here is that 'ROAD'appears twice in the address, once as part of the street name'BROAD'and once as its own word. Thereplacemethod sees these two occurrences and blindly replaces both of them; meanwhile, I see my addresses getting destroyed. +Life, unfortunately, is full of counterexamples, and I quickly discovered this one. The problem here is that 'ROAD'appears twice in the address, once as part of the street name'BROAD'and once as its own word. Thereplacemethod sees these two occurrences and blindly replaces both of them; meanwhile, I see my addresses getting destroyed.@@ -5172,7 +5012,7 @@ within regular expressions to make them practically self-documenting. - ![]()
It's time to move up to regular expressions. In Python, all functionality related to regular expressions is contained in the remodule. +It's time to move up to regular expressions. In Python, all functionality related to regular expressions is contained in the remodule.@@ -5184,7 +5024,7 @@ within regular expressions to make them practically self-documenting. @@ -5224,7 +5064,7 @@ ended with the street name. Most of the time, I got away with it, but if the st - ![]()
Using the re.subfunction, you search the stringsfor the regular expression'ROAD$'and replace it with'RD.'. This matches theROADat the end of the strings, but does not match theROADthat's part of the wordBROAD, because that's in the middle ofs. +Using the re.subfunction, you search the string s for the regular expression'ROAD$'and replace it with'RD.'. This matches theROADat the end of the string s, but does not match theROADthat's part of the wordBROAD, because that's in the middle of s.*sigh* Unfortunately, I soon found more cases that contradicted my logic. In this case, the street address contained the word 'ROAD'as a whole word by itself, but it wasn't at the end, because the address had an apartment number after the street designation. - Because'ROAD'isn't at the very end of the string, it doesn't match, so the entire call tore.subends up replacing nothing at all, and you get the original string back, which is not what you want. + Because'ROAD'isn't at the very end of the string, it doesn't match, so the entire call tore.subends up replacing nothing at all, and you get the original string back, which is not what you want.@@ -5252,7 +5092,7 @@ ended with the street name. Most of the time, I got away with it, but if the st The following are some general rules for constructing Roman numerals:
-
- Characters are additive.
Iis1,IIis2, andIIIis3.VIis6(literally, “5and1”),VIIis7, andVIIIis8. +- Characters are additive.
Iis1,IIis2, andIIIis3.VIis6(literally, “5and1”),VIIis7, andVIIIis8.- The tens characters (
I,X,C, andM) can be repeated up to three times. At4, you need to subtract from the next highest fives character. You can't represent4asIIII; instead, it is represented asIV(“1less than5”). The number40is written asXL(10less than50),41asXLI,42asXLII,43asXLIII, and then44asXLIV(10less than50, then1less than5). @@ -5301,8 +5141,8 @@ ended with the street name. Most of the time, I got away with it, but if the st- ![]()
The essence of the remodule is thesearchfunction, that takes a regular expression (pattern) and a string ('M') to try to match against the regular expression. If a match is found,searchreturns an object which has various methods to describe the match; if no match is found,searchreturnsNone, the Python null value. All you care about at the moment is whether the pattern matches, which you can tell by just looking at the return - value ofsearch.'M'matches this regular expression, because the first optionalMmatches and the second and third optionalMcharacters are ignored. +The essence of the remodule is thesearchfunction, that takes a regular expression (pattern) and a string ('M') to try to match against the regular expression. If a match is found,searchreturns an object which has various methods to describe the match; if no match is found,searchreturnsNone, the Python null value. All you care about at the moment is whether the pattern matches, which you can tell by just looking at the return + value ofsearch.'M'matches this regular expression, because the first optionalMmatches and the second and third optionalMcharacters are ignored.@@ -5320,7 +5160,7 @@ ended with the street name. Most of the time, I got away with it, but if the st - ![]()
'MMMM'does not match. All threeMcharacters match, but then the regular expression insists on the string ending (because of the$character), and the string doesn't end yet (because of the fourthM). SosearchreturnsNone. +'MMMM'does not match. All threeMcharacters match, but then the regular expression insists on the string ending (because of the$character), and the string doesn't end yet (because of the fourthM). SosearchreturnsNone.@@ -5649,7 +5489,7 @@ it a verbose regular expression. This example shows how. ![]()
The most important thing to remember when using verbose regular expressions is that you need to pass an extra argument when - working with them: @@ -5669,7 +5509,7 @@ it a verbose regular expression. This example shows how.re.VERBOSEis a constant defined in theremodule that signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has + working with them:re.VERBOSEis a constant defined in theremodule that signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored). Once you ignore the whitespace and the comments, this is exactly the same regular expression as you saw in the previous section, but it's a lot more readable.@@ -5713,7 +5553,7 @@ examples of regular expressions that purported to do this, but none of them were - ![]()
This does not match. Why? Because it doesn't have the re.VERBOSEflag, so there.searchfunction is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks. Python can't auto-detect whether a regular expression is verbose or not. Python assumes every regular expression is compact unless you explicitly state that it is verbose. +This does not match. Why? Because it doesn't have the re.VERBOSEflag, so there.searchfunction is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks. Python can't auto-detect whether a regular expression is verbose or not. Python assumes every regular expression is compact unless you explicitly state that it is verbose.@@ -5747,7 +5587,7 @@ examples of regular expressions that purported to do this, but none of them were - ![]()
To get access to the groups that the regular expression parser remembered along the way, use the groups()method on the object that thesearchfunction returns. It will return a tuple of however many groups were defined in the regular expression. In this case, you +To get access to the groups that the regular expression parser remembered along the way, use the groups()method on the object that thesearchfunction returns. It will return a tuple of however many groups were defined in the regular expression. In this case, you defined three groups, one with three digits, one with three digits, and one with four digits.- ![]()
The groups()method now returns a tuple of four elements, since the regular expression now defines four groups to remember. +The groups()method now returns a tuple of four elements, since the regular expression now defines four groups to remember.@@ -5848,7 +5688,7 @@ examples of regular expressions that purported to do this, but none of them were - ![]()
Finally, you've solved the other long-standing problem: extensions are optional again. If no extension is found, the groups()method still returns a tuple of four elements, but the fourth element is just an empty string. +Finally, you've solved the other long-standing problem: extensions are optional again. If no extension is found, the groups()method still returns a tuple of four elements, but the fourth element is just an empty string.@@ -5983,7 +5823,7 @@ you made.
- Regular Expression HOWTO teaches about regular expressions and how to use them in Python. -
- Python Library Reference summarizes the
remodule. +- Python Library Reference summarizes the
remodule.7.7. Summary
@@ -6012,7 +5852,7 @@ you made.(a|b|c)matches eitheraorborc. -(x)in general is a remembered group. You can get the value of what matched by using thegroups()method of the object returned byre.search. +(x)in general is a remembered group. You can get the value of what matched by using thegroups()method of the object returned byre.search.Regular expressions are extremely powerful, but they are not the correct solution for every problem. You should learn enough @@ -6036,9 +5876,9 @@ they solve.
Chapter 8. HTML Processing
8.1. Diving in
I often see questions on comp.lang.python like “How can I list all the [headers|images|links] in my HTML document?” “How do I parse/translate/munge the text of my HTML document but leave the tags alone?” “How can I add/remove/quote attributes of all my HTML tags at once?” This chapter will answer all of these questions. -
Here is a complete, working Python program in two parts. The first part,
BaseHTMLProcessor.py, is a generic tool to help you process HTML files by walking through the tags and text blocks. The second part,dialect.py, is an example of how to useBaseHTMLProcessor.pyto translate the text of an HTML document but leave the tags alone. Read thedocstrings and comments to get an overview of what's going on. Most of it will seem like black magic, because it's not obvious how +Here is a complete, working Python program in two parts. The first part,
BaseHTMLProcessor.py, is a generic tool to help you process HTML files by walking through the tags and text blocks. The second part,dialect.py, is an example of how to useBaseHTMLProcessor.pyto translate the text of an HTML document but leave the tags alone. Read thedocstrings and comments to get an overview of what's going on. Most of it will seem like black magic, because it's not obvious how any of these class methods ever get called. Don't worry, all will be revealed in due time. -Example 8.1.
+BaseHTMLProcessor.pyExample 8.1.
BaseHTMLProcessor.pyIf you have not already done so, you can download this and other examples used in this book.
from sgmllib import SGMLParser import htmlentitydefs @@ -6110,7 +5950,7 @@ class BaseHTMLProcessor(SGMLParser): def output(self): """Return processed HTML as a single string""" - return "".join(self.pieces)Example 8.2.
dialect.py+ return "".join(self.pieces)Example 8.2.
dialect.pyimport re from BaseHTMLProcessor import BaseHTMLProcessor @@ -6263,7 +6103,7 @@ def test(url): webbrowser.open_new(outfile) if __name__ == "__main__": - test("http://diveintopython3.org/odbchelper_list.html")Example 8.3. Output of
+ test("http://diveintopython3.org/odbchelper_list.html")dialect.pyExample 8.3. Output of
dialect.pyRunning this script will translate Section 3.2, “Introducing Lists” into mock Swedish Chef-speak (from The Muppets), mock Elmer Fudd-speak (from Bugs Bunny cartoons), and mock Middle English (loosely based on Chaucer's The Canterbury Tales). If you look at the HTML source of the output pages, you'll see that all the HTML tags and attributes are untouched, but the text between the tags has been “translated” into the mock language. If you look closer, you'll see that, in fact, only the titles and paragraphs were translated; the code listings and screen examples were left untouched.
<div class="abstract"> @@ -6273,38 +6113,38 @@ If youw onwy expewience wif wists is awways in in <span class="application">Powewbuiwdew</span>, bwace youwsewf fow <span class="application">Pydon</span> wists.</p> </div> -8.2. Introducing
-sgmllib.pyHTML processing is broken into three steps: breaking down the HTML into its constituent pieces, fiddling with the pieces, and reconstructing the pieces into HTML again. The first step is done by
sgmllib.py, a part of the standard Python library. +8.2. Introducing
+sgmllib.pyHTML processing is broken into three steps: breaking down the HTML into its constituent pieces, fiddling with the pieces, and reconstructing the pieces into HTML again. The first step is done by
sgmllib.py, a part of the standard Python library.The key to understanding this chapter is to realize that HTML is not just text, it is structured text. The structure is derived from the more-or-less-hierarchical sequence of start tags -and end tags. Usually you don't work with HTML this way; you work with it textually in a text editor, or visually in a web browser or web authoring tool.
sgmllib.pypresents HTML structurally. -
sgmllib.pycontains one important class:SGMLParser.SGMLParserparses HTML into useful pieces, like start tags and end tags. As soon as it succeeds in breaking down some data into a useful piece, -it calls a method on itself based on what it found. In order to use the parser, you subclass theSGMLParserclass and override these methods. This is what I meant when I said that it presents HTML structurally: the structure of the HTML determines the sequence of method calls and the arguments passed to each method. -
SGMLParserparses HTML into 8 kinds of data, and calls a separate method for each of them: +and end tags. Usually you don't work with HTML this way; you work with it textually in a text editor, or visually in a web browser or web authoring tool.sgmllib.pypresents HTML structurally. +
sgmllib.pycontains one important class:SGMLParser.SGMLParserparses HTML into useful pieces, like start tags and end tags. As soon as it succeeds in breaking down some data into a useful piece, +it calls a method on itself based on what it found. In order to use the parser, you subclass theSGMLParserclass and override these methods. This is what I meant when I said that it presents HTML structurally: the structure of the HTML determines the sequence of method calls and the arguments passed to each method. +
SGMLParserparses HTML into 8 kinds of data, and calls a separate method for each of them:
- Start tag
-- An HTML tag that starts a block, like
<html>,<head>,<body>, or<pre>, or a standalone tag like<br>or<img>. When it finds a start tagtagname,SGMLParserwill look for a method calledstart_ortagnamedo_. For instance, when it finds atagname<pre>tag, it will look for astart_preordo_premethod. If found,SGMLParsercalls this method with a list of the tag's attributes; otherwise, it callsunknown_starttagwith the tag name and list of attributes. +- An HTML tag that starts a block, like
<html>,<head>,<body>, or<pre>, or a standalone tag like<br>or<img>. When it finds a start tagtagname,SGMLParserwill look for a method calledstart_ortagnamedo_. For instance, when it finds atagname<pre>tag, it will look for astart_preordo_premethod. If found,SGMLParsercalls this method with a list of the tag's attributes; otherwise, it callsunknown_starttagwith the tag name and list of attributes.- End tag
-- An HTML tag that ends a block, like
</html>,</head>,</body>, or</pre>. When it finds an end tag,SGMLParserwill look for a method calledend_. If found,tagnameSGMLParsercalls this method, otherwise it callsunknown_endtagwith the tag name. +- An HTML tag that ends a block, like
</html>,</head>,</body>, or</pre>. When it finds an end tag,SGMLParserwill look for a method calledend_. If found,tagnameSGMLParsercalls this method, otherwise it callsunknown_endtagwith the tag name.- Character reference
-- An escaped character referenced by its decimal or hexadecimal equivalent, like
 . When found,SGMLParsercallshandle_charrefwith the text of the decimal or hexadecimal character equivalent. +- An escaped character referenced by its decimal or hexadecimal equivalent, like
 . When found,SGMLParsercallshandle_charrefwith the text of the decimal or hexadecimal character equivalent.- Entity reference
-- An HTML entity, like
©. When found,SGMLParsercallshandle_entityrefwith the name of the HTML entity. +- An HTML entity, like
©. When found,SGMLParsercallshandle_entityrefwith the name of the HTML entity.- Comment
-- An HTML comment, enclosed in
<!-- ... -->. When found,SGMLParsercallshandle_commentwith the body of the comment. +- An HTML comment, enclosed in
<!-- ... -->. When found,SGMLParsercallshandle_commentwith the body of the comment.- Processing instruction
-- An HTML processing instruction, enclosed in
<? ... >. When found,SGMLParsercallshandle_piwith the body of the processing instruction. +- An HTML processing instruction, enclosed in
<? ... >. When found,SGMLParsercallshandle_piwith the body of the processing instruction.- Declaration
-- An HTML declaration, such as a
DOCTYPE, enclosed in<! ... >. When found,SGMLParsercallshandle_declwith the body of the declaration. +- An HTML declaration, such as a
DOCTYPE, enclosed in<! ... >. When found,SGMLParsercallshandle_declwith the body of the declaration.- Text data
-- A block of text. Anything that doesn't fit into the other 7 categories. When found,
SGMLParsercallshandle_datawith the text. +- A block of text. Anything that doesn't fit into the other 7 categories. When found,
SGMLParsercallshandle_datawith the text.@@ -6312,12 +6152,12 @@ it calls a method on itself based on what it found. In order to use the parser,
-- Python 2.0 had a bug where SGMLParserwould not recognize declarations at all (handle_declwould never be called), which meant thatDOCTYPEs were silently ignored. This is fixed in Python 2.1. +Python 2.0 had a bug where SGMLParserwould not recognize declarations at all (handle_declwould never be called), which meant thatDOCTYPEs were silently ignored. This is fixed in Python 2.1.
sgmllib.pycomes with a test suite to illustrate this. You can runsgmllib.py, passing the name of an HTML file on the command line, and it will print out the tags and other elements as it parses them. It does this by subclassing -theSGMLParserclass and definingunknown_starttag,unknown_endtag,handle_dataand other methods which simply print their arguments.+
-
sgmllib.pycomes with a test suite to illustrate this. You can runsgmllib.py, passing the name of an HTML file on the command line, and it will print out the tags and other elements as it parses them. It does this by subclassing +theSGMLParserclass and definingunknown_starttag,unknown_endtag,handle_dataand other methods which simply print their arguments.-
@@ -6326,7 +6166,7 @@ the SGMLParserclass and definingExample 8.4. Sample test of
+sgmllib.pyExample 8.4. Sample test of
sgmllib.pyHere is a snippet from the table of contents of the HTML version of this book. Of course your paths may vary. (If you haven't downloaded the HTML version of the book, you can do so at http://diveintopython3.org/.
c:\python23\lib> type "c:\downloads\diveintopython3\html\toc\index.html"@@ -6340,7 +6180,7 @@ theSGMLParserclass and defining... rest of file omitted for brevity ... -Running this through the test suite of
sgmllib.pyyields this output:+Running this through the test suite of
sgmllib.pyyields this output:c:\python23\lib> python sgmllib.py "c:\downloads\diveintopython3\html\toc\index.html" data: '\n\n' start tag: <html > @@ -6360,22 +6200,22 @@ data: '\n 'Here's the roadmap for the rest of the chapter:
-
-- Subclass
SGMLParserto create classes that extract interesting data out of HTML documents. +- Subclass
SGMLParserto create classes that extract interesting data out of HTML documents. -- Subclass
SGMLParserto createBaseHTMLProcessor, which overrides all 8 handler methods and uses them to reconstruct the original HTML from the pieces. +- Subclass
SGMLParserto createBaseHTMLProcessor, which overrides all 8 handler methods and uses them to reconstruct the original HTML from the pieces. -- Subclass
BaseHTMLProcessorto createDialectizer, which adds some methods to process specific HTML tags specially, and overrides thehandle_datamethod to provide a framework for processing the text blocks between the HTML tags. +- Subclass
BaseHTMLProcessorto createDialectizer, which adds some methods to process specific HTML tags specially, and overrides thehandle_datamethod to provide a framework for processing the text blocks between the HTML tags. -- Subclass
Dialectizerto create classes that define text processing rules used byDialectizer.handle_data. +- Subclass
Dialectizerto create classes that define text processing rules used byDialectizer.handle_data. -- Write a test suite that grabs a real web page from
http://diveintopython3.org/and processes it. +- Write a test suite that grabs a real web page from
http://diveintopython3.org/and processes it.Along the way, you'll also learn about
locals,globals, and dictionary-based string formatting. +Along the way, you'll also learn about
locals,globals, and dictionary-based string formatting.8.3. Extracting data from HTML documents
-To extract data from HTML documents, subclass the
SGMLParserclass and define methods for each tag or entity you want to capture. +To extract data from HTML documents, subclass the
SGMLParserclass and define methods for each tag or entity you want to capture.The first step to extracting data from an HTML document is getting some HTML. If you have some HTML lying around on your hard drive, you can use file functions to read it, but the real fun begins when you get HTML from live web pages. -
Example 8.5. Introducing
urllib+Example 8.5. Introducing
urllib>>> import urllib>>> sock = urllib.urlopen("http://diveintopython3.org/")
>>> htmlSource = sock.read()
@@ -6400,35 +6240,35 @@ data: '\n '
- ![]()
The urllibmodule is part of the standard Python library. It contains functions for getting information about and actually retrieving data from Internet-based URLs (mainly web pages). +The urllibmodule is part of the standard Python library. It contains functions for getting information about and actually retrieving data from Internet-based URLs (mainly web pages).- ![]()
The simplest use of urllibis to retrieve the entire text of a web page using theurlopenfunction. Opening a URL is similar to opening a file. The return value ofurlopenis a file-like object, which has some of the same methods as a file object. +The simplest use of urllibis to retrieve the entire text of a web page using theurlopenfunction. Opening a URL is similar to opening a file. The return value ofurlopenis a file-like object, which has some of the same methods as a file object.- ![]()
The simplest thing to do with the file-like object returned by urlopenisread, which reads the entire HTML of the web page into a single string. The object also supportsreadlines, which reads the text line by line into a list. +The simplest thing to do with the file-like object returned by urlopenisread, which reads the entire HTML of the web page into a single string. The object also supportsreadlines, which reads the text line by line into a list.- ![]()
When you're done with the object, make sure to closeit, just like a normal file object. +When you're done with the object, make sure to closeit, just like a normal file object.- ![]()
You now have the complete HTML of the home page of http://diveintopython3.org/in a string, and you're ready to parse it. +You now have the complete HTML of the home page of http://diveintopython3.org/in a string, and you're ready to parse it.Example 8.6. Introducing
+urllister.pyExample 8.6. Introducing
urllister.pyIf you have not already done so, you can download this and other examples used in this book.
from sgmllib import SGMLParser @@ -6445,30 +6285,30 @@ class URLLister(SGMLParser):- ![]()
resetis called by the__init__method ofSGMLParser, and it can also be called manually once an instance of the parser has been created. So if you need to do any initialization, - do it inreset, not in__init__, so that it will be re-initialized properly when someone re-uses a parser instance. +resetis called by the__init__method ofSGMLParser, and it can also be called manually once an instance of the parser has been created. So if you need to do any initialization, + do it inreset, not in__init__, so that it will be re-initialized properly when someone re-uses a parser instance.- ![]()
start_ais called bySGMLParserwhenever it finds an<a>tag. The tag may contain anhrefattribute, and/or other attributes, likenameortitle. Theattrsparameter is a list of tuples,[(attribute, value), (attribute, value), ...]. Or it may be just an<a>, a valid (if useless) HTML tag, in which caseattrswould be an empty list. +start_ais called bySGMLParserwhenever it finds an<a>tag. The tag may contain anhrefattribute, and/or other attributes, likenameortitle. The attrs parameter is a list of tuples,[(attribute, value), (attribute, value), ...]. Or it may be just an<a>, a valid (if useless) HTML tag, in which case attrs would be an empty list.- ![]()
You can find out whether this <a>tag has anhrefattribute with a simple multi-variable list comprehension. +You can find out whether this <a>tag has anhrefattribute with a simple multi-variable list comprehension.- ![]()
String comparisons like k=='href'are always case-sensitive, but that's safe in this case, becauseSGMLParserconverts attribute names to lowercase while buildingattrs. +String comparisons like k=='href'are always case-sensitive, but that's safe in this case, becauseSGMLParserconverts attribute names to lowercase while building attrs.Example 8.7. Using
urllister.py+Example 8.7. Using
urllister.py>>> import urllib, urllister >>> usock = urllib.urlopen("http://diveintopython3.org/") >>> parser = urllister.URLLister() @@ -6495,34 +6335,34 @@ download/diveintopython3-common-5.0.zip- ![]()
Call the feedmethod, defined inSGMLParser, to get HTML into the parser.[1] It takes a string, which is whatusock.read()returns. +Call the feedmethod, defined inSGMLParser, to get HTML into the parser.[1] It takes a string, which is whatusock.read()returns.- ![]()
Like files, you should closeyour URL objects as soon as you're done with them. +Like files, you should closeyour URL objects as soon as you're done with them.- ![]()
You should closeyour parser object, too, but for a different reason. You've read all the data and fed it to the parser, but thefeedmethod isn't guaranteed to have actually processed all the HTML you give it; it may buffer it, waiting for more. Be sure to callcloseto flush the buffer and force everything to be fully parsed. +You should closeyour parser object, too, but for a different reason. You've read all the data and fed it to the parser, but thefeedmethod isn't guaranteed to have actually processed all the HTML you give it; it may buffer it, waiting for more. Be sure to callcloseto flush the buffer and force everything to be fully parsed.- - ![]()
Once the parser is closed, the parsing is complete, andparser.urlscontains a list of all the linked URLs in the HTML document. (Your output may look different, if the download links have been updated by the time you read this.) +Once the parser is closed, the parsing is complete, and parser.urls contains a list of all the linked URLs in the HTML document. (Your output may look different, if the download links have been updated by the time you read this.)8.4. Introducing
-BaseHTMLProcessor.py
SGMLParserdoesn't produce anything by itself. It parses and parses and parses, and it calls a method for each interesting thing it - finds, but the methods don't do anything.SGMLParseris an HTML consumer: it takes HTML and breaks it down into small, structured pieces. As you saw in the previous section, you can subclassSGMLParserto define classes that catch specific tags and produce useful things, like a list of all the links on a web page. Now you'll - take this one step further by defining a class that catches everythingSGMLParserthrows at it and reconstructs the complete HTML document. In technical terms, this class will be an HTML producer. -
BaseHTMLProcessorsubclassesSGMLParserand provides all 8 essential handler methods:unknown_starttag,unknown_endtag,handle_charref,handle_entityref,handle_comment,handle_pi,handle_decl, andhandle_data. -Example 8.8. Introducing
BaseHTMLProcessor+8.4. Introducing
+BaseHTMLProcessor.py
SGMLParserdoesn't produce anything by itself. It parses and parses and parses, and it calls a method for each interesting thing it + finds, but the methods don't do anything.SGMLParseris an HTML consumer: it takes HTML and breaks it down into small, structured pieces. As you saw in the previous section, you can subclassSGMLParserto define classes that catch specific tags and produce useful things, like a list of all the links on a web page. Now you'll + take this one step further by defining a class that catches everythingSGMLParserthrows at it and reconstructs the complete HTML document. In technical terms, this class will be an HTML producer. +
BaseHTMLProcessorsubclassesSGMLParserand provides all 8 essential handler methods:unknown_starttag,unknown_endtag,handle_charref,handle_entityref,handle_comment,handle_pi,handle_decl, andhandle_data. +Example 8.8. Introducing
BaseHTMLProcessorclass BaseHTMLProcessor(SGMLParser): def reset(self):self.pieces = [] @@ -6558,13 +6398,13 @@ class BaseHTMLProcessor(SGMLParser):
- ![]()
reset, called bySGMLParser.__init__, initializesself.piecesas an empty list before calling the ancestor method.self.piecesis a data attribute which will hold the pieces of the HTML document you're constructing. Each handler method will reconstruct the HTML thatSGMLParserparsed, and each method will append that string toself.pieces. Note thatself.piecesis a list. You might be tempted to define it as a string and just keep appending each piece to it. That would work, but +reset, called bySGMLParser.__init__, initializes self.pieces as an empty list before calling the ancestor method. self.pieces is a data attribute which will hold the pieces of the HTML document you're constructing. Each handler method will reconstruct the HTML thatSGMLParserparsed, and each method will append that string to self.pieces. Note that self.pieces is a list. You might be tempted to define it as a string and just keep appending each piece to it. That would work, but Python is much more efficient at dealing with lists.[2]- ![]()
Since BaseHTMLProcessordoes not define any methods for specific tags (like thestart_amethod inURLLister),SGMLParserwill callunknown_starttagfor every start tag. This method takes the tag (tag) and the list of attribute name/value pairs (attrs), reconstructs the original HTML, and appends it toself.pieces. The string formatting here is a little strange; you'll untangle that (and also the odd-lookinglocalsfunction) later in this chapter. +Since BaseHTMLProcessordoes not define any methods for specific tags (like thestart_amethod inURLLister),SGMLParserwill callunknown_starttagfor every start tag. This method takes the tag (tag) and the list of attribute name/value pairs (attrs), reconstructs the original HTML, and appends it to self.pieces. The string formatting here is a little strange; you'll untangle that (and also the odd-lookinglocalsfunction) later in this chapter.@@ -6576,21 +6416,21 @@ Python is much more efficient at dealing with lists.[ - ![]()
When SGMLParserfinds a character reference, it callshandle_charrefwith the bare reference. If the HTML document contains the reference ,refwill be160. Reconstructing the original complete character reference just involves wrappingrefin&#...;characters. +When SGMLParserfinds a character reference, it callshandle_charrefwith the bare reference. If the HTML document contains the reference , ref will be160. Reconstructing the original complete character reference just involves wrapping ref in&#...;characters.![]()
Entity references are similar to character references, but without the hash mark. Reconstructing the original entity reference - requires wrapping refin&...;characters. (Actually, as an erudite reader pointed out to me, it's slightly more complicated than this. Only certain standard -HTML entites end in a semicolon; other similar-looking entities do not. Luckily for us, the set of standard HTML entities is defined in a dictionary in a Python module calledhtmlentitydefs. Hence the extraifstatement.) + requires wrapping ref in&...;characters. (Actually, as an erudite reader pointed out to me, it's slightly more complicated than this. Only certain standard +HTML entites end in a semicolon; other similar-looking entities do not. Luckily for us, the set of standard HTML entities is defined in a dictionary in a Python module calledhtmlentitydefs. Hence the extraifstatement.)- ![]()
Blocks of text are simply appended to self.piecesunaltered. +Blocks of text are simply appended to self.pieces unaltered. @@ -6611,12 +6451,12 @@ Python is much more efficient at dealing with lists.[ ![]()
- -The HTML specification requires that all non-HTML (like client-side JavaScript) must be enclosed in HTML comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't). BaseHTMLProcessoris not forgiving; if script is improperly embedded, it will be parsed as if it were HTML. For instance, if the script contains less-than and equals signs,SGMLParsermay incorrectly think that it has found tags and attributes.SGMLParseralways converts tags and attribute names to lowercase, which may break the script, andBaseHTMLProcessoralways encloses attribute values in double quotes (even if the original HTML document used single quotes or no quotes), which will certainly break the script. Always protect your client-side script +The HTML specification requires that all non-HTML (like client-side JavaScript) must be enclosed in HTML comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't). BaseHTMLProcessoris not forgiving; if script is improperly embedded, it will be parsed as if it were HTML. For instance, if the script contains less-than and equals signs,SGMLParsermay incorrectly think that it has found tags and attributes.SGMLParseralways converts tags and attribute names to lowercase, which may break the script, andBaseHTMLProcessoralways encloses attribute values in double quotes (even if the original HTML document used single quotes or no quotes), which will certainly break the script. Always protect your client-side script within HTML comments.Example 8.9.
BaseHTMLProcessoroutput+Example 8.9.
BaseHTMLProcessoroutputdef output(self):"""Return processed HTML as a single string""" return "".join(self.pieces)
@@ -6624,13 +6464,13 @@ Python is much more efficient at dealing with lists.[- ![]()
This is the one method in BaseHTMLProcessorthat is never called by the ancestorSGMLParser. Since the other handler methods store their reconstructed HTML inself.pieces, this function is needed to join all those pieces into one string. As noted before, Python is great at lists and mediocre at strings, so you only create the complete string when somebody explicitly asks for it. +This is the one method in BaseHTMLProcessorthat is never called by the ancestorSGMLParser. Since the other handler methods store their reconstructed HTML in self.pieces, this function is needed to join all those pieces into one string. As noted before, Python is great at lists and mediocre at strings, so you only create the complete string when somebody explicitly asks for it.- ![]()
If you prefer, you could use the +joinmethod of thestringmodule instead:string.join(self.pieces, "")If you prefer, you could use the joinmethod of thestringmodule instead:string.join(self.pieces, "")@@ -6638,17 +6478,17 @@ Python is much more efficient at dealing with lists.[- W3C discusses character and entity references. -
- Python Library Reference confirms your suspicions that the
htmlentitydefsmodule is exactly what it sounds like. +- Python Library Reference confirms your suspicions that the
htmlentitydefsmodule is exactly what it sounds like. -8.5.
-localsandglobalsLet's digress from HTML processing for a minute and talk about how Python handles variables. Python has two built-in functions,
localsandglobals, which provide dictionary-based access to local and global variables. -Remember
locals? You first saw it here: +8.5.
+localsandglobalsLet's digress from HTML processing for a minute and talk about how Python handles variables. Python has two built-in functions,
localsandglobals, which provide dictionary-based access to local and global variables. +Remember
locals? You first saw it here:def unknown_starttag(self, tag, attrs): strattrs = "".join([' %s="%s"' % (key, value) for key, value in attrs]) self.pieces.append("<%(tag)s%(strattrs)s>" % locals()) -No, wait, you can't learn about
localsyet. First, you need to learn about namespaces. This is dry stuff, but it's important, so pay attention. +No, wait, you can't learn about
localsyet. First, you need to learn about namespaces. This is dry stuff, but it's important, so pay attention.Python uses what are called namespaces to keep track of variables. A namespace is just like a dictionary where the keys are names of variables and the dictionary values are the values of those variables. In fact, you can access a namespace as a Python dictionary, as you'll see in a minute.
At any particular point in a Python program, there are several namespaces available. Each function has its own namespace, called the local namespace, which @@ -6656,17 +6496,17 @@ keeps track of the function's variables, including function arguments and locall own namespace, called the global namespace, which keeps track of the module's variables, including functions, classes, any other imported modules, and module-level variables and constants. And there is the built-in namespace, accessible from any module, which holds built-in functions and exceptions. -
When a line of code asks for the value of a variable
x, Python will search for that variable in all the available namespaces, in order: +When a line of code asks for the value of a variable x, Python will search for that variable in all the available namespaces, in order:
-
-- local namespace - specific to the current function or class method. If the function defines a local variable
x, or has an argumentx, Python will use this and stop searching. +- local namespace - specific to the current function or class method. If the function defines a local variable x, or has an argument x, Python will use this and stop searching. -
- global namespace - specific to the current module. If the module has defined a variable, function, or class called
x, Python will use that and stop searching. +- global namespace - specific to the current module. If the module has defined a variable, function, or class called x, Python will use that and stop searching. -
- built-in namespace - global to all modules. As a last resort, Python will assume that
xis the name of built-in function or variable. +- built-in namespace - global to all modules. As a last resort, Python will assume that x is the name of built-in function or variable.
If Python doesn't find
xin any of these namespaces, it gives up and raises aNameErrorwith the messageThere is no variable named 'x', which you saw back in Example 3.18, “Referencing an Unbound Variable”, but you didn't appreciate how much work Python was doing before giving you that error.+
-If Python doesn't find x in any of these namespaces, it gives up and raises a
NameErrorwith the message There is no variable named 'x', which you saw back in Example 3.18, “Referencing an Unbound Variable”, but you didn't appreciate how much work Python was doing before giving you that error.-
@@ -6675,8 +6515,8 @@ module, which holds built-in functions and exceptions. from __future__ import nested_scopes Are you confused yet? Don't despair! This is really cool, I promise. Like many things in Python, namespaces are directly accessible at run-time. How? Well, the local namespace is accessible via the built-in
localsfunction, and the global (module level) namespace is accessible via the built-inglobalsfunction. -Example 8.10. Introducing
locals>>> def foo(arg):+
Are you confused yet? Don't despair! This is really cool, I promise. Like many things in Python, namespaces are directly accessible at run-time. How? Well, the local namespace is accessible via the built-in
localsfunction, and the global (module level) namespace is accessible via the built-inglobalsfunction. +Example 8.10. Introducing
locals>>> def foo(arg):... x = 1 ... print locals() ... @@ -6688,30 +6528,30 @@ from __future__ import nested_scopes
- ![]()
The function foohas two variables in its local namespace:arg, whose value is passed in to the function, andx, which is defined within the function. +The function foohas two variables in its local namespace: arg, whose value is passed in to the function, and x, which is defined within the function.- ![]()
localsreturns a dictionary of name/value pairs. The keys of this dictionary are the names of the variables as strings; the values - of the dictionary are the actual values of the variables. So callingfoowith7prints the dictionary containing the function's two local variables:arg(7) andx(1). +localsreturns a dictionary of name/value pairs. The keys of this dictionary are the names of the variables as strings; the values + of the dictionary are the actual values of the variables. So callingfoowith7prints the dictionary containing the function's two local variables: arg (7) and x (1).- ![]()
Remember, Python has dynamic typing, so you could just as easily pass a string in for arg; the function (and the call tolocals) would still work just as well.localsworks with all variables of all datatypes. +Remember, Python has dynamic typing, so you could just as easily pass a string in for arg; the function (and the call to locals) would still work just as well.localsworks with all variables of all datatypes.What
localsdoes for the local (function) namespace,globalsdoes for the global (module) namespace.globalsis more exciting, though, because a module's namespace is more exciting.[3] Not only does the module's namespace include module-level variables and constants, it includes all the functions and classes +What
localsdoes for the local (function) namespace,globalsdoes for the global (module) namespace.globalsis more exciting, though, because a module's namespace is more exciting.[3] Not only does the module's namespace include module-level variables and constants, it includes all the functions and classes defined in the module. Plus, it includes anything that was imported into the module. -Remember the difference between
from module importandimport module? Withimport module, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access -any of its functions or attributes:module.function. But withfrom module import, you're actually importing specific functions and attributes from another module into your own namespace, which is why you -access them directly without referencing the original module they came from. With theglobalsfunction, you can actually see this happen. -Example 8.11. Introducing
-globalsLook at the following block of code at the bottom of
BaseHTMLProcessor.py:+Remember the difference between
from module importandimport module? Withimport module, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access +any of its functions or attributes:module.function. But withfrom module import, you're actually importing specific functions and attributes from another module into your own namespace, which is why you +access them directly without referencing the original module they came from. With theglobalsfunction, you can actually see this happen. +Example 8.11. Introducing
+globalsLook at the following block of code at the bottom of
BaseHTMLProcessor.py:if __name__ == "__main__": for k, v in globals().items():print k, "=", v
@@ -6719,7 +6559,7 @@ if __name__ == "__main__":@@ -6734,25 +6574,25 @@ __name__ = __main__ - ![]()
Just so you don't get intimidated, remember that you've seen all this before. The globalsfunction returns a dictionary, and you're iterating through the dictionary using theitemsmethod and multi-variable assignment. The only thing new here is theglobalsfunction. +Just so you don't get intimidated, remember that you've seen all this before. The globalsfunction returns a dictionary, and you're iterating through the dictionary using theitemsmethod and multi-variable assignment. The only thing new here is theglobalsfunction.-
SGMLParserwas imported fromsgmllib, usingfrom module import. That means that it was imported directly into the module's namespace, and here it is. +SGMLParserwas imported fromsgmllib, usingfrom module import. That means that it was imported directly into the module's namespace, and here it is.- ![]()
Contrast this with htmlentitydefs, which was imported usingimport. That means that thehtmlentitydefsmodule itself is in the namespace, but theentitydefsvariable defined withinhtmlentitydefsis not. +Contrast this with htmlentitydefs, which was imported usingimport. That means that thehtmlentitydefsmodule itself is in the namespace, but the entitydefs variable defined withinhtmlentitydefsis not.- ![]()
This module only defines one class, BaseHTMLProcessor, and here it is. Note that the value here is the class itself, not a specific instance of the class. +This module only defines one class, BaseHTMLProcessor, and here it is. Note that the value here is the class itself, not a specific instance of the class.@@ -6761,14 +6601,14 @@ __name__ = __main__ - ![]()
Remember the if __name__trick? When running a module (as opposed to importing it from another module), the built-in__name__attribute is a special value,__main__. Since you ran this module as a script from the command line,__name__is__main__, which is why the little test code to print theglobalsgot executed. +Remember the if __name__trick? When running a module (as opposed to importing it from another module), the built-in__name__attribute is a special value,__main__. Since you ran this module as a script from the command line,__name__is__main__, which is why the little test code to print theglobalsgot executed.![]()
- -Using the localsandglobalsfunctions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors - the functionality of thegetattrfunction, which allows you to access arbitrary functions dynamically by providing the function name as a string. +Using the localsandglobalsfunctions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors + the functionality of thegetattrfunction, which allows you to access arbitrary functions dynamically by providing the function name as a string.There is one other important difference between the
localsandglobalsfunctions, which you should learn now before it bites you. It will bite you anyway, but at least then you'll remember learning +There is one other important difference between the
localsandglobalsfunctions, which you should learn now before it bites you. It will bite you anyway, but at least then you'll remember learning it. -Example 8.12.
localsis read-only,globalsis not+Example 8.12.
localsis read-only,globalsis notdef foo(arg): x = 1 print locals()@@ -6785,14 +6625,14 @@ print "z=",z
![]()
- ![]()
Since foois called with3, this will print{'arg': 3, 'x': 1}. This should not be a surprise. +Since foois called with3, this will print{'arg': 3, 'x': 1}. This should not be a surprise.@@ -6805,7 +6645,7 @@ print "z=",z - ![]()
localsis a function that returns a dictionary, and here you are setting a value in that dictionary. You might think that this - would change the value of the local variablexto2, but it doesn't.localsdoes not actually return the local namespace, it returns a copy. So changing it does nothing to the value of the variables +localsis a function that returns a dictionary, and here you are setting a value in that dictionary. You might think that this + would change the value of the local variable x to2, but it doesn't.localsdoes not actually return the local namespace, it returns a copy. So changing it does nothing to the value of the variables in the local namespace.![]()
- ![]()
After being burned by locals, you might think that this wouldn't change the value ofz, but it does. Due to internal differences in how Python is implemented (which I'd rather not go into, since I don't fully understand them myself),globalsreturns the actual global namespace, not a copy: the exact opposite behavior oflocals. So any changes to the dictionary returned byglobalsdirectly affect your global variables. +After being burned by locals, you might think that this wouldn't change the value of z, but it does. Due to internal differences in how Python is implemented (which I'd rather not go into, since I don't fully understand them myself),globalsreturns the actual global namespace, not a copy: the exact opposite behavior oflocals. So any changes to the dictionary returned byglobalsdirectly affect your global variables.@@ -6816,7 +6656,7 @@ print "z=",z ![]()
8.6. Dictionary-based string formatting
-Why did you learn about
localsandglobals? So you can learn about dictionary-based string formatting. As you recall, regular string formatting provides an easy way to insert values into strings. Values are listed in a tuple and inserted in order into the string in +Why did you learn about
localsandglobals? So you can learn about dictionary-based string formatting. As you recall, regular string formatting provides an easy way to insert values into strings. Values are listed in a tuple and inserted in order into the string in place of each formatting marker. While this is efficient, it is not always the easiest code to read, especially when multiple values are being inserted. You can't simply scan through the string in one pass and understand what the result will be; you're constantly switching between reading the string and reading the tuple of values. @@ -6833,14 +6673,14 @@ constantly switching between reading the string and reading the tuple of values.- ![]()
Instead of a tuple of explicit values, this form of string formatting uses a dictionary, params. And instead of a simple%smarker in the string, the marker contains a name in parentheses. This name is used as a key in theparamsdictionary and subsitutes the corresponding value,secret, in place of the%(pwd)smarker. +Instead of a tuple of explicit values, this form of string formatting uses a dictionary, params. And instead of a simple %smarker in the string, the marker contains a name in parentheses. This name is used as a key in the params dictionary and subsitutes the corresponding value,secret, in place of the%(pwd)smarker.![]()
Dictionary-based string formatting works with any number of named keys. Each key must exist in the given dictionary, or the - formatting will fail with a KeyError. + formatting will fail with aKeyError.@@ -6851,8 +6691,8 @@ constantly switching between reading the string and reading the tuple of values. @@ -6874,20 +6714,20 @@ meaningful keys and values already. LikeSo why would you use dictionary-based string formatting? Well, it does seem like overkill to set up a dictionary of keys and values simply to do string formatting in the next line; it's really most useful when you happen to have a dictionary of -meaningful keys and values already. Like
locals. -Example 8.14. Dictionary-based string formatting in
BaseHTMLProcessor.py+meaningful keys and values already. Likelocals. +Example 8.14. Dictionary-based string formatting in
BaseHTMLProcessor.pydef handle_comment(self, text): self.pieces.append("<!--%(text)s-->" % locals())
@@ -6860,8 +6700,8 @@ meaningful keys and values already. Like-
Using the built-in localsfunction is the most common use of dictionary-based string formatting. It means that you can use the names of local variables - within your string (in this case,text, which was passed to the class method as an argument) and each named variable will be replaced by its value. Iftextis'Begin page footer', the string formatting"<!--%(text)s-->" % locals()will resolve to the string'<!--Begin page footer-->'. +Using the built-in localsfunction is the most common use of dictionary-based string formatting. It means that you can use the names of local variables + within your string (in this case, text, which was passed to the class method as an argument) and each named variable will be replaced by its value. If text is'Begin page footer', the string formatting"<!--%(text)s-->" % locals()will resolve to the string'<!--Begin page footer-->'.-
When this method is called, attrsis a list of key/value tuples, just like theitemsof a dictionary, which means you can use multi-variable assignment to iterate through it. This should be a familiar pattern by now, but there's a lot going on here, so let's break it down: +When this method is called, attrs is a list of key/value tuples, just like the @@ -6895,7 +6735,7 @@ meaningful keys and values already. Likeitemsof a dictionary, which means you can use multi-variable assignment to iterate through it. This should be a familiar pattern by now, but there's a lot going on here, so let's break it down:-
- Suppose
attrsis[('href', 'index.html'), ('title', 'Go to home page')]. +- Suppose attrs is
[('href', 'index.html'), ('title', 'Go to home page')]. -- In the first round of the list comprehension,
keywill get'href', andvaluewill get'index.html'. +- In the first round of the list comprehension, key will get
'href', and value will get'index.html'.- The string formatting
' %s="%s"' % (key, value)will resolve to' href="index.html"'. This string becomes the first element of the list comprehension's return value. -- In the second round,
keywill get'title', andvaluewill get'Go to home page'. +- In the second round, key will get
'title', and value will get'Go to home page'.- The string formatting will resolve to
' title="Go to home page"'. -- The list comprehension returns a list of these two resolved strings, and
strattrswill join both elements of this list together to form' href="index.html" title="Go to home page"'. +- The list comprehension returns a list of these two resolved strings, and strattrs will join both elements of this list together to form
' href="index.html" title="Go to home page"'.-
Now, using dictionary-based string formatting, you insert the value of tagandstrattrsinto a string. So iftagis'a', the final result would be'<a href="index.html" title="Go to home page">', and that is what gets appended toself.pieces. +Now, using dictionary-based string formatting, you insert the value of tag and strattrs into a string. So if tag is @@ -6904,14 +6744,14 @@ meaningful keys and values already. Like'a', the final result would be'<a href="index.html" title="Go to home page">', and that is what gets appended to self.pieces.![]()
- Using dictionary-based string formatting with localsis a convenient way of making complex string formatting expressions more readable, but it comes with a price. There is a - slight performance hit in making the call tolocals, sincelocalsbuilds a copy of the local namespace. +Using dictionary-based string formatting with localsis a convenient way of making complex string formatting expressions more readable, but it comes with a price. There is a + slight performance hit in making the call tolocals, sincelocalsbuilds a copy of the local namespace.8.7. Quoting attribute values
-A common question on comp.lang.python is “I have a bunch of HTML documents with unquoted attribute values, and I want to properly quote them all. How can I do this?”[4] (This is generally precipitated by a project manager who has found the HTML-is-a-standard religion joining a large project and proclaiming that all pages must validate against an HTML validator. Unquoted attribute values are a common violation of the HTML standard.) Whatever the reason, unquoted attribute values are easy to fix by feeding HTML through
BaseHTMLProcessor. -
BaseHTMLProcessorconsumes HTML (since it's descended fromSGMLParser) and produces equivalent HTML, but the HTML output is not identical to the input. Tags and attribute names will end up in lowercase, even if they started in uppercase +A common question on comp.lang.python is “I have a bunch of HTML documents with unquoted attribute values, and I want to properly quote them all. How can I do this?”[4] (This is generally precipitated by a project manager who has found the HTML-is-a-standard religion joining a large project and proclaiming that all pages must validate against an HTML validator. Unquoted attribute values are a common violation of the HTML standard.) Whatever the reason, unquoted attribute values are easy to fix by feeding HTML through
BaseHTMLProcessor. +
BaseHTMLProcessorconsumes HTML (since it's descended fromSGMLParser) and produces equivalent HTML, but the HTML output is not identical to the input. Tags and attribute names will end up in lowercase, even if they started in uppercase or mixed case, and attribute values will be enclosed in double quotes, even if they started in single quotes or with no quotes at all. It is this last side effect that you can take advantage of.Example 8.16. Quoting attribute values
@@ -6947,7 +6787,7 @@ at all. It is this last side effect that you can take advantage of.- ![]()
Note that the attribute values of the hrefattributes in the<a>tags are not properly quoted. (Also note that you're using triple quotes for something other than adocstring. And directly in the IDE, no less. They're very useful.) +Note that the attribute values of the hrefattributes in the<a>tags are not properly quoted. (Also note that you're using triple quotes for something other than adocstring. And directly in the IDE, no less. They're very useful.)@@ -6958,14 +6798,14 @@ at all. It is this last side effect that you can take advantage of. - - ![]()
Using the outputfunction defined inBaseHTMLProcessor, you get the output as a single string, complete with quoted attribute values. While this may seem anti-climactic, think - about how much has actually happened here:SGMLParserparsed the entire HTML document, breaking it down into tags, refs, data, and so forth;BaseHTMLProcessorused those elements to reconstruct pieces of HTML (which are still stored inparser.pieces, if you want to see them); finally, you calledparser.output, which joined all the pieces of HTML into one string. +Using the outputfunction defined inBaseHTMLProcessor, you get the output as a single string, complete with quoted attribute values. While this may seem anti-climactic, think + about how much has actually happened here:SGMLParserparsed the entire HTML document, breaking it down into tags, refs, data, and so forth;BaseHTMLProcessorused those elements to reconstruct pieces of HTML (which are still stored in parser.pieces, if you want to see them); finally, you calledparser.output, which joined all the pieces of HTML into one string.8.8. Introducing
-dialect.py
Dialectizeris a simple (and silly) descendant ofBaseHTMLProcessor. It runs blocks of text through a series of substitutions, but it makes sure that anything within ablock passes through unaltered. -<pre>...</pre>To handle the
<pre>blocks, you define two methods inDialectizer:start_preandend_pre. +8.8. Introducing
+dialect.py
Dialectizeris a simple (and silly) descendant ofBaseHTMLProcessor. It runs blocks of text through a series of substitutions, but it makes sure that anything within ablock passes through unaltered. +<pre>...</pre>To handle the
<pre>blocks, you define two methods inDialectizer:start_preandend_pre.Example 8.17. Handling specific tags
def start_pre(self, attrs):self.verbatim += 1
@@ -6978,25 +6818,25 @@ at all. It is this last side effect that you can take advantage of.
- ![]()
start_preis called every timeSGMLParserfinds a<pre>tag in the HTML source. (In a minute, you'll see exactly how this happens.) The method takes a single parameter,attrs, which contains the attributes of the tag (if any).attrsis a list of key/value tuples, just likeunknown_starttagtakes. +start_preis called every timeSGMLParserfinds a<pre>tag in the HTML source. (In a minute, you'll see exactly how this happens.) The method takes a single parameter, attrs, which contains the attributes of the tag (if any). attrs is a list of key/value tuples, just likeunknown_starttagtakes.- ![]()
In the resetmethod, you initialize a data attribute that serves as a counter for<pre>tags. Every time you hit a<pre>tag, you increment the counter; every time you hit a</pre>tag, you'll decrement the counter. (You could just use this as a flag and set it to1and reset it to0, but it's just as easy to do it this way, and this handles the odd (but possible) case of nested<pre>tags.) In a minute, you'll see how this counter is put to good use. +In the resetmethod, you initialize a data attribute that serves as a counter for<pre>tags. Every time you hit a<pre>tag, you increment the counter; every time you hit a</pre>tag, you'll decrement the counter. (You could just use this as a flag and set it to1and reset it to0, but it's just as easy to do it this way, and this handles the odd (but possible) case of nested<pre>tags.) In a minute, you'll see how this counter is put to good use.- ![]()
That's it, that's the only special processing you do for <pre>tags. Now you pass the list of attributes along tounknown_starttagso it can do the default processing. +That's it, that's the only special processing you do for <pre>tags. Now you pass the list of attributes along tounknown_starttagso it can do the default processing.- ![]()
end_preis called every timeSGMLParserfinds a</pre>tag. Since end tags can not contain attributes, the method takes no parameters. +end_preis called every timeSGMLParserfinds a</pre>tag. Since end tags can not contain attributes, the method takes no parameters.@@ -7007,12 +6847,12 @@ at all. It is this last side effect that you can take advantage of. - - ![]()
Second, you decrement your counter to signal that this <pre>block has been closed. +Second, you decrement your counter to signal that this <pre>block has been closed.At this point, it's worth digging a little further into
SGMLParser. I've claimed repeatedly (and you've taken it on faith so far) thatSGMLParserlooks for and calls specific methods for each tag, if they exist. For instance, you just saw the definition ofstart_preandend_preto handle<pre>and</pre>. But how does this happen? Well, it's not magic, it's just good Python coding. -Example 8.18.
SGMLParser+At this point, it's worth digging a little further into
SGMLParser. I've claimed repeatedly (and you've taken it on faith so far) thatSGMLParserlooks for and calls specific methods for each tag, if they exist. For instance, you just saw the definition ofstart_preandend_preto handle<pre>and</pre>. But how does this happen? Well, it's not magic, it's just good Python coding. +Example 8.18.
SGMLParserdef finish_starttag(self, tag, attrs):try: method = getattr(self, 'start_' + tag)
@@ -7036,46 +6876,46 @@ at all. It is this last side effect that you can take advantage of.
- ![]()
At this point, SGMLParserhas already found a start tag and parsed the attribute list. The only thing left to do is figure out whether there is a - specific handler method for this tag, or whether you should fall back on the default method (unknown_starttag). +At this point, SGMLParserhas already found a start tag and parsed the attribute list. The only thing left to do is figure out whether there is a + specific handler method for this tag, or whether you should fall back on the default method (unknown_starttag).- ![]()
The “magic” of SGMLParseris nothing more than your old friend,getattr. What you may not have realized before is thatgetattrwill find methods defined in descendants of an object as well as the object itself. Here the object isself, the current instance. So iftagis'pre', this call togetattrwill look for astart_premethod on the current instance, which is an instance of theDialectizerclass. +The “magic” of SGMLParseris nothing more than your old friend,getattr. What you may not have realized before is thatgetattrwill find methods defined in descendants of an object as well as the object itself. Here the object isself, the current instance. So if tag is'pre', this call togetattrwill look for astart_premethod on the current instance, which is an instance of theDialectizerclass.- ![]()
getattrraises anAttributeErrorif the method it's looking for doesn't exist in the object (or any of its descendants), but that's okay, because you wrapped - the call togetattrinside atry...exceptblock and explicitly caught theAttributeError. +getattrraises anAttributeErrorif the method it's looking for doesn't exist in the object (or any of its descendants), but that's okay, because you wrapped + the call togetattrinside atry...exceptblock and explicitly caught theAttributeError.- ![]()
Since you didn't find a start_xxxmethod, you'll also look for ado_xxxmethod before giving up. This alternate naming scheme is generally used for standalone tags, like<br>, which have no corresponding end tag. But you can use either naming scheme; as you can see,SGMLParsertries both for every tag. (You shouldn't define both astart_xxxanddo_xxxhandler method for the same tag, though; only thestart_xxxmethod will get called.) +Since you didn't find a start_xxxmethod, you'll also look for ado_xxxmethod before giving up. This alternate naming scheme is generally used for standalone tags, like<br>, which have no corresponding end tag. But you can use either naming scheme; as you can see,SGMLParsertries both for every tag. (You shouldn't define both astart_xxxanddo_xxxhandler method for the same tag, though; only thestart_xxxmethod will get called.)- ![]()
Another AttributeError, which means that the call togetattrfailed withdo_xxx. Since you found neither astart_xxxnor ado_xxxmethod for this tag, you catch the exception and fall back on the default method,unknown_starttag. +Another AttributeError, which means that the call togetattrfailed withdo_xxx. Since you found neither astart_xxxnor ado_xxxmethod for this tag, you catch the exception and fall back on the default method,unknown_starttag.- ![]()
Remember, try...exceptblocks can have anelseclause, which is called if no exception is raised during thetry...exceptblock. Logically, that means that you did find ado_xxxmethod for this tag, so you're going to call it. +Remember, try...exceptblocks can have anelseclause, which is called if no exception is raised during thetry...exceptblock. Logically, that means that you did find ado_xxxmethod for this tag, so you're going to call it.![]()
By the way, don't worry about these different return values; in theory they mean something, but they're never actually used. - Don't worry about the @@ -7083,40 +6923,40 @@ at all. It is this last side effect that you can take advantage of.self.stack.append(tag)either;SGMLParserkeeps track internally of whether your start tags are balanced by appropriate end tags, but it doesn't do anything with this + Don't worry about theself.stack.append(tag)either;SGMLParserkeeps track internally of whether your start tags are balanced by appropriate end tags, but it doesn't do anything with this information either. In theory, you could use this module to validate that your tags were fully balanced, but it's probably not worth it, and it's beyond the scope of this chapter. You have better things to worry about right now.- - ![]()
start_xxxanddo_xxxmethods are not called directly; the tag, method, and attributes are passed to this function,handle_starttag, so that descendants can override it and change the way all start tags are dispatched. You don't need that level of control, so you just let this method do its thing, which is to call - the method (start_xxxordo_xxx) with the list of attributes. Remember,methodis a function, returned fromgetattr, and functions are objects. (I know you're getting tired of hearing it, and I promise I'll stop saying it as soon as I run +start_xxxanddo_xxxmethods are not called directly; the tag, method, and attributes are passed to this function,handle_starttag, so that descendants can override it and change the way all start tags are dispatched. You don't need that level of control, so you just let this method do its thing, which is to call + the method (start_xxxordo_xxx) with the list of attributes. Remember, method is a function, returned fromgetattr, and functions are objects. (I know you're getting tired of hearing it, and I promise I'll stop saying it as soon as I run out of ways to use it to my advantage.) Here, the function object is passed into this dispatch method as an argument, and this method turns around and calls the function. At this point, you don't need to know what the function is, what it's named, - or where it's defined; the only thing you need to know about the function is that it is called with one argument,attrs. + or where it's defined; the only thing you need to know about the function is that it is called with one argument, attrs.Now back to our regularly scheduled program:
Dialectizer. When you left, you were in the process of defining specific handler methods for<pre>and</pre>tags. There's only one thing left to do, and that is to process text blocks with the pre-defined substitutions. For that, -you need to override thehandle_datamethod. -Example 8.19. Overriding the
handle_datamethod+Now back to our regularly scheduled program:
Dialectizer. When you left, you were in the process of defining specific handler methods for<pre>and</pre>tags. There's only one thing left to do, and that is to process text blocks with the pre-defined substitutions. For that, +you need to override thehandle_datamethod. +Example 8.19. Overriding the
handle_datamethoddef handle_data(self, text):self.pieces.append(self.verbatim and text or self.process(text))
-
- ![]()
handle_datais called with only one argument, the text to process. +handle_datais called with only one argument, the text to process.- ![]()
In the ancestor BaseHTMLProcessor, thehandle_datamethod simply appended the text to the output buffer,self.pieces. Here the logic is only slightly more complicated. If you're in the middle of ablock,<pre>...</pre>self.verbatimwill be some value greater than0, and you want to put the text in the output buffer unaltered. Otherwise, you will call a separate method to process the +In the ancestor BaseHTMLProcessor, thehandle_datamethod simply appended the text to the output buffer, self.pieces. Here the logic is only slightly more complicated. If you're in the middle of ablock, self.verbatim will be some value greater than<pre>...</pre>0, and you want to put the text in the output buffer unaltered. Otherwise, you will call a separate method to process the substitutions, then put the result of that into the output buffer. In Python, this is a one-liner, using theand-ortrick.You're close to completely understanding
Dialectizer. The only missing link is the nature of the text substitutions themselves. If you know any Perl, you know that when complex text substitutions are required, the only real solution is regular expressions. The classes -later indialect.pydefine a series of regular expressions that operate on the text between the HTML tags. But you just had a whole chapter on regular expressions. You don't really want to slog through regular expressions again, do you? God knows I don't. I think you've learned enough +You're close to completely understanding
Dialectizer. The only missing link is the nature of the text substitutions themselves. If you know any Perl, you know that when complex text substitutions are required, the only real solution is regular expressions. The classes +later indialect.pydefine a series of regular expressions that operate on the text between the HTML tags. But you just had a whole chapter on regular expressions. You don't really want to slog through regular expressions again, do you? God knows I don't. I think you've learned enough for one chapter.8.9. Putting it all together
It's time to put everything you've learned so far to good use. I hope you were paying attention. -
Example 8.20. The
translatefunction, part 1+Example 8.20. The
translatefunction, part 1def translate(url, dialectName="chef"):import urllib
sock = urllib.urlopen(url)
@@ -7127,7 +6967,7 @@ def translate(url, dialectName="chef"):
-
The translatefunction has an optional argumentdialectName, which is a string that specifies the dialect you'll be using. You'll see how this is used in a minute. +The translatefunction has an optional argument dialectName, which is a string that specifies the dialect you'll be using. You'll see how this is used in a minute.@@ -7147,7 +6987,7 @@ def translate(url, dialectName="chef"): Example 8.21. The
translatefunction, part 2: curiouser and curiouser+Example 8.21. The
translatefunction, part 2: curiouser and curiouserparserName = "%sDialectizer" % dialectName.capitalize()parserClass = globals()[parserName]
parser = parserClass()
@@ -7156,32 +6996,32 @@ def translate(url, dialectName="chef"):
-
capitalizeis a string method you haven't seen before; it simply capitalizes the first letter of a string and forces everything else - to lowercase. Combined with some string formatting, you've taken the name of a dialect and transformed it into the name of the corresponding Dialectizer class. IfdialectNameis the string'chef',parserNamewill be the string'ChefDialectizer'. +capitalizeis a string method you haven't seen before; it simply capitalizes the first letter of a string and forces everything else + to lowercase. Combined with some string formatting, you've taken the name of a dialect and transformed it into the name of the corresponding Dialectizer class. If dialectName is the string'chef', parserName will be the string'ChefDialectizer'.- ![]()
You have the name of a class as a string ( parserName), and you have the global namespace as a dictionary (globals()). Combined, you can get a reference to the class which the string names. (Remember, classes are objects, and they can be assigned to variables just like any other object.) IfparserNameis the string'ChefDialectizer',parserClasswill be the classChefDialectizer. +You have the name of a class as a string (parserName), and you have the global namespace as a dictionary ( globals()). Combined, you can get a reference to the class which the string names. (Remember, classes are objects, and they can be assigned to variables just like any other object.) If parserName is the string'ChefDialectizer', parserClass will be the classChefDialectizer.- - ![]()
Finally, you have a class object ( parserClass), and you want an instance of the class. Well, you already know how to do that: call the class like a function. The fact that the class is being stored in a local variable makes absolutely no difference; you just call the local variable - like a function, and out pops an instance of the class. IfparserClassis the classChefDialectizer,parserwill be an instance of the classChefDialectizer. +Finally, you have a class object (parserClass), and you want an instance of the class. Well, you already know how to do that: call the class like a function. The fact that the class is being stored in a local variable makes absolutely no difference; you just call the local variable + like a function, and out pops an instance of the class. If parserClass is the class ChefDialectizer, parser will be an instance of the classChefDialectizer.Why bother? After all, there are only 3
Dialectizerclasses; why not just use acasestatement? (Well, there's nocasestatement in Python, but why not just use a series ofifstatements?) One reason: extensibility. Thetranslatefunction has absolutely no idea how many Dialectizer classes you've defined. Imagine if you defined a newFooDialectizertomorrow;translatewould work by passing'foo'as thedialectName. -Even better, imagine putting
FooDialectizerin a separate module, and importing it withfrom module import. You've already seen that this includes it inglobals(), sotranslatewould still work without modification, even thoughFooDialectizerwas in a separate file. +Why bother? After all, there are only 3
Dialectizerclasses; why not just use acasestatement? (Well, there's nocasestatement in Python, but why not just use a series ofifstatements?) One reason: extensibility. Thetranslatefunction has absolutely no idea how many Dialectizer classes you've defined. Imagine if you defined a newFooDialectizertomorrow;translatewould work by passing'foo'as the dialectName. +Even better, imagine putting
FooDialectizerin a separate module, and importing it withfrom module import. You've already seen that this includes it inglobals(), sotranslatewould still work without modification, even thoughFooDialectizerwas in a separate file.Now imagine that the name of the dialect is coming from somewhere outside the program, maybe from a database or from a user-inputted value on a form. You can use any number of server-side Python scripting architectures to dynamically generate web pages; this function could take a URL and a dialect name (both strings) in the query string of a web page request, and output the “translated” web page. -
Finally, imagine a
Dialectizerframework with a plug-in architecture. You could put eachDialectizerclass in a separate file, leaving only thetranslatefunction indialect.py. Assuming a consistent naming scheme, thetranslatefunction could dynamic import the appropiate class from the appropriate file, given nothing but the dialect name. (You haven't +Finally, imagine a
Dialectizerframework with a plug-in architecture. You could put eachDialectizerclass in a separate file, leaving only thetranslatefunction indialect.py. Assuming a consistent naming scheme, thetranslatefunction could dynamic import the appropiate class from the appropriate file, given nothing but the dialect name. (You haven't seen dynamic importing yet, but I promise to cover it in a later chapter.) To add a new dialect, you would simply add an -appropriately-named file in the plug-ins directory (likefoodialect.pywhich contains theFooDialectizerclass). Calling thetranslatefunction with the dialect name'foo'would find the modulefoodialect.py, import the classFooDialectizer, and away you go. -Example 8.22. The
translatefunction, part 3+appropriately-named file in the plug-ins directory (likefoodialect.pywhich contains theFooDialectizerclass). Calling thetranslatefunction with the dialect name'foo'would find the modulefoodialect.py, import the classFooDialectizer, and away you go. +Example 8.22. The
translatefunction, part 3parser.feed(htmlSource)parser.close()
return parser.output()
@@ -7190,21 +7030,21 @@ appropriately-named file in the plug-ins directory (like
- ![]()
After all that imagining, this is going to seem pretty boring, but the feedfunction is what does the entire transformation. You had the entire HTML source in a single string, so you only had to callfeedonce. However, you can callfeedas often as you want, and the parser will just keep parsing. So if you were worried about memory usage (or you knew you +After all that imagining, this is going to seem pretty boring, but the feedfunction is what does the entire transformation. You had the entire HTML source in a single string, so you only had to callfeedonce. However, you can callfeedas often as you want, and the parser will just keep parsing. So if you were worried about memory usage (or you knew you were going to be dealing with very large HTML pages), you could set this up in a loop, where you read a few bytes of HTML and fed it to the parser. The result would be the same.- ![]()
Because feedmaintains an internal buffer, you should always call the parser'sclosemethod when you're done (even if you fed it all at once, like you did). Otherwise you may find that your output is missing +Because feedmaintains an internal buffer, you should always call the parser'sclosemethod when you're done (even if you fed it all at once, like you did). Otherwise you may find that your output is missing the last few bytes.@@ -7216,26 +7056,26 @@ appropriately-named file in the plug-ins directory (like - ![]()
Remember, outputis the function you defined onBaseHTMLProcessorthat joins all the pieces of output you've buffered and returns them in a single string. +Remember, outputis the function you defined onBaseHTMLProcessorthat joins all the pieces of output you've buffered and returns them in a single string.8.10. Summary
-Python provides you with a powerful tool,
sgmllib.py, to manipulate HTML by turning its structure into an object model. You can use this tool in many different ways. +Python provides you with a powerful tool,
sgmllib.py, to manipulate HTML by turning its structure into an object model. You can use this tool in many different ways.
- parsing the HTML looking for something specific
- aggregating the results, like the URL lister
- altering the structure along the way, like the attribute quoter -
- transforming the HTML into something else by manipulating the text while leaving the tags alone, like the
Dialectizer+- transforming the HTML into something else by manipulating the text while leaving the tags alone, like the
DialectizerAlong with these examples, you should be comfortable doing all of the following things:
-
- Using
locals() andglobals() to access namespaces +- Using
locals() andglobals() to access namespaces- Formatting strings using dictionary-based substitutions
-[1] The technical term for a parser like
SGMLParseris a consumer: it consumes HTML and breaks it down. Presumably, the namefeedwas chosen to fit into the whole “consumer” motif. Personally, it makes me think of an exhibit in the zoo where there's just a dark cage with no trees or plants or +[1] The technical term for a parser like
SGMLParseris a consumer: it consumes HTML and breaks it down. Presumably, the namefeedwas chosen to fit into the whole “consumer” motif. Personally, it makes me think of an exhibit in the zoo where there's just a dark cage with no trees or plants or evidence of life of any kind, but if you stand perfectly still and look really closely you can make out two beady eyes staring back at you from the far left corner, but you convince yourself that that's just your mind playing tricks on you, and the only way you can tell that the whole thing isn't just an empty cage is a small innocuous sign on the railing that reads, “Do not feed the parser.” But maybe that's just me. In any event, it's an interesting mental image. @@ -7243,7 +7083,7 @@ appropriately-named file in the plug-ins directory (like[2] The reason Python is better at lists than strings is that lists are mutable but strings are immutable. This means that appending to a list just adds the element and updates the index. Since strings can not be changed after they are created, code like
s = s + newpiecewill create an entirely new string out of the concatenation of the original and the new piece, then throw away the original string. This involves a lot of expensive memory management, and the amount of effort involved increases as the string gets - longer, so doings = s + newpiecein a loop is deadly. In technical terms, appendingnitems to a list isO(n), while appendingnitems to a string isO(n2). + longer, so doings = s + newpiecein a loop is deadly. In technical terms, appending n items to a list isO(n), while appending n items to a string isO(n2).[3] I don't get out much.
@@ -7253,14 +7093,14 @@ appropriately-named file in the plug-ins directory (like9.1. Diving in
These next two chapters are about XML processing in Python. It would be helpful if you already knew what an XML document looks like, that it's made up of structured tags to form a hierarchy of elements, and so on. If this doesn't make sense to you, there are many XML tutorials that can explain the basics. -
If you're not particularly interested in XML, you should still read these chapters, which cover important topics like Python packages, Unicode, command line arguments, and how to use
getattrfor method dispatching. +If you're not particularly interested in XML, you should still read these chapters, which cover important topics like Python packages, Unicode, command line arguments, and how to use
getattrfor method dispatching.Being a philosophy major is not required, although if you have ever had the misfortune of being subjected to the writings of Immanuel Kant, you will appreciate the example program a lot more than if you majored in something useful, like computer science. -
There are two basic ways to work with XML. One is called SAX (“Simple API for XML”), and it works by reading the XML a little bit at a time and calling a method for each element it finds. (If you read Chapter 8, HTML Processing, this should sound familiar, because that's how the
sgmllibmodule works.) The other is called DOM (“Document Object Model”), and it works by reading in the entire XML document at once and creating an internal representation of it using native Python classes linked in a tree structure. Python has standard modules for both kinds of parsing, but this chapter will only deal with using the DOM. +There are two basic ways to work with XML. One is called SAX (“Simple API for XML”), and it works by reading the XML a little bit at a time and calling a method for each element it finds. (If you read Chapter 8, HTML Processing, this should sound familiar, because that's how the
sgmllibmodule works.) The other is called DOM (“Document Object Model”), and it works by reading in the entire XML document at once and creating an internal representation of it using native Python classes linked in a tree structure. Python has standard modules for both kinds of parsing, but this chapter will only deal with using the DOM.The following is a complete Python program which generates pseudo-random output based on a context-free grammar defined in an XML format. Don't worry yet if you don't understand what that means; you'll examine both the program's input and its output in more depth throughout these next two chapters. -
Example 9.1.
+kgp.pyExample 9.1.
kgp.pyIf you have not already done so, you can download this and other examples used in this book.
"""Kant Generator for Python @@ -7502,7 +7342,7 @@ def main(argv): if __name__ == "__main__": main(sys.argv[1:]) -Example 9.2.
toolbox.py+Example 9.2.
toolbox.py"""Miscellaneous utility functions""" def openAnything(source): @@ -7549,8 +7389,8 @@ def openAnything(source): # treat source as string import StringIO return StringIO.StringIO(str(source)) -Run the program
kgp.pyby itself, and it will parse the default XML-based grammar, inkant.xml, and print several paragraphs worth of philosophy in the style of Immanuel Kant. -Example 9.3. Sample output of
kgp.py[you@localhost kgp]$ python kgp.py +Run the program
kgp.pyby itself, and it will parse the default XML-based grammar, inkant.xml, and print several paragraphs worth of philosophy in the style of Immanuel Kant. +Example 9.3. Sample output of
kgp.py[you@localhost kgp]$ python kgp.py As is shown in the writings of Hume, our a priori concepts, in reference to ends, abstract from all content of knowledge; in the study of space, the discipline of human reason, in accordance with the @@ -7589,13 +7429,13 @@ the sort of thing that Kant would have agreed with), some of it is blatantly fal But all of it is in the style of Immanuel Kant.Let me repeat that this is much, much funnier if you are now or have ever been a philosophy major.
The interesting thing about this program is that there is nothing Kant-specific about it. All the content in the previous -example was derived from the grammar file,
kant.xml. If you tell the program to use a different grammar file (which you can specify on the command line), the output will be +example was derived from the grammar file,kant.xml. If you tell the program to use a different grammar file (which you can specify on the command line), the output will be completely different. -Example 9.4. Simpler output from
kgp.py[you@localhost kgp]$ python kgp.py -g binary.xml +Example 9.4. Simpler output from
kgp.py[you@localhost kgp]$ python kgp.py -g binary.xml 00101001 [you@localhost kgp]$ python kgp.py -g binary.xml 10110100You will take a closer look at the structure of the grammar file later in this chapter. For now, all you need to know is -that the grammar file defines the structure of the output, and the
kgp.pyprogram reads through the grammar and makes random decisions about which words to plug in where. +that the grammar file defines the structure of the output, and thekgp.pyprogram reads through the grammar and makes random decisions about which words to plug in where.9.2. Packages
Actually parsing an XML document is very simple: one line of code. However, before you get to that line of code, you need to take a short detour to talk about packages. @@ -7606,13 +7446,13 @@ that the grammar file defines the structure of the output, and the
-
This is a syntax you haven't seen before. It looks almost like the from module importyou know and love, but the"."gives it away as something above and beyond a simple import. In fact,xmlis what is known as a package,domis a nested package withinxml, andminidomis a module withinxml.dom. +This is a syntax you haven't seen before. It looks almost like the from module importyou know and love, but the"."gives it away as something above and beyond a simple import. In fact,xmlis what is known as a package,domis a nested package withinxml, andminidomis a module withinxml.dom.That sounds complicated, but it's really not. Looking at the actual implementation may help. Packages are little more than directories of modules; nested packages are subdirectories. The modules within a package (or a nested package) are still -just
.pyfiles, like always, except that they're in a subdirectory instead of the mainlib/directory of your Python installation. +just.pyfiles, like always, except that they're in a subdirectory instead of the mainlib/directory of your Python installation.Example 9.6. File layout of a package
Python21/ root Python installation (home of the executable) | +--lib/ library directory (home of the standard library modules) @@ -7623,7 +7463,7 @@ just.pyfiles, like always, except that they're i | +--dom/ xml.dom package (contains minidom.py) | - +--parsers/ xml.parsers package (used internally)So when you say
from xml.dom import minidom, Python figures out that that means “look in thexmldirectory for adomdirectory, and look in that for theminidommodule, and import it asminidom”. But Python is even smarter than that; not only can you import entire modules contained within a package, you can selectively import + +--parsers/ xml.parsers package (used internally)So when you say
from xml.dom import minidom, Python figures out that that means “look in thexmldirectory for adomdirectory, and look in that for theminidommodule, and import it asminidom”. But Python is even smarter than that; not only can you import entire modules contained within a package, you can selectively import specific classes or functions from a module contained within a package. You can also import the package itself as a module. The syntax is all the same; Python figures out what you mean based on the file layout of the package, and automatically does the right thing.Example 9.7. Packages are modules, too
>>> from xml.dom import minidom@@ -7646,41 +7486,41 @@ The syntax is all the same; Python figures out what you mean based on the file l
- ![]()
Here you're importing a module ( minidom) from a nested package (xml.dom). The result is thatminidomis imported into your namespace, and in order to reference classes within theminidommodule (likeElement), you need to preface them with the module name. +Here you're importing a module ( minidom) from a nested package (xml.dom). The result is thatminidomis imported into your namespace, and in order to reference classes within theminidommodule (likeElement), you need to preface them with the module name.- ![]()
Here you are importing a class ( Element) from a module (minidom) from a nested package (xml.dom). The result is thatElementis imported directly into your namespace. Note that this does not interfere with the previous import; theElementclass can now be referenced in two ways (but it's all still the same class). +Here you are importing a class ( Element) from a module (minidom) from a nested package (xml.dom). The result is thatElementis imported directly into your namespace. Note that this does not interfere with the previous import; theElementclass can now be referenced in two ways (but it's all still the same class).- ![]()
Here you are importing the dompackage (a nested package ofxml) as a module in and of itself. Any level of a package can be treated as a module, as you'll see in a moment. It can even +Here you are importing the dompackage (a nested package ofxml) as a module in and of itself. Any level of a package can be treated as a module, as you'll see in a moment. It can even have its own attributes and methods, just the modules you've seen before.- ![]()
Here you are importing the root level xmlpackage as a module. +Here you are importing the root level xmlpackage as a module.So how can a package (which is just a directory on disk) be imported and treated as a module (which is always a file on disk)? -The answer is the magical
__init__.pyfile. You see, packages are not simply directories; they are directories with a specific file,__init__.py, inside. This file defines the attributes and methods of the package. For instance,xml.domcontains aNodeclass, which is defined inxml/dom/__init__.py. When you import a package as a module (likedomfromxml), you're really importing its__init__.pyfile.+The answer is the magical
@@ -7742,7 +7582,7 @@ package architecture. It's one of the many things Python is good at, so take ad__init__.pyfile. You see, packages are not simply directories; they are directories with a specific file,__init__.py, inside. This file defines the attributes and methods of the package. For instance,xml.domcontains aNodeclass, which is defined inxml/dom/__init__.py. When you import a package as a module (likedomfromxml), you're really importing its__init__.pyfile.-
- A package is a directory with the special __init__.pyfile in it. The__init__.pyfile defines the attributes and methods of the package. It doesn't need to define anything; it can just be an empty file, - but it has to exist. But if__init__.pydoesn't exist, the directory is just a directory, not a package, and it can't be imported or contain modules or nested packages. +A package is a directory with the special __init__.pyfile in it. The__init__.pyfile defines the attributes and methods of the package. It doesn't need to define anything; it can just be an empty file, + but it has to exist. But if__init__.pydoesn't exist, the directory is just a directory, not a package, and it can't be imported or contain modules or nested packages.So why bother with packages? Well, they provide a way to logically group related modules. Instead of having an
xmlpackage withsaxanddompackages inside, the authors could have chosen to put all thesaxfunctionality inxmlsax.pyand all thedomfunctionality inxmldom.py, or even put all of it in a single module. But that would have been unwieldy (as of this writing, the XML package has over 3000 lines of code) and difficult to manage (separate source files mean multiple people can work on different +So why bother with packages? Well, they provide a way to logically group related modules. Instead of having an
xmlpackage withsaxanddompackages inside, the authors could have chosen to put all thesaxfunctionality inxmlsax.pyand all thedomfunctionality inxmldom.py, or even put all of it in a single module. But that would have been unwieldy (as of this writing, the XML package has over 3000 lines of code) and difficult to manage (separate source files mean multiple people can work on different areas simultaneously).If you ever find yourself writing a large subsystem in Python (or, more likely, when you realize that your small subsystem has grown into a large one), invest some time designing a good package architecture. It's one of the many things Python is good at, so take advantage of it. @@ -7707,26 +7547,26 @@ package architecture. It's one of the many things Python is good at, so take ad
- ![]()
As you saw in the previous section, this imports the minidommodule from thexml.dompackage. +As you saw in the previous section, this imports the minidommodule from thexml.dompackage.- ![]()
Here is the one line of code that does all the work: minidom.parsetakes one argument and returns a parsed representation of the XML document. The argument can be many things; in this case, it's simply a filename of an XML document on my local disk. (To follow along, you'll need to change the path to point to your downloaded examples directory.) +Here is the one line of code that does all the work: minidom.parsetakes one argument and returns a parsed representation of the XML document. The argument can be many things; in this case, it's simply a filename of an XML document on my local disk. (To follow along, you'll need to change the path to point to your downloaded examples directory.) But you can also pass a file object, or even a file-like object. You'll take advantage of this flexibility later in this chapter.- ![]()
The object returned from minidom.parseis aDocumentobject, a descendant of theNodeclass. ThisDocumentobject is the root level of a complex tree-like structure of interlocking Python objects that completely represent the XML document you passed tominidom.parse. +The object returned from minidom.parseis aDocumentobject, a descendant of theNodeclass. ThisDocumentobject is the root level of a complex tree-like structure of interlocking Python objects that completely represent the XML document you passed tominidom.parse.- ![]()
toxmlis a method of theNodeclass (and is therefore available on theDocumentobject you got fromminidom.parse).toxmlprints out the XML that thisNoderepresents. For theDocumentnode, this prints out the entire XML document. +toxmlis a method of theNodeclass (and is therefore available on theDocumentobject you got fromminidom.parse).toxmlprints out the XML that thisNoderepresents. For theDocumentnode, this prints out the entire XML document.- ![]()
Every Nodehas achildNodesattribute, which is a list of theNodeobjects. ADocumentalways has only one child node, the root element of the XML document (in this case, thegrammarelement). +Every Nodehas achildNodesattribute, which is a list of theNodeobjects. ADocumentalways has only one child node, the root element of the XML document (in this case, thegrammarelement).@@ -7755,11 +7595,11 @@ package architecture. It's one of the many things Python is good at, so take ad - - ![]()
Since getting the first child node of a node is a useful and common activity, the Nodeclass has afirstChildattribute, which is synonymous withchildNodes[0]. (There is also alastChildattribute, which is synonymous withchildNodes[-1].) +Since getting the first child node of a node is a useful and common activity, the Nodeclass has afirstChildattribute, which is synonymous withchildNodes[0]. (There is also alastChildattribute, which is synonymous withchildNodes[-1].)Example 9.10.
toxmlworks on any node+Example 9.10.
toxmlworks on any node>>> grammarNode = xmldoc.firstChild >>> print grammarNode.toxml()<grammar> @@ -7776,7 +7616,7 @@ package architecture. It's one of the many things Python is good at, so take ad
@@ -7806,31 +7646,31 @@ package architecture. It's one of the many things Python is good at, so take ad - ![]()
Since the toxmlmethod is defined in theNodeclass, it is available on any XML node, not just theDocumentelement. +Since the toxmlmethod is defined in theNodeclass, it is available on any XML node, not just theDocumentelement.- ![]()
Looking at the XML in binary.xml, you might think that thegrammarhas only two child nodes, the tworefelements. But you're missing something: the carriage returns! After the'<grammar>'and before the first'<ref>'is a carriage return, and this text counts as a child node of thegrammarelement. Similarly, there is a carriage return after each'</ref>'; these also count as child nodes. Sogrammar.childNodesis actually a list of 5 objects: 3Textobjects and 2Elementobjects. +Looking at the XML in binary.xml, you might think that thegrammarhas only two child nodes, the tworefelements. But you're missing something: the carriage returns! After the'<grammar>'and before the first'<ref>'is a carriage return, and this text counts as a child node of thegrammarelement. Similarly, there is a carriage return after each'</ref>'; these also count as child nodes. Sogrammar.childNodesis actually a list of 5 objects: 3Textobjects and 2Elementobjects.- ![]()
The first child is a Textobject representing the carriage return after the'<grammar>'tag and before the first'<ref>'tag. +The first child is a Textobject representing the carriage return after the'<grammar>'tag and before the first'<ref>'tag.- ![]()
The second child is an Elementobject representing the firstrefelement. +The second child is an Elementobject representing the firstrefelement.- ![]()
The fourth child is an Elementobject representing the secondrefelement. +The fourth child is an Elementobject representing the secondrefelement.@@ -7857,31 +7697,31 @@ u'0' - ![]()
The last child is a Textobject representing the carriage return after the'</ref>'end tag and before the'</grammar>'end tag. +The last child is a Textobject representing the carriage return after the'</ref>'end tag and before the'</grammar>'end tag.- ![]()
As you saw in the previous example, the first refelement isgrammarNode.childNodes[1], since childNodes[0] is aTextnode for the carriage return. +As you saw in the previous example, the first refelement isgrammarNode.childNodes[1], since childNodes[0] is aTextnode for the carriage return.- ![]()
The refelement has its own set of child nodes, one for the carriage return, a separate one for the spaces, one for thepelement, and so forth. +The refelement has its own set of child nodes, one for the carriage return, a separate one for the spaces, one for thepelement, and so forth.- ![]()
You can even use the toxmlmethod here, deeply nested within the document. +You can even use the toxmlmethod here, deeply nested within the document.- ![]()
The pelement has only one child node (you can't tell that from this example, but look atpNode.childNodesif you don't believe me), and it is aTextnode for the single character'0'. +The pelement has only one child node (you can't tell that from this example, but look atpNode.childNodesif you don't believe me), and it is aTextnode for the single character'0'.@@ -7925,7 +7765,7 @@ Dive in - ![]()
The .dataattribute of aTextnode gives you the actual string that the text node represents. But what is that'u'in front of the string? The answer to that deserves its own section. +The .dataattribute of aTextnode gives you the actual string that the text node represents. But what is that'u'in front of the string? The answer to that deserves its own section.@@ -7947,20 +7787,20 @@ La Peña - ![]()
When printing a string, Python will attempt to convert it to your default encoding, which is usually ASCII. (More on this in a minute.) Since this unicode string is made up of characters that are also ASCII characters, printing it has the same result as printing a normal ASCII string; the conversion is seamless, and if you didn't know that swas a unicode string, you'd never notice the difference. +When printing a string, Python will attempt to convert it to your default encoding, which is usually ASCII. (More on this in a minute.) Since this unicode string is made up of characters that are also ASCII characters, printing it has the same result as printing a normal ASCII string; the conversion is seamless, and if you didn't know that s was a unicode string, you'd never notice the difference. - ![]()
Remember I said that the UnicodeErrorerror. +Remember I said that the - ![]()
Here's where the conversion-from-unicode-to-other-encoding-schemes comes in. sis a unicode string, butencodemethod, available on every unicode string, to convert the unicode string to a regular string in the given encoding scheme, +Here's where the conversion-from-unicode-to-other-encoding-schemes comes in. s is a unicode string, but encodemethod, available on every unicode string, to convert the unicode string to a regular string in the given encoding scheme, which you pass as a parameter. In this case, you're usinglatin-1(also known asiso-8859-1), which includes the tilde-n (whereas the default ASCII encoding scheme did not, since it only includes characters numbered 0 through 127).Remember I said Python usually converted unicode to ASCII whenever it needed to make a regular string out of a unicode string? Well, this default encoding scheme is an option which you can customize. -
Example 9.15.
sitecustomize.py+Example 9.15.
sitecustomize.py# sitecustomize.py# this file can be anywhere in your Python path, # but it usually goes in ${pythondir}/lib/site-packages/ @@ -7971,14 +7811,14 @@ sys.setdefaultencoding('iso-8859-1')
-
sitecustomize.pyis a special script; Python will try to import it on startup, so any code in it will be run automatically. As the comment mentions, it can go anywhere - (as long asimportcan find it), but it usually goes in thesite-packagesdirectory within your Pythonlibdirectory. +sitecustomize.pyis a special script; Python will try to import it on startup, so any code in it will be run automatically. As the comment mentions, it can go anywhere + (as long asimportcan find it), but it usually goes in thesite-packagesdirectory within your Pythonlibdirectory.@@ -7993,8 +7833,8 @@ La Peña - ![]()
setdefaultencodingfunction sets, well, the default encoding. This is the encoding scheme that Python will try to use whenever it needs to auto-coerce a unicode string into a regular string. +setdefaultencodingfunction sets, well, the default encoding. This is the encoding scheme that Python will try to use whenever it needs to auto-coerce a unicode string into a regular string.- ![]()
This example assumes that you have made the changes listed in the previous example to your sitecustomize.pyfile, and restarted Python. If your default encoding still says'ascii', you didn't set up yoursitecustomize.pyproperly, or you didn't restart Python. The default encoding can only be changed during Python startup; you can't change it later. (Due to some wacky programming tricks that I won't get into right now, you can't even - callsys.setdefaultencodingafter Python has started up. Dig intosite.pyand search for “setdefaultencoding” to find out how.) +This example assumes that you have made the changes listed in the previous example to your sitecustomize.pyfile, and restarted Python. If your default encoding still says'ascii', you didn't set up yoursitecustomize.pyproperly, or you didn't restart Python. The default encoding can only be changed during Python startup; you can't change it later. (Due to some wacky programming tricks that I won't get into right now, you can't even + callsys.setdefaultencodingafter Python has started up. Dig intosite.pyand search for “setdefaultencoding” to find out how.)@@ -8004,13 +7844,13 @@ La Peña -Example 9.17. Specifying encoding in
-.pyfilesIf you are going to be storing non-ASCII strings within your Python code, you'll need to specify the encoding of each individual
.pyfile by putting an encoding declaration at the top of each file. This declaration defines the.pyfile to be UTF-8:+Example 9.17. Specifying encoding in
+.pyfilesIf you are going to be storing non-ASCII strings within your Python code, you'll need to specify the encoding of each individual
.pyfile by putting an encoding declaration at the top of each file. This declaration defines the.pyfile to be UTF-8:#!/usr/bin/env python # -*- coding: UTF-8 -*-Now, what about XML? Well, every XML document is in a specific encoding. Again, ISO-8859-1 is a popular encoding for data in Western European languages. KOI8-R is popular for Russian texts. The encoding, if specified, is in the header of the XML document. -
Example 9.18.
russiansample.xml+Example 9.18.
russiansample.xml<?xml version="1.0" encoding="koi8-r"?><preface> <title>Предисловие</title>
@@ -8030,7 +7870,7 @@ is popular for Russian texts. The encoding, if specified, is in the header of t -
Example 9.19. Parsing
russiansample.xml+Example 9.19. Parsing
russiansample.xml>>> from xml.dom import minidom >>> xmldoc = minidom.parse('russiansample.xml')>>> title = xmldoc.getElementsByTagName('title')[0].firstChild.data @@ -8049,15 +7889,15 @@ UnicodeError: ASCII encoding error: ordinal not in range(128)
- ![]()
I'm assuming here that you saved the previous example as russiansample.xmlin the current directory. I am also, for the sake of completeness, assuming that you've changed your default encoding back - to'ascii'by removing yoursitecustomize.pyfile, or at least commenting out thesetdefaultencodingline. +I'm assuming here that you saved the previous example as russiansample.xmlin the current directory. I am also, for the sake of completeness, assuming that you've changed your default encoding back + to'ascii'by removing yoursitecustomize.pyfile, or at least commenting out thesetdefaultencodingline.- ![]()
Note that the text data of the titletag (now in thetitlevariable, thanks to that long concatenation of Python functions which I hastily skipped over and, annoyingly, won't explain until the next section) -- the text data inside the -XML document'stitleelement is stored in unicode. +Note that the text data of the titletag (now in the title variable, thanks to that long concatenation of Python functions which I hastily skipped over and, annoyingly, won't explain until the next section) -- the text data inside the +XML document'stitleelement is stored in unicode.@@ -8089,14 +7929,14 @@ in Python. If your XML documents are all 7-bit ASCI - Unicode Tutorial has some more examples of how to use Python's unicode functions, including how to force Python to coerce unicode into ASCII even when it doesn't really want to. -
- PEP 263 goes into more detail about how and when to define a character encoding in your
.pyfiles. +- PEP 263 goes into more detail about how and when to define a character encoding in your
.pyfiles.9.5. Searching for elements
Traversing XML documents by stepping through each node can be tedious. If you're looking for something in particular, buried deep within - your XML document, there is a shortcut you can use to find it quickly:
getElementsByTagName. -For this section, you'll be using the
binary.xmlgrammar file, which looks like this: -Example 9.20.
binary.xml<?xml version="1.0"?> + your XML document, there is a shortcut you can use to find it quickly:getElementsByTagName. +For this section, you'll be using the
binary.xmlgrammar file, which looks like this: +Example 9.20.