diff --git a/dip2 b/dip2 index 98e8cb6..40f0ffd 100644 --- a/dip2 +++ b/dip2 @@ -432,7 +432,7 @@ Type "help", "copyright", "credits", or "license" for more information. >>> [press Ctrl+D to exit] [root@localhost root]# which python2.3 /usr/bin/python2.3 -
+
  1. Whoops! Just typing python gives you the older version of Python -- the one that was installed by default. That's not the one you want.
  2. At the time of this writing, the newest version is called python2.3. You'll probably want to change the path on the first line of the sample scripts to point to the newer version. @@ -525,7 +525,7 @@ hello world >>> y = 2 >>> x + y 3 -
    +
    1. The Python interactive shell can evaluate arbitrary Python expressions, including any basic arithmetic expression.
    2. The interactive shell can execute arbitrary Python statements, including the print statement. @@ -632,7 +632,7 @@ NameError: There is no variable named 'x' >>> y 'b' >>> z -'e'
      +'e'
      1. v is a tuple of three elements, and (x, y, z) is a tuple of three variables. Assigning one to the other assigns each of the values of v to each of the variables, in order.

        This has all sorts of uses. I often want to assign names to a range of values. In C, you would use enum and manually list each constant and its associated value, which seems especially tedious when the values are consecutive. @@ -645,7 +645,7 @@ NameError: There is no variable named 'x' >>> TUESDAY 1 >>> SUNDAY -6

        +6
        1. The built-in range function returns a list of integers. In its simplest form, it takes an upper limit and returns a zero-based list counting up to but not including the upper limit. (If you like, you can pass other parameters to specify a base other than 0 and a step other than 1. You can print range.__doc__ for details.) @@ -671,7 +671,7 @@ NameError: There is no variable named 'x' [1, 9, 8, 4] >>> li = [elem*2 for elem in li] >>> li -[2, 18, 16, 8]
          +[2, 18, 16, 8]
          1. To make sense of this, look at it from right to left. li is the list you're mapping. Python loops through li one element at a time, temporarily assigning the value of each element to the variable elem. Python then applies the function elem*2 and appends that result to the returned list.
          2. Note that list comprehensions do not change the original list. @@ -685,7 +685,7 @@ NameError: There is no variable named 'x' >>> params.values() ['mpilgrim', 'sa', 'master', 'secret'] >>> params.items() -[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
            +[('server', 'mpilgrim'), ('uid', 'sa'), ('database', 'master'), ('pwd', 'secret')]
            1. The keys method of a dictionary returns a list of all the keys. The list is not in the order in which the dictionary was defined (remember that elements in a dictionary are unordered), but it is a list. @@ -701,7 +701,7 @@ as params.items(), but each element in the >>> [v for k, v in params.items()] ['mpilgrim', 'sa', 'master', 'secret'] >>> ["%s=%s" % (k, v) for k, v in params.items()] -['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
              +['server=mpilgrim', 'uid=sa', 'database=master', 'pwd=secret']
              1. Note that you're using two variables to iterate through the params.items() list. This is another use of multi-variable assignment. The first element of params.items() is ('server', 'mpilgrim'), so in the first iteration of the list comprehension, k will get 'server' and v will get 'mpilgrim'. In this case, you're ignoring the value of v and only including the value of k in the returned list, so this list comprehension ends up being equivalent to params.keys().
              2. Here you're doing the same thing, but ignoring the value of k, so this list comprehension ends up being equivalent to params.values(). @@ -774,7 +774,7 @@ def info(object, spacing=10, collapse=1): - print info.__doc__
                + print info.__doc__
                1. This module has one function, info. According to its function declaration, it takes three parameters: object, spacing, and collapse. The last two are actually optional parameters, as you'll see shortly.
                2. The info function has a multi-line docstring that succinctly describes the function's purpose. Note that no return value is mentioned; this function will be used solely @@ -818,7 +818,7 @@ Python, arguments can be specified by name, in any order. info(odbchelper) info(odbchelper, 12) info(odbchelper, collapse=0) -info(spacing=15, object=odbchelper)
                  +info(spacing=15, object=odbchelper)
                  1. With only one argument, spacing gets its default value of 10 and collapse gets its default value of 1.
                  2. With two arguments, collapse gets its default value of 1. @@ -852,7 +852,7 @@ time, you'll call functions the “normal” way, but you always have th <type 'module'> >>> import types >>> type(odbchelper) == types.ModuleType -True
                    +True
                    1. type takes anything -- and I mean anything -- and returns its datatype. Integers, strings, lists, dictionaries, tuples, functions, classes, modules, even types are acceptable. @@ -873,7 +873,7 @@ True
                      >>> str(odbchelper) "<module 'odbchelper' from 'c:\\docbook\\dip\\py\\odbchelper.py'>" >>> str(None) -'None'
                      +'None'
                      1. For simple datatypes like integers, you would expect str to work, because almost every language has a function to convert an integer to a string.
                      2. However, str works on any object of any type. Here it works on a list which you've constructed in bits and pieces. @@ -891,7 +891,7 @@ True
                        ['clear', 'copy', 'get', 'has_key', 'items', 'keys', 'setdefault', 'update', 'values'] >>> import odbchelper >>> dir(odbchelper) -['__builtins__', '__doc__', '__file__', '__name__', 'buildConnectionString']
                        +['__builtins__', '__doc__', '__file__', '__name__', 'buildConnectionString']
                        1. li is a list, so dir(li) returns a list of all the methods of a list. Note that the returned list contains the names of the methods as strings, not the methods themselves. @@ -915,7 +915,7 @@ True intervening occurrences of sep. The default separator is a single space. - (joinfields and join are synonymous)
                          + (joinfields and join are synonymous)
                          1. The functions in the string module are deprecated (although many people still use the join function), but the module contains a lot of useful constants like this string.punctuation, which contains all the standard punctuation characters.
                          2. string.join is a function that joins a list of strings. @@ -966,7 +966,7 @@ IOError I/O operation failed. >>> getattr((), "pop") Traceback (innermost last): File "<interactive input>", line 1, in ? -AttributeError: 'tuple' object has no attribute 'pop'
                            +AttributeError: 'tuple' object has no attribute 'pop'
                            1. This gets a reference to the pop method of the list. Note that this is not calling the pop method; that would be li.pop(). This is the method itself.
                            2. This also returns a reference to the pop method, but this time, the method name is specified as a string argument to the getattr function. getattr is an incredibly useful built-in function that returns any attribute of any object. In this case, the object is a list, @@ -991,7 +991,7 @@ AttributeError: 'tuple' object has no attribute 'pop'
                              >>> type(getattr(object, method)) == types.FunctionType True >>> callable(getattr(object, method)) -True
                              +True
                              1. This returns a reference to the buildConnectionString function in the odbchelper module, which you studied in Chapter 2, Your First Python Program. (The hex address you see is specific to my machine; your output will be different.)
                              2. Using getattr, you can get the same reference to the same function. In general, getattr(object, "attribute") is equivalent to object.attribute. If object is a module, then attribute can be anything defined in the module: a function, class, or global variable. @@ -1009,7 +1009,7 @@ import statsout def output(data, format="text"): output_function = getattr(statsout, "output_%s" % format) return output_function(data) -
                                +
                                1. The output function takes one required argument, data, and one optional argument, format. If format is not specified, it defaults to text, and you will end up calling the plain text output function.
                                2. You concatenate the format argument with "output_" to produce a function name, and then go get that function from the statsout module. This allows you to easily extend the program later to support other output formats, without changing this dispatch @@ -1025,7 +1025,7 @@ import statsout def output(data, format="text"): output_function = getattr(statsout, "output_%s" % format, statsout.output_text) return output_function(data) -
                                  +
                                  1. This function call is guaranteed to work, because you added a third argument to the call to getattr. The third argument is a default value that is returned if the attribute or method specified by the second argument wasn't found. @@ -1042,7 +1042,7 @@ so they are never put through the mapping expression and are not included in the >>> [elem for elem in li if elem != "b"] ['a', 'mpilgrim', 'foo', 'c', 'd', 'd'] >>> [elem for elem in li if li.count(elem) == 1] -['a', 'mpilgrim', 'foo', 'c']
                                    +['a', 'mpilgrim', 'foo', 'c']
                                    1. The mapping expression here is simple (it just returns the value of each element), so concentrate on the filter expression. As Python loops through the list, it runs each element through the filter expression. If the filter expression is true, the element @@ -1075,7 +1075,7 @@ the pop method of a list) and user-defined (like the buildCon >>> '' and 'b' '' >>> 'a' and 'b' and 'c' -'c'
                                      +'c'
                                      1. When using and, values are evaluated in a boolean context from left to right. 0, '', [], (), {}, and None are false in a boolean context; everything else is true. Well, almost everything. By default, instances of classes are true in a boolean context, but you can define special methods in your class to make an instance evaluate to false. You'll @@ -1092,7 +1092,7 @@ the pop method of a list) and user-defined (like the buildCon ... print "in sidefx()" ... return 1 >>> 'a' or sidefx() -'a'
                                        +'a'
                                        1. When using or, values are evaluated in a boolean context from left to right, just like and. If any value is true, or returns that value immediately. In this case, 'a' is the first true value.
                                        2. or evaluates '', which is false, then 'b', which is true, and returns 'b'. @@ -1107,7 +1107,7 @@ the pop method of a list) and user-defined (like the buildCon 'first' >>> 0 and a or b 'second' -
                                          +
                                          1. This syntax looks similar to the bool ? a : b expression in C. The entire expression is evaluated from left to right, so the and is evaluated first. 1 and 'first' evalutes to 'first', then 'first' or 'second' evalutes to 'first'.
                                          2. 0 and 'first' evalutes to False, and then 0 or 'second' evaluates to 'second'. @@ -1116,7 +1116,7 @@ the pop method of a list) and user-defined (like the buildCon

                                            Example 4.18. When the and-or Trick Fails

                                            >>> a = ""
                                             >>> b = "second"
                                             >>> 1 and a or b         
                                            -'second'
                                            +'second'
                                            1. Since a is an empty string, which Python considers false in a boolean context, 1 and '' evalutes to '', and then '' or 'second' evalutes to 'second'. Oops! That's not what you wanted.

                                              The and-or trick, bool and a or b, will not work like the C expression bool ? a : b when a is false in a boolean context. @@ -1124,7 +1124,7 @@ the pop method of a list) and user-defined (like the buildCon

                                              Example 4.19. Using the and-or Trick Safely

                                              >>> a = ""
                                               >>> b = "second"
                                               >>> (1 and [a] or [b])[0] 
                                              -''
                                              +''
                                              1. Since [a] is a non-empty list, it is never false. Even if a is 0 or '' or some other false value, the list [a] is true because it has one element.

                                                By now, this trick may seem like more trouble than it's worth. You could, after all, accomplish the same thing with an if statement, so why go through all this fuss? Well, in many cases, you are choosing between two constant values, so you can @@ -1147,7 +1147,7 @@ the pop method of a list) and user-defined (like the buildCon >>> g(3) 6 >>> (lambda x: x*2)(3) -6

                                                +6
                                                1. This is a lambda function that accomplishes the same thing as the normal function above it. Note the abbreviated syntax here: there are no parentheses around the argument list, and the return keyword is missing (it is implied, since the entire function can only be one expression). Also, the function has no name, @@ -1173,7 +1173,7 @@ a test >>> print s.split() ['this', 'is', 'a', 'test'] >>> print " ".join(s.split()) -'this is a test'
                                                  +'this is a test'
                                                  1. This is a multiline string, defined by escape characters instead of triple quotes. \n is a carriage return, and \t is a tab character.
                                                  2. split without any arguments splits on whitespace. So three spaces, a carriage return, and a tab character are all the same. @@ -1216,7 +1216,7 @@ for method in methodList

                                                    shows that this is a >>> print getattr(object, method).__doc__ Build a connection string from a dictionary of parameters. - Returns string.

                                                    + Returns string.
                                                    1. In the info function, object is the object you're getting help on, passed in as an argument.
                                                    2. As you're looping through methodList, method is the name of the current method. @@ -1231,7 +1231,7 @@ for method in methodList

                                                      shows that this is a >>> str(foo.__doc__) 'None' -

                                                      +
                                                      1. You can easily define a function that has no docstring, so its __doc__ attribute is None. Confusingly, if you evaluate the __doc__ attribute directly, the Python IDE prints nothing at all, which makes sense if you think about it, but is still unhelpful.
                                                      2. You can verify that the value of the __doc__ attribute is actually None by comparing it directly. @@ -1245,7 +1245,7 @@ True >>> s.ljust(30) 'buildConnectionString ' >>> s.ljust(20) -'buildConnectionString'
                                                        +'buildConnectionString'
                                                        1. ljust pads the string with spaces to the given length. This is what the info function uses to make two columns of output and line up all the docstrings in the second column.
                                                        2. If the given length is smaller than the length of the string, ljust will simply return the string unchanged. It never truncates the string. @@ -1254,7 +1254,7 @@ True >>> print "\n".join(li) a b -c
                                                          +c
                                                          1. This is also a useful debugging trick when you're working with lists. And in Python, you're always working with lists.

                                                            That's the last piece of the puzzle. You should now understand this code. @@ -1391,7 +1391,7 @@ def listDirectory(directory, fileExtList): if __name__ == "__main__": for info in listDirectory("/music/_singles/", [".mp3"]): print "\n".join(["%s=%s" % (k, v) for k, v in info.items()]) - print

                                                            + print
                                                            1. This program's output depends on the files on your hard drive. To get meaningful output, you'll need to change the directory path to point to a directory of MP3 files on your own machine. @@ -1466,7 +1466,7 @@ can import individual items or use from module import * NameError: There is no variable named 'FunctionType' >>> from types import FunctionType >>> FunctionType -<type 'function'>
                                                              +<type 'function'>
                                                              1. The types module contains no methods; it just has attributes for each Python object type. Note that the attribute, FunctionType, must be qualified by the module name, types.
                                                              2. FunctionType by itself has not been defined in this namespace; it exists only in the context of types. @@ -1505,7 +1505,7 @@ NameError: There is no variable named 'FunctionType'

                                                                Example 5.4. Defining the FileInfo Class

                                                                
                                                                 from UserDict import UserDict
                                                                 
                                                                -class FileInfo(UserDict): 
                                                                +class FileInfo(UserDict):
                                                                1. In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. So the FileInfo class is inherited from the UserDict class (which was imported from the UserDict module). UserDict is a class that acts like a dictionary, allowing you to essentially subclass the dictionary datatype and add your own behavior. (There are similar classes UserList and UserString which allow you to subclass lists and strings.) There is a bit of black magic behind this, which you will demystify later @@ -1517,84 +1517,14 @@ class FileInfo(UserDict):

                                                                  Python supports multiple inheritance. In the parentheses following the class name, you can list as many ancestor classes as you like, separated by commas.

                                                                  5.3.1. Initializing and Coding Classes

                                                                  -

                                                                  This example shows the initialization of the FileInfo class using the __init__ method. -

                                                                  Example 5.5. Initializing the FileInfo Class

                                                                  
                                                                  -class FileInfo(UserDict):
                                                                  -    "store file metadata"              
                                                                  -    def __init__(self, filename=None):   
                                                                  -
                                                                    -
                                                                  1. Classes can (and should) have docstrings too, just like modules and functions. -
                                                                  2. __init__ is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor - of the class. It's tempting, because it looks like a constructor (by convention, __init__ is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance - of the class), and even sounds like one (“init” certainly suggests a constructor-ish nature). Incorrect, because the object has already been constructed by the time __init__ is called, and you already have a valid reference to the new instance of the class. But __init__ is the closest thing you're going to get to a constructor in Python, and it fills much the same role. -
                                                                  3. The first argument of every class method, including __init__, is always a reference to the current instance of the class. By convention, this argument is always named self. In the __init__ method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although - you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically. -
                                                                  4. __init__ methods can take any number of arguments, and just like functions, the arguments can be defined with default values, making - them optional to the caller. In this case, filename has a default value of None, which is the Python null value. - -
                                                                    NoteBy convention, the first argument of any Python class method (the reference to the current instance) is called self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention. -

                                                                    Example 5.6. Coding the FileInfo Class

                                                                    
                                                                    -class FileInfo(UserDict):
                                                                    -    "store file metadata"
                                                                    -    def __init__(self, filename=None):
                                                                    -        UserDict.__init__(self)        
                                                                    -        self["name"] = filename        
                                                                    -
                                                                    -
                                                                      -
                                                                    1. Some pseudo-object-oriented languages like Powerbuilder have a concept of “extending” constructors and other events, where the ancestor's method is called automatically before the descendant's method is executed. - Python does not do this; you must always explicitly call the appropriate method in the ancestor class. -
                                                                    2. I told you that this class acts like a dictionary, and here is the first sign of it. You're assigning the argument filename as the value of this object's name key. -
                                                                    3. Note that the __init__ method never returns a value. -

                                                                      5.3.2. Knowing When to Use self and __init__

                                                                      -

                                                                      When defining your class methods, you must explicitly list self as the first argument for each method, including __init__. When you call a method of an ancestor class from within your class, you must include the self argument. But when you call your class method from outside, you do not specify anything for the self argument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent, - but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don't know - about yet. -

                                                                      Whew. I realize that's a lot to absorb, but you'll get the hang of it. All Python classes work the same way, so once you learn one, you've learned them all. If you forget everything else, remember this - one thing, because I promise it will trip you up: - -
                                                                      Note__init__ methods are optional, but when you define one, you must remember to explicitly call the ancestor's __init__ method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, - the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments. -
                                                                      -

                                                                      Further Reading on Python Classes

                                                                      - -

                                                                      5.4. Instantiating Classes

                                                                      -

                                                                      Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the -__init__ method defines. The return value will be the newly created object. -

                                                                      Example 5.7. Creating a FileInfo Instance

                                                                      >>> import fileinfo
                                                                      ->>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3") 
                                                                      ->>> f.__class__    
                                                                      -<class fileinfo.FileInfo at 010EC204>
                                                                      ->>> f.__doc__      
                                                                      -'store file metadata'
                                                                      ->>> f              
                                                                      -{'name': '/music/_singles/kairo.mp3'}
                                                                      -
                                                                        -
                                                                      1. You are creating an instance of the FileInfo class (defined in the fileinfo module) and assigning the newly created instance to the variable f. You are passing one parameter, /music/_singles/kairo.mp3, which will end up as the filename argument in FileInfo's __init__ method. -
                                                                      2. Every class instance has a built-in attribute, __class__, which is the object's class. (Note that the representation of this includes the physical address of the instance on my - machine; your representation will be different.) Java programmers may be familiar with the Class class, which contains methods like getName and getSuperclass to get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like __class__, __name__, and __bases__. -
                                                                      3. You can access the instance's docstring just as with a function or a module. All instances of a class share the same docstring. -
                                                                      4. Remember when the __init__ method assigned its filename argument to self["name"]? Well, here's the result. The arguments you pass when you create the class instance get sent right along to the __init__ method (along with the object reference, self, which Python adds for free). - - -
                                                                        NoteIn Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit new operator like C++ or Java. -

                                                                        5.4.1. Garbage Collection

                                                                        If creating new instances is easy, destroying them is even easier. In general, there is no need to explicitly free instances, because they are freed automatically when the variables assigned to them go out of scope. Memory leaks are rare in Python.

                                                                        Example 5.8. Trying to Implement a Memory Leak

                                                                        >>> def leakmem():
                                                                         ...    f = fileinfo.FileInfo('/music/_singles/kairo.mp3') 
                                                                         ...    
                                                                         >>> for i in range(100):
                                                                        -...    leakmem()      
                                                                        +... leakmem()
                                                                        1. Every time the leakmem function is called, you are creating an instance of FileInfo and assigning it to the variable f, which is a local variable within the function. Then the function ends without ever freeing f, so you would expect a memory leak, but you would be wrong. When the function ends, the local variable f goes out of scope. At this point, there are no longer any references to the newly created instance of FileInfo (since you never assigned it to anything other than f), so Python destroys the instance for us.
                                                                        2. No matter how many times you call the leakmem function, it will never leak memory, because every time, Python will destroy the newly created FileInfo class before returning from leakmem. @@ -1623,14 +1553,14 @@ class UserDict: def __init__(self, dict=None): self.data = {} if dict is not None: self.update(dict) -
                                                                          +
                                                                          1. Note that UserDict is a base class, not inherited from any other class.
                                                                          2. This is the __init__ method that you overrode in the FileInfo class. Note that the argument list in this ancestor class is different than the descendant. That's okay; each subclass can have its own set of arguments, as long as it calls the ancestor with the correct arguments. Here the ancestor class has a way to define initial values (by passing a dictionary in the dict argument) which the FileInfo does not use.
                                                                          3. Python supports data attributes (called “instance variables” in Java and Powerbuilder, and “member variables” in C++). Data attributes are pieces of data held by a specific instance of a class. In this case, each instance of UserDict will have a data attribute data. To reference this attribute from code outside the class, you qualify it with the instance name, instance.data, in the same way that you qualify a function with its module name. To reference a data attribute from within the class, - you use self as the qualifier. By convention, all data attributes are initialized to reasonable values in the __init__ method. However, this is not required, since data attributes, like local variables, spring into existence when they are first assigned a value. + you use self as the qualifier. By convention, all data attributes are initialized to reasonable values in the __init__ method. However, this is not required, since data attributes, like local variables, spring into existence when they are first assigned a value.
                                                                          4. The update method is a dictionary duplicator: it copies all the keys and values from one dictionary to another. This does not clear the target dictionary first; if the target dictionary already has some keys, the ones from the source dictionary will be overwritten, but others will be left untouched. Think of update as a merge function, not a copy function.
                                                                          5. This is a syntax you may not have seen before (I haven't used it in the examples in this book). It's an if statement, but instead of having an indented block starting on the next line, there is just a single statement on the same @@ -1661,14 +1591,14 @@ class UserDict: def keys(self): return self.data.keys() def items(self): return self.data.items() def values(self): return self.data.values() -
                                                                            +
                                                                              -
                                                                            1. clear is a normal class method; it is publicly available to be called by anyone at any time. Notice that clear, like all class methods, has self as its first argument. (Remember that you don't include self when you call the method; it's something that Python adds for you.) Also note the basic technique of this wrapper class: store a real dictionary (data) as a data attribute, define all the methods that a real dictionary has, and have each class method redirect to the corresponding +
                                                                            2. clear is a normal class method; it is publicly available to be called by anyone at any time. Notice that clear, like all class methods, has self as its first argument. (Remember that you don't include self when you call the method; it's something that Python adds for you.) Also note the basic technique of this wrapper class: store a real dictionary (data) as a data attribute, define all the methods that a real dictionary has, and have each class method redirect to the corresponding method on the real dictionary. (In case you'd forgotten, a dictionary's clear method deletes all of its keys and their associated values.)
                                                                            3. The copy method of a real dictionary returns a new dictionary that is an exact duplicate of the original (all the same key-value pairs). - But UserDict can't simply redirect to self.data.copy, because that method returns a real dictionary, and what you want is to return a new instance that is the same class as self. -
                                                                            4. You use the __class__ attribute to see if self is a UserDict; if so, you're golden, because you know how to copy a UserDict: just create a new UserDict and give it the real dictionary that you've squirreled away in self.data. Then you immediately return the new UserDict you don't even get to the import copy on the next line. -
                                                                            5. If self.__class__ is not UserDict, then self must be some subclass of UserDict (like maybe FileInfo), in which case life gets trickier. UserDict doesn't know how to make an exact copy of one of its descendants; there could, for instance, be other data attributes defined + But UserDict can't simply redirect to self.data.copy, because that method returns a real dictionary, and what you want is to return a new instance that is the same class as self. +
                                                                            6. You use the __class__ attribute to see if self is a UserDict; if so, you're golden, because you know how to copy a UserDict: just create a new UserDict and give it the real dictionary that you've squirreled away in self.data. Then you immediately return the new UserDict you don't even get to the import copy on the next line. +
                                                                            7. If self.__class__ is not UserDict, then self must be some subclass of UserDict (like maybe FileInfo), in which case life gets trickier. UserDict doesn't know how to make an exact copy of one of its descendants; there could, for instance, be other data attributes defined in the subclass, so you would need to iterate through them and make sure to copy all of them. Luckily, Python comes with a module to do exactly this, and it's called copy. I won't go into the details here (though it's a wicked cool module, if you're ever inclined to dive into it on your own). Suffice it to say that copy can copy arbitrary Python objects, and that's how you're using it here.
                                                                            8. The rest of the methods are straightforward, redirecting the calls to the built-in methods on self.data. @@ -1682,7 +1612,7 @@ class FileInfo(dict): "store file metadata" def __init__(self, filename=None): self["name"] = filename -
                                                                              +
                                                                              1. The first difference is that you don't need to import the UserDict module, since dict is a built-in datatype and is always available. The second is that you are inheriting from dict directly, instead of from UserDict.UserDict.
                                                                              2. The third difference is subtle but important. Because of the way UserDict works internally, it requires you to manually call its __init__ method to properly initialize its internal data structures. dict does not work like this; it is not a wrapper, and it requires no explicit initialization. @@ -1706,7 +1636,7 @@ provide a way to map non-method-calling syntax into method calls. >>> f.__getitem__("name") '/music/_singles/kairo.mp3' >>> f["name"] -'/music/_singles/kairo.mp3'
                                                                                +'/music/_singles/kairo.mp3'
                                                                                1. The __getitem__ special method looks simple enough. Like the normal methods clear, keys, and values, it just redirects to the dictionary to return its value. But how does it get called? Well, you can call __getitem__ directly, but in practice you wouldn't actually do that; I'm just doing it here to show you how it works. The right way to use __getitem__ is to get Python to call it for you. @@ -1720,7 +1650,7 @@ provide a way to map non-method-calling syntax into method calls. {'name':'/music/_singles/kairo.mp3', 'genre':31} >>> f["genre"] = 32 >>> f -{'name':'/music/_singles/kairo.mp3', 'genre':32}
                                                                                  +{'name':'/music/_singles/kairo.mp3', 'genre':32}
                                                                                  1. Like the __getitem__ method, __setitem__ simply redirects to the real dictionary self.data to do its work. And like __getitem__, you wouldn't ordinarily call it directly like this; Python calls __setitem__ for you when you use the right syntax.
                                                                                  2. This looks like regular dictionary syntax, except of course that f is really a class that's trying very hard to masquerade as a dictionary, and __setitem__ is an essential part of that masquerade. This line of code actually calls f.__setitem__("genre", 32) under the covers. @@ -1734,7 +1664,7 @@ provide a way to map non-method-calling syntax into method calls. def __setitem__(self, key, item): if key == "name" and item: self.__parse(item) - FileInfo.__setitem__(self, key, item)
                                                                                    + FileInfo.__setitem__(self, key, item)
                                                                                    1. Notice that this __setitem__ method is defined exactly the same way as the ancestor method. This is important, since Python will be calling the method for you, and it expects it to be defined with a certain number of arguments. (Technically speaking, the names of the arguments don't matter; only the number of arguments is important.) @@ -1758,7 +1688,7 @@ provide a way to map non-method-calling syntax into method calls. >>> mp3file {'album': '', 'artist': 'The Cynic Project', 'genre': 18, 'title': 'Sidewinder', 'name': '/music/_singles/sidewinder.mp3', 'year': '2000', -'comment': 'http://mp3.com/cynicproject'}
                                                                                      +'comment': 'http://mp3.com/cynicproject'}
                                                                                      1. First, you create an instance of MP3FileInfo, without passing it a filename. (You can get away with this because the filename argument of the __init__ method is optional.) Since MP3FileInfo has no __init__ method of its own, Python walks up the ancestor tree and finds the __init__ method of FileInfo. This __init__ method manually calls the __init__ method of UserDict and then sets the name key to filename, which is None, since you didn't pass a filename. Thus, mp3file initially looks like a dictionary with one key, name, whose value is None. @@ -1777,7 +1707,7 @@ provide a way to map non-method-calling syntax into method calls. else: return cmp(self.data, dict) def __len__(self): return len(self.data) - def __delitem__(self, key): del self.data[key]
                                                                                        + def __delitem__(self, key): del self.data[key]
                                                                                        1. __repr__ is a special method that is called when you call repr(instance). The repr function is a built-in function that returns a string representation of an object. It works on any object, not just class instances. You're already intimately familiar with repr and you don't even know it. In the interactive window, when you type just a variable name and press the ENTER key, Python uses repr to display the variable's value. Go create a dictionary d with some data and then print repr(d) to see for yourself. @@ -1833,7 +1763,7 @@ class MP3FileInfo(FileInfo): 'artist': (33, 63, <function stripnulls at 0260C8D4>), 'year': (93, 97, <function stripnulls at 0260C8D4>), 'comment': (97, 126, <function stripnulls at 0260C8D4>), -'album': (63, 93, <function stripnulls at 0260C8D4>)}
                                                                                          +'album': (63, 93, <function stripnulls at 0260C8D4>)}
                                                                                          1. MP3FileInfo is the class itself, not any particular instance of the class.
                                                                                          2. tagDataMap is a class attribute: literally, an attribute of the class. It is available before creating any instances of the class. @@ -1865,7 +1795,7 @@ class MP3FileInfo(FileInfo): >>> c.count 2 >>> counter.count -2
                                                                                            +2
                                                                                            1. count is a class attribute of the counter class.
                                                                                            2. __class__ is a built-in attribute of every class instance (of every class). It is a reference to the class that self is an instance of (in this case, the counter class). @@ -1896,7 +1826,7 @@ call it directly (even from outside the fileinfo module) if you had >>> m.__parse("/music/_singles/kairo.mp3") Traceback (innermost last): File "<interactive input>", line 1, in ? -AttributeError: 'MP3FileInfo' instance has no attribute '__parse'
                                                                                              +AttributeError: 'MP3FileInfo' instance has no attribute '__parse'
                                                                                              1. If you try to call a private method, Python will raise a slightly misleading exception, saying that the method does not exist. Of course it does exist, but it's private, so it's not accessible outside the class.Strictly speaking, private methods are accessible outside their class, just not easily accessible. Nothing in Python is truly private; internally, the names of private methods and attributes are mangled and unmangled on the fly to make them @@ -1963,7 +1893,7 @@ IOError: [Errno 2] No such file or directory: '/notthere' ... print "The file does not exist, exiting gracefully" ... print "This line will always print" The file does not exist, exiting gracefully -This line will always print
                                                                                                +This line will always print
                                                                                                1. Using the built-in open function, you can try to open a file for reading (more on open in the next section). But the file doesn't exist, so this raises the IOError exception. Since you haven't provided any explicit check for an IOError exception, Python just prints out some debugging information about what happened and then gives up.
                                                                                                2. You're trying to open the same non-existent file, but this time you're doing it within a try...except block. @@ -1999,7 +1929,7 @@ exceptions, errors occur immediately, and you can handle them in a standard way else: getpass = win_getpass else: - getpass = unix_getpass
                                                                                                  + getpass = unix_getpass
                                                                                                  1. termios is a UNIX-specific module that provides low-level control over the input terminal. If this module is not available (because it's not on your system, or your system doesn't support it), the import fails and Python raises an ImportError, which you catch. @@ -2031,7 +1961,7 @@ exceptions, errors occur immediately, and you can handle them in a standard way >>> f.mode 'rb' >>> f.name -'/music/_singles/kairo.mp3'
                                                                                                    +'/music/_singles/kairo.mp3'
                                                                                                    1. The open method can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename, is required; the other two are optional. If not specified, the file is opened for reading in text mode. Here you are opening the file for reading in binary mode. @@ -2054,7 +1984,7 @@ exceptions, errors occur immediately, and you can handle them in a standard way 'TAGKAIRO****THE BEST GOA ***DJ MARY-JANE*** Rave Mix 2000http://mp3.com/DJMARYJANE \037' >>> f.tell() -7543037
                                                                                                      +7543037
                                                                                                      1. A file object maintains state about the file it has open. The tell method of a file object tells you your current position in the open file. Since you haven't done anything with this file yet, the current position is 0, which is the beginning of the file. @@ -2091,7 +2021,7 @@ ValueError: I/O operation on closed file Traceback (innermost last): File "<interactive input>", line 1, in ? ValueError: I/O operation on closed file ->>> f.close()
                                                                                                        +>>> f.close()
                                                                                                        1. The closed attribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (closed is False).
                                                                                                        2. To close a file, call the close method of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any) @@ -2115,7 +2045,7 @@ ValueError: I/O operation on closed file . . except IOError: - pass
                                                                                                          + pass
                                                                                                          1. Because opening and reading files is risky and may raise an exception, all of this code is wrapped in a try...except block. (Hey, isn't standardized indentation great? This is where you start to appreciate it.)
                                                                                                          2. The open function may raise an IOError. (Maybe the file doesn't exist.) @@ -2145,7 +2075,7 @@ test succeeded >>> logfile.close() >>> print file('test.log').read() test succeededline 2 -
                                                                                                            +
                                                                                                            1. You start boldly by creating either the new file test.log or overwrites the existing file, and opening the file for writing. (The second parameter "w" means open the file for writing.) Yes, that's all as dangerous as it sounds. I hope you didn't care about the previous contents of that file, because it's gone now. @@ -2180,7 +2110,7 @@ e >>> print "\n".join(li) a b -e
                                                                                                              +e
                                                                                                              1. The syntax for a for loop is similar to list comprehensions. li is a list, and s will take the value of each element in turn, starting from the first element.
                                                                                                              2. Like an if statement or any other indented block, a for loop can have any number of lines of code in it. @@ -2202,7 +2132,7 @@ b c d e -
                                                                                                                +
                                                                                                                1. As you saw in Example 3.20, “Assigning Consecutive Values”, range produces a list of integers, which you then loop through. I know it looks a bit odd, but it is occasionally (and I stress occasionally) useful to have a counter loop. @@ -2225,7 +2155,7 @@ OS=Windows_NT COMPUTERNAME=MPILGRIM USERNAME=mpilgrim -[...snip...]
                                                                                                                  +[...snip...]
                                                                                                                  1. os.environ is a dictionary of the environment variables defined on your system. In Windows, these are your user and system variables accessible from MS-DOS. In UNIX, they are the variables exported in your shell's startup scripts. In Mac OS, there is no concept of environment variables, so this dictionary is empty. @@ -2246,7 +2176,7 @@ USERNAME=mpilgrim . if tagdata[:3] == "TAG": for tag, (start, end, parseFunc) in self.tagDataMap.items(): - self[tag] = parseFunc(tagdata[start:end])
                                                                                                                    + self[tag] = parseFunc(tagdata[start:end])
                                                                                                                    1. tagDataMap is a class attribute that defines the tags you're looking for in an MP3 file. Tags are stored in fixed-length fields. Once you read the last 128 bytes of the file, bytes 3 through 32 of those are always the song title, 33 through 62 are always the artist name, 63 through 92 are the album name, and so forth. Note @@ -2269,7 +2199,7 @@ __builtin__ site signal UserDict -stat
                                                                                                                      +stat
                                                                                                                      1. The sys module contains system-level information, such as the version of Python you're running (sys.version or sys.version_info), and system-level options such as the maximum allowed recursion depth (sys.getrecursionlimit() and sys.setrecursionlimit()).
                                                                                                                      2. sys.modules is a dictionary containing all the modules that have ever been imported since Python was started; the key is the module name, the value is the module object. Note that this is more than just the modules your program has imported. Python preloads some modules on startup, and if you're using a Python IDE, sys.modules contains all the modules imported by all the programs you've run within the IDE. @@ -2293,7 +2223,7 @@ stat >>> fileinfo <module 'fileinfo' from 'fileinfo.pyc'> >>> sys.modules["fileinfo"] -<module 'fileinfo' from 'fileinfo.pyc'>
                                                                                                                        +<module 'fileinfo' from 'fileinfo.pyc'>
                                                                                                                        1. As new modules are imported, they are added to sys.modules. This explains why importing the same module twice is very fast: Python has already loaded and cached the module in sys.modules, so importing the second time is simply a dictionary lookup.
                                                                                                                        2. Given the name (as a string) of any previously-imported module, you can get a reference to the module itself through the sys.modules dictionary. @@ -2302,7 +2232,7 @@ stat >>> MP3FileInfo.__module__ 'fileinfo' >>> sys.modules[MP3FileInfo.__module__] -<module 'fileinfo' from 'fileinfo.pyc'>
                                                                                                                          +<module 'fileinfo' from 'fileinfo.pyc'>
                                                                                                                          1. Every Python class has a built-in class attribute __module__, which is the name of the module in which the class is defined.
                                                                                                                          2. Combining this with the sys.modules dictionary, you can get a reference to the module in which a class is defined. @@ -2311,7 +2241,7 @@ stat def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]): "get file info class from filename extension" subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:] - return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
                                                                                                                            + return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
                                                                                                                            1. This is a function with two arguments; filename is required, but module is optional and defaults to the module that contains the FileInfo class. This looks inefficient, because you might expect Python to evaluate the sys.modules expression every time the function is called. In fact, Python evaluates default expressions only once, the first time the module is imported. As you'll see later, you never call this function with a module argument, so module serves as a function-level constant. @@ -2338,7 +2268,7 @@ stat >>> os.path.expanduser("~") 'c:\\Documents and Settings\\mpilgrim\\My Documents' >>> os.path.join(os.path.expanduser("~"), "Python") -'c:\\Documents and Settings\\mpilgrim\\My Documents\\Python'
                                                                                                                              +'c:\\Documents and Settings\\mpilgrim\\My Documents\\Python'
                                                                                                                              1. os.path is a reference to a module -- which module depends on your platform. Just as getpass encapsulates differences between platforms by setting getpass to a platform-specific function, os encapsulates differences between platforms by setting path to a platform-specific module.
                                                                                                                              2. The join function of os.path constructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. (Note that dealing @@ -2359,7 +2289,7 @@ stat >>> shortname 'mahadeva' >>> extension -'.mp3'
                                                                                                                                +'.mp3'
                                                                                                                                1. The split function splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use multi-variable assignment to return multiple values from a function? Well, split is such a function. @@ -2387,7 +2317,7 @@ stat ... if os.path.isdir(os.path.join(dirname, f))] ['cygwin', 'docbook', 'Documents and Settings', 'Incoming', 'Inetpub', 'Music', 'Program Files', 'Python20', 'RECYCLER', -'System Volume Information', 'TEMP', 'WINNT']
                                                                                                                                  +'System Volume Information', 'TEMP', 'WINNT']
                                                                                                                                  1. The listdir function takes a pathname and returns a list of the contents of the directory.
                                                                                                                                  2. listdir returns both files and folders, with no indication of which is which. @@ -2401,7 +2331,7 @@ def listDirectory(directory, fileExtList): for f in os.listdir(directory)] fileList = [os.path.join(directory, f) for f in fileList - if os.path.splitext(f)[1] in fileExtList]
                                                                                                                                    + if os.path.splitext(f)[1] in fileExtList]
                                                                                                                                    1. os.listdir(directory) returns a list of all the files and folders in directory.
                                                                                                                                    2. Iterating through the list with f, you use os.path.normcase(f) to normalize the case according to operating system defaults. normcase is a useful little function that compensates for case-insensitive operating systems that think that mahadeva.mp3 and mahadeva.MP3 are the same file. For instance, on Windows and Mac OS, normcase will convert the entire filename to lowercase; on UNIX-compatible systems, it will return the filename unchanged. @@ -2431,7 +2361,7 @@ may already be familiar with from working on the command line. ['c:\\music\\_singles\\sidewinder.mp3', 'c:\\music\\_singles\\spinning.mp3'] >>> glob.glob('c:\\music\\*\\*.mp3') -
                                                                                                                                      +
                                                                                                                                      1. As you saw earlier, os.listdir simply takes a directory path and lists all files and directories in that directory.
                                                                                                                                      2. The glob module, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard. @@ -2461,7 +2391,7 @@ def listDirectory(directory, fileExtList): "get file info class from filename extension" subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:] return hasattr(module, subclass) and getattr(module, subclass) or FileInfo - return [getFileInfoClass(f)(f) for f in fileList]
                                                                                                                                        + return [getFileInfoClass(f)(f) for f in fileList]
                                                                                                                                        1. listDirectory is the main attraction of this entire module. It takes a directory (like c:\music\_singles\ in my case) and a list of interesting file extensions (like ['.mp3']), and it returns a list of class instances that act like dictionaries that contain metadata about each interesting file in that directory. And it does it in just a few straightforward lines of code. @@ -2925,7 +2855,7 @@ data: '\n ' <td width='99%' align='right'><hr size='1' noshade></td></tr> <tr><td class='tagline' colspan='2'>Python&nbsp;for&nbsp;experienced&nbsp;programmers</td></tr> -[...snip...]
                                                                                                                                          +[...snip...]
                                                                                                                                          1. The urllib module is part of the standard Python library. It contains functions for getting information about and actually retrieving data from Internet-based URLs (mainly web pages).
                                                                                                                                          2. The simplest use of urllib is to retrieve the entire text of a web page using the urlopen function. Opening a URL is similar to opening a file. The return value of urlopen is a file-like object, which has some of the same methods as a file object. @@ -2945,7 +2875,7 @@ class URLLister(SGMLParser): def start_a(self, attrs): href = [v for k, v in attrs if k=='href'] if href: - self.urls.extend(href)
                                                                                                                                            + self.urls.extend(href)
                                                                                                                                            1. reset is called by the __init__ method of SGMLParser, and it can also be called manually once an instance of the parser has been created. So if you need to do any initialization, do it in reset, not in __init__, so that it will be re-initialized properly when someone re-uses a parser instance. @@ -2974,7 +2904,7 @@ download/diveintopython3-xml-5.0.zip download/diveintopython3-common-5.0.zip -... rest of output omitted for brevity ...
                                                                                                                                              +... rest of output omitted for brevity ...
                                                                                                                                              1. Call the feed method, defined in SGMLParser, to get HTML into the parser. [1] It takes a string, which is what usock.read() returns. @@ -3017,7 +2947,7 @@ class BaseHTMLProcessor(SGMLParser): self.pieces.append("<?%(text)s>" % locals()) def handle_decl(self, text): - self.pieces.append("<!%(text)s>" % locals())
                                                                                                                                                + self.pieces.append("<!%(text)s>" % locals())
                                                                                                                                                1. reset, called by SGMLParser.__init__, initializes self.pieces as an empty list before calling the ancestor method. self.pieces is a data attribute which will hold the pieces of the HTML document you're constructing. Each handler method will reconstruct the HTML that SGMLParser parsed, and each method will append that string to self.pieces. Note that self.pieces is a list. You might be tempted to define it as a string and just keep appending each piece to it. That would work, but Python is much more efficient at dealing with lists. @@ -3037,7 +2967,7 @@ Python is much more efficient at dealing with lists.

                                                                                                                                                  Example 8.9. BaseHTMLProcessor output

                                                                                                                                                  
                                                                                                                                                       def output(self):               
                                                                                                                                                           """Return processed HTML as a single string"""
                                                                                                                                                  -        return "".join(self.pieces) 
                                                                                                                                                  + return "".join(self.pieces)
                                                                                                                                                  1. This is the one method in BaseHTMLProcessor that is never called by the ancestor SGMLParser. Since the other handler methods store their reconstructed HTML in self.pieces, this function is needed to join all those pieces into one string. As noted before, Python is great at lists and mediocre at strings, so you only create the complete string when somebody explicitly asks for it.
                                                                                                                                                  2. If you prefer, you could use the join method of the string module instead: string.join(self.pieces, "")
                                                                                                                                                    @@ -3085,7 +3015,7 @@ from __future__ import nested_scopes

                                                                                                                                                    Are you confused yet? Don't despai >>> foo(7) {'arg': 7, 'x': 1} >>> foo('bar') -{'arg': 'bar', 'x': 1}

                                                                                                                                                    +{'arg': 'bar', 'x': 1}
                                                                                                                                                    1. The function foo has two variables in its local namespace: arg, whose value is passed in to the function, and x, which is defined within the function.
                                                                                                                                                    2. locals returns a dictionary of name/value pairs. The keys of this dictionary are the names of the variables as strings; the values @@ -3101,7 +3031,7 @@ access them directly without referencing the original module they came from. Wit

                                                                                                                                                      Look at the following block of code at the bottom of BaseHTMLProcessor.py:

                                                                                                                                                      
                                                                                                                                                       if __name__ == "__main__":
                                                                                                                                                           for k, v in globals().items():             
                                                                                                                                                      -        print k, "=", v
                                                                                                                                                      + print k, "=", v
                                                                                                                                                      1. Just so you don't get intimidated, remember that you've seen all this before. The globals function returns a dictionary, and you're iterating through the dictionary using the items method and multi-variable assignment. The only thing new here is the globals function.

                                                                                                                                                        Now running the script from the command line gives this output (note that your output may be slightly different, depending @@ -3110,7 +3040,7 @@ SGMLParser = sgmllib.SGMLParser htmlentitydefs = <module 'htmlentitydefs' from 'C:\Python23\lib\htmlentitydefs.py'> BaseHTMLProcessor = __main__.BaseHTMLProcessor __name__ = __main__ -... rest of output omitted for brevity...

                                                                                                                                                        +... rest of output omitted for brevity...
                                                                                                                                                        1. SGMLParser was imported from sgmllib, using from module import. That means that it was imported directly into the module's namespace, and here it is.
                                                                                                                                                        2. Contrast this with htmlentitydefs, which was imported using import. That means that the htmlentitydefs module itself is in the namespace, but the entitydefs variable defined within htmlentitydefs is not. @@ -3134,7 +3064,7 @@ print "z=",z foo(3) globals()["z"] = 8 print "z=",z -
                                                                                                                                                          +
                                                                                                                                                          1. Since foo is called with 3, this will print {'arg': 3, 'x': 1}. This should not be a surprise.
                                                                                                                                                          2. locals is a function that returns a dictionary, and here you are setting a value in that dictionary. You might think that this @@ -3156,7 +3086,7 @@ constantly switching between reading the string and reading the tuple of values. >>> "%(pwd)s is not a good password for %(uid)s" % params 'secret is not a good password for sa' >>> "%(database)s of mind, %(database)s of body" % params -'master of mind, master of body'
                                                                                                                                                            +'master of mind, master of body'
                                                                                                                                                            1. Instead of a tuple of explicit values, this form of string formatting uses a dictionary, params. And instead of a simple %s marker in the string, the marker contains a name in parentheses. This name is used as a key in the params dictionary and subsitutes the corresponding value, secret, in place of the %(pwd)s marker.
                                                                                                                                                            2. Dictionary-based string formatting works with any number of named keys. Each key must exist in the given dictionary, or the @@ -3168,7 +3098,7 @@ meaningful keys and values already. Like Example 8.14. Dictionary-based string formatting in BaseHTMLProcessor.py
                                                                                                                                                              
                                                                                                                                                                   def handle_comment(self, text):        
                                                                                                                                                                       self.pieces.append("<!--%(text)s-->" % locals()) 
                                                                                                                                                              -
                                                                                                                                                              +
                                                                                                                                                              1. Using the built-in locals function is the most common use of dictionary-based string formatting. It means that you can use the names of local variables within your string (in this case, text, which was passed to the class method as an argument) and each named variable will be replaced by its value. If text is 'Begin page footer', the string formatting "<!--%(text)s-->" % locals() will resolve to the string '<!--Begin page footer-->'. @@ -3176,7 +3106,7 @@ meaningful keys and values already. Like ① self.pieces.append("<%(tag)s%(strattrs)s>" % locals()) -
                                                                                                                                                                +
                                                                                                                                                                1. When this method is called, attrs is a list of key/value tuples, just like the items of a dictionary, which means you can use multi-variable assignment to iterate through it. This should be a familiar pattern by now, but there's a lot going on here, so let's break it down:
                                                                                                                                                                  @@ -3232,7 +3162,7 @@ at all. It is this last side effect that you can take advantage of. <li><a href="toc.html">Table of contents</a></li> <li><a href="history.html">Revision history</a></li> </body> -</html>
                                                                                                                                                                  +</html>
                                                                                                                                                                  1. Note that the attribute values of the href attributes in the <a> tags are not properly quoted. (Also note that you're using triple quotes for something other than a docstring. And directly in the IDE, no less. They're very useful.)
                                                                                                                                                                  2. Feed the parser. @@ -3248,7 +3178,7 @@ at all. It is this last side effect that you can take advantage of. def end_pre(self): self.unknown_endtag("pre") - self.verbatim -= 1
                                                                                                                                                                    + self.verbatim -= 1
                                                                                                                                                                    1. start_pre is called every time SGMLParser finds a <pre> tag in the HTML source. (In a minute, you'll see exactly how this happens.) The method takes a single parameter, attrs, which contains the attributes of the tag (if any). attrs is a list of key/value tuples, just like unknown_starttag takes.
                                                                                                                                                                    2. In the reset method, you initialize a data attribute that serves as a counter for <pre> tags. Every time you hit a <pre> tag, you increment the counter; every time you hit a </pre> tag, you'll decrement the counter. (You could just use this as a flag and set it to 1 and reset it to 0, but it's just as easy to do it this way, and this handles the odd (but possible) case of nested <pre> tags.) In a minute, you'll see how this counter is put to good use. @@ -3276,11 +3206,11 @@ at all. It is this last side effect that you can take advantage of. return 1 def handle_starttag(self, tag, method, attrs): - method(attrs)
                                                                                                                                                                      + method(attrs)
                                                                                                                                                                      1. At this point, SGMLParser has already found a start tag and parsed the attribute list. The only thing left to do is figure out whether there is a specific handler method for this tag, or whether you should fall back on the default method (unknown_starttag). -
                                                                                                                                                                      2. The “magic” of SGMLParser is nothing more than your old friend, getattr. What you may not have realized before is that getattr will find methods defined in descendants of an object as well as the object itself. Here the object is self, the current instance. So if tag is 'pre', this call to getattr will look for a start_pre method on the current instance, which is an instance of the Dialectizer class. +
                                                                                                                                                                      3. The “magic” of SGMLParser is nothing more than your old friend, getattr. What you may not have realized before is that getattr will find methods defined in descendants of an object as well as the object itself. Here the object is self, the current instance. So if tag is 'pre', this call to getattr will look for a start_pre method on the current instance, which is an instance of the Dialectizer class.
                                                                                                                                                                      4. getattr raises an AttributeError if the method it's looking for doesn't exist in the object (or any of its descendants), but that's okay, because you wrapped the call to getattr inside a try...except block and explicitly caught the AttributeError.
                                                                                                                                                                      5. Since you didn't find a start_xxx method, you'll also look for a do_xxx method before giving up. This alternate naming scheme is generally used for standalone tags, like <br>, which have no corresponding end tag. But you can use either naming scheme; as you can see, SGMLParser tries both for every tag. (You shouldn't define both a start_xxx and do_xxx handler method for the same tag, though; only the start_xxx method will get called.) @@ -3299,7 +3229,7 @@ at all. It is this last side effect that you can take advantage of. you need to override the handle_data method.

                                                                                                                                                                        Example 8.19. Overriding the handle_data method

                                                                                                                                                                        
                                                                                                                                                                             def handle_data(self, text):     
                                                                                                                                                                        -        self.pieces.append(self.verbatim and text or self.process(text)) 
                                                                                                                                                                        + self.pieces.append(self.verbatim and text or self.process(text))
                                                                                                                                                                        1. handle_data is called with only one argument, the text to process.
                                                                                                                                                                        2. In the ancestor BaseHTMLProcessor, the handle_data method simply appended the text to the output buffer, self.pieces. Here the logic is only slightly more complicated. If you're in the middle of a <pre>...</pre> block, self.verbatim will be some value greater than 0, and you want to put the text in the output buffer unaltered. Otherwise, you will call a separate method to process the @@ -3315,7 +3245,7 @@ def translate(url, dialectName="chef"): sock = urllib.urlopen(url) htmlSource = sock.read() sock.close() -
                                                                                                                                                                          +
                                                                                                                                                                          1. The translate function has an optional argument dialectName, which is a string that specifies the dialect you'll be using. You'll see how this is used in a minute.
                                                                                                                                                                          2. Hey, wait a minute, there's an import statement in this function! That's perfectly legal in Python. You're used to seeing import statements at the top of a program, which means that the imported module is available anywhere in the program. But you can @@ -3328,7 +3258,7 @@ def translate(url, dialectName="chef"): parserName = "%sDialectizer" % dialectName.capitalize() parserClass = globals()[parserName] parser = parserClass() -
                                                                                                                                                                            +
                                                                                                                                                                            1. capitalize is a string method you haven't seen before; it simply capitalizes the first letter of a string and forces everything else to lowercase. Combined with some string formatting, you've taken the name of a dialect and transformed it into the name of the corresponding Dialectizer class. If dialectName is the string 'chef', parserName will be the string 'ChefDialectizer'. @@ -3346,7 +3276,7 @@ appropriately-named file in the plug-ins directory (like foodialect.py① parser.close() return parser.output() -
                                                                                                                                                                              +
                                                                                                                                                                              1. After all that imagining, this is going to seem pretty boring, but the feed function is what does the entire transformation. You had the entire HTML source in a single string, so you only had to call feed once. However, you can call feed as often as you want, and the parser will just keep parsing. So if you were worried about memory usage (or you knew you were going to be dealing with very large HTML pages), you could set this up in a loop, where you read a few bytes of HTML and fed it to the parser. The result would be the same. @@ -3747,7 +3677,7 @@ that the grammar file defines the structure of the output, and the kgp.py< to talk about packages.

                                                                                                                                                                                Example 9.5. Loading an XML document (a sneak peek)

                                                                                                                                                                                 >>> from xml.dom import minidom 
                                                                                                                                                                                ->>> xmldoc = minidom.parse('~/diveintopython3/common/py/kgp/binary.xml')
                                                                                                                                                                                +>>> xmldoc = minidom.parse('~/diveintopython3/common/py/kgp/binary.xml')
                                                                                                                                                                                1. This is a syntax you haven't seen before. It looks almost like the from module import you know and love, but the "." gives it away as something above and beyond a simple import. In fact, xml is what is known as a package, dom is a nested package within xml, and minidom is a module within xml.dom.

                                                                                                                                                                                  That sounds complicated, but it's really not. Looking at the actual implementation may help. Packages are little more than @@ -3781,7 +3711,7 @@ The syntax is all the same; Python figures out what you mean based on the file l <module 'xml.dom' from 'C:\Python21\lib\xml\dom\__init__.pyc'> >>> import xml >>> xml -<module 'xml' from 'C:\Python21\lib\xml\__init__.pyc'>

                                                                                                                                                                                  +<module 'xml' from 'C:\Python21\lib\xml\__init__.pyc'>
                                                                                                                                                                                  1. Here you're importing a module (minidom) from a nested package (xml.dom). The result is that minidom is imported into your namespace, and in order to reference classes within the minidom module (like Element), you need to preface them with the module name.
                                                                                                                                                                                  2. Here you are importing a class (Element) from a module (minidom) from a nested package (xml.dom). The result is that Element is imported directly into your namespace. Note that this does not interfere with the previous import; the Element class can now be referenced in two ways (but it's all still the same class). @@ -3816,7 +3746,7 @@ package architecture. It's one of the many things Python is good at, so take adv <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ <xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p> </ref> -</grammar>
                                                                                                                                                                                    +</grammar>
                                                                                                                                                                                    1. As you saw in the previous section, this imports the minidom module from the xml.dom package.
                                                                                                                                                                                    2. Here is the one line of code that does all the work: minidom.parse takes one argument and returns a parsed representation of the XML document. The argument can be many things; in this case, it's simply a filename of an XML document on my local disk. (To follow along, you'll need to change the path to point to your downloaded examples directory.) @@ -3830,7 +3760,7 @@ package architecture. It's one of the many things Python is good at, so take adv >>> xmldoc.childNodes[0] <DOM Element: grammar at 17538908> >>> xmldoc.firstChild -<DOM Element: grammar at 17538908>
                                                                                                                                                                                      +<DOM Element: grammar at 17538908>
                                                                                                                                                                                      1. Every Node has a childNodes attribute, which is a list of the Node objects. A Document always has only one child node, the root element of the XML document (in this case, the grammar element).
                                                                                                                                                                                      2. To get the first (and in this case, the only) child node, just use regular list syntax. Remember, there is nothing special @@ -3848,7 +3778,7 @@ package architecture. It's one of the many things Python is good at, so take adv <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ <xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p> </ref> -</grammar>
                                                                                                                                                                                        +</grammar>
                                                                                                                                                                                        1. Since the toxml method is defined in the Node class, it is available on any XML node, not just the Document element.

                                                                                                                                                                                          Example 9.11. Child nodes can be text

                                                                                                                                                                                          @@ -3872,7 +3802,7 @@ package architecture. It's one of the many things Python is good at, so take adv
                                                                                                                                                                                           >>> print grammarNode.lastChild.toxml()     
                                                                                                                                                                                           
                                                                                                                                                                                           
                                                                                                                                                                                          -
                                                                                                                                                                                          +
                                                                                                                                                                                          1. Looking at the XML in binary.xml, you might think that the grammar has only two child nodes, the two ref elements. But you're missing something: the carriage returns! After the '<grammar>' and before the first '<ref>' is a carriage return, and this text counts as a child node of the grammar element. Similarly, there is a carriage return after each '</ref>'; these also count as child nodes. So grammar.childNodes is actually a list of 5 objects: 3 Text objects and 2 Element objects.
                                                                                                                                                                                          2. The first child is a Text object representing the carriage return after the '<grammar>' tag and before the first '<ref>' tag. @@ -3897,7 +3827,7 @@ package architecture. It's one of the many things Python is good at, so take adv >>> pNode.firstChild <DOM Text node "0"> >>> pNode.firstChild.data -u'0'
                                                                                                                                                                                            +u'0'
                                                                                                                                                                                            1. As you saw in the previous example, the first ref element is grammarNode.childNodes[1], since childNodes[0] is a Text node for the carriage return.
                                                                                                                                                                                            2. The ref element has its own set of child nodes, one for the carriage return, a separate one for the spaces, one for the p element, and so forth. @@ -3927,7 +3857,7 @@ you can customize. # but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('iso-8859-1') -
                                                                                                                                                                                              +
                                                                                                                                                                                              1. sitecustomize.py is a special script; Python will try to import it on startup, so any code in it will be run automatically. As the comment mentions, it can go anywhere (as long as import can find it), but it usually goes in the site-packages directory within your Python lib directory. @@ -3938,7 +3868,7 @@ sys.setdefaultencoding('iso-8859-1') 'iso-8859-1' >>> s = u'La Pe\xf1a' >>> print s -La Peña
                                                                                                                                                                                                +La Peña
                                                                                                                                                                                                1. This example assumes that you have made the changes listed in the previous example to your sitecustomize.py file, and restarted Python. If your default encoding still says 'ascii', you didn't set up your sitecustomize.py properly, or you didn't restart Python. The default encoding can only be changed during Python startup; you can't change it later. (Due to some wacky programming tricks that I won't get into right now, you can't even call sys.setdefaultencoding after Python has started up. Dig into site.py and search for “setdefaultencoding” to find out how.) @@ -3988,7 +3918,7 @@ La Peña
                                                                                                                                                                                                  <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ <xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p> </ref> -
                                                                                                                                                                                                  +
                                                                                                                                                                                                  1. getElementsByTagName takes one argument, the name of the element you wish to find. It returns a list of Element objects, corresponding to the XML elements that have that name. In this case, you find two ref elements.

                                                                                                                                                                                                    Example 9.22. Every element is searchable

                                                                                                                                                                                                    @@ -4004,7 +3934,7 @@ La Peña
                                                                                                                                                                                                    >>> print plist[0].toxml() <p>0</p> >>> print plist[1].toxml() -<p>1</p>
                                                                                                                                                                                                    +<p>1</p>
                                                                                                                                                                                                    1. Continuing from the previous example, the first object in your reflist is the 'bit' ref element.
                                                                                                                                                                                                    2. You can use the same getElementsByTagName method on this Element to find all the <p> elements within the 'bit' ref element. @@ -4019,7 +3949,7 @@ La Peña
                                                                                                                                                                                                      '<p>1</p>' >>> plist[2].toxml() '<p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ -<xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>'
                                                                                                                                                                                                      +<xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>'
                                                                                                                                                                                                      1. Note carefully the difference between this and the previous example. Previously, you were searching for p elements within firstref, but here you are searching for p elements within xmldoc, the root-level object that represents the entire XML document. This does find the p elements nested within the ref elements within the root grammar element.
                                                                                                                                                                                                      2. The first two p elements are within the first ref (the 'bit' ref). @@ -4047,7 +3977,7 @@ La Peña
                                                                                                                                                                                                        >>> bitref.attributes.values() [<xml.dom.minidom.Attr instance at 0x81d5044>] >>> bitref.attributes["id"] -<xml.dom.minidom.Attr instance at 0x81d5044>
                                                                                                                                                                                                        +<xml.dom.minidom.Attr instance at 0x81d5044>
                                                                                                                                                                                                        1. Each Element object has an attribute called attributes, which is a NamedNodeMap object. This sounds scary, but it's not, because a NamedNodeMap is an object that acts like a dictionary, so you already know how to use it.
                                                                                                                                                                                                        2. Treating the NamedNodeMap as a dictionary, you can get a list of the names of the attributes of this element by using attributes.keys(). This element has only one attribute, 'id'. @@ -4062,7 +3992,7 @@ La Peña
                                                                                                                                                                                                          >>> a.name u'id' >>> a.value -u'bit'
                                                                                                                                                                                                          +u'bit'
                                                                                                                                                                                                          1. The Attr object completely represents a single XML attribute of a single XML element. The name of the attribute (the same name as you used to find this object in the bitref.attributes NamedNodeMap pseudo-dictionary) is stored in a.name.
                                                                                                                                                                                                          2. The actual text value of this XML attribute is stored in a.value. @@ -4115,7 +4045,7 @@ calls the object's read method, the function can handle any kind of <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ <xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p> </ref> -</grammar>
                                                                                                                                                                                                            +</grammar>
                                                                                                                                                                                                            1. First, you open the file on disk. This gives you a file object.
                                                                                                                                                                                                            2. You pass the file object to minidom.parse, which calls the read method of fsock and reads the XML document from the file on disk. @@ -4150,7 +4080,7 @@ just going to be parsing a local file, you can pass the filename and minid <link>http://slashdot.org/article.pl?sid=01/12/28/0421241</link> </item> -[...snip...]
                                                                                                                                                                                                              +[...snip...]
                                                                                                                                                                                                              1. As you saw in a previous chapter, urlopen takes a web page URL and returns a file-like object. Most importantly, this object has a read method which returns the HTML source of the web page.
                                                                                                                                                                                                              2. Now you pass the file-like object to minidom.parse, which obediently calls the read method of the object and parses the XML data that the read method returns. The fact that this XML data is now coming straight from a web page is completely irrelevant. minidom.parse doesn't know about web pages, and it doesn't care about web pages; it just knows about file-like objects. @@ -4161,7 +4091,7 @@ just going to be parsing a local file, you can pass the filename and minid >>> xmldoc = minidom.parseString(contents) >>> print xmldoc.toxml() <?xml version="1.0" ?> -<grammar><ref id="bit"><p>0</p><p>1</p></ref></grammar>
                                                                                                                                                                                                                +<grammar><ref id="bit"><p>0</p><p>1</p></ref></grammar>
                                                                                                                                                                                                                1. minidom has a method, parseString, which takes an entire XML document as a string and parses it. You can use this instead of minidom.parse if you know you already have your entire XML document in a string.

                                                                                                                                                                                                                  OK, so you can use the minidom.parse function for parsing both local files and remote URLs, but for parsing strings, you use... a different function. That means that if you want to be able to take input from a @@ -4182,7 +4112,7 @@ file, a URL, or a string, you'll need special logic to check whethe "d='bit'><p>0</p" >>> ssock.read() '><p>1</p></ref></grammar>' ->>> ssock.close()

                                                                                                                                                                                                                  +>>> ssock.close()
                                                                                                                                                                                                                  1. The StringIO module contains a single class, also called StringIO, which allows you to turn a string into a file-like object. The StringIO class takes the string as a parameter when creating an instance.
                                                                                                                                                                                                                  2. Now you have a file-like object, and you can do all sorts of file-like things with it. Like read, which returns the original string. @@ -4199,7 +4129,7 @@ file, a URL, or a string, you'll need special logic to check whethe >>> ssock.close() >>> print xmldoc.toxml() <?xml version="1.0" ?> -<grammar><ref id="bit"><p>0</p><p>1</p></ref></grammar>
                                                                                                                                                                                                                    +<grammar><ref id="bit"><p>0</p><p>1</p></ref></grammar>
                                                                                                                                                                                                                    1. Now you can pass the file-like object (really a StringIO) to minidom.parse, which will call the object's read method and happily parse away, never knowing that its input came from a hard-coded string.

                                                                                                                                                                                                                      So now you know how to use a single function, minidom.parse, to parse an XML document stored on a web page, in a local file, or in a hard-coded string. For a web page, you use urlopen to get a file-like object; for a local file, you use open; and for a string, you use StringIO. Now let's take it one step further and generalize these differences as well. @@ -4220,7 +4150,7 @@ def openAnything(source): # treat source as string import StringIO - return StringIO.StringIO(str(source))

                                                                                                                                                                                                                      + return StringIO.StringIO(str(source))
                                                                                                                                                                                                                      1. The openAnything function takes a single parameter, source, and returns a file-like object. source is a string of some sort; it can either be a URL (like 'http://slashdot.org/slashdot.rdf'), a full or partial pathname to a local file (like 'binary.xml'), or a string that contains actual XML data to be parsed.
                                                                                                                                                                                                                      2. First, you see if source is a URL. You do this through brute force: you try to open it as a URL and silently ignore errors caused by trying to open something which is not a URL. This is actually elegant in the sense that, if urllib ever supports new types of URLs in the future, you will also support them without recoding. If urllib is able to open source, then the return kicks you out of the function immediately and the following try statements never execute. @@ -4252,7 +4182,7 @@ Dive in Dive inDive inDive in >>> for i in range(3): ... sys.stderr.write('Dive in') -Dive inDive inDive in
                                                                                                                                                                                                                        +Dive inDive inDive in
                                                                                                                                                                                                                        1. As you saw in Example 6.9, “Simple Counters”, you can use Python's built-in range function to build simple counter loops that repeat something a set number of times.
                                                                                                                                                                                                                        2. stdout is a file-like object; calling its write function will print out whatever string you give it. In fact, this is what the print function really does; it adds a carriage return to the end of the string you're printing, and calls sys.stdout.write. @@ -4275,7 +4205,7 @@ sys.stdout = fsock print 'This message will be logged instead of displayed' sys.stdout = saveout fsock.close() -
                                                                                                                                                                                                                          +
                                                                                                                                                                                                                          1. This will print to the IDE “Interactive Window” (or the terminal, if running the script from the command line).
                                                                                                                                                                                                                          2. Always save stdout before redirecting it, so you can set it back to normal later. @@ -4299,7 +4229,7 @@ import sys fsock = open('error.log', 'w') sys.stderr = fsock raise Exception, 'this error will be logged' -
                                                                                                                                                                                                                            +
                                                                                                                                                                                                                            1. Open the log file where you want to store debugging information.
                                                                                                                                                                                                                            2. Redirect standard error by assigning the file object of the newly-opened log file to stderr. @@ -4313,7 +4243,7 @@ entering function >>> import sys >>> print >> sys.stderr, 'entering function' entering function -
                                                                                                                                                                                                                              +
                                                                                                                                                                                                                              1. This shorthand syntax of the print statement can be used to write to any open file, or file-like object. In this case, you can redirect a single print statement to stderr without affecting subsequent print statements.

                                                                                                                                                                                                                                Standard input, on the other hand, is a read-only file object, and it represents the data flowing into the program from some @@ -4338,7 +4268,7 @@ one program's output to the next program's input. </ref> </grammar> [you@localhost kgp]$ cat binary.xml | python kgp.py -g - -10110001

                                                                                                                                                                                                                                +10110001
                                                                                                                                                                                                                                1. As you saw in Section 9.1, “Diving in”, this will print a string of eight random bits, 0 or 1.
                                                                                                                                                                                                                                2. This simply prints out the entire contents of binary.xml. (Windows users should use type instead of cat.) @@ -4361,7 +4291,7 @@ def openAnything(source): import urllib try: -[... snip ...]
                                                                                                                                                                                                                                  +[... snip ...]
                                                                                                                                                                                                                                  1. This is the openAnything function from toolbox.py, which you previously examined in Section 10.1, “Abstracting input sources”. All you've done is add three lines of code at the beginning of the function to check if the source is “-”; if so, you return sys.stdin. Really, that's it! Remember, stdin is a file-like object with a read method, so the rest of the code (in kgp.py, where you call openAnything) doesn't change a bit.

                                                                                                                                                                                                                                    10.3. Caching node lookups

                                                                                                                                                                                                                                    @@ -4375,7 +4305,7 @@ def openAnything(source): self.grammar = self._load(grammar) self.refs = {} for ref in self.grammar.getElementsByTagName("ref"): - self.refs[ref.attributes["id"].value] = ref
                                                                                                                                                                                                                                    + self.refs[ref.attributes["id"].value] = ref
                                                                                                                                                                                                                                    1. Start by creating an empty dictionary, self.refs.
                                                                                                                                                                                                                                    2. As you saw in Section 9.5, “Searching for elements”, getElementsByTagName returns a list of all the elements of a particular name. You easily can get a list of all the ref elements, then simply loop through that list. @@ -4394,7 +4324,7 @@ def openAnything(source): choices = [e for e in node.childNodes if e.nodeType == e.ELEMENT_NODE] chosen = random.choice(choices) - return chosen
                                                                                                                                                                                                                                      + return chosen
                                                                                                                                                                                                                                      1. As you saw in Example 9.9, “Getting child nodes”, the childNodes attribute returns a list of all the child nodes of an element.
                                                                                                                                                                                                                                      2. However, as you saw in Example 9.11, “Child nodes can be text”, the list returned by childNodes contains all different types of nodes, including text nodes. That's not what you're looking for here. You only want the @@ -4412,7 +4342,7 @@ def openAnything(source): >>> xmldoc.__class__ <class xml.dom.minidom.Document at 0x01105D40> >>> xmldoc.__class__.__name__ -'Document'
                                                                                                                                                                                                                                        +'Document'
                                                                                                                                                                                                                                        1. Assume for a moment that kant.xml is in the current directory.
                                                                                                                                                                                                                                        2. As you saw in Section 9.2, “Packages”, the object returned by parsing an XML document is a Document object, as defined in the minidom.py in the xml.dom package. As you saw in Section 5.4, “Instantiating Classes”, __class__ is built-in attribute of every Python object. @@ -4422,7 +4352,7 @@ def openAnything(source):

                                                                                                                                                                                                                                          Example 10.18. parse, a generic XML node dispatcher

                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                               def parse(self, node):          
                                                                                                                                                                                                                                                   parseMethod = getattr(self, "parse_%s" % node.__class__.__name__)  
                                                                                                                                                                                                                                          -        parseMethod(node) 
                                                                                                                                                                                                                                          + parseMethod(node)
                                                                                                                                                                                                                                          1. First off, notice that you're constructing a larger string based on the class name of the node you were passed (in the node argument). So if you're passed a Document node, you're constructing the string 'parse_Document', and so forth.
                                                                                                                                                                                                                                          2. Now you can treat that string as a function name, and get a reference to the function itself using getattr
                                                                                                                                                                                                                                          3. Finally, you can call that function and pass the node itself as an argument. The next example shows the definitions of each @@ -4445,7 +4375,7 @@ def openAnything(source): def parse_Element(self, node): handlerMethod = getattr(self, "do_%s" % node.tagName) - handlerMethod(node)
                                                                                                                                                                                                                                            + handlerMethod(node)
                                                                                                                                                                                                                                            1. parse_Document is only ever called once, since there is only one Document node in an XML document, and only one Document object in the parsed XML representation. It simply turns around and parses the root element of the grammar file.
                                                                                                                                                                                                                                            2. parse_Text is called on nodes that represent bits of text. The function itself does some special processing to handle automatic capitalization @@ -4471,7 +4401,7 @@ Python program, so let's write a simple program to see them. import sys for arg in sys.argv: - print arg
                                                                                                                                                                                                                                              + print arg
                                                                                                                                                                                                                                              1. Each command-line argument passed to the program will be in sys.argv, which is just a list. Here you are printing each argument on a separate line.

                                                                                                                                                                                                                                                Example 10.21. The contents of sys.argv

                                                                                                                                                                                                                                                @@ -4487,7 +4417,7 @@ def
                                                                                                                                                                                                                                                 [you@localhost py]$ python argecho.py -m kant.xml 
                                                                                                                                                                                                                                                 argecho.py
                                                                                                                                                                                                                                                 -m
                                                                                                                                                                                                                                                -kant.xml
                                                                                                                                                                                                                                                +kant.xml
                                                                                                                                                                                                                                                1. The first thing to know about sys.argv is that it contains the name of the script you're calling. You will actually use this knowledge to your advantage later, in Chapter 16, Functional Programming. Don't worry about it for now. @@ -4510,7 +4440,7 @@ def main(argv): ... if __name__ == "__main__": - main(sys.argv[1:])
                                                                                                                                                                                                                                                  + main(sys.argv[1:])
                                                                                                                                                                                                                                                  1. First off, look at the bottom of the example and notice that you're calling the main function with sys.argv[1:]. Remember, sys.argv[0] is the name of the script that you're running; you don't care about that for command-line processing, so you chop it off and pass the rest of the list. @@ -4577,7 +4507,7 @@ def main(argv): source = "".join(args) k = KantGenerator(grammar, source) - print k.output()
                                                                                                                                                                                                                                                    + print k.output()
                                                                                                                                                                                                                                                    1. The grammar variable will keep track of the grammar file you're using. You initialize it here in case it's not specified on the command line (using either the -g or the --grammar flag). @@ -4809,7 +4739,7 @@ def fetch(source, etag=None, last_modified=None, agent=USER_AGENT): <title mode="escaped">dive into mark</title> <link rel="alternate" type="text/html" href="http://diveintomark.org/"/> <-- rest of feed omitted for brevity --> -
                                                                                                                                                                                                                                                      +
                                                                                                                                                                                                                                                      1. Downloading anything over HTTP is incredibly easy in Python; in fact, it's a one-liner. The urllib module has a handy urlopen function that takes the address of the page you want, and returns a file-like object that you can just read() from to get the full contents of the page. It just can't get much easier.

                                                                                                                                                                                                                                                        So what's wrong with this? Well, for a quick one-off during testing or development, there's nothing wrong with it. I do @@ -4891,7 +4821,7 @@ header: ETag: "e8284-68e0-4de30f80" header: Accept-Ranges: bytes header: Content-Length: 26848 header: Connection: close -

                                                                                                                                                                                                                                                        +
                                                                                                                                                                                                                                                        1. urllib relies on another standard Python library, httplib. Normally you don't need to import httplib directly (urllib does that automatically), but you will here so you can set the debugging flag on the HTTPConnection class that urllib uses internally to connect to the HTTP server. This is an incredibly useful technique. Some other Python libraries have similar debug flags, but there's no particular standard for naming them or turning them on; you need to read the documentation of each library to see if such a feature is available. @@ -4933,7 +4863,7 @@ header: ETag: "e8284-68e0-4de30f80" header: Accept-Ranges: bytes header: Content-Length: 26848 header: Connection: close -
                                                                                                                                                                                                                                                          +
                                                                                                                                                                                                                                                          1. If you still have your Python IDE open from the previous section's example, you can skip this, but this turns on HTTP debugging so you can see what you're actually sending over the wire, and what gets sent back.
                                                                                                                                                                                                                                                          2. Fetching an HTTP resource with urllib2 is a three-step process, for good reasons that will become clear shortly. The first step is to create a Request object, which takes the URL of the resource you'll eventually get around to retrieving. Note that this step doesn't actually @@ -4966,7 +4896,7 @@ header: ETag: "e8284-68e0-4de30f80" header: Accept-Ranges: bytes header: Content-Length: 26848 header: Connection: close -
                                                                                                                                                                                                                                                            +
                                                                                                                                                                                                                                                            1. You're continuing from the previous example; you've already created a Request object with the URL you want to access.
                                                                                                                                                                                                                                                            2. Using the add_header method on the Request object, you can add arbitrary HTTP headers to the request. The first argument is the header, the second is the value you're @@ -5014,7 +4944,7 @@ turn it off by setting httplib.HTTPConnection.debuglevel = 0. Or yo File "c:\python23\lib\urllib2.py", line 412, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 304: Not Modified -
                                                                                                                                                                                                                                                              +
                                                                                                                                                                                                                                                              1. Remember all those HTTP headers you saw printed out when you turned on debugging? This is how you can get access to them programmatically: firstdatastream.headers is an object that acts like a dictionary and allows you to get any of the individual headers returned from the HTTP server. @@ -5033,7 +4963,7 @@ class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler): ①③ return result -
                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                1. urllib2 is designed around URL handlers. Each handler is just a class that can define any number of methods. When something happens -- like an HTTP error, or even a 304 code -- urllib2 introspects into the list of defined handlers for a method that can handle it. You used a similar introspection in Chapter 9, XML Processing to define handlers for different node types, but urllib2 is more flexible, and introspects over as many handlers as are defined for the current request. @@ -5051,7 +4981,7 @@ class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler): ①>>> seconddatastream.read() '' -
                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                  1. You're continuing the previous example, so the Request object is already set up, and you've already added the If-Modified-Since header.
                                                                                                                                                                                                                                                                  2. This is the key: now that you've defined your custom URL handler, you need to tell urllib2 to use it. Remember how I said that urllib2 broke up the process of accessing an HTTP resource into three steps, and for good reason? This is why building the URL opener @@ -5085,7 +5015,7 @@ class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler): ①>>> seconddatastream.read() '' -
                                                                                                                                                                                                                                                                    +
                                                                                                                                                                                                                                                                    1. Using the firstdatastream.headers pseudo-dictionary, you can get the ETag returned from the server. (What happens if the server didn't send back an ETag? Then this line would return None.)
                                                                                                                                                                                                                                                                    2. OK, you got the data. @@ -5149,7 +5079,7 @@ header: Content-Type: application/atom+xml Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: addinfourl instance has no attribute 'status' -
                                                                                                                                                                                                                                                                      +
                                                                                                                                                                                                                                                                      1. You'll be better able to see what's happening if you turn on debugging.
                                                                                                                                                                                                                                                                      2. This is a URL which I have set up to permanently redirect to my Atom feed at http://diveintomark.org/xml/atom.xml. @@ -5177,7 +5107,7 @@ class SmartRedirectHandler(urllib2.HTTPRedirectHandler):
                                                                                                                                                                                                                                                                        +
                                                                                                                                                                                                                                                                        1. Redirect behavior is defined in urllib2 in a class called HTTPRedirectHandler. You don't want to completely override the behavior, you just want to extend it a little, so you'll subclass HTTPRedirectHandler so you can call the ancestor class to do all the hard work.
                                                                                                                                                                                                                                                                        2. When it encounters a 301 status code from the server, urllib2 will search through its handlers and call the http_error_301 method. The first thing ours does is just call the http_error_301 method in the ancestor, which handles the grunt work of looking for the Location: header and following the redirect to the new address. @@ -5224,7 +5154,7 @@ header: Content-Type: application/atom+xml 301 >>> f.url 'http://diveintomark.org/xml/atom.xml' -
                                                                                                                                                                                                                                                                          +
                                                                                                                                                                                                                                                                          1. First, build a URL opener with the redirect handler you just defined.
                                                                                                                                                                                                                                                                          2. You sent off a request, and you got a 301 status code in response. At this point, the http_error_301 method gets called. You call the ancestor method, which follows the redirect and sends a request at the new location (http://diveintomark.org/xml/atom.xml). @@ -5269,7 +5199,7 @@ header: Content-Type: application/atom+xml 302 >>> f.url http://diveintomark.org/xml/atom.xml -
                                                                                                                                                                                                                                                                            +
                                                                                                                                                                                                                                                                            1. This is a sample URL I've set up that is configured to tell clients to temporarily redirect to http://diveintomark.org/xml/atom.xml.
                                                                                                                                                                                                                                                                            2. The server sends back a 302 status code, indicating a temporary redirect. The temporary new location of the data is given in the Location: header. @@ -5308,7 +5238,7 @@ header: Content-Encoding: gzip header: Content-Length: 6289 header: Connection: close header: Content-Type: application/atom+xml -
                                                                                                                                                                                                                                                                              +
                                                                                                                                                                                                                                                                              1. This is the key: once you've created your Request object, add an Accept-encoding header to tell the server you can accept gzip-encoded data. gzip is the name of the compression algorithm you're using. In theory there could be other compression algorithms, but gzip is the compression algorithm used by 99% of web servers.
                                                                                                                                                                                                                                                                              2. There's your header going across the wire. @@ -5335,7 +5265,7 @@ header: Content-Type: application/atom+xml <-- rest of feed omitted for brevity --> >>> len(data) 15955 -
                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                1. Continuing from the previous example, f is the file-like object returned from the URL opener. Using its read() method would ordinarily get you the uncompressed data, but since this data has been gzip-compressed, this is just the first step towards getting the data you really want. @@ -5357,7 +5287,7 @@ header: Content-Type: application/atom+xml File "c:\python23\lib\gzip.py", line 252, in _read pos = self.fileobj.tell() # Save current position AttributeError: addinfourl instance has no attribute 'tell' -
                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                  1. Continuing from the previous example, you already have a Request object set up with an Accept-encoding: gzip header.
                                                                                                                                                                                                                                                                                  2. Simply opening the request will get you the headers (though not download any data yet). As you can see from the returned @@ -5384,7 +5314,7 @@ def openAnything(source, etag=None, lastmodified=None, agent=USER_AGENT): request.add_header('Accept-encoding', 'gzip') opener = urllib2.build_opener(SmartRedirectHandler(), DefaultErrorHandler()) return opener.open(request) -
                                                                                                                                                                                                                                                                                    +
                                                                                                                                                                                                                                                                                    1. urlparse is a handy utility module for, you guessed it, parsing URLs. It's primary function, also called urlparse, takes a URL and splits it into a tuple of (scheme, domain, path, params, query string parameters, and fragment identifier). Of these, the only thing you care about is the scheme, to make sure that you're dealing with an HTTP URL (which urllib2 can handle). @@ -5417,7 +5347,7 @@ def fetch(source, etag=None, last_modified=None, agent=USER_AGENT): result['status'] = f.status f.close() return result -
                                                                                                                                                                                                                                                                                      +
                                                                                                                                                                                                                                                                                      1. First, you call the openAnything function with a URL, ETag hash, Last-Modified date, and User-Agent.
                                                                                                                                                                                                                                                                                      2. Read the actual data returned from the server. This may be compressed; if so, you'll decompress it later. @@ -5449,7 +5379,7 @@ def fetch(source, etag=None, last_modified=None, agent=USER_AGENT): 'etag': '"e842a-3e53-55d97640"', 'status': 304, 'data': ''} -
                                                                                                                                                                                                                                                                                        +
                                                                                                                                                                                                                                                                                        1. The very first time you fetch a resource, you don't have an ETag hash or Last-Modified date, so you'll leave those out. (They're optional parameters.)
                                                                                                                                                                                                                                                                                        2. What you get back is a dictionary of several useful headers, the HTTP status code, and the actual data returned from the server. @@ -5548,7 +5478,7 @@ class ToRomanBadInput(unittest.TestCase): def testNonInteger(self): """to_roman should fail with non-integer input""" - self.assertRaises(roman.NotIntegerError, roman.to_roman, 0.5)
                                                                                                                                                                                                                                                                                          + self.assertRaises(roman.NotIntegerError, roman.to_roman, 0.5)
                                                                                                                                                                                                                                                                                          1. The TestCase class of the unittest provides the assertRaises method, which takes the following arguments: the exception you're expecting, the function you're testing, and the arguments you're passing that function. (If the function you're testing takes more than one argument, pass them all to assertRaises, in order, and it will pass them right along to the function you're testing.) Pay close attention to what you're doing here: @@ -5583,7 +5513,7 @@ class FromRomanBadInput(unittest.TestCase): """from_roman should fail with malformed antecedents""" for s in ('IIMXCC', 'VX', 'DCM', 'CMM', 'IXIV', 'MCMC', 'XCX', 'IVI', 'LM', 'LD', 'LC'): - self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, s)
                                                                                                                                                                                                                                                                                            + self.assertRaises(roman.InvalidRomanNumeralError, roman.from_roman, s)
                                                                                                                                                                                                                                                                                            1. Not much new to say about these; the pattern is exactly the same as the one you used to test bad input to to_roman(). I will briefly note that you have another exception: roman.InvalidRomanNumeralError. That makes a total of three custom exceptions that will need to be defined in roman.py (along with roman.OutOfRangeError and roman.NotIntegerError). You'll see how to define these custom exceptions when you actually start writing roman.py, later in this chapter.

                                                                                                                                                                                                                                                                                              13.6. Testing for sanity

                                                                                                                                                                                                                                                                                              @@ -5604,7 +5534,7 @@ class SanityCheck(unittest.TestCase): for integer in range(1, 4000): numeral = roman.to_roman(integer) result = roman.from_roman(numeral) - self.assertEqual(integer, result)
                                                                                                                                                                                                                                                                                              + self.assertEqual(integer, result)
                                                                                                                                                                                                                                                                                              1. You've seen the range function before, but here it is called with two arguments, which returns a list of integers starting at the first argument (1) and counting consecutively up to but not including the second argument (4000). Thus, 1..3999, which is the valid range for converting to Roman numerals.
                                                                                                                                                                                                                                                                                              2. I just wanted to mention in passing that integer is not a keyword in Python; here it's just a variable name like any other. @@ -5633,7 +5563,7 @@ class CaseCheck(unittest.TestCase): numeral = roman.to_roman(integer) roman.from_roman(numeral.upper()) self.assertRaises(roman.InvalidRomanNumeralError, - roman.from_roman, numeral.lower())
                                                                                                                                                                                                                                                                                                + roman.from_roman, numeral.lower())
                                                                                                                                                                                                                                                                                                1. The most interesting thing about this test case is all the things it doesn't test. It doesn't test that the value returned from to_roman() is right or even consistent; those questions are answered by separate test cases. You have a whole test case just to test for uppercase-ness. You might @@ -5677,7 +5607,7 @@ def to_roman(n): def from_roman(s): """convert Roman numeral to integer""" pass -
                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                  1. This is how you define your own custom exceptions in Python. Exceptions are classes, and you create your own by subclassing existing exceptions. It is strongly recommended (but not required) that you subclass Exception, which is the base class that all built-in exceptions inherit from. Here I am defining RomanError (inherited from Exception) to act as the base class for all my other custom exceptions to follow. This is a matter of style; I could just as easily @@ -5810,7 +5740,7 @@ AssertionError: OutOfRangeError ---------------------------------------------------------------------- Ran 12 tests in 0.040s -FAILED (failures=10, errors=2)
                                                                                                                                                                                                                                                                                                    +FAILED (failures=10, errors=2)

                                                                                                                                                                                                                                                                                                    14.2. roman.py, stage 2

                                                                                                                                                                                                                                                                                                    Now that you have the framework of the roman module laid out, it's time to start writing code and passing test cases.

                                                                                                                                                                                                                                                                                                    Example 14.3. roman2.py

                                                                                                                                                                                                                                                                                                    @@ -5852,7 +5782,7 @@ def to_roman(n): def from_roman(s): """convert Roman numeral to integer""" pass -
                                                                                                                                                                                                                                                                                                    +
                                                                                                                                                                                                                                                                                                    1. romanNumeralMap is a tuple of tuples which defines three things:
                                                                                                                                                                                                                                                                                                      @@ -5896,7 +5826,7 @@ from_roman(to_roman(n))==n for all n ... FAIL to_roman should fail with non-integer input ... FAIL to_roman should fail with negative input ... FAIL to_roman should fail with large input ... FAIL -to_roman should fail with 0 input ... FAIL
                                                                                                                                                                                                                                                                                                      +to_roman should fail with 0 input ... FAIL
                                                                                                                                                                                                                                                                                                      1. to_roman() does, in fact, always return uppercase, because romanNumeralMap defines the Roman numeral representations as uppercase. So this test passes already.
                                                                                                                                                                                                                                                                                                      2. Here's the big news: this version of the to_roman() function passes the known values test. Remember, it's not comprehensive, but it does put the function through its paces with a variety of good inputs, including @@ -6044,7 +5974,7 @@ def to_roman(n): def from_roman(s): """convert Roman numeral to integer""" pass -
                                                                                                                                                                                                                                                                                                        +
                                                                                                                                                                                                                                                                                                        1. This is a nice Pythonic shortcut: multiple comparisons at once. This is equivalent to if not ((0 < n) and (n < 4000)), but it's much easier to read. This is the range check, and it should catch inputs that are too large, negative, or zero.
                                                                                                                                                                                                                                                                                                        2. You raise exceptions yourself with the raise statement. You can raise any of the built-in exceptions, or you can raise any of your custom exceptions that you've defined. @@ -6077,7 +6007,7 @@ from_roman(to_roman(n))==n for all n ... FAIL to_roman should fail with non-integer input ... ok to_roman should fail with negative input ... ok to_roman should fail with large input ... ok -to_roman should fail with 0 input ... ok
                                                                                                                                                                                                                                                                                                          +to_roman should fail with 0 input ... ok
                                                                                                                                                                                                                                                                                                          1. to_roman() still passes the known values test, which is comforting. All the tests that passed in stage 2 still pass, so the latest code hasn't broken anything.
                                                                                                                                                                                                                                                                                                          2. More exciting is the fact that all of the bad input tests now pass. This test, testNonInteger, passes because of the int(n) <> n check. When a non-integer is passed to to_roman(), the int(n) <> n check notices it and raises the NotIntegerError exception, which is what testNonInteger is looking for. @@ -6140,7 +6070,7 @@ AssertionError: 1 != None ---------------------------------------------------------------------- Ran 12 tests in 0.401s -FAILED (failures=6)
                                                                                                                                                                                                                                                                                                            +FAILED (failures=6)
                                                                                                                                                                                                                                                                                                            1. You're down to 6 failures, and all of them involve from_roman(): the known values test, the three separate bad input tests, the case check, and the sanity check. That means that to_roman() has passed all the tests it can pass by itself. (It's involved in the sanity check, but that also requires that from_roman() be written, which it isn't yet.) Which means that you must stop coding to_roman() now. No tweaking, no twiddling, no extra checks “just in case”. Stop. Now. Back away from the keyboard. @@ -6188,7 +6118,7 @@ def from_roman(s): result += integer index += len(numeral) return result -
                                                                                                                                                                                                                                                                                                              +
                                                                                                                                                                                                                                                                                                              1. The pattern here is the same as to_roman(). You iterate through your Roman numeral data structure (a tuple of tuples), and instead of matching the highest integer values as often as possible, you match the “highest” Roman numeral character strings as often as possible. @@ -6218,7 +6148,7 @@ from_roman(to_roman(n))==n for all n ... ok to_roman should fail with non-integer input ... ok to_roman should fail with negative input ... ok to_roman should fail with large input ... ok -to_roman should fail with 0 input ... ok
                                                                                                                                                                                                                                                                                                                +to_roman should fail with 0 input ... ok
                                                                                                                                                                                                                                                                                                                1. Two pieces of exciting news here. The first is that from_roman() works for good input, at least for all the known values you test.
                                                                                                                                                                                                                                                                                                                2. The second is that the sanity check also passed. Combined with the known values tests, you can be reasonably sure that both to_roman() and from_roman() work properly for all possible good values. (This is not guaranteed; it is theoretically possible that to_roman() has a bug that produces the wrong Roman numeral for some particular set of inputs, and that from_roman() has a reciprocal bug that produces the same wrong integer values for exactly that set of Roman numerals that to_roman() generated incorrectly. Depending on your application and your requirements, this possibility may bother you; if so, write @@ -6339,7 +6269,7 @@ def from_roman(s): result += integer index += len(numeral) return result -
                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                  1. This is just a continuation of the pattern you discussed in Section 7.3, “Case Study: Roman Numerals”. The tens places is either XC (90), XL (40), or an optional L followed by 0 to 3 optional X characters. The ones place is either IX (9), IV (4), or an optional V followed by 0 to 3 optional I characters.
                                                                                                                                                                                                                                                                                                                  2. Having encoded all that logic into a regular expression, the code to check for invalid Roman numerals becomes trivial. If @@ -6363,7 +6293,7 @@ to_roman should fail with 0 input ... ok ---------------------------------------------------------------------- Ran 12 tests in 2.864s -OK
                                                                                                                                                                                                                                                                                                                    +OK
                                                                                                                                                                                                                                                                                                                    1. One thing I didn't mention about regular expressions is that, by default, they are case-sensitive. Since the regular expression romanNumeralPattern was expressed in uppercase characters, the re.search check will reject any input that isn't completely uppercase. So the uppercase input test passes. @@ -6448,7 +6378,7 @@ kgp g ref test ... ok ---------------------------------------------------------------------- Ran 29 tests in 2.799s -OK
                                                                                                                                                                                                                                                                                                                      +OK
                                                                                                                                                                                                                                                                                                                      1. The first 5 tests are from apihelpertest.py, which tests the example script from Chapter 4, The Power Of Introspection.
                                                                                                                                                                                                                                                                                                                      2. The next 5 tests are from odbchelpertest.py, which tests the example script from Chapter 2, Your First Python Program. @@ -6466,7 +6396,7 @@ import sys, os print 'sys.argv[0] =', sys.argv[0] pathname = os.path.dirname(sys.argv[0]) print 'path =', pathname -print 'full path =', os.path.abspath(pathname)
                                                                                                                                                                                                                                                                                                                        +print 'full path =', os.path.abspath(pathname)
                                                                                                                                                                                                                                                                                                                        1. Regardless of how you run a script, sys.argv[0] will always contain the name of the script, exactly as it appears on the command line. This may or may not include any path information, as you'll see shortly. @@ -6485,7 +6415,7 @@ print 'full path =', os.path.abspath(pathname)
                                                                                                                                                                                                                                                                                                                          >>> os.path.abspath('/home/you/.ssh') /home/you/.ssh >>> os.path.abspath('.ssh/../foo/') -/home/you/foo
                                                                                                                                                                                                                                                                                                                          +/home/you/foo
                                                                                                                                                                                                                                                                                                                          1. os.getcwd() returns the current working directory.
                                                                                                                                                                                                                                                                                                                          2. Calling os.path.abspath with an empty string returns the current working directory, same as os.getcwd(). @@ -6512,7 +6442,7 @@ full path = /home/you/diveintopython3/common/py [you@localhost py]$ python fullpath.py sys.argv[0] = fullpath.py path = -full path = /home/you/diveintopython3/common/py
                                                                                                                                                                                                                                                                                                                            +full path = /home/you/diveintopython3/common/py
                                                                                                                                                                                                                                                                                                                            1. In the first case, sys.argv[0] includes the full path of the script. You can then use the os.path.dirname function to strip off the script name and return the full directory name, and os.path.abspath simply returns what you give it.
                                                                                                                                                                                                                                                                                                                            2. If the script is run by using a partial pathname, sys.argv[0] will still contain exactly what appears on the command line. os.path.dirname will then give you a partial pathname (relative to the current directory), and os.path.abspath will construct a full pathname from the partial pathname. @@ -6529,7 +6459,7 @@ def regressionTest(): path = os.getcwd() sys.path.append(path) files = os.listdir(path) -
                                                                                                                                                                                                                                                                                                                              +
                                                                                                                                                                                                                                                                                                                              1. Instead of setting path to the directory where the currently running script is located, you set it to the current working directory instead. This will be whatever directory you were in before you ran the script, which is not necessarily the same as the directory the script @@ -6557,7 +6487,7 @@ def regressionTest(): ... filteredList.append(n) ... >>> filteredList -[1, 3, 5, 9, -3]
                                                                                                                                                                                                                                                                                                                                +[1, 3, 5, 9, -3]
                                                                                                                                                                                                                                                                                                                                1. odd uses the built-in mod function “%” to return True if n is odd and False if n is even.
                                                                                                                                                                                                                                                                                                                                2. filter takes two arguments, a function (odd) and a list (li). It loops through the list and calls odd with each element. If odd returns a true value (remember, any non-zero value is true in Python), then the element is included in the returned list, otherwise it is filtered out. The result is a list of only the odd @@ -6568,7 +6498,7 @@ def regressionTest():

                                                                                                                                                                                                                                                                                                                                  Example 16.8. filter in regression.py

                                                                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                                       files = os.listdir(path)              
                                                                                                                                                                                                                                                                                                                                       test = re.compile("test\.py$", re.IGNORECASE)           
                                                                                                                                                                                                                                                                                                                                  -    files = filter(test.search, files)    
                                                                                                                                                                                                                                                                                                                                  + files = filter(test.search, files)
                                                                                                                                                                                                                                                                                                                                  1. As you saw in Section 16.2, “Finding the path”, path may contain the full or partial pathname of the directory of the currently running script, or it may contain an empty string if the script is being run from the current directory. Either way, files will end up with the names of the files in the same directory as this script you're running. @@ -6581,7 +6511,7 @@ There is discussion that map and filter might be depre

                                                                                                                                                                                                                                                                                                                                    Example 16.9. Filtering using list comprehensions instead

                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                         files = os.listdir(path)             
                                                                                                                                                                                                                                                                                                                                         test = re.compile("test\.py$", re.IGNORECASE)          
                                                                                                                                                                                                                                                                                                                                    -    files = [f for f in files if test.search(f)] 
                                                                                                                                                                                                                                                                                                                                    + files = [f for f in files if test.search(f)]
                                                                                                                                                                                                                                                                                                                                    1. This will accomplish exactly the same result as using the filter function. Which way is more expressive? That's up to you.

                                                                                                                                                                                                                                                                                                                                      16.4. Mapping lists revisited

                                                                                                                                                                                                                                                                                                                                      @@ -6600,7 +6530,7 @@ There is discussion that map and filter might be depre ... newlist.append(double(n)) ... >>> newlist -[2, 4, 6, 10, 18, 20, 512, -6]
                                                                                                                                                                                                                                                                                                                                      +[2, 4, 6, 10, 18, 20, 512, -6]
                                                                                                                                                                                                                                                                                                                                      1. map takes a function and a list[8] and returns a new list by calling the function with each element of the list in order. In this case, the function simply multiplies each element by 2. @@ -6609,7 +6539,7 @@ There is discussion that map and filter might be depre

                                                                                                                                                                                                                                                                                                                                        Example 16.11. map with lists of mixed datatypes

                                                                                                                                                                                                                                                                                                                                         >>> li = [5, 'a', (2, 'b')]
                                                                                                                                                                                                                                                                                                                                         >>> map(double, li)     
                                                                                                                                                                                                                                                                                                                                        -[10, 'aa', (2, 'b', 2, 'b')]
                                                                                                                                                                                                                                                                                                                                        +[10, 'aa', (2, 'b', 2, 'b')]
                                                                                                                                                                                                                                                                                                                                        1. As a side note, I'd like to point out that map works just as well with lists of mixed datatypes, as long as the function you're using correctly handles each type. In this case, the double function simply multiplies the given argument by 2, and Python Does The Right Thing depending on the datatype of the argument. For integers, this means actually multiplying it by 2; for @@ -6618,7 +6548,7 @@ There is discussion that map and filter might be depre

                                                                                                                                                                                                                                                                                                                                          All right, enough play time. Let's look at some real code.

                                                                                                                                                                                                                                                                                                                                          Example 16.12. map in regression.py

                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                               filenameToModuleName = lambda f: os.path.splitext(f)[0] 
                                                                                                                                                                                                                                                                                                                                          -    moduleNames = map(filenameToModuleName, files)          
                                                                                                                                                                                                                                                                                                                                          + moduleNames = map(filenameToModuleName, files)
                                                                                                                                                                                                                                                                                                                                          1. As you saw in Section 4.7, “Using lambda Functions”, lambda defines an inline function. And as you saw in Example 6.17, “Splitting Pathnames”, os.path.splitext takes a filename and returns a tuple (name, extension). So filenameToModuleName is a function which will take a filename and strip off the file extension, and return just the name.
                                                                                                                                                                                                                                                                                                                                          2. Calling map takes each filename listed in files, passes it to the function filenameToModuleName, and returns a list of the return values of each of those function calls. In other words, you strip the file extension off @@ -6654,7 +6584,7 @@ too much, filter it. If it's not what you want, map it. Focus on the data; leave this way, with a comma-separated list. You did this on the very first line of this chapter's script.

                                                                                                                                                                                                                                                                                                                                            Example 16.13. Importing multiple modules at once

                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                             import sys, os, re, unittest 
                                                                                                                                                                                                                                                                                                                                            -
                                                                                                                                                                                                                                                                                                                                            +
                                                                                                                                                                                                                                                                                                                                            1. This imports four modules at once: sys (for system functions and access to the command line parameters), os (for operating system functions like directory listings), re (for regular expressions), and unittest (for unit testing).

                                                                                                                                                                                                                                                                                                                                              Now let's do the same thing, but with dynamic imports. @@ -6667,7 +6597,7 @@ import sys, os, re, unittest >>> <module 'sys' (built-in)> >>> os >>> <module 'os' from '/usr/local/lib/python2.2/os.pyc'> -

                                                                                                                                                                                                                                                                                                                                              +
                                                                                                                                                                                                                                                                                                                                              1. The built-in __import__ function accomplishes the same goal as using the import statement, but it's an actual function, and it takes a string as an argument.
                                                                                                                                                                                                                                                                                                                                              2. The variable sys is now the sys module, just as if you had said import sys. The variable os is now the os module, and so forth. @@ -6689,7 +6619,7 @@ to doesn't need to match the module name, either. You could import a series of m >>> import sys >>> sys.version '2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)]' -
                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                1. moduleNames is just a list of strings. Nothing fancy, except that the strings happen to be names of modules that you could import, if you wanted to. @@ -6725,7 +6655,7 @@ return unittest.TestSuite(map(load, modules)) 'plural.py', 'pluraltest.py', 'pyfontify.py', 'regression.py', 'roman.py', 'romantest.py', 'uncurly.py', 'unicode2koi8r.py', 'urllister.py', 'kgp', 'plural', 'roman', 'colorize.py'] -
                                                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                                                  1. files is a list of all the files and directories in the script's directory. (If you've been running some of the examples already, you may also see some .pyc files in there as well.) @@ -6734,7 +6664,7 @@ return unittest.TestSuite(map(load, modules)) >>> files = filter(test.search, files) >>> files ['apihelpertest.py', 'kgptest.py', 'odbchelpertest.py', 'pluraltest.py', 'romantest.py'] -
                                                                                                                                                                                                                                                                                                                                                    +
                                                                                                                                                                                                                                                                                                                                                    1. This regular expression will match any string that ends with test.py. Note that you need to escape the period, since a period in a regular expression usually means “match any single character”, but you actually want to match a literal period instead.
                                                                                                                                                                                                                                                                                                                                                    2. The compiled regular expression acts like a function, so you can use it to filter the large list of files and directories, @@ -6749,7 +6679,7 @@ return unittest.TestSuite(map(load, modules)) >>> moduleNames = map(filenameToModuleName, files) >>> moduleNames ['apihelpertest', 'kgptest', 'odbchelpertest', 'pluraltest', 'romantest'] -
                                                                                                                                                                                                                                                                                                                                                      +
                                                                                                                                                                                                                                                                                                                                                      1. As you saw in Section 4.7, “Using lambda Functions”, lambda is a quick-and-dirty way of creating an inline, one-line function. This one takes a filename with an extension and returns just the filename part, using the standard library function os.path.splitext that you saw in Example 6.17, “Splitting Pathnames”. @@ -6766,7 +6696,7 @@ return unittest.TestSuite(map(load, modules)) <module 'romantest' from 'romantest.py'>] >>> modules[-1] <module 'romantest' from 'romantest.py'> -
                                                                                                                                                                                                                                                                                                                                                        +
                                                                                                                                                                                                                                                                                                                                                        1. As you saw in Section 16.6, “Dynamically importing modules”, you can use a combination of map and __import__ to map a list of module names (as strings) into actual modules (which you can call or access like any other module). @@ -6785,7 +6715,7 @@ return unittest.TestSuite(map(load, modules)) ] ] >>> unittest.TestSuite(map(load, modules)) -
                                                                                                                                                                                                                                                                                                                                                          +
                                                                                                                                                                                                                                                                                                                                                          1. These are real module objects. Not only can you access them like any other module, instantiate classes and call functions, you can also introspect into the module to figure out which classes and functions it has in the first place. That's what @@ -6798,7 +6728,7 @@ in the unittest module, but then I'd never finish this one.)

                                                                                                                                                                                                                                                                                                                                                            Example 16.22. Step 6: Telling unittest to use your test suite

                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                             if __name__ == "__main__": 
                                                                                                                                                                                                                                                                                                                                                                 unittest.main(defaultTest="regressionTest") 
                                                                                                                                                                                                                                                                                                                                                            -
                                                                                                                                                                                                                                                                                                                                                            +
                                                                                                                                                                                                                                                                                                                                                            1. Instead of letting the unittest module do all its magic for us, you've done most of it yourself. You've created a function (regressionTest) that imports the modules yourself, calls unittest.defaultTestLoader yourself, and wraps it all up in a test suite. Now all you need to do is tell unittest that, instead of looking for tests and building a test suite in the usual way, it should just call the regressionTest function, which returns a ready-to-use TestSuite.

                                                                                                                                                                                                                                                                                                                                                              16.8. Summary

                                                                                                                                                                                                                                                                                                                                                              @@ -6966,7 +6896,7 @@ in your timing framework will irreparably skew your results. 8.21683733547 >>> t.repeat(3, 2000000) [16.48319309109, 16.46128984923, 16.44203948912] -
                                                                                                                                                                                                                                                                                                                                                              +
                                                                                                                                                                                                                                                                                                                                                              1. The timeit module defines one class, Timer, which takes two arguments. Both arguments are strings. The first argument is the statement you wish to time; in this case, you are timing a call to the Soundex function within the soundex with an argument of 'Pilgrim'. The second argument to the Timer class is the import statement that sets up the environment for the statement. Internally, timeit sets up an isolated virtual environment, manually executes the setup statement (importing the soundex module), then manually compiles and executes the timed statement (calling the Soundex function). diff --git a/dip3.css b/dip3.css index 06f035e..572599a 100644 --- a/dip3.css +++ b/dip3.css @@ -101,7 +101,7 @@ h1,h2{letter-spacing:-1px} h1,h1 code{font-size:xx-large} h2,h2 code{font-size:x-large} h3,h3 code{font-size:large} -h1{border-bottom:4px double;width:100%;margin:1em 0;text-shadow:gainsboro 1px 1px 1px} +h1{border-bottom:4px double;width:100%;margin:1em 0} h1:before{content:"Chapter " counter(h1) ". "} h1{counter-reset:h2} h2:before{counter-increment:h2;content:counter(h1) "." counter(h2) ". "} @@ -117,4 +117,4 @@ aside{display:block;float:right;font-style:oblique;font-size:xx-large;width:25%; .nav a{text-decoration:none;border:0;display:block} .nav a:first-child{float:left} .nav a:last-child{float:right} -.nav span{font-size:1000%;line-height:1;margin:0} +.nav span{font-size:1000%;line-height:1;margin:0;text-shadow:gainsboro 3px 3px 3px} diff --git a/examples/fibonacci2.py b/examples/fibonacci2.py index 8aeef15..f2c4749 100644 --- a/examples/fibonacci2.py +++ b/examples/fibonacci2.py @@ -1,11 +1,14 @@ """Fibonacci iterator""" class Fib: + """iterator that yields numbers in the Fibanocci sequence""" + def __init__(self, max): self.max = max def __iter__(self): - self.a, self.b = 0, 1 + self.a = 0 + self.b = 1 return self def __next__(self): diff --git a/examples/plural6.py b/examples/plural6.py index bc88b05..91474d6 100644 --- a/examples/plural6.py +++ b/examples/plural6.py @@ -15,8 +15,10 @@ def build_match_and_apply_functions(pattern, search, replace): return (matches_rule, apply_rule) class LazyRules: + rules_f = 'plural6-rules.txt' + def __init__(self): - self.pattern_file = open('plural6-rules.txt') + self.pattern_file = open(self.rules_f) self.cache = [] def __iter__(self): diff --git a/iterators.html b/iterators.html index bfa9a6a..080fccf 100644 --- a/iterators.html +++ b/iterators.html @@ -23,11 +23,14 @@ body{counter-reset:h1 6}

                                                                                                                                                                                                                                                                                                                                                                [download fibonacci2.py]

                                                                                                                                                                                                                                                                                                                                                                class Fib:
                                                                                                                                                                                                                                                                                                                                                                +    """iterator that yields numbers in the Fibanocci sequence"""
                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                     def __init__(self, max):
                                                                                                                                                                                                                                                                                                                                                                         self.max = max
                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                     def __iter__(self):
                                                                                                                                                                                                                                                                                                                                                                -        self.a, self.b = 0, 1
                                                                                                                                                                                                                                                                                                                                                                +        self.a = 0
                                                                                                                                                                                                                                                                                                                                                                +        self.b = 1
                                                                                                                                                                                                                                                                                                                                                                         return self
                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                     def __next__(self):
                                                                                                                                                                                                                                                                                                                                                                @@ -63,37 +66,88 @@ class PapayaWhip:  
                                                                                                                                                                                                                                                                                                                                                                 

                                                                                                                                                                                                                                                                                                                                                                The pass statement in Python is like a empty set of curly braces ({}) in Java or C. -

                                                                                                                                                                                                                                                                                                                                                                Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Although it's not required, Python classes can have something similar to a constructor: the __init__ method. +

                                                                                                                                                                                                                                                                                                                                                                Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Although it's not required, Python classes can have something similar to a constructor: the __init__() method.

                                                                                                                                                                                                                                                                                                                                                                The __init__() Method

                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                FIXME - port from DiP +

                                                                                                                                                                                                                                                                                                                                                                This example shows the initialization of the Fib class using the __init__ method. -

                                                                                                                                                                                                                                                                                                                                                                Know When To Use self and __init__

                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                +class Fib:
                                                                                                                                                                                                                                                                                                                                                                +    """iterator that yields numbers in the Fibanocci sequence"""  
                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                FIXME - port from DiP + def __init__(self, max):

                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                                                                1. Classes can (and should) have docstrings too, just like modules and functions. +
                                                                                                                                                                                                                                                                                                                                                                2. The __init__() method is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor of the class. It's tempting, because it looks like a constructor (by convention, the __init__() method is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the __init__() method is called, and you already have a valid reference to the new instance of the class. +
                                                                                                                                                                                                                                                                                                                                                                + +

                                                                                                                                                                                                                                                                                                                                                                The first argument of every class method, including the __init__() method, is always a reference to the current instance of the class. By convention, this argument is named self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention. + +

                                                                                                                                                                                                                                                                                                                                                                In the __init__() method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.

                                                                                                                                                                                                                                                                                                                                                                Instantiating Classes

                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                FIXME - port from DiP +

                                                                                                                                                                                                                                                                                                                                                                Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the __init__() method requires. The return value will be the newly created object. +

                                                                                                                                                                                                                                                                                                                                                                +>>> import fibonacci2
                                                                                                                                                                                                                                                                                                                                                                +>>> fib = fibonacci2.Fib(100)  
                                                                                                                                                                                                                                                                                                                                                                +>>> fib                        
                                                                                                                                                                                                                                                                                                                                                                +<fibonacci2.Fib object at 0x00DB8810>
                                                                                                                                                                                                                                                                                                                                                                +>>> fib.__class__              
                                                                                                                                                                                                                                                                                                                                                                +<class 'fibonacci2.Fib'>
                                                                                                                                                                                                                                                                                                                                                                +>>> fib.__doc__                
                                                                                                                                                                                                                                                                                                                                                                +'iterator that yields numbers in the Fibanocci sequence'
                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                                                                1. You are creating an instance of the Fib class (defined in the fibonacci2 module) and assigning the newly created instance to the variable fib. You are passing one parameter, 100, which will end up as the max argument in Fib's __init__() method. +
                                                                                                                                                                                                                                                                                                                                                                2. fib is now an instance of the Fib class. +
                                                                                                                                                                                                                                                                                                                                                                3. Every class instance has a built-in attribute, __class__, which is the object's class. Java programmers may be familiar with the Class class, which contains methods like getName and getSuperclass to get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like __class__, __name__, and __bases__. +
                                                                                                                                                                                                                                                                                                                                                                4. You can access the instance's docstring just as with a function or a module. All instances of a class share the same docstring. +
                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                A Note About Garbage Collection

                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                +

                                                                                                                                                                                                                                                                                                                                                                In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit new operator like C++ or Java. +

                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                FIXME - port from DiP, verify it's still true +

                                                                                                                                                                                                                                                                                                                                                                Instance Variables

                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                Special Method Names

                                                                                                                                                                                                                                                                                                                                                                +

                                                                                                                                                                                                                                                                                                                                                                On to the next line: -

                                                                                                                                                                                                                                                                                                                                                                FIXME - port from DiP, link to http://docs.python.org/3.0/reference/datamodel.html#special-method-names +

                                                                                                                                                                                                                                                                                                                                                                class Fib:
                                                                                                                                                                                                                                                                                                                                                                +    def __init__(self, max):
                                                                                                                                                                                                                                                                                                                                                                +        self.max = max        
                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                                                                1. What is self.max? It's an instance variable. It is completely separate from max, which was passed into the __init__() method as an argument. self.max is “global” to the instance. That means that you can access it from other methods. +
                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                FIXME - do we want to make an appendix out of some of the special methods? The organization in the Python docs is somewhat haphazard and most names have no examples at all +

                                                                                                                                                                                                                                                                                                                                                                class Fib:
                                                                                                                                                                                                                                                                                                                                                                +    def __init__(self, max):
                                                                                                                                                                                                                                                                                                                                                                +        self.max = max        
                                                                                                                                                                                                                                                                                                                                                                +    .
                                                                                                                                                                                                                                                                                                                                                                +    .
                                                                                                                                                                                                                                                                                                                                                                +    .
                                                                                                                                                                                                                                                                                                                                                                +    def __next__(self):
                                                                                                                                                                                                                                                                                                                                                                +        fib = self.a
                                                                                                                                                                                                                                                                                                                                                                +        if fib > self.max:    
                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                                                                1. self.max is defined in the __init__() method… +
                                                                                                                                                                                                                                                                                                                                                                2. …and referenced in the __next__() method. +
                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                Class Attributes

                                                                                                                                                                                                                                                                                                                                                                +

                                                                                                                                                                                                                                                                                                                                                                Instance variables are specific to one instance of a class. For example, if you create two Fib instances with different maximum values, they will each remember their own values. -

                                                                                                                                                                                                                                                                                                                                                                FIXME +

                                                                                                                                                                                                                                                                                                                                                                +>>> import fibonacci2
                                                                                                                                                                                                                                                                                                                                                                +>>> fib1 = fibonacci2.Fib(100)
                                                                                                                                                                                                                                                                                                                                                                +>>> fib2 = fibonacci2.Fib(200)
                                                                                                                                                                                                                                                                                                                                                                +>>> fib1.max
                                                                                                                                                                                                                                                                                                                                                                +100
                                                                                                                                                                                                                                                                                                                                                                +>>> fib2.max
                                                                                                                                                                                                                                                                                                                                                                +200

                                                                                                                                                                                                                                                                                                                                                                A Fibonacci Iterator

                                                                                                                                                                                                                                                                                                                                                                -

                                                                                                                                                                                                                                                                                                                                                                FIXME +

                                                                                                                                                                                                                                                                                                                                                                Now you're ready to learn how to build an iterator. An iterator is just a class that defines an __iter__() method.

                                                                                                                                                                                                                                                                                                                                                                [download fibonacci2.py]

                                                                                                                                                                                                                                                                                                                                                                class Fib:                                        
                                                                                                                                                                                                                                                                                                                                                                @@ -112,8 +166,8 @@ class PapayaWhip:  
                                                                                                                                                                                                                                                                                                                                                                         return fib                                
                                                                                                                                                                                                                                                                                                                                                                1. To build an iterator from scratch, fib needs to be a class, not a function. -
                                                                                                                                                                                                                                                                                                                                                                2. “Calling” fib(max) is really creating an instance of this class and calling its __init__() method with max. The __init__() method saves the maximum value as an instance variable so other methods can refer to it later. -
                                                                                                                                                                                                                                                                                                                                                                3. The __iter__() method is called whenever someone calls iter(fib). (As you’ll see in a minute, a for loop will call this automatically, but you can also call it yourself manually.) After performing beginning-of-iteration initialization (in this case, resetting self.a and self.b, our two counters), the __iter__() method can return any object that implements a __next__() method. In this case (and in most cases), __iter__() simply returns self, since this class implements its own __next__() method. +
                                                                                                                                                                                                                                                                                                                                                                4. “Calling” Fib(max) is really creating an instance of this class and calling its __init__() method with max. The __init__() method saves the maximum value as an instance variable so other methods can refer to it later. +
                                                                                                                                                                                                                                                                                                                                                                5. The __iter__() method is called whenever someone calls iter(fib). (As you’ll see in a minute, a for loop will call this automatically, but you can also call it yourself manually.) After performing beginning-of-iteration initialization (in this case, resetting self.a and self.b, our two counters), the __iter__() method can return any object that implements a __next__() method. In this case (and in most cases), __iter__() simply returns self, since this class implements its own __next__() method.
                                                                                                                                                                                                                                                                                                                                                                6. The __next__() method is called whenever someone calls next() on an iterator of an instance of a class. That will make more sense in a minute.
                                                                                                                                                                                                                                                                                                                                                                7. When the __next__() method raises a StopIteration exception, this signals to the caller that the iteration is over; no more values are available. If the caller is a for loop, it will notice this StopIteration exception and gracefully exit the loop. (In other words, it will swallow the exception.) This little bit of magic is actually the key to using iterators in for loops.
                                                                                                                                                                                                                                                                                                                                                                8. To spit out the next value, an iterator’s __next__() method simply returns the value. Do not use yield here; that’s a bit of syntactic sugar that only applies when you’re using generators. Here you’re creating your own iterator from scratch; use return instead. @@ -133,12 +187,12 @@ class PapayaWhip:
                                                                                                                                                                                                                                                                                                                                                                  • The for loop calls Fib(1000), as shown. This returns an instance of the Fib class. Call this fib_inst. -
                                                                                                                                                                                                                                                                                                                                                                  • Secretly, and quite cleverly, the for loop calls iter(fib_inst), which returns an iterator object. Call this fib_iter. In this case, fib_iter == fib_inst, because the __iter__() method returns self, but the for loop doesn’t know (or care) about that. +
                                                                                                                                                                                                                                                                                                                                                                  • Secretly, and quite cleverly, the for loop calls iter(fib_inst), which returns an iterator object. Call this fib_iter. In this case, fib_iter == fib_inst, because the __iter__() method returns self, but the for loop doesn’t know (or care) about that.
                                                                                                                                                                                                                                                                                                                                                                  • To “loop through” the iterator, the for loop calls next(fib_iter), which calls the __next__() method on the fib_iter object, which does the next-Fibonacci-number calculations and returns a value. The for loop takes this value and assigns it to n, then executes the body of the for loop for that value of n.
                                                                                                                                                                                                                                                                                                                                                                  • How does the for loop know when to stop? I’m glad you asked! When next(fib_iter) raises a StopIteration exception, the for loop will swallow the exception and gracefully exit. (Any other exception will pass through and be raised as usual.) And where have you seen a StopIteration exception? In the __next__() method, of course!
                                                                                                                                                                                                                                                                                                                                                                  -

                                                                                                                                                                                                                                                                                                                                                                  A Plural Rule Iterator

                                                                                                                                                                                                                                                                                                                                                                  +

                                                                                                                                                                                                                                                                                                                                                                  A Plural Rule Iterator

                                                                                                                                                                                                                                                                                                                                                                  Now it’s time for the finale. Let's rewrite the plural rules generator as an iterator. @@ -181,15 +235,43 @@ rules = LazyRules()

                                                                                                                                                                                                                                                                                                                                                                Let’s take the class one bite at a time.

                                                                                                                                                                                                                                                                                                                                                                class LazyRules:
                                                                                                                                                                                                                                                                                                                                                                -    def __init__(self):                                
                                                                                                                                                                                                                                                                                                                                                                -        self.pattern_file = open('plural6-rules.txt')  
                                                                                                                                                                                                                                                                                                                                                                -        self.cache = []                                
                                                                                                                                                                                                                                                                                                                                                                + rules_f = 'plural6-rules.txt' + + def __init__(self): + self.pattern_file = open(self.rules_f) + self.cache = []
                                                                                                                                                                                                                                                                                                                                                                1. The __init__() method is only going to be called once, when you instantiate the class and assign it to rules.
                                                                                                                                                                                                                                                                                                                                                                2. Since this is only going to get called once, it’s the perfect place to open the pattern file. You’ll read it later; no point doing more than you absolutely have to until absolutely necessary!
                                                                                                                                                                                                                                                                                                                                                                3. Also, this is a good place to initialize the cache, which you’ll use later as you read the patterns from the pattern file.
                                                                                                                                                                                                                                                                                                                                                                +

                                                                                                                                                                                                                                                                                                                                                                Before we continue, let's take a closer look at rules_f. It's not defined within the __init__() method. In fact, it's not defined within any method. It's defined at the class level. It's a class variable, and although you can access it just like an instance variable (self.rules_f), it is shared across all instances of the LazyRules class. + +

                                                                                                                                                                                                                                                                                                                                                                +>>> import plural6
                                                                                                                                                                                                                                                                                                                                                                +>>> r1 = plural6.LazyRules()
                                                                                                                                                                                                                                                                                                                                                                +>>> r2 = plural6.LazyRules()
                                                                                                                                                                                                                                                                                                                                                                +>>> r1.rules_f                               
                                                                                                                                                                                                                                                                                                                                                                +'plural6-rules.txt'
                                                                                                                                                                                                                                                                                                                                                                +>>> r2.rules_f
                                                                                                                                                                                                                                                                                                                                                                +'plural6-rules.txt'
                                                                                                                                                                                                                                                                                                                                                                +>>> r1.__class__.rules_f                     
                                                                                                                                                                                                                                                                                                                                                                +'plural6-rules.txt'
                                                                                                                                                                                                                                                                                                                                                                +>>> r1.__class__.rules_f = 'papayawhip.txt'  
                                                                                                                                                                                                                                                                                                                                                                +>>> r1.rules_f
                                                                                                                                                                                                                                                                                                                                                                +'papayawhip.txt'
                                                                                                                                                                                                                                                                                                                                                                +>>> r2.rules_f                               
                                                                                                                                                                                                                                                                                                                                                                +'papayawhip.txt'
                                                                                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                                                                                  +
                                                                                                                                                                                                                                                                                                                                                                1. FIXME +
                                                                                                                                                                                                                                                                                                                                                                2. +
                                                                                                                                                                                                                                                                                                                                                                3. +
                                                                                                                                                                                                                                                                                                                                                                4. +
                                                                                                                                                                                                                                                                                                                                                                + +

                                                                                                                                                                                                                                                                                                                                                                And now back to our show. +

                                                                                                                                                                                                                                                                                                                                                                    def __iter__(self):       
                                                                                                                                                                                                                                                                                                                                                                         self.cache_index = 0  
                                                                                                                                                                                                                                                                                                                                                                         return self           
                                                                                                                                                                                                                                                                                                                                                                @@ -197,7 +279,7 @@ rules = LazyRules()
                                                                                                                                                                                                                                                                                                                                                                1. The __iter__() method will be called every time someone — say, a for loop — calls iter(rules).
                                                                                                                                                                                                                                                                                                                                                                2. This is the place to reset the counter that we’re going to use to retrieve items from the cache (that we haven’t built yet — patience, grasshopper). -
                                                                                                                                                                                                                                                                                                                                                                3. Finally, the __iter__() method returns self, which signals that this class will take care of returning its own values throughout an iteration. +
                                                                                                                                                                                                                                                                                                                                                                4. Finally, the __iter__() method returns self, which signals that this class will take care of returning its own values throughout an iteration.
                                                                                                                                                                                                                                                                                                                                                                    def __next__(self):                                 
                                                                                                                                                                                                                                                                                                                                                                @@ -282,3 +364,34 @@ rules = LazyRules()

                                                                                                                                                                                                                                                                                                                                                                © 2001–9 Mark Pilgrim + + diff --git a/unit-testing.html b/unit-testing.html index f8b36e0..7dfb3d6 100644 --- a/unit-testing.html +++ b/unit-testing.html @@ -516,6 +516,8 @@ Ran 5 tests in 0.000s OK +

                                                                                                                                                                                                                                                                                                                                                                Now stop coding. +