diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index 09ff8ed..5bf0cb8 100644 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -13,7 +13,6 @@ del{background:#f87} mark{background:#ff8;font-weight:bold} -

skip to main content

  

You are here: Home Dive Into Python 3

Case study: porting chardet to Python 3

@@ -100,7 +99,6 @@ mark{background:#ff8;font-weight:bold}

We’re going to migrate the chardet module from Python 2 to Python 3. Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. In some cases this is easy — a function was renamed or moved to a different modules — but in other cases it can get pretty complex. To get a sense of all that it can do, refer to the appendix, Porting code to Python 3 with 2to3. In this chapter, we’ll start by running 2to3 on the chardet package, but as you’ll see, there will still be a lot of work to do after the automated tools have performed their magic.

The main chardet package is split across several different files, all in the same directory. The 2to3 script makes it easy to convert multiple files at once: just pass a directory as a command line argument, and 2to3 will convert each of the files in turn.

[The code examples will be easier to follow if you enable Javascript, but whatever.] -

skip over this

C:\home\chardet> python c:\Python30\Tools\Scripts\2to3.py -w chardet\
 RefactoringTool: Skipping implicit fixer: buffer
 RefactoringTool: Skipping implicit fixer: idioms
@@ -567,8 +565,7 @@ RefactoringTool: chardet\sbcsgroupprober.py
 RefactoringTool: chardet\sjisprober.py
 RefactoringTool: chardet\universaldetector.py
 RefactoringTool: chardet\utf8prober.py
-

Now run the 2to3 script on the testing harness, test.py. -

skip over this +

Now run the 2to3 script on the testing harness, test.py.

C:\home\chardet> python c:\Python30\Tools\Scripts\2to3.py -w test.py
 RefactoringTool: Skipping implicit fixer: buffer
 RefactoringTool: Skipping implicit fixer: idioms
@@ -599,12 +596,11 @@ RefactoringTool: Skipping implicit fixer: ws_comma
 +print(count, 'tests')
 RefactoringTool: Files that were modified:
 RefactoringTool: test.py
-

[FIXME explain the difference in import syntax] +

[FIXME explain the difference in import syntax]

Well, that wasn’t so hard. Just a few imports and print statements to convert. Time to run the new version. Do you think it’ll work?

Fixing what 2to3 can’t

False is invalid syntax

Now for the real test: running the test harness against the test suite. Since the test suite is designed to cover all the possible code paths, it’s a good way to test our ported code to make sure there aren’t any bugs lurking anywhere. -

skip over this

C:\home\chardet> python test.py tests\*\*
 Traceback (most recent call last):
   File "test.py", line 1, in <module>
@@ -613,8 +609,7 @@ RefactoringTool: test.py
self.done = constants.False ^ SyntaxError: invalid syntax -

Hmm, a small snag. In Python 3, False is a reserved word, so you can’t use it as a variable name. Let’s look at constants.py to see where it’s defined. Here’s the original version from constants.py, before the 2to3 script changed it: -

skip over this +

Hmm, a small snag. In Python 3, False is a reserved word, so you can’t use it as a variable name. Let’s look at constants.py to see where it’s defined. Here’s the original version from constants.py, before the 2to3 script changed it:

import __builtin__
 if not hasattr(__builtin__, 'False'):
     False = 0
@@ -622,7 +617,7 @@ if not hasattr(__builtin__, 'False'):
 else:
     False = __builtin__.False
     True = __builtin__.True
-

This piece of code is designed to allow this library to run under older versions of Python 2. Prior to Python 2.3 [FIXME-LINK], Python had no built-in Boolean type. This code detects the absence of the built-in constants True and False, and defines them if necessary. +

This piece of code is designed to allow this library to run under older versions of Python 2. Prior to Python 2.3 [FIXME-LINK], Python had no built-in Boolean type. This code detects the absence of the built-in constants True and False, and defines them if necessary.

However, Python 3 will always have a Boolean type, so this entire code snippet is unnecessary. The simplest solution is to replace all instances of constants.True and constants.False with True and False, respectively, then delete this dead code from constants.py.

So this line in universaldetector.py:

self.done = constants.False
@@ -631,7 +626,6 @@ else:

Ah, wasn’t that satisfying? The code is shorter and more readable already.

No module named constants

Time to run test.py again and see how far it gets. -

skip over this

C:\home\chardet> python test.py tests\*\*
 Traceback (most recent call last):
   File "test.py", line 1, in <module>
@@ -639,7 +633,7 @@ else:
   File "C:\home\chardet\chardet\universaldetector.py", line 29, in <module>
     import constants, sys
 ImportError: No module named constants
-

What’s that you say? No module named constants? Of course there’s a module named constants. …Oh wait, no there isn’t. Remember when the 2to3 script fixed up all those import statements? This library has a lot of relative imports — that is, modules that import other modules within the library. In Python 3, all import statements are absolute by default [FIXME-LINK PEP 0328]. To do relative imports, you need to do something like this instead: +

What’s that you say? No module named constants? Of course there’s a module named constants. …Oh wait, no there isn’t. Remember when the 2to3 script fixed up all those import statements? This library has a lot of relative imports — that is, modules that import other modules within the library. In Python 3, all import statements are absolute by default [FIXME-LINK PEP 0328]. To do relative imports, you need to do something like this instead:

from . import constants

But wait. Wasn’t the 2to3 script supposed to take care of these for you? Well, it did, but this particular import statement combines two different types of imports into one line: a relative import of the constants module within the library, and an absolute import of the sys module that is pre-installed in the Python standard library. In Python 2, you could combine these into one import statement. In Python 3, you can’t, and the 2to3 script is not smart enough to split the import statement into two.

The solution is to split the import statement manually. So this two-in-one import: @@ -651,20 +645,18 @@ import sys

Onward!

Name 'file' is not defined

And here we go again, running test.py to try to execute our test cases…

-

skip over this

C:\home\chardet> python test.py tests\*\*
 tests\ascii\howto.diveintomark.org.xml
 Traceback (most recent call last):
   File "test.py", line 9, in <module>
     for line in file(f, 'rb'):
 NameError: name 'file' is not defined
-

This one surprised me, because I’ve been using this idiom as long as I can remember. In Python 2, the global file() function was an alias for open(), which was the standard way of opening files for reading. In Python 3, the entire system for reading and writing files has been refactored into the io module. [FIXME-LINK PEP 3116] I’ll cover the new I/O module in more detail in Chapter FIXME, but for now, the important bit is that the global file() function no longer exists. However, the open() function does still exist. (Technically, it’s an alias for io.open(), but never mind that right now.) +

This one surprised me, because I’ve been using this idiom as long as I can remember. In Python 2, the global file() function was an alias for open(), which was the standard way of opening files for reading. In Python 3, the entire system for reading and writing files has been refactored into the io module. [FIXME-LINK PEP 3116] I’ll cover the new I/O module in more detail in Chapter FIXME, but for now, the important bit is that the global file() function no longer exists. However, the open() function does still exist. (Technically, it’s an alias for io.open(), but never mind that right now.)

Thus, the simplest solution to the problem of the missing file() is to call open() instead:

for line in open(f, 'rb'):

And that’s all I have to say about that.

Can’t use a string pattern on a bytes-like object

Now things are starting to get interesting. And by “interesting,” I mean “confusing as all hell.” -

skip over this

C:\home\chardet> python test.py tests\*\*
 tests\ascii\howto.diveintomark.org.xml
 Traceback (most recent call last):
@@ -673,34 +665,29 @@ NameError: name 'file' is not defined
File "C:\home\chardet\chardet\universaldetector.py", line 98, in feed if self._highBitDetector.search(aBuf): TypeError: can't use a string pattern on a bytes-like object -

To debug this, let’s see what self._highBitDetector is. It’s defined in the __init__ method of the UniversalDetector class: -

skip over this

class UniversalDetector:
     def __init__(self):
         self._highBitDetector = re.compile(r'[\x80-\xFF]')
-

This pre-compiles a regular expression designed to find non-ASCII characters in the range 128–255 (0x80–0xFF). Wait, that’s not quite right; I need to be more precise with my terminology. This pattern is designed to find non-ASCII bytes in the range 128-255. +

This pre-compiles a regular expression designed to find non-ASCII characters in the range 128–255 (0x80–0xFF). Wait, that’s not quite right; I need to be more precise with my terminology. This pattern is designed to find non-ASCII bytes in the range 128-255.

And therein lies the problem.

In Python 2, a string was an array of bytes whose character encoding was tracked separately. If you wanted Python 2 to keep track of the character encoding, you had to use a Unicode string (u'') instead. But in Python 3, a string is always what Python 2 called a Unicode string — that is, an array of Unicode characters (of possibly varying byte lengths). Since this regular expression is defined by a string pattern, it can only be used to search a string — again, an array of characters. But what we’re searching is not a string, it’s a byte array. Looking at the traceback, this error occurred in universaldetector.py: -

skip over this

def feed(self, aBuf):
     .
     .
     .
     if self._mInputState == ePureAscii:
         if self._highBitDetector.search(aBuf):
-

And what is aBuf? Let’s backtrack further to a place that calls UniversalDetector.feed(). One place that calls it is the test harness, test.py. -

skip over this +

And what is aBuf? Let’s backtrack further to a place that calls UniversalDetector.feed(). One place that calls it is the test harness, test.py.

u = UniversalDetector()
 .
 .
 .
 for line in open(f, 'rb'):
     u.feed(line)
-

And here we find our answer: in the UniversalDetector.feed() method, aBuf is a line read from a file on disk. Look carefully at the parameters used to open the file: 'rb'. 'r' is for “read”; OK, big deal, we’re reading the file. Ah, but 'b' is for “binary.” Without the 'b' flag, this for loop would read the file, line by line, and convert each line into a string — an array of Unicode characters — according to the system default character encoding. (You could override the system encoding with another parameter to open(), but never mind that for now.) But with the 'b' flag, this for loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to UniversalDetector.feed(), and eventually gets passed to the pre-compiled regular expression, self._highBitDetector, to search for high-bit… characters. But we don’t have characters; we have bytes. Oops. +

And here we find our answer: in the UniversalDetector.feed() method, aBuf is a line read from a file on disk. Look carefully at the parameters used to open the file: 'rb'. 'r' is for “read”; OK, big deal, we’re reading the file. Ah, but 'b' is for “binary.” Without the 'b' flag, this for loop would read the file, line by line, and convert each line into a string — an array of Unicode characters — according to the system default character encoding. (You could override the system encoding with another parameter to open(), but never mind that for now.) But with the 'b' flag, this for loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to UniversalDetector.feed(), and eventually gets passed to the pre-compiled regular expression, self._highBitDetector, to search for high-bit… characters. But we don’t have characters; we have bytes. Oops.

What we need this regular expression to search is not an array of characters, but an array of bytes.

Once you realize that, the solution is not difficult. Regular expressions defined with strings can search strings. Regular expressions defined with byte arrays can search byte arrays. To define a byte array pattern, we simply change the type of the argument we use to define the regular expression to a byte array. (There is one other case of this same problem, on the very next line.) -

skip over this code listing

  class UniversalDetector:
       def __init__(self):
 -         self._highBitDetector = re.compile(b'[\x80-\xFF]')
@@ -710,8 +697,7 @@ for line in open(f, 'rb'):
           self._mEscCharSetProber = None
           self._mCharSetProbers = []
           self.reset()
-

Searching the entire codebase for other uses of the re module turns up two more instances, in charsetprober.py. Again, the code is defining regular expressions as strings but executing them on aBuf, which is a byte array. The solution is the same: define the regular expression patterns as byte arrays. -

skip over this code listing +

Searching the entire codebase for other uses of the re module turns up two more instances, in charsetprober.py. Again, the code is defining regular expressions as strings but executing them on aBuf, which is a byte array. The solution is the same: define the regular expression patterns as byte arrays.

  class CharSetProber:
       .
       .
@@ -728,7 +714,6 @@ for line in open(f, 'rb'):
         
 

Can't convert 'bytes' object to str implicitly

Curiouser and curiouser… -

skip over this

C:\home\chardet> python test.py tests\*\*
 tests\ascii\howto.diveintomark.org.xml
 Traceback (most recent call last):
@@ -737,12 +722,10 @@ for line in open(f, 'rb'):
   File "C:\home\chardet\chardet\universaldetector.py", line 100, in feed
     elif (self._mInputState == ePureAscii) and self._escDetector.search(self._mLastChar + aBuf):
 TypeError: Can't convert 'bytes' object to str implicitly
-

There's an unfortunate clash of coding style and Python interpreter here. The TypeError could be anywhere on that line, but the traceback doesn't tell you exactly where it is. It could be in the first conditional or the second, and the traceback would look the same. To narrow it down, you should split the line in half, like this: -

skip over this code listing +

There's an unfortunate clash of coding style and Python interpreter here. The TypeError could be anywhere on that line, but the traceback doesn't tell you exactly where it is. It could be in the first conditional or the second, and the traceback would look the same. To narrow it down, you should split the line in half, like this:

elif (self._mInputState == ePureAscii) and \
     self._escDetector.search(self._mLastChar + aBuf):
-

And re-run the test:

-

skip over this command output listing +

And re-run the test:

C:\home\chardet> python test.py tests\*\*
 tests\ascii\howto.diveintomark.org.xml
 Traceback (most recent call last):
@@ -751,9 +734,8 @@ TypeError: Can't convert 'bytes' object to str implicitly
File "C:\home\chardet\chardet\universaldetector.py", line 101, in feed self._escDetector.search(self._mLastChar + aBuf): TypeError: Can't convert 'bytes' object to str implicitly
-

Aha! The problem was not in the first conditional (self._mInputState == ePureAscii) but in the second one. So what could cause a TypeError there? Perhaps you're thinking that the search() method is expecting a value of a different type, but that wouldn't generate this traceback. Python functions can take any value; if you pass the right number of arguments, the function will execute. It may crash if you pass it a value of a different type than it's expecting, but if that happened, the traceback would point to somewhere inside the function. But this traceback says it never got as far as calling the search() method. So the problem must be in that + operation, as it's trying to construct the value that it will eventually pass to the search() method. +

Aha! The problem was not in the first conditional (self._mInputState == ePureAscii) but in the second one. So what could cause a TypeError there? Perhaps you're thinking that the search() method is expecting a value of a different type, but that wouldn't generate this traceback. Python functions can take any value; if you pass the right number of arguments, the function will execute. It may crash if you pass it a value of a different type than it's expecting, but if that happened, the traceback would point to somewhere inside the function. But this traceback says it never got as far as calling the search() method. So the problem must be in that + operation, as it's trying to construct the value that it will eventually pass to the search() method.

We know from previous debugging that aBuf is a byte array. So what is self._mLastChar? It's an instance variable, defined in the reset() method, which is actually called from the __init__() method. -

skip over this code listing

class UniversalDetector:
     def __init__(self):
         self._highBitDetector = re.compile(b'[\x80-\xFF]')
@@ -769,9 +751,8 @@ TypeError: Can't convert 'bytes' object to str implicitly
self._mGotData = False self._mInputState = ePureAscii self._mLastChar = '' -

And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can't concatenate a string to a byte array — not even a zero-length string. +

And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can't concatenate a string to a byte array — not even a zero-length string.

So what is self._mLastChar anyway? The answer is in the feed() method, just a few lines down from where the trackback occurred. -

skip over this code listing

if self._mInputState == ePureAscii:
     if self._highBitDetector.search(aBuf):
         self._mInputState = eHighbyte
@@ -781,15 +762,13 @@ TypeError: Can't convert 'bytes' object to str implicitly
self._mLastChar = aBuf[-1]

The calling function calls this feed() method over and over again with a few bytes at a time. The method processes the bytes it was given (passed in as aBuf), then stores the last byte in self._mLastChar in case it's needed during the next call. (In a multi-byte encoding, the feed() method might get called with half of a character, then called again with the other half.) But because aBuf is now a byte array instead of a string, self._mLastChar needs to be a byte array as well. Thus: -

skip over this code listing

  def reset(self):
       .
       .
       .
 -     self._mLastChar = ''
 +     self._mLastChar = b''
-

Searching the entire codebase for "mLastChar" turns up a similar problem in mbcharsetprober.py, but instead of tracking the last character, it tracks the last two characters. The MultiByteCharSetProber class uses a list of 1-character strings to track the last two characters; in Python 3, it needs to use a list of integers. -

skip over this code listing +

Searching the entire codebase for "mLastChar" turns up a similar problem in mbcharsetprober.py, but instead of tracking the last character, it tracks the last two characters. The MultiByteCharSetProber class uses a list of 1-character strings to track the last two characters; in Python 3, it needs to use a list of integers.


   class MultiByteCharSetProber(CharSetProber):
       def __init__(self):
@@ -809,7 +788,6 @@ TypeError: Can't convert 'bytes' object to str implicitly
+ self._mLastChar = [0, 0]

Unsupported operand type(s) for +: 'int' and 'bytes'

I have good news, and I have bad news. The good news is we're making progress… -

skip over this command listing

C:\home\chardet> python test.py tests\*\*
 tests\ascii\howto.diveintomark.org.xml
 Traceback (most recent call last):
@@ -818,10 +796,9 @@ TypeError: Can't convert 'bytes' object to str implicitly
File "C:\home\chardet\chardet\universaldetector.py", line 101, in feed self._escDetector.search(self._mLastChar + aBuf): TypeError: unsupported operand type(s) for +: 'int' and 'bytes' -

…The bad news is it doesn't always feel like progress. +

…The bad news is it doesn't always feel like progress.

But this is progress! Really! Even though the traceback calls out the same line of code, it's a different error than it used to be. Progress! So what's the problem now? The last time I checked, this line of code didn't try to concatenate an int with a byte array (bytes). In fact, you just spent a lot of time ensuring that self._mLastChar was a byte array. How did it turn into an int?

The answer lies not in the previous lines of code, but in the following lines. -

skip over this code listing

if self._mInputState == ePureAscii:
     if self._highBitDetector.search(aBuf):
         self._mInputState = eHighbyte
@@ -830,8 +807,7 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'
self._mInputState = eEscAscii self._mLastChar = aBuf[-1] -

This error doesn't occur the first time the feed() method gets called; it occurs the second time, after self._mLastChar has been set to the last byte of aBuf. Well, what's the problem with that? Getting a single element from a byte array yields an integer, not a byte array. To see the difference, follow me to the interactive shell: -

skip over this interpreter listing +

This error doesn't occur the first time the feed() method gets called; it occurs the second time, after self._mLastChar has been set to the last byte of aBuf. Well, what's the problem with that? Getting a single element from a byte array yields an integer, not a byte array. To see the difference, follow me to the interactive shell:

 >>> aBuf = b'\xEF\xBB\xBF'         
 >>> len(aBuf)
@@ -850,7 +826,7 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'
 b'\xbf'
 >>> mLastChar + aBuf               
 b'\xbf\xef\xbb\xbf'
-
    +
    1. Define a byte array of length 3.
    2. The last element of the byte array is 191.
    3. That's an integer. @@ -866,7 +842,6 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes' + self._mLastChar = aBuf[-1:]

      ord() expected string of length 1, but int found

      Tired yet? You're almost there… -

      skip over this command output listing

      C:\home\chardet> python test.py tests\*\*
       tests\ascii\howto.diveintomark.org.xml                       ascii with confidence 1.0
       tests\Big5\0804.blogspot.com.xml
      @@ -882,29 +857,25 @@ tests\Big5\0804.blogspot.com.xml
         File "C:\home\chardet\chardet\codingstatemachine.py", line 43, in next_state
           byteCls = self._mModel['classTable'][ord(c)]
       TypeError: ord() expected string of length 1, but int found
      -

      OK, so c is an int, but the ord() function was expecting a 1-character string. Fair enough. Where is c defined? -

      skip over this code listing +

      OK, so c is an int, but the ord() function was expecting a 1-character string. Fair enough. Where is c defined?

      # codingstatemachine.py
       def next_state(self, c):
           # for each byte we get its class
           # if it is first byte, we also get byte length
           byteCls = self._mModel['classTable'][ord(c)]
      -

      That's no help; it's just passed into the function. Let's pop the stack. -

      skip over this code listing +

      That's no help; it's just passed into the function. Let's pop the stack.

      # utf8prober.py
       def feed(self, aBuf):
           for c in aBuf:
               codingState = self._mCodingSM.next_state(c)
      -

      And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That's what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there's no need to call the ord() function because c is already an int! +

      And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That's what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there's no need to call the ord() function because c is already an int!

      Thus: -

      skip over this code listing

        def next_state(self, c):
             # for each byte we get its class
             # if it is first byte, we also get byte length
       -     byteCls = self._mModel['classTable'][ord(c)]
       +     byteCls = self._mModel['classTable'][c]

      Searching the entire codebase for instances of "ord(c)" uncovers similar problems in sbcharsetprober.py… -

      skip over this code listing

      # sbcharsetprober.py
       def feed(self, aBuf):
           if not self._mModel['keepEnglishLetter']:
      @@ -914,15 +885,13 @@ def feed(self, aBuf):
               return self.get_state()
           for c in aBuf:
               order = self._mModel['charToOrderMap'][ord(c)]
      -

      …and latin1prober.py… -

      skip over this code listing +

      …and latin1prober.py

      # latin1prober.py
       def feed(self, aBuf):
           aBuf = self.filter_with_english_letters(aBuf)
           for c in aBuf:
               charClass = Latin1_CharToClass[ord(c)]
      -

      c is iterating over aBuf, which means it is an integer, not a 1-character string. The solution is the same: change ord(c) to just plain c. -

      skip over this code listing +

      c is iterating over aBuf, which means it is an integer, not a 1-character string. The solution is the same: change ord(c) to just plain c.

        # sbcharsetprober.py
         def feed(self, aBuf):
             if not self._mModel['keepEnglishLetter']:
      @@ -943,7 +912,6 @@ def feed(self, aBuf):
       

      Unorderable types: int() >= str()

      Let's go again. -

      skip over this command output listing

      C:\home\chardet> python test.py tests\*\*
       tests\ascii\howto.diveintomark.org.xml                       ascii with confidence 1.0
       tests\Big5\0804.blogspot.com.xml
      @@ -961,9 +929,8 @@ tests\Big5\0804.blogspot.com.xml
         File "C:\home\chardet\chardet\jpcntx.py", line 176, in get_order
           if ((aStr[0] >= '\x81') and (aStr[0] <= '\x9F')) or \
       TypeError: unorderable types: int() >= str()
      -

      Did you notice? This time around, the code passed the first test case (tests\ascii\howto.diveintomark.org.xml). You're making real progress here. +

      Did you notice? This time around, the code passed the first test case (tests\ascii\howto.diveintomark.org.xml). You're making real progress here.

      So what's this all about? “Unorderable types”? Once again, the difference between byte arrays and strings is rearing its ugly head. Take a look at the code: -

      skip over this code listing

      class SJISContextAnalysis(JapaneseContextAnalysis):
           def get_order(self, aStr):
               if not aStr: return -1, 1
      @@ -973,8 +940,7 @@ TypeError: unorderable types: int() >= str()
      charLen = 2 else: charLen = 1 -

      And where does aStr come from? Let's pop the stack: -

      skip over this code listing +

      And where does aStr come from? Let's pop the stack:

      def feed(self, aBuf, aLen):
           .
           .
      @@ -982,10 +948,9 @@ TypeError: unorderable types: int() >= str()
      i = self._mNeedToSkipCharNum while i < aLen: order, charLen = self.get_order(aBuf[i:i+2]) -

      Oh look, it's our old friend, aBuf. As you might have guessed from every other issue we've encountered in this chapter, aBuf is a byte array. Here, the feed() method isn't just passing it on wholesale; it's slicing it. But as you saw earlier in this chapter, slicing a byte array returns a byte array, so the aStr parameter that gets passed to the get_order() method is still a byte array. +

      Oh look, it's our old friend, aBuf. As you might have guessed from every other issue we've encountered in this chapter, aBuf is a byte array. Here, the feed() method isn't just passing it on wholesale; it's slicing it. But as you saw earlier in this chapter, slicing a byte array returns a byte array, so the aStr parameter that gets passed to the get_order() method is still a byte array.

      And what is this code trying to do with aStr? It's taking the first element of the byte array and comparing it to a string of length 1. In Python 2, that worked, because aStr and aBuf were strings, and aStr[0] would be a string, and you can compare strings for inequality. But in Python 3, aStr and aBuf are byte arrays, aStr[0] is an integer, and you can't compare integers and strings for inequality without explicitly coercing one of them.

      In this case, there's no need to make the code more complicated by adding an explicit coercion. aStr[0] yields an integer; the things you're comparing to are all constants. Let's change them from 1-character strings to integers. -

      skip over this code listing

        class SJISContextAnalysis(JapaneseContextAnalysis):
             def get_order(self, aStr):
                 if not aStr: return -1, 1
      @@ -1039,7 +1004,6 @@ TypeError: unorderable types: int() >= str()
      return -1, charLen

      Searching the entire codebase for occurrences of the ord() function uncovers the same problem in chardistribution.py: -

      skip over this command output listing

      C:\home\chardet> python test.py tests\*\*
       tests\ascii\howto.diveintomark.org.xml                       ascii with confidence 1.0
       tests\Big5\0804.blogspot.com.xml
      @@ -1057,8 +1021,7 @@ tests\Big5\0804.blogspot.com.xml
         File "C:\home\chardet\chardet\chardistribution.py", line 174, in get_order
           if (aStr[0] >= '\x81') and (aStr[0] <= '\x9F'):
       TypeError: unorderable types: int() >= str()
      -

      The fix is the same: -

      skip over this code listing +

      The fix is the same:

        class EUCTWDistributionAnalysis(CharDistributionAnalysis):
             def __init__(self):
                 CharDistributionAnalysis.__init__(self)
      @@ -1165,7 +1128,6 @@ TypeError: unorderable types: int() >= str()
      return -1

      Global name 'reduce' is not defined

      Once more into the breach… -

      skip over this command output listing

      C:\home\chardet> python test.py tests\*\*
       tests\ascii\howto.diveintomark.org.xml                       ascii with confidence 1.0
       tests\Big5\0804.blogspot.com.xml
      @@ -1177,16 +1139,14 @@ tests\Big5\0804.blogspot.com.xml
         File "C:\home\chardet\chardet\latin1prober.py", line 126, in get_confidence
           total = reduce(operator.add, self._mFreqCounter)
       NameError: global name 'reduce' is not defined
      -

      According to the official What's New In Python 3.0 guide, the reduce() function has been moved out of the global namespace and into the functools module. Quoting the guide: "Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable." +

      According to the official What's New In Python 3.0 guide, the reduce() function has been moved out of the global namespace and into the functools module. Quoting the guide: "Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."

      OK then, let's refactor it to use a for loop. -

      skip over this code listing

      def get_confidence(self):
           if self.get_state() == constants.eNotMe:
               return 0.01
         
           total = reduce(operator.add, self._mFreqCounter)

      The reduce() function takes two arguments — a function and a list (strictly speaking, any iterable object will do) — and applies the function cumulatively to each item of the list. In other words, this is a fancy and roundabout way of adding up all the items in a list and returning the result. It looks much more readable as a for loop. -

      skip over this code listing

        def get_confidence(self):
             if self.get_state() == constants.eNotMe:
                 return 0.01
      @@ -1195,8 +1155,7 @@ NameError: global name 'reduce' is not defined
      + total = 0 + for frequency in self._mFreqCounter: + total += frequency -

      I CAN HAZ TESTZ? -

      skip over this command output listing +

      I CAN HAZ TESTZ?

      C:\home\chardet> python test.py tests\*\*
       tests\ascii\howto.diveintomark.org.xml                       ascii with confidence 1.0
       tests\Big5\0804.blogspot.com.xml                             Big5 with confidence 0.99
      @@ -1231,7 +1190,7 @@ tests\EUC-JP\arclamp.jp.xml                                  EUC-JP with confide
       .
       .
       316 tests
      -

      Holy crap, it actually works! /me does a little dance +

      Holy crap, it actually works! /me does a little dance

      Summary

      What have we learned?

        diff --git a/dip3.css b/dip3.css index 8fc8396..90b847e 100644 --- a/dip3.css +++ b/dip3.css @@ -26,10 +26,6 @@ a:link,.w a{color:#26c} a:visited{color:#93c} .c a{color:inherit} -/* skip links */ -.s a,.s a:hover,.s a:visited{position:absolute;left:0px;top:-500px;width:1px;height:1px;overflow:hidden} -.s a:active,.s a:focus{position:static;width:auto;height:auto} - /* code blocks */ pre{white-space:pre-wrap;padding-left:2.154em;border-left:1px solid #ddd} .w{float:left} diff --git a/native-datatypes.html b/native-datatypes.html index 1ea8ee6..3585462 100644 --- a/native-datatypes.html +++ b/native-datatypes.html @@ -9,7 +9,6 @@ body{counter-reset:h1 2} -

        skip to main content

          

        You are here: Home Dive Into Python 3

        Native datatypes

        diff --git a/porting-code-to-python-3-with-2to3.html b/porting-code-to-python-3-with-2to3.html index 30fae83..f10c1c8 100644 --- a/porting-code-to-python-3-with-2to3.html +++ b/porting-code-to-python-3-with-2to3.html @@ -19,7 +19,6 @@ th,td,td pre{margin:0} td pre{padding:0;border:0} -

        skip to main content

          

        You are here: Home Dive Into Python 3

        Porting code to Python 3 with 2to3

        @@ -89,8 +88,7 @@ td pre{padding:0;border:0}

        print statement

        In Python 2, print was a statement. Whatever you wanted to print simply followed the print keyword. In Python 3, print() is a function — whatever you want to print is passed to print() like any other function.

        [The code examples will be easier to follow if you enable Javascript, but whatever.] -

        skip over this table - +
        @@ -112,7 +110,7 @@ td pre{padding:0;border:0}
        Notes Python 2print >>sys.stderr, 1, 2, 3 print(1, 2, 3, file=sys.stderr)
        -

          +
          1. To print a blank line, call print() without any arguments.
          2. To print a single value, call print() with one argument
          3. To print two values separated by a space, call print() with two arguments. @@ -121,8 +119,7 @@ td pre{padding:0;border:0}

          Unicode string literals

          Python 2 had two string types: Unicode strings and non-Unicode strings. Python 3 has one string type: Unicode strings. -

          skip over this table - +
          @@ -134,14 +131,13 @@ td pre{padding:0;border:0}
          Notes Python 2 Python 3ur"PapayaWhip\foo" r"PapayaWhip\foo"
          -

            +
            1. Unicode string literals are simply converted into string literals, which, in Python 3, are always Unicode.
            2. Unicode raw strings (in which Python does not auto-escape backslashes) are converted to raw strings. In Python 3, raw strings are always Unicode.

            unicode() global function

            Python 2 had two global functions to coerce objects into strings: unicode() to coerce them into Unicode strings, and str() to coerce them into non-Unicode strings. Python 3 has only one string type, Unicode strings, so the str() function is all you need. (The unicode() function no longer exists.) -

            skip over this table - +
            @@ -150,12 +146,10 @@ td pre{padding:0;border:0}
            Notes Python 2 Python 3unicode(anything) str(anything)
            -

            long data type

            Python 2 had separate int and long types for non-floating-point numbers. An int could not be any larger than sys.maxint, which varied by platform. Longs were defined by appending an L to the end of the number, and they could be, well, longer than ints. In Python 3, there is only one integer type, called int, which mostly behaves like the long type in Python 2. Since there are no longer two types, there is no need for special syntax to distinguish them.

            Further reading: PEP 237: Unifying Long Integers and Integers. -

            skip over this table - +
            @@ -176,7 +170,7 @@ td pre{padding:0;border:0}
            Notes Python 2 Python 3isinstance(x, long) isinstance(x, int)
            -

              +
              1. Base 10 long integer literals become base 10 integer literals.
              2. Base 16 long integer literals become base 16 integer literals.
              3. In Python 3, the old long() function no longer exists, since longs don't exist. To coerce a variable to an integer, use the int() function. @@ -185,8 +179,7 @@ td pre{padding:0;border:0}

              <> comparison

              Python 2 supported <> as a synonym for !=, the not-equals comparison operator. Python 3 supports the != operator, but not <>. -

              skip over this table - +
              @@ -198,14 +191,13 @@ td pre{padding:0;border:0}
              Notes Python 2 Python 3if x <> y <> z: if x != y != z:
              -

                +
                1. A simple comparison.
                2. A more complex comparison between three values.

                has_key() dictionary method

                In Python 2, dictionaries had a has_key() method to test whether the dictionary had a certain key. In Python 3, this method no longer exists. Instead, you need to use the in operator. -

                skip over this table - +
                @@ -226,7 +218,7 @@ td pre{padding:0;border:0}
                Notes Python 2 Python 3x + a_dictionary.has_key(y) x + (y in a_dictionary)
                -

                  +
                  1. The simplest form.
                  2. The or operator takes precedence over the in operator, so there is no need for parentheses here.
                  3. On the other hand, you do need parentheses here, for the same reason — or takes precedence over in. @@ -235,8 +227,7 @@ td pre{padding:0;border:0}

                  Dictionary methods that return lists

                  In Python 2, many dictionary methods returned lists. The most frequently used methods were keys(), items(), and values(). In Python 3, all of these methods return dynamic views. In some contexts, this is not a problem. If the method's return value is immediately passed to another function that iterates through the entire sequence, it makes no difference whether the actual type is a list or a view. In other contexts, it matters a great deal. If you were expecting a complete list with individually addressable elements, your code will choke, because views do not support indexing. -

                  skip over this table - +
                  @@ -257,7 +248,7 @@ td pre{padding:0;border:0}
                  Notes Python 2 Python 3min(a_dictionary.keys()) no change
                  -

                    +
                    1. 2to3 errs on the side of safety, converting the return value from keys() to a static list with the list() function. This will always work, but it will be less efficient than using a view. You should examine the converted code to see if a list is absolutely necessary, or if a view would do.
                    2. Another view-to-list conversion, with the items() method. 2to3 will do the same thing with the values() method.
                    3. Python 3 does not support the iterkeys() method anymore. Use keys(), and if necessary, convert the view to an iterator with the iter() function. @@ -268,8 +259,7 @@ td pre{padding:0;border:0}

                      Several modules in the Python Standard Library have been renamed. Several other modules which are related to each other have been combined or reorganized to make their association more logical.

                      http

                      In Python 3, several related HTTP modules have been combined into a single package, http. -

                      skip over this table - +
                      @@ -289,7 +279,7 @@ import SimpleHTTPServer import CGIHttpServer
                      Notes Python 2 Python 3 import http.server
                      -

                        +
                        1. The http.client module implements a low-level library that can request HTTP resources and interpret HTTP responses.
                        2. The http.cookies module provides a Pythonic interface to browser cookies that are sent in a Set-Cookie: HTTP header.
                        3. The http.cookiejar module manipulates the actual files on disk that popular web browsers use to store cookies. @@ -297,8 +287,7 @@ import CGIHttpServer

                        urllib

                        Python 2 had a rat's nest of overlapping modules to parse, encode, and fetch URLs. In Python 3, these have all been refactored and combined in a single package, urllib. -

                        skip over this table - +
                        @@ -326,7 +315,7 @@ from urllib2 import HTTPError
                        Notes Python 2 Python 3
                        from urllib.request import Request
                         from urllib.error import HTTPError
                        -

                          +
                          1. The old urllib module in Python 2 had a variety of functions, including urlopen() for fetching data and splittype(), splithost(), and splituser() for splitting a URL into its constituent parts. These functions have been reorganized more logically within the new urllib package. 2to3 will also change all calls to these functions so they use the new naming scheme.
                          2. The old urllib2 module in Python 2 has been folded into into the urllib package in Python 3. All your urllib2 favorites — the build_opener() method, Request objects, and HTTPBasicAuthHandler and friends — are still available.
                          3. The urllib.parse module in Python 3 contains all the parsing functions from the old urlparse module in Python 2. @@ -336,8 +325,7 @@ from urllib.error import HTTPError

                          dbm

                          All the various DBM clones are now in a single package, dbm. If you need a specific variant like GNU DBM, you can import the appropriate module within the dbm package. -

                          skip over this table - +
                          @@ -359,11 +347,9 @@ from urllib.error import HTTPError import whichdb
                          Notes Python 2 Python 3
                          import dbm
                          -

                          xmlrpc

                          XML-RPC is a lightweight method of performing remote RPC calls over HTTP. The XML-RPC client library and several XML-RPC server implementations are now combined in a single package, xmlrpc. -

                          skip over this table - +
                          @@ -376,10 +362,8 @@ import whichdb import SimpleXMLRPCServer
                          Notes Python 2 Python 3 import xmlrpc.server
                          -

                          Other modules

                          -

                          skip over this table - +
                          @@ -418,7 +402,7 @@ except ImportError:
                          Notes Python 2 Python 3import commands import subprocess
                          -

                            +
                            1. A common idiom in Python 2 was to try to import cStringIO as StringIO, and if that failed, to import StringIO instead. Do not do this in Python 3; the io module does it for you. It will find the fastest implementation available and use it automatically.
                            2. A similar idiom was used to import the fastest pickle implementation. Do not do this in Python 3; the pickle module does it for you.
                            3. The builtins module contains the global functions, classes, and constants used throughout the Python language. Redefining a function in the builtins module will redefine the global function everywhere. That is exactly as powerful and scary as it sounds. @@ -432,7 +416,6 @@ except ImportError:

                              Relative imports within a package

                              A package is a group of related modules that function as a single entity. In Python 2, when modules within a package need to reference each other, you use import foo or from foo import Bar. The Python 2 interpreter first searches within the current package to find foo.py, and then moves on to the other directories in the Python search path (sys.path). Python 3 works a bit differently. Instead of searching the current package, it goes directly to the Python search path. If you want one module within a package to import another module in the same package, you need to explicitly provide the relative path between the two modules.

                              Suppose you had this package, with multiple files in the same directory: -

                              skip over this ASCII art

                              chardet/
                               |
                               +--__init__.py
                              @@ -442,9 +425,8 @@ except ImportError:
                               +--mbcharsetprober.py
                               |
                               +--universaldetector.py
                              -

                              Now suppose that universaldetector.py needs to import the entire constants.py file and one class from mbcharsetprober.py. How do you do it? -

                              skip over this table - +

                              Now suppose that universaldetector.py needs to import the entire constants.py file and one class from mbcharsetprober.py. How do you do it? +

                              @@ -456,14 +438,13 @@ except ImportError:
                              Notes Python 2 Python 3from mbcharsetprober import MultiByteCharSetProber from .mbcharsetprober import MultiByteCharsetProber
                              -

                                +
                                1. When you need to import an entire module from elsewhere in your package, use the new from . import syntax. The period is actually a relative path from this file (universaldetector.py) to the file you want to import (constants.py). In this case, they are in the same directory, thus the single period. You can also import from the parent directory (from .. import anothermodule) or a subdirectory.
                                2. To import a specific class or function from another module directly into your module's namespace, prefix the target module with a relative path, minus the trailing slash. In this case, mbcharsetprober.py is in the same directory as universaldetector.py, so the path is a single period. You can also import form the parent directory (from ..anothermodule import AnotherClass) or a subdirectory.

                                next() iterator method

                                In Python 2, iterators had a next() method which returned the next item in the sequence. That's still true in Python 3, but there is now also a global next() function that takes an iterator as an argument. -

                                skip over this table - +
                                @@ -494,7 +475,7 @@ for an_iterator in a_sequence_of_iterators: for an_iterator in a_sequence_of_iterators: an_iterator.__next__()
                                Notes Python 2 Python 3
                                -

                                  +
                                  1. In the simplest case, instead of calling an iterator's next() method, you now pass the iterator itself to the global next() function.
                                  2. If you have a function that returns an iterator, call the function and pass the result to the next() function. (The 2to3 script is smart enough to convert this properly.)
                                  3. If you define your own class and mean to use it as an iterator, define the __next__() special method. @@ -503,8 +484,7 @@ for an_iterator in a_sequence_of_iterators:

                                  filter() global function

                                  In Python 2, the filter() function returned a list, the result of filtering a sequence through a function that returned True or False for each item in the sequence. In Python 3, the filter() function returns an iterator, not a list. -

                                  skip over this table - +
                                  @@ -525,7 +505,7 @@ for an_iterator in a_sequence_of_iterators:
                                  Notes Python 2 Python 3[i for i in filter(a_function, a_sequence)] no change
                                  -

                                    +
                                    1. In the most basic case, 2to3 will wrap a call to filter() with a call to list(), which simply iterates through its argument and returns a real list.
                                    2. However, if the call to filter() is already wrapped in list(), 2to3 will do nothing, since the fact that filter() is returning an iterator is irrelevant.
                                    3. For the special syntax of filter(None, ...), 2to3 will transform the call into a semantically equivalent list comprehension. @@ -534,8 +514,7 @@ for an_iterator in a_sequence_of_iterators:

                                    map() global function

                                    In much the same way as filter(), the map() function now returns an iterator. (In Python 2, it returned a list.) -

                                    skip over this table - +
                                    @@ -556,7 +535,7 @@ for an_iterator in a_sequence_of_iterators:
                                    Notes Python 2 Python 3[i for i in map(a_function, a_sequence)] no change
                                    -

                                      +
                                      1. As with filter(), in the most basic case, 2to3 will wrap a call to map() with a call to list().
                                      2. For the special syntax of map(None, ...), the identity function, 2to3 will convert it to an equivalent call to list().
                                      3. If the first argument to map() is a lambda function, 2to3 will convert it to an equivalent list comprehension. @@ -565,8 +544,7 @@ for an_iterator in a_sequence_of_iterators:

                                      reduce() global function (3.1+)

                                      In Python 3, the reduce() function has been removed from the global namespace and placed in the functools module. -

                                      skip over this table - +
                                      @@ -576,13 +554,12 @@ for an_iterator in a_sequence_of_iterators:
                                      Notes Python 2 Python 3
                                      from functtools import reduce
                                       reduce(a, b, c)
                                      -

                                      +

                                      The version of 2to3 that shipped with Python 3.0 would not fix the reduce() function automatically. The fix first appeared in the 2to3 script that shipped with Python 3.1.

                                      apply() global function

                                      Python 2 had a global function called apply(), which took a function f and a list [a, b, c] and returned f(a, b, c). In Python 3, the apply() function no longer exists. Instead, there is a new function calling syntax that allows you to pass a list and have Python apply the list as the function's arguments. -

                                      skip over this table - +
                                      @@ -600,7 +577,7 @@ reduce(a, b, c)
                                      Notes Python 2 Python 3
                                      apply(aModule.a_function, a_list_of_args) aModule.a_function(*a_list_of_args)
                                      -

                                        +
                                        1. In the simplest form, you can call a function with a list of arguments (an actual list like [a, b, c]) by prepending the list with an asterisk (*). This is exactly equivalent to the old apply() function in Python 2.
                                        2. In Python 2, the apply() function could actually take three parameters: a function, a list of arguments, and a dictionary of named arguments. In Python 3, you can accomplish the same thing by prepending the list of arguments with an asterisk (*) and the dictionary of named arguments with two asterisks (**).
                                        3. The + operator, used here for list concatenation, takes precedence over the * operator, so there is no need for extra parentheses around a_list_of_args + z. @@ -608,8 +585,7 @@ reduce(a, b, c)

                                        intern() global function

                                        In Python 2, you could call the intern() function on a string to intern it as a performance optimization. In Python 3, the intern() function has been moved to the sys module. -

                                        skip over this table - +
                                        @@ -618,11 +594,9 @@ reduce(a, b, c)
                                        Notes Python 2 Python 3
                                        intern(aString) sys.intern(aString)
                                        -

                                        exec statement

                                        Just as the print statement became a function in Python 3, so too has the exec statement. The exec() function takes a string which contains arbitrary Python code and executes it as if it were just another statement or expression. -

                                        skip over this table - +
                                        @@ -637,15 +611,14 @@ reduce(a, b, c)
                                        Notes Python 2 Python 3
                                        exec codeString in a_global_namespace, a_local_namespace exec(codeString, a_global_namespace, a_local_namespace)
                                        -

                                          +
                                          1. In the simplest form, the 2to3 script simply encloses the code-as-a-string in parentheses, since exec() is now a function instead of a statement.
                                          2. The old exec statement could take a namespace, a private environment of globals in which the code-as-a-string would be executed. Python 3 can also do this; just pass the namespace as the second argument to the exec() function.
                                          3. Even fancier, the old exec statement could also take a local namespace (like the variables defined within a function). In Python 3, the exec() function can do that too.

                                          execfile statement (3.1+)

                                          Like the old exec statement, the old execfile statement will execute strings as if they were Python code. Where exec took a string, execfile took a filename. In Python 3, the execfile statement has been eliminated. If you really need to take a file of Python code and execute it (but you're not willing to simply import it), you can accomplish the same thing by opening the file, reading its contents, calling the global compile() function to force the Python interpreter to compile the code, and then call the new exec() function. -

                                          skip over this table - +
                                          @@ -654,13 +627,12 @@ reduce(a, b, c)
                                          Notes Python 2 Python 3
                                          execfile("a_filename") exec(compile(open("a_filename").read(), "a_filename", "exec"))
                                          -

                                          +

                                          The version of 2to3 that shipped with Python 3.0 would not fix the execfile statement automatically. The fix first appeared in the 2to3 script that shipped with Python 3.1.

                                          repr literals (backticks)

                                          In Python 2, there was a special syntax of wrapping any object in backticks (like `x`) to get a representation of the object. In Python 3, this capability still exists, but you can no longer use backticks to get it. Instead, use the global repr() function. -

                                          skip over this table - +
                                          @@ -672,14 +644,13 @@ reduce(a, b, c)
                                          Notes Python 2 Python 3
                                          `"PapayaWhip" + `2`` repr("PapayaWhip" + repr(2))
                                          -

                                            +
                                            1. Remember, x can be anything — a class, a function, a module, a primitive data type, etc. The repr() function works on everything.
                                            2. In Python 2, backticks could be nested, leading to this sort of confusing (but valid) expression. The 2to3 tool is smart enough to convert this into nested calls to repr().

                                            try...except statement

                                            The syntax for catching exceptions has changed slightly between Python 2 and Python 3. -

                                            skip over this table - +
                                            @@ -715,7 +686,7 @@ except: pass
                                            Notes Python 2 Python 3 no change
                                            -

                                              +
                                              1. Instead of a comma after the exception type, Python 3 uses a new keyword, as.
                                              2. The as keyword also works for catching multiple types of exceptions at once.
                                              3. If you catch an exception but don't actually care about accessing the exception object itself, the syntax is identical between Python 2 and Python 3. @@ -726,8 +697,7 @@ except:

                                          raise statement

                                          The syntax for raising your own exceptions has changed slightly between Python 2 and Python 3. -

                                          skip over this table - +
                                          @@ -745,7 +715,7 @@ except:
                                          Notes Python 2 Python 3raise "error message" unsupported
                                          -

                                            +
                                            1. In the simplest form, raising an exception without a custom error message, the syntax is unchanged.
                                            2. The change becomes noticeable when you want to raise an exception with a custom error message. Python 2 separated the exception class and the message with a comma; Python 3 passes the error message as a parameter.
                                            3. Python 2 supported a more complex syntax to raise an exception with a custom traceback (stack trace). You can do this in Python 3 as well, but the syntax is quite different. @@ -753,8 +723,7 @@ except:

                                            throw method on generators

                                            In Python 2, generators have a throw() method. Calling a_generator.throw() raises an exception at the point where the generator was paused, then returns the next value yielded by the generator function. In Python 3, this functionality is still available, but the syntax is slightly different. -

                                            skip over this table - +
                                            @@ -769,15 +738,14 @@ except:
                                            Notes Python 2 Python 3a_generator.throw("error message") unsupported
                                            -

                                              +
                                              1. In the simplest form, a generator throws an exception without a custom error message. In this case, the syntax has not changed between Python 2 and Python 3.
                                              2. If the generator throws an exception with a custom error message, you need to pass the error string to the exception when you create it.
                                              3. Python 2 also supported throwing an exception with only a custom error message. Python 3 does not support this, and the 2to3 script will display a warning telling you that you will need to fix this code manually.

                                              xrange() global function

                                              In Python 2, there were two ways to get a range of numbers: range(), which returned a list, and xrange(), which returned an iterator. In Python 3, range() returns an iterator, and xrange() doesn't exist. -

                                              skip over this table - +
                                              @@ -798,7 +766,7 @@ except:
                                              Notes Python 2 Python 3sum(range(10)) no change
                                              -

                                                +
                                                1. In the simplest case, the 2to3 script will simply convert xrange() to range().
                                                2. If your Python 2 code used range(), the 2to3 script does not know whether you needed a list, or whether an iterator would do. It errs on the side of caution and coerces the return value into a list by calling the list() function.
                                                3. If the xrange() function was inside a list comprehension, there is no need to coerce the result to a list, since the list comprehension will work just fine with an iterator. @@ -807,8 +775,7 @@ except:

                                                raw_input() and input() global functions

                                                Python 2 had two global functions for asking the user for input on the command line. The first, called input(), expected the user to enter a Python expression (and returned the result). The second, called raw_input(), just returned whatever the user typed. This was wildly confusing for beginners and widely regarded as a “wart” in the language. Python 3 excises this wart by renaming raw_input() to input(), so it works the way everyone naively expects it to work. -

                                                skip over this table - +
                                                @@ -823,15 +790,14 @@ except:
                                                Notes Python 2 Python 3input() eval(input())
                                                -

                                                  +
                                                  1. In the simplest form, raw_input() becomes input().
                                                  2. In Python 2, the raw_input() function could take a prompt as a parameter. This has been retained in Python 3.
                                                  3. If you actually need to ask the user for a Python expression to evaluate, use the input() function and pass the result to eval().

                                                  func_* function attributes

                                                  In Python 2, code within functions can access special attributes about the function itself. In Python 3, these special function attributes have been renamed for consistency with other attributes. -

                                                  skip over this table - +
                                                  @@ -858,7 +824,7 @@ except:
                                                  Notes Python 2 Python 3a_function.func_code a_function.__code__
                                                  -

                                                    +
                                                    1. The __name__ attribute (previously func_name) contains the function's name.
                                                    2. The __doc__ attribute (previously func_doc) contains the docstring that you defined in the function's source code.
                                                    3. The __defaults__ attribute (previously func_defaults) is a tuple containing default argument values for those arguments that have default values. @@ -869,8 +835,7 @@ except:

                                                    xreadlines() I/O method

                                                    In Python 2, file objects had an xreadlines() method which returned an iterator that would read the file one line at a time. This was useful in for loops, among other places. In fact, it was so useful, later versions of Python 2 added the capability to file objects themselves. -

                                                    skip over this table - +
                                                    @@ -882,15 +847,14 @@ except:
                                                    Notes Python 2 Python 3for line in a_file.xreadlines(5): no change
                                                    -

                                                      +
                                                      1. If you used to call xreadlines() with no arguments, 2to3 will convert it to just the file object. In Python 3, this will accomplish the same thing: read the file one line at a time and execute the body of the for loop.
                                                      2. If you used to call xreadlines() with an argument (the number of lines to read at a time), keep doing that. It still works in Python 3, and 2to3 will not change it.

                                                      lambda functions with multiple parameters

                                                      In Python 2, you could define anonymous lambda functions which took multiple parameters by defining the function as taking a tuple with a specific number of items. In effect, Python 2 would “unpack” the tuple into named arguments, which you could then reference (by name) within the lambda function. In Python 3, you can still pass a tuple to a lambda function, but the Python interpreter will not unpack the tuple into named arguments. Instead, you will need to reference each argument by its positional index. -

                                                      skip over this table - +
                                                      @@ -905,15 +869,14 @@ except:
                                                      Notes Python 2 Python 3lambda (x, (y, z)): x + y + z lambda x_y_z: x_y_z[0] + x_y_z[1][0] + x_y_z[1][1]
                                                      -

                                                        +
                                                        1. If you had defined a lambda function that took a tuple of one item, in Python 3 that would become a lambda with references to x1[0]. The name x1 is autogenerated by the 2to3 script, based on the named arguments in the original tuple.
                                                        2. A lambda function with a two-item tuple (x, y) gets converted to x_y with positional arguments x_y[0] and x_y[1].
                                                        3. The 2to3 script can even handle lambda functions with nested tuples of named arguments. The resulting Python 3 code is a bit unreadable, but it works the same as the old code did in Python 2.

                                                        Special method attributes

                                                        In Python 2, class methods can reference the class object they are defined in, as well as the method object itself. im_self is the class instance object; the class im_func is the function object; im_class is the class of im_self (for bound methods) or the class that asked for the method (for unbound methods). In Python 3, these special method attributes have been renamed to follow the naming conventions of other attributes. -

                                                        skip over this table - +
                                                        @@ -928,11 +891,9 @@ except:
                                                        Notes Python 2 Python 3aClassInstance.aClassMethod.im_class aClassInstance.aClassMethod.self.__class__
                                                        -

                                                        __nonzero__ special class attribute

                                                        In Python 2, you could build your own classes that could be used in a boolean context. For example, you could instantiate the class and then use the instance in an if statement. To do this, you defined a special __nonzero__() method which returned True or False, and it was called whenever the instance was used in a boolean context. In Python 3, you can still do this, but the name of the method has changed to __bool__(). -

                                                        skip over this table - +
                                                        @@ -950,14 +911,13 @@ except: pass
                                                        Notes Python 2 Python 3 no change
                                                        -

                                                          +
                                                          1. Instead of __nonzero__(), Python 3 calls the __bool__() method when evaluating an instance in a boolean context.
                                                          2. However, if you have a __nonzero__() method that takes arguments, the 2to3 tool will assume that you were using it for some other purpose, and it will not make any changes.

                                                          Octal literals

                                                          The syntax for defining base 8 (octal) numbers has changed slightly between Python 2 and Python 3. -

                                                          skip over this table - +
                                                          @@ -966,11 +926,9 @@ except:
                                                          Notes Python 2 Python 3x = 0755 x = 0o755
                                                          -

                                                          sys.maxint

                                                          Due to the integration of the long and int types, the sys.maxint constant is no longer accurate. Because the value may still be useful in determining platform-specific capabilities, it has been retained but renamed as sys.maxsize. -

                                                          skip over this table - +
                                                          @@ -982,14 +940,13 @@ except:
                                                          Notes Python 2 Python 3a_function(sys.maxint) a_function(sys.maxsize)
                                                          -

                                                            +
                                                            1. maxint becomes maxsize.
                                                            2. Any usage of sys.maxint becomes sys.maxsize.

                                                            callable() global function

                                                            In Python 2, you could check whether an object was callable (like a function) with the global callable() function. In Python 3, this global function has been eliminated. To check whether an object is callable, check for the existence of the __call__() special method. -

                                                            skip over this table - +
                                                            @@ -998,11 +955,9 @@ except:
                                                            Notes Python 2 Python 3callable(anything) hasattr(anything, "__call__")
                                                            -

                                                            zip() global function

                                                            In Python 2, the global zip() function took any number of sequences and returned a list of tuples. The first tuple contained the first item from each sequence; the second tuple contained the second item from each sequence; and so on. In Python 3, zip() returns an iterator instead of a list. -

                                                            skip over this table - +
                                                            @@ -1014,14 +969,13 @@ except:
                                                            Notes Python 2 Python 3d.join(zip(a, b, c)) no change
                                                            -

                                                              +
                                                              1. In the simplest form, you can get the old behavior of the zip() function by wrapping the return value in a call to list(), which will run through the iterator that zip() returns and return a real list of the results.
                                                              2. In contexts that already iterate through all the items of a sequence (such as this call to the join() method), the iterator that zip() returns will work just fine. The 2to3 script is smart enough to detect these cases and make no change to your code.

                                                              StandardError exception

                                                              In Python 2, StandardError was the base class for all built-in exceptions other than StopIteration, GeneratorExit, KeyboardInterrupt, and SystemExit. In Python 3, StandardError has been eliminated; use Exception instead. -

                                                              skip over this table - +
                                                              @@ -1033,11 +987,9 @@ except:
                                                              Notes Python 2 Python 3x = StandardError(a, b, c) x = Exception(a, b, c)
                                                              -

                                                              types module constants

                                                              The types module contains a variety of constants to help you determine the type of an object. In Python 2, it contained constants for all primitive types like dict and int. In Python 3, these constants have been eliminated; just use the primitive type name instead. -

                                                              skip over this table - +
                                                              @@ -1061,11 +1013,9 @@ except:
                                                              Notes Python 2 Python 3types.NoneType type(None)
                                                              -

                                                              isinstance() global function (3.1+)

                                                              The isinstance() function checks whether an object is an instance of a particular class or type. In Python 2, you could pass a tuple of types, and isinstance() would return True if the object was any of those types. In Python 3, you can still do this, but passing the same type twice is deprecated. -

                                                              skip over this table - +
                                                              @@ -1074,13 +1024,12 @@ except:
                                                              Notes Python 2 Python 3isinstance(x, (int, float, int)) isinstance(x, (int, float))
                                                              -

                                                              +

                                                              The version of 2to3 that shipped with Python 3.0 would not fix these cases of isinstance() automatically. The fix first appeared in the 2to3 script that shipped with Python 3.1.

                                                              basestring datatype

                                                              Python 2 had two string types: Unicode and non-Unicode. But there was also another type, basestring. It was an abstract type, a superclass for both the str and unicode types. It couldn't be called or instantiated directly, but you could pass it to the global isinstance() function to check whether an object was either a Unicode or non-Unicode string. In Python 3, there is only one string type, so basestring has no reason to exist. -

                                                              skip over this table - +
                                                              @@ -1089,10 +1038,9 @@ except:
                                                              Notes Python 2 Python 3isinstance(x, basestring) isinstance(x, str)
                                                              -

                                                              itertools module

                                                              Python 2.3 introduced the itertools module, which defined variants of the global zip(), map(), and filter() functions that returned iterators instead of lists. In Python 3, those global functions return iterators, so those functions in the itertools module have been eliminated. - +
                                                              @@ -1110,7 +1058,7 @@ except:
                                                              Notes Python 2 Python 3from itertools import imap, izip, foo from itertools import foo
                                                              -

                                                                +
                                                                1. Instead of itertools.izip(), just use the global zip() function.
                                                                2. Instead of itertools.imap(), just use map().
                                                                3. itertools.ifilter() becomes filter(). @@ -1118,8 +1066,7 @@ except:

                                                                sys.exc_type, sys.exc_value, sys.exc_traceback

                                                                Python 2 had three variables in the sys module that you could access while an exception was being handled: sys.exc_type, sys.exc_value, sys.exc_traceback. (Actually, these date all the way back to Python 1.) Ever since Python 1.5, these variables have been deprecated in favor of sys.exc_info, which is a tuple that contains all three values. In Python 3, these individual variables have finally gone away; you must use sys.exc_info. -

                                                                skip over this table - +
                                                                @@ -1134,11 +1081,9 @@ except:
                                                                Notes Python 2 Python 3sys.exc_traceback sys.exc_info()[2]
                                                                -

                                                                List comprehensions over tuples

                                                                In Python 2, if you wanted to code a list comprehension that iterated over a tuple, you did not need to put parentheses around the tuple values. In Python 3, explicit parentheses are required. -

                                                                skip over this table - +
                                                                @@ -1147,11 +1092,9 @@ except:
                                                                Notes Python 2 Python 3[i for i in 1, 2] [i for i in (1, 2)]
                                                                -

                                                                os.getcwdu() function

                                                                Python 2 had a function named os.getcwd(), which returned the current working directory as a (non-Unicode) string. Because modern file systems can handle directory names in any character encoding, Python 2.3 introduced os.getcwdu(). The os.getcwdu() function returned the current working directory as a Unicode string. In Python 3, there is only one string type (Unicode), so os.getcwd() is all you need. -

                                                                skip over this table - +
                                                                @@ -1160,11 +1103,9 @@ except:
                                                                Notes Python 2 Python 3os.getcwdu() os.getcwd()
                                                                -

                                                                Metaclasses

                                                                In Python 2, you could create metaclasses either by defining the metaclass argument in the class declaration, or by defining a special class-level __metaclass__ attribute. In Python 3, the class-level attribute has been eliminated. -

                                                                skip over this table - +
                                                                @@ -1184,7 +1125,7 @@ except:
                                                                Notes Python 2 Python 3
                                                                class C(Whipper, Beater, metaclass=PapayaMeta):
                                                                     pass
                                                                -

                                                                  +
                                                                  1. Declaring the metaclass in the class declaration worked in Python 2, and it still works the same in Python 3.
                                                                  2. Declaring the metaclass in a class attribute worked in Python 2, but doesn't work in Python 3.
                                                                  3. The 2to3 script is smart enough to construct a valid class declaration, even if the class is inherited from one or more base classes. @@ -1196,8 +1137,7 @@ except:

                                                                    The 2to3 script will not fix set() literals by default. To enable this fix, specify -f set_literal on the command line when you call 2to3.

                                                                    -

                                                                    skip over this table - +
                                                                    @@ -1212,14 +1152,12 @@ except:
                                                                    Notes Before Afterset([i for i in a_sequence]) {i for i in a_sequence}
                                                                    -

                                                                    buffer() global function (explicit)

                                                                    Python objects implemented in C can export a “buffer interface,” which is a block of memory that is directly readable and writeable without copying. (That is exactly as powerful and scary as it sounds.) In Python 3, buffer() has been renamed to memoryview(). (It's a little more complicated than that, but you can almost certainly ignore the differences.)

                                                                    The 2to3 script will not fix the buffer() function by default. To enable this fix, specify -f buffer on the command line when you call 2to3.

                                                                    -

                                                                    skip over this table - +
                                                                    @@ -1228,14 +1166,12 @@ except:
                                                                    Notes Before Afterx = buffer(y) x = memoryview(y)
                                                                    -

                                                                    Whitespace around commas (explicit)

                                                                    Despite being draconian about whitespace for indenting and outdenting, Python is actually quite liberal about whitespace in other areas. Within lists, tuples, sets, and dictionaries, whitespace can appear before and after commas with no ill effects. However, the Python style guide states that commas should be preceded by zero spaces and followed by one. Although this is purely an aesthetic issue (the code works either way, in both Python 2 and Python 3), the 2to3 script can optionally fix this for you.

                                                                    The 2to3 script will not fix whitespace around commas by default. To enable this fix, specify -f wscomma on the command line when you call 2to3.

                                                                    -

                                                                    skip over this table - +
                                                                    @@ -1247,14 +1183,12 @@ except:
                                                                    Notes Before After{a :b} {a: b}
                                                                    -

                                                                    Common idioms (explicit)

                                                                    There were a number of common idioms built up in the Python community. Some, like the while 1: loop, date back to Python 1. (Python didn't have a true boolean type until version 2.3, so developers used 1 and 0 instead.) Modern Python programmers should train their brains to use modern versions of these idioms instead.

                                                                    The 2to3 script will not fix common idioms by default. To enable this fix, specify -f idioms on the command line when you call 2to3.

                                                                    -

                                                                    skip over this table - +
                                                                    @@ -1277,7 +1211,6 @@ do_stuff(a_list)
                                                                    Notes Before After
                                                                    a_list = sorted(a_sequence)
                                                                     do_stuff(a_list)
                                                                    -

                                                                    FIXME: once the rest of the book is written, this appendix should contain copious links back to any chapter or section that touches on these features.

                                                                    © 2001–9 ark Pilgrim diff --git a/regular-expressions.html b/regular-expressions.html index c363623..7616d3d 100644 --- a/regular-expressions.html +++ b/regular-expressions.html @@ -9,7 +9,6 @@ body{counter-reset:h1 4} -

                                                                    skip to main content

                                                                      

                                                                    You are here: Home Dive Into Python 3

                                                                    Regular expressions

                                                                    diff --git a/strings.html b/strings.html index 4b1c2a0..db5c795 100644 --- a/strings.html +++ b/strings.html @@ -9,7 +9,6 @@ body{counter-reset:h1 3} -

                                                                    skip to main content

                                                                      

                                                                    You are here: Home Dive Into Python 3

                                                                    Strings

                                                                    diff --git a/unit-testing.html b/unit-testing.html index 53fa82d..71cdb21 100644 --- a/unit-testing.html +++ b/unit-testing.html @@ -9,7 +9,6 @@ body{counter-reset:h1 7} -

                                                                    skip to main content

                                                                      

                                                                    You are here: Home Dive Into Python 3

                                                                    Unit testing

                                                                    diff --git a/your-first-python-program.html b/your-first-python-program.html index 1143c59..407ef44 100644 --- a/your-first-python-program.html +++ b/your-first-python-program.html @@ -10,7 +10,6 @@ body{counter-reset:h1 1} th{font-family:inherit !important} -

                                                                    skip to main content

                                                                      

                                                                    You are here: Home Dive Into Python 3

                                                                    Your first Python program

                                                                    @@ -41,7 +40,6 @@ th{font-family:inherit !important}

                                                                    Diving in

                                                                    Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.

                                                                    [The code examples will be easier to follow if you enable Javascript, but whatever.] -

                                                                    skip over this code listing

                                                                    [download humansize.py]

                                                                    SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
                                                                                 1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
                                                                    @@ -71,8 +69,7 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
                                                                     if __name__ == "__main__":
                                                                         print(approximate_size(1000000000000, False))
                                                                         print(approximate_size(1000000000000))
                                                                    -

                                                                    Now let's run this program on the command line. On Windows, it will look something like this: -

                                                                    skip over this command output listing +

                                                                    Now let's run this program on the command line. On Windows, it will look something like this:

                                                                     c:\home\diveintopython3> c:\python30\python.exe humansize.py
                                                                     1.0 TB
                                                                    @@ -82,7 +79,7 @@ if __name__ == "__main__":
                                                                     you@localhost:~$ python3 humansize.py
                                                                     1.0 TB
                                                                     931.3 GiB
                                                                    -

                                                                    FIXME: this would be a good place to explain what the program, you know, actually does. +

                                                                    FIXME: this would be a good place to explain what the program, you know, actually does.

                                                                    Declaring functions

                                                                    Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it, like this:

                                                                    def approximate_size(size, a_kilobyte_is_1024_bytes=True):
                                                                    @@ -122,7 +119,6 @@ if __name__ == "__main__":

                                                                    I won't bore you with a long finger-wagging speech about the importance of documenting your code. Just know that code is written once but read many times, and the most important audience for your code is yourself, six months after writing it (i.e. after you've forgotten everything but need to fix something). Python makes it easy to write readable code, so take advantage of it. You'll thank me in six months.

                                                                    Documentation strings

                                                                    You can document a Python function by giving it a documentation string (docstring for short). In this program, the approximate_size function has a docstring: -

                                                                    skip over this code listing

                                                                    def approximate_size(size, a_kilobyte_is_1024_bytes=True):
                                                                         """Convert a file size to human-readable form.
                                                                     
                                                                    @@ -134,7 +130,7 @@ if __name__ == "__main__":
                                                                         Returns: string
                                                                     
                                                                         """
                                                                    -

                                                                    Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including carriage returns, leading white space, and other quote characters. You can use them anywhere, but you'll see them most often used when defining a docstring. +

                                                                    Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including carriage returns, leading white space, and other quote characters. You can use them anywhere, but you'll see them most often used when defining a docstring.

                                                                    Triple quotes are also an easy way to define a string with both single and double quotes, like qq/.../ in Perl 5.

                                                                    @@ -149,7 +145,6 @@ if __name__ == "__main__":

                                                                    Everything is an object

                                                                    In case you missed it, I just said that Python functions have attributes, and that those attributes are available at runtime. A function, like everything else in Python, is an object.

                                                                    Run the interactive Python shell and follow along: -

                                                                    skip over this interpreter listing

                                                                     >>> import humansize                               
                                                                     >>> print(humansize.approximate_size(4096, True))  
                                                                    @@ -165,7 +160,7 @@ if __name__ == "__main__":
                                                                         Returns: string
                                                                     
                                                                     
                                                                    -
                                                                      +
                                                                      1. The first line imports the humansize program as a module -- a chunk of code that you can use interactively, or from a larger Python program. (You'll see examples of multi-module Python programs in [FIXME xref].) Once you import a module, you can reference any of its public functions, classes, or attributes. Modules can do this to access functionality in other modules, and you can do it in the Python interactive shell too. This is an important concept, and you'll see a lot more of it throughout this book.
                                                                      2. When you want to use functions defined in imported modules, you need to include the module name. So you can't just say approximate_size; it must be humansize.approximate_size. If you've used classes in Java, this should feel vaguely familiar.
                                                                      3. Instead of calling the function as you would expect to, you asked for one of the function's attributes, __doc__. @@ -175,7 +170,6 @@ if __name__ == "__main__":

                                                              The import search path

                                                              Before this goes any further, I want to briefly mention the library search path. Python looks in several places when you try to import a module. Specifically, it looks in all the directories defined in sys.path. This is just a list, and you can easily view it or modify it with standard list methods. (You'll learn more about lists later in this chapter.) -

                                                              skip over this interpreter listing

                                                               >>> import sys                       
                                                               >>> sys.path                         
                                                              @@ -183,7 +177,7 @@ if __name__ == "__main__":
                                                               >>> sys                              
                                                               <module 'sys' (built-in)>
                                                               >>> sys.path.append('/my/new/path')  
                                                              -
                                                                +
                                                                1. Importing the sys module makes all of its functions and attributes available.
                                                                2. sys.path is a list of directory names that constitute the current search path. (Yours will look different, depending on your operating system, what version of Python you're running, and where it was originally installed.) Python will look through these directories (in this order) for a .py file whose name matches what you're trying to import.
                                                                3. Actually, I lied; the truth is more complicated than that, because not all modules are stored as .py files. Some, like the sys module, are "built-in modules"; they are actually baked right into Python itself. Built-in modules behave just like regular modules, but their Python source code is not available, because they are not written in Python! (The sys module is written in C.) @@ -195,7 +189,6 @@ if __name__ == "__main__":

                                                                  This is so important that I'm going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects.

                                                                  Indenting code

                                                                  Python functions have no explicit begin or end, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (:) and the indentation of the code itself. -

                                                                  skip over this code listing

                                                                  
                                                                   def approximate_size(size, a_kilobyte_is_1024_bytes=True):  
                                                                       if size < 0:                                            
                                                                  @@ -208,7 +201,7 @@ if __name__ == "__main__":
                                                                               return "{0:.1f} {1}".format(size, suffix)
                                                                   
                                                                       raise ValueError('number too large')
                                                                  -
                                                                    +
                                                                    1. Code blocks are defined by their indentation. By "code block," I mean functions, if statements, for loops, while loops, and so forth. Indenting starts a block and unindenting ends it. There are no explicit braces, brackets, or keywords. This means that whitespace is significant, and must be consistent. In this example, the function code is indented four spaces. It doesn't need to be four spaces, it just needs to be consistent. The first line that is not indented marks the end of the function.
                                                                    2. In Python, an if statement is followed by a code block. If the if expression evaluates to true, the indented block is executed, otherwise it falls to the else block (if any). (Note the lack of parentheses around the expression.)
                                                                    3. This line is inside the if code block. This raise statement will raise an exception (of type ValueError), but only if size < 0. @@ -221,22 +214,19 @@ if __name__ == "__main__":

                                      Running scripts

                                      Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them, by including a special block of code that executes when you run the Python file on the command line. Take the last few lines of humansize.py: -

                                      skip over this code listing

                                      
                                       if __name__ == "__main__":
                                           print(approximate_size(1000000000000, False))
                                           print(approximate_size(1000000000000))
                                      -
                                      +

                                      Like C, Python uses == for comparison and = for assignment. Unlike C, Python does not support in-line assignment, so there's no chance of accidentally assigning the value you thought you were comparing.

                                      So what makes this if statement special? Well, modules are objects, and all modules have a built-in attribute __name__. A module's __name__ depends on how you're using the module. If you import the module, then __name__ is the module's filename, without a directory path or file extension. -

                                      skip over this interpreter listing

                                       >>> import humansize
                                       >>> humansize.__name__
                                       'humansize'
                                      -

                                      But you can also run the module directly as a standalone program, in which case __name__ will be a special default value, __main__. Python will evaluate this if statement, find a true expression, and execute the if code block. In this case, to print two values. -

                                      skip over this command output listing +

                                      But you can also run the module directly as a standalone program, in which case __name__ will be a special default value, __main__. Python will evaluate this if statement, find a true expression, and execute the if code block. In this case, to print two values.

                                       c:\home\diveintopython3> c:\python30\python.exe humansize.py
                                       1.0 TB