diff --git a/advanced-classes.html b/advanced-classes.html index b078fe8..e96eae1 100644 --- a/advanced-classes.html +++ b/advanced-classes.html @@ -20,6 +20,8 @@ body{counter-reset:h1 11}
FIXME +
import collections
import itertools
@@ -92,6 +94,8 @@ class OrderedDict(dict, collections.MutableMapping):
return all(p==q for p, q in itertools.zip_longest(self.items(), other.items()))
return dict.__eq__(self, other)
+© 2001–9 Mark Pilgrim diff --git a/advanced-iterators.html b/advanced-iterators.html index a119e7d..3d5027d 100644 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -17,7 +17,7 @@ body{counter-reset:h1 7}
HAWAII + IDAHO + IOWA + OHIO == STATES. Or, to put it another way, 510199 + 98153 + 9301 + 3593 == 621246. Am I speaking in tongues? No, it's just a puzzle.
+
HAWAII + IDAHO + IOWA + OHIO == STATES. Or, to put it another way, 510199 + 98153 + 9301 + 3593 == 621246. Am I speaking in tongues? No, it’s just a puzzle.
Let me spell it out for you. @@ -38,7 +38,7 @@ E = 4
The most well-known alphametic puzzle is SEND + MORE = MONEY.
-
In this chapter, we'll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles in just 14 lines of code. +
In this chapter, we’ll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles in just 14 lines of code.
import re
@@ -91,13 +91,13 @@ if __name__ == '__main__':
>>> re.findall('[A-Z]+', 'SEND + MORE == MONEY') ②
['SEND', 'MORE', 'MONEY']
re module is Python's implementation of regular expressions. It has a nifty function called findall() which takes a regular expression pattern and a string, and finds all occurrences of the pattern within the string. In this case, the pattern matches sequences of numbers. The findall() function returns a list of all the substrings that matched the pattern.
+re module is Python’s implementation of regular expressions. It has a nifty function called findall() which takes a regular expression pattern and a string, and finds all occurrences of the pattern within the string. In this case, the pattern matches sequences of numbers. The findall() function returns a list of all the substrings that matched the pattern.
Set comprehensions make it trivial to find the unique items in a sequence. [FIXME-not sure if I'm going to cover set comprehensions in an earlier chapter; if not, this is certainly an abrupt and inadequate introduction to the topic.] +
Set comprehensions make it trivial to find the unique items in a sequence. [FIXME-not sure if I’m going to cover set comprehensions in an earlier chapter; if not, this is certainly an abrupt and inadequate introduction to the topic.]
>>> a_list = ['a', 'c', 'b', 'a', 'd', 'b']
@@ -112,7 +112,7 @@ if __name__ == '__main__':
>>> {c for c in ''.join(words)} ④
{'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}
for loop. Take the first item from the list, put it in the set. Second. Third. Fourth — wait, that's in the set already, so it only gets listed once. Fifth. Sixth — again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn't even need to be sorted first.
+for loop. Take the first item from the list, put it in the set. Second. Third. Fourth — wait, that’s in the set already, so it only gets listed once. Fifth. Sixth — again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn’t even need to be sorted first.
''.join(a_list) concatenates all the strings together into one.
Like many programming languages, Python has an assert statement. Here's how it works.
+
Like many programming languages, Python has an assert statement. Here’s how it works.
>>> assert 1 + 1 == 2 ① @@ -172,9 +172,9 @@ AssertionError
First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you're doing. Here I'm talking about combinatorics, but if that doesn't mean anything to you, don't worry about it. As always, Wikipedia is your friend.) +
First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you’re doing. Here I’m talking about combinatorics, but if that doesn’t mean anything to you, don’t worry about it. As always, Wikipedia is your friend.) -
The idea is that you take a list of things (could be numbers, could be letters, could be dancing bears) and find all the possible ways to split them up into smaller lists. All the smaller lists have the same size, which can be as small as 1 and as large as the total number of items. Oh, and nothing can be repeated. Mathematicians say things like "let's find the permutations of 3 different items taken 2 at a time," which means you have a sequence of 3 items and you want to find all the possible ordered pairs. +
The idea is that you take a list of things (could be numbers, could be letters, could be dancing bears) and find all the possible ways to split them up into smaller lists. All the smaller lists have the same size, which can be as small as 1 and as large as the total number of items. Oh, and nothing can be repeated. Mathematicians say things like “let’s find the permutations of 3 different items taken 2 at a time,” which means you have a sequence of 3 items and you want to find all the possible ordered pairs.
>>> import itertools ① @@ -197,13 +197,13 @@ AssertionErrorStopIteration
itertools module has all kinds of fun stuff in it, including a permutations() function takes a sequence (here a list of three integers) and a number, which is the number of items you want in each smaller group. The function returns an iterator, which you can use in a foor loop or any old place that iterates. Here I'll step through the iterator manually to show all the values.
+permutations() function takes a sequence (here a list of three integers) and a number, which is the number of items you want in each smaller group. The function returns an iterator, which you can use in a foor loop or any old place that iterates. Here I’ll step through the iterator manually to show all the values.
[1, 2, 3] taken 2 at a time is (1, 2).
(2, 1) is different than (1, 2).
-[1, 2, 3] taken 2 at a time. Pairs like (1, 1) and (2, 2) never show up, because they contain repeats so they aren't valid permutations. When there are no more permutations, the iterator raises a StopIteration exception.
+[1, 2, 3] taken 2 at a time. Pairs like (1, 1) and (2, 2) never show up, because they contain repeats so they aren’t valid permutations. When there are no more permutations, the iterator raises a StopIteration exception.
The permutations() function doesn't have to take a list. It can take any sequence — even a string.
+
The permutations() function doesn’t have to take a list. It can take any sequence — even a string.
>>> import itertools
@@ -245,7 +245,7 @@ StopIteration
[('A', 'B'), ('A', 'C'), ('B', 'C')]
itertools.product() function returns an iterator containing the Cartesian product of two sequences.
-itertools.combinations() function returns an iterator containing all the possible combinations of the given sequence of the given length. This is like the itertools.permutations() function, except combinations don't include items that are duplicates of other items in a different order. So itertools.permutations('ABC', 2) will return both ('A', 'B') and ('B', 'A') (among others), but itertools.combinations('ABC', 2) will not return ('B', 'A') because it is a duplicate of ('A', 'B') in a different order.
+itertools.combinations() function returns an iterator containing all the possible combinations of the given sequence of the given length. This is like the itertools.permutations() function, except combinations don’t include items that are duplicates of other items in a different order. So itertools.permutations('ABC', 2) will return both ('A', 'B') and ('B', 'A') (among others), but itertools.combinations('ABC', 2) will not return ('B', 'A') because it is a duplicate of ('A', 'B') in a different order.
[download favorite-people.txt]
@@ -273,7 +273,7 @@ StopIteration
sorted() function can also take a function as the key parameter, and it sorts by that key. In this case, the sort function is len(), so it sorts by len(each item). Shorter names come first, then longer, then longest.
What does this have to do with the itertools module? I'm glad you asked.
+
What does this have to do with the itertools module? I’m glad you asked.
…continuing from the previous interactive shell… @@ -330,7 +330,7 @@ Wesley
itertools.zip_longest() function stops at the end of the longest sequence, inserting None values for items past the end of the shorter sequences.
-OK, that was all very interesting, but how does it relate to the alphametics solver? Here's how: +
OK, that was all very interesting, but how does it relate to the alphametics solver? Here’s how:
>>> characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')
@@ -343,7 +343,7 @@ Wesley
'N': '5', 'S': '1', 'R': '6', 'Y': '7'}
zip function will create a pairing of letters and digits, in order.
-dict() function to create a dictionary that uses letters as keys and their associated digits as values. Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no "order" per se), you can see that each letter is associated with the digit, based on the ordering of the original characters and guess sequences.
+dict() function to create a dictionary that uses letters as keys and their associated digits as values. Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no “order” per se), you can see that each letter is associated with the digit, based on the ordering of the original characters and guess sequences.
The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution. @@ -355,7 +355,7 @@ for guess in itertools.permutations(digits, len(characters)): ... equation = puzzle.translate(dict(zip(characters, guess))) -
But what is this translate() method? Ah, now you're getting to the really fun part.
+
But what is this translate() method? Ah, now you’re getting to the really fun part.
Needs to become two separate imports:
from . import constants
import sys
-There are variations of this problem scattered throughout the chardet library. In some places it’s "import constants, sys"; in other places, it’s "import constants, re". The fix is the same: manually split the import statement into two lines, one for the relative import, the other for the absolute import.
+
There are variations of this problem scattered throughout the chardet library. In some places it’s “import constants, sys”; in other places, it’s “import constants, re”. The fix is the same: manually split the import statement into two lines, one for the relative import, the other for the absolute import.
Onward!
There's an unfortunate clash of coding style and Python interpreter here. The TypeError could be anywhere on that line, but the traceback doesn't tell you exactly where it is. It could be in the first conditional or the second, and the traceback would look the same. To narrow it down, you should split the line in half, like this:
+
There’s an unfortunate clash of coding style and Python interpreter here. The TypeError could be anywhere on that line, but the traceback doesn’t tell you exactly where it is. It could be in the first conditional or the second, and the traceback would look the same. To narrow it down, you should split the line in half, like this:
elif (self._mInputState == ePureAscii) and \
self._escDetector.search(self._mLastChar + aBuf):
And re-run the test:
@@ -709,8 +709,8 @@ TypeError: Can't convert 'bytes' object to str implicitly File "C:\home\chardet\chardet\universaldetector.py", line 101, in feed self._escDetector.search(self._mLastChar + aBuf): TypeError: Can't convert 'bytes' object to str implicitly -Aha! The problem was not in the first conditional (self._mInputState == ePureAscii) but in the second one. So what could cause a TypeError there? Perhaps you're thinking that the search() method is expecting a value of a different type, but that wouldn't generate this traceback. Python functions can take any value; if you pass the right number of arguments, the function will execute. It may crash if you pass it a value of a different type than it's expecting, but if that happened, the traceback would point to somewhere inside the function. But this traceback says it never got as far as calling the search() method. So the problem must be in that + operation, as it's trying to construct the value that it will eventually pass to the search() method.
-
We know from previous debugging that aBuf is a byte array. So what is self._mLastChar? It's an instance variable, defined in the reset() method, which is actually called from the __init__() method.
+
Aha! The problem was not in the first conditional (self._mInputState == ePureAscii) but in the second one. So what could cause a TypeError there? Perhaps you’re thinking that the search() method is expecting a value of a different type, but that wouldn’t generate this traceback. Python functions can take any value; if you pass the right number of arguments, the function will execute. It may crash if you pass it a value of a different type than it’s expecting, but if that happened, the traceback would point to somewhere inside the function. But this traceback says it never got as far as calling the search() method. So the problem must be in that + operation, as it’s trying to construct the value that it will eventually pass to the search() method.
+
We know from previous debugging that aBuf is a byte array. So what is self._mLastChar? It’s an instance variable, defined in the reset() method, which is actually called from the __init__() method.
class UniversalDetector:
def __init__(self):
self._highBitDetector = re.compile(b'[\x80-\xFF]')
@@ -726,7 +726,7 @@ TypeError: Can't convert 'bytes' object to str implicitly
self._mGotData = False
self._mInputState = ePureAscii
self._mLastChar = ''
-And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can't concatenate a string to a byte array — not even a zero-length string. +
And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can’t concatenate a string to a byte array — not even a zero-length string.
So what is self._mLastChar anyway? The answer is in the feed() method, just a few lines down from where the trackback occurred.
if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):
@@ -736,14 +736,14 @@ TypeError: Can't convert 'bytes' object to str implicitly
self._mInputState = eEscAscii
self._mLastChar = aBuf[-1]
-The calling function calls this feed() method over and over again with a few bytes at a time. The method processes the bytes it was given (passed in as aBuf), then stores the last byte in self._mLastChar in case it's needed during the next call. (In a multi-byte encoding, the feed() method might get called with half of a character, then called again with the other half.) But because aBuf is now a byte array instead of a string, self._mLastChar needs to be a byte array as well. Thus:
+
The calling function calls this feed() method over and over again with a few bytes at a time. The method processes the bytes it was given (passed in as aBuf), then stores the last byte in self._mLastChar in case it’s needed during the next call. (In a multi-byte encoding, the feed() method might get called with half of a character, then called again with the other half.) But because aBuf is now a byte array instead of a string, self._mLastChar needs to be a byte array as well. Thus:
def reset(self):
.
.
.
- self._mLastChar = ''
+ self._mLastChar = b''
-Searching the entire codebase for "mLastChar" turns up a similar problem in mbcharsetprober.py, but instead of tracking the last character, it tracks the last two characters. The MultiByteCharSetProber class uses a list of 1-character strings to track the last two characters; in Python 3, it needs to use a list of integers.
+
Searching the entire codebase for “mLastChar” turns up a similar problem in mbcharsetprober.py, but instead of tracking the last character, it tracks the last two characters. The MultiByteCharSetProber class uses a list of 1-character strings to track the last two characters; in Python 3, it needs to use a list of integers.
class MultiByteCharSetProber(CharSetProber):
def __init__(self):
@@ -762,7 +762,7 @@ TypeError: Can't convert 'bytes' object to str implicitly
'int' and 'bytes'I have good news, and I have bad news. The good news is we're making progress… +
I have good news, and I have bad news. The good news is we’re making progress…
C:\home\chardet> python test.py tests\*\* tests\ascii\howto.diveintomark.org.xml Traceback (most recent call last): @@ -771,8 +771,8 @@ TypeError: Can't convert 'bytes' object to str implicitlyFile "C:\home\chardet\chardet\universaldetector.py", line 101, in feed self._escDetector.search(self._mLastChar + aBuf): TypeError: unsupported operand type(s) for +: 'int' and 'bytes' -
…The bad news is it doesn't always feel like progress. -
But this is progress! Really! Even though the traceback calls out the same line of code, it's a different error than it used to be. Progress! So what's the problem now? The last time I checked, this line of code didn't try to concatenate an int with a byte array (bytes). In fact, you just spent a lot of time ensuring that self._mLastChar was a byte array. How did it turn into an int?
+
…The bad news is it doesn’t always feel like progress. +
But this is progress! Really! Even though the traceback calls out the same line of code, it’s a different error than it used to be. Progress! So what’s the problem now? The last time I checked, this line of code didn’t try to concatenate an int with a byte array (bytes). In fact, you just spent a lot of time ensuring that self._mLastChar was a byte array. How did it turn into an int?
The answer lies not in the previous lines of code, but in the following lines.
if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):
@@ -783,7 +783,7 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'
self._mLastChar = aBuf[-1]
-This error doesn't occur the first time the feed() method gets called; it occurs the second time, after self._mLastChar has been set to the last byte of aBuf. Well, what's the problem with that? Getting a single element from a byte array yields an integer, not a byte array. To see the difference, follow me to the interactive shell:
+
This error doesn’t occur the first time the feed() method gets called; it occurs the second time, after self._mLastChar has been set to the last byte of aBuf. Well, what’s the problem with that? Getting a single element from a byte array yields an integer, not a byte array. To see the difference, follow me to the interactive shell:
>>> aBuf = b'\xEF\xBB\xBF' ① >>> len(aBuf) @@ -805,19 +805,19 @@ TypeError: unsupported operand type(s) for +: 'int' and 'bytes'--
- Define a byte array of length 3.
- The last element of the byte array is 191. -
- That's an integer. -
- Concatenating an integer with a byte array doesn't work. You've now replicated the error you just found in
universaldetector.py. -- Ah, here's the fix. Instead of taking the last element of the byte array, use list slicing to create a new byte array containing just the last element. That is, start with the last element and continue the slice until the end of the byte array. Now mLastChar is a byte array of length 1. +
- That’s an integer. +
- Concatenating an integer with a byte array doesn’t work. You’ve now replicated the error you just found in
universaldetector.py. +- Ah, here’s the fix. Instead of taking the last element of the byte array, use list slicing to create a new byte array containing just the last element. That is, start with the last element and continue the slice until the end of the byte array. Now mLastChar is a byte array of length 1.
- Concatenating a byte array of length 1 with a byte array of length 3 returns a new byte array of length 4.
So, to ensure that the
feed()method inuniversaldetector.pycontinues to work no matter how often it's called, you need to initialize self._mLastChar as a 0-length byte array, then make sure it stays a byte array. +So, to ensure that the
feed()method inuniversaldetector.pycontinues to work no matter how often it’s called, you need to initialize self._mLastChar as a 0-length byte array, then make sure it stays a byte array.self._escDetector.search(self._mLastChar + aBuf): self._mInputState = eEscAscii- self._mLastChar = aBuf[-1]+ self._mLastChar = aBuf[-1:]-
ord()expected string of length 1, butintfoundTired yet? You're almost there… +
Tired yet? You’re almost there…
C:\home\chardet> python test.py tests\*\* tests\ascii\howto.diveintomark.org.xml ascii with confidence 1.0 tests\Big5\0804.blogspot.com.xml @@ -839,19 +839,19 @@ def next_state(self, c): # for each byte we get its class # if it is first byte, we also get byte length byteCls = self._mModel['classTable'][ord(c)]-That's no help; it's just passed into the function. Let's pop the stack. +
That’s no help; it’s just passed into the function. Let’s pop the stack.
-# utf8prober.py def feed(self, aBuf): for c in aBuf: codingState = self._mCodingSM.next_state(c)And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That's what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an
int, not a 1-character string. In other words, there's no need to call theord()function because c is already anint! +And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an
int, not a 1-character string. In other words, there’s no need to call theord()function because c is already anint!Thus:
-def next_state(self, c): # for each byte we get its class # if it is first byte, we also get byte length- byteCls = self._mModel['classTable'][ord(c)]+ byteCls = self._mModel['classTable'][c]Searching the entire codebase for instances of
"ord(c)"uncovers similar problems insbcharsetprober.py… +Searching the entire codebase for instances of “
ord(c)” uncovers similar problems insbcharsetprober.py…# sbcharsetprober.py def feed(self, aBuf): if not self._mModel['keepEnglishLetter']: @@ -887,7 +887,7 @@ def feed(self, aBuf): + charClass = Latin1_CharToClass[c]Unorderable types:
-int()>=str()Let's go again. +
Let’s go again.
C:\home\chardet> python test.py tests\*\* tests\ascii\howto.diveintomark.org.xml ascii with confidence 1.0 tests\Big5\0804.blogspot.com.xml @@ -905,8 +905,8 @@ tests\Big5\0804.blogspot.com.xml File "C:\home\chardet\chardet\jpcntx.py", line 176, in get_order if ((aStr[0] >= '\x81') and (aStr[0] <= '\x9F')) or \ TypeError: unorderable types: int() >= str()-Did you notice? This time around, the code passed the first test case (
tests\ascii\howto.diveintomark.org.xml). You're making real progress here. -So what's this all about? “Unorderable types”? Once again, the difference between byte arrays and strings is rearing its ugly head. Take a look at the code: +
Did you notice? This time around, the code passed the first test case (
tests\ascii\howto.diveintomark.org.xml). You’re making real progress here. +So what’s this all about? “Unorderable types”? Once again, the difference between byte arrays and strings is rearing its ugly head. Take a look at the code:
charLen = 2 else: charLen = 1class SJISContextAnalysis(JapaneseContextAnalysis): def get_order(self, aStr): if not aStr: return -1, 1 @@ -916,7 +916,7 @@ TypeError: unorderable types: int() >= str()
And where does aStr come from? Let's pop the stack: +
And where does aStr come from? Let’s pop the stack:
def feed(self, aBuf, aLen):
.
.
@@ -924,9 +924,9 @@ TypeError: unorderable types: int() >= str()
i = self._mNeedToSkipCharNum
while i < aLen:
order, charLen = self.get_order(aBuf[i:i+2])
-Oh look, it's our old friend, aBuf. As you might have guessed from every other issue we've encountered in this chapter, aBuf is a byte array. Here, the feed() method isn't just passing it on wholesale; it's slicing it. But as you saw earlier in this chapter, slicing a byte array returns a byte array, so the aStr parameter that gets passed to the get_order() method is still a byte array.
-
And what is this code trying to do with aStr? It's taking the first element of the byte array and comparing it to a string of length 1. In Python 2, that worked, because aStr and aBuf were strings, and aStr[0] would be a string, and you can compare strings for inequality. But in Python 3, aStr and aBuf are byte arrays, aStr[0] is an integer, and you can't compare integers and strings for inequality without explicitly coercing one of them. -
In this case, there's no need to make the code more complicated by adding an explicit coercion. aStr[0] yields an integer; the things you're comparing to are all constants. Let's change them from 1-character strings to integers. +
Oh look, it’s our old friend, aBuf. As you might have guessed from every other issue we’ve encountered in this chapter, aBuf is a byte array. Here, the feed() method isn’t just passing it on wholesale; it’s slicing it. But as you saw earlier in this chapter, slicing a byte array returns a byte array, so the aStr parameter that gets passed to the get_order() method is still a byte array.
+
And what is this code trying to do with aStr? It’s taking the first element of the byte array and comparing it to a string of length 1. In Python 2, that worked, because aStr and aBuf were strings, and aStr[0] would be a string, and you can compare strings for inequality. But in Python 3, aStr and aBuf are byte arrays, aStr[0] is an integer, and you can’t compare integers and strings for inequality without explicitly coercing one of them. +
In this case, there’s no need to make the code more complicated by adding an explicit coercion. aStr[0] yields an integer; the things you’re comparing to are all constants. Let’s change them from 1-character strings to integers.
class SJISContextAnalysis(JapaneseContextAnalysis):
def get_order(self, aStr):
if not aStr: return -1, 1
@@ -1115,7 +1115,7 @@ tests\Big5\0804.blogspot.com.xml
File "C:\home\chardet\chardet\latin1prober.py", line 126, in get_confidence
total = reduce(operator.add, self._mFreqCounter)
NameError: global name 'reduce' is not defined
-According to the official What's New In Python 3.0 guide, the reduce() function has been moved out of the global namespace and into the functools module. Quoting the guide: "Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable." You can read more about the decision from Guido van Rossum's weblog: The fate of reduce() in Python 3000.
+
According to the official What’s New In Python 3.0 guide, the reduce() function has been moved out of the global namespace and into the functools module. Quoting the guide: “Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.” You can read more about the decision from Guido van Rossum’s weblog: The fate of reduce() in Python 3000.
def get_confidence(self):
if self.get_state() == constants.eNotMe:
return 0.01
@@ -1129,7 +1129,7 @@ NameError: global name 'reduce' is not defined
Since you're no longer using the operator module, you can remove that import from the top of the file as well.
+
Since you’re no longer using the operator module, you can remove that import from the top of the file as well.
from .charsetprober import CharSetProber
from . import constants
- import operator
@@ -1172,11 +1172,11 @@ tests\EUC-JP\arclamp.jp.xml EUC-JP with confide
What have we learned?
2to3 tool is helpful as far as it goes, but it will only do the easy parts — function renames, module renames, syntax changes. It's an impressive piece of engineering, but in the end it's just an intelligent search-and-replace bot.
-chardet library is to convert a stream of bytes into a string. But “a stream of bytes” comes up more often than you might think. Reading a file in “binary” mode? You'll get a stream of bytes. Fetching a web page? Calling a web API? They return a stream of bytes, too.
+2to3 tool is helpful as far as it goes, but it will only do the easy parts — function renames, module renames, syntax changes. It’s an impressive piece of engineering, but in the end it’s just an intelligent search-and-replace bot.
+chardet library is to convert a stream of bytes into a string. But “a stream of bytes” comes up more often than you might think. Reading a file in “binary” mode? You’ll get a stream of bytes. Fetching a web page? Calling a web API? They return a stream of bytes, too.
chardet works in Python 3 is because I had a test suite that exercised every line of code in the entire library. I never would have found half of these problems with manual spot-checking.
+chardet works in Python 3 is because I had a test suite that exercised every line of code in the entire library. I never would have found half of these problems with manual spot-checking.
You can see the full table of contents (not finalized), or read what I’ve written so far:
-Let's take that one line at a time. +
Let’s take that one line at a time.
class Fib:
-class? What's a class?
+
class? What’s a class?
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you've defined. +
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you’ve defined. -
Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word class, followed by the class name. Technically, that's all that's required, since a class doesn't need to inherit from any other class.
+
Defining a class in Python is simple. As with functions, there is no separate interface definition. Just define the class and start coding. A Python class starts with the reserved word class, followed by the class name. Technically, that’s all that’s required, since a class doesn’t need to inherit from any other class.
class PapayaWhip: ①
pass ②
PapayaWhip, and it doesn't inherit from any other class. Class names are usually capitalized, EachWordLikeThis, but this is only a convention, not a requirement.
+PapayaWhip, and it doesn’t inherit from any other class. Class names are usually capitalized, EachWordLikeThis, but this is only a convention, not a requirement.
if statement, for loop, or any other block of code. The first line not indented is outside the class.
This PapayaWhip class doesn't define any methods or attributes, but syntactically, there needs to be something in the definition, thus the pass statement. This is a Python reserved word that just means “move along, nothing to see here”. It's a statement that does nothing, and it's a good placeholder when you're stubbing out functions or classes.
+
This PapayaWhip class doesn’t define any methods or attributes, but syntactically, there needs to be something in the definition, thus the pass statement. This is a Python reserved word that just means “move along, nothing to see here”. It’s a statement that does nothing, and it’s a good placeholder when you’re stubbing out functions or classes.
-☞The
passstatement in Python is like a empty set of curly braces ({}) in Java or C.
Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Although it's not required, Python classes can have something similar to a constructor: the __init__() method.
+
Many classes are inherited from other classes, but this one is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don’t have explicit constructors and destructors. Although it’s not required, Python classes can have something similar to a constructor: the __init__() method.
__init__() Methoddocstrings too, just like modules and functions.
-__init__() method is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor of the class. It's tempting, because it looks like a constructor (by convention, the __init__() method is the first method defined for the class), acts like one (it's the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the __init__() method is called, and you already have a valid reference to the new instance of the class.
+__init__() method is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor of the class. It’s tempting, because it looks like a constructor (by convention, the __init__() method is the first method defined for the class), acts like one (it’s the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the __init__() method is called, and you already have a valid reference to the new instance of the class.
The first argument of every class method, including the __init__() method, is always a reference to the current instance of the class. By convention, this argument is named self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention.
+
The first argument of every class method, including the __init__() method, is always a reference to the current instance of the class. By convention, this argument is named self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don’t call it anything but self; this is a very strong convention.
In the __init__() method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.
@@ -99,10 +99,10 @@ class Fib:
>>> fib.__doc__ ④
'iterator that yields numbers in the Fibanocci sequence'
Fib class (defined in the fibonacci2 module) and assigning the newly created instance to the variable fib. You are passing one parameter, 100, which will end up as the max argument in Fib's __init__() method.
+Fib class (defined in the fibonacci2 module) and assigning the newly created instance to the variable fib. You are passing one parameter, 100, which will end up as the max argument in Fib’s __init__() method.
Fib class.
-__class__, which is the object's class. Java programmers may be familiar with the Class class, which contains methods like getName and getSuperclass to get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like __class__, __name__, and __bases__.
-docstring just as with a function or a module. All instances of a class share the same docstring.
+__class__, which is the object’s class. Java programmers may be familiar with the Class class, which contains methods like getName and getSuperclass to get metadata information about an object. In Python, this kind of metadata is available directly on the object itself through attributes like __class__, __name__, and __bases__.
+docstring just as with a function or a module. All instances of a class share the same docstring.
@@ -117,7 +117,7 @@ class Fib: def __init__(self, max): self.max = max ①-
- What is self.max? It's an instance variable. It is completely separate from max, which was passed into the
__init__()method as an argument. self.max is “global” to the instance. That means that you can access it from other methods. +- What is self.max? It’s an instance variable. It is completely separate from max, which was passed into the
__init__()method as an argument. self.max is “global” to the instance. That means that you can access it from other methods.class Fib: @@ -147,7 +147,7 @@ class Fib:A Fibonacci Iterator
-Now you're ready to learn how to build an iterator. An iterator is just a class that defines an
__iter__()method. +Now you’re ready to learn how to build an iterator. An iterator is just a class that defines an
__iter__()method.class Fib: ① @@ -195,7 +195,7 @@ class Fib:A Plural Rule Iterator
-Now it’s time for the finale. Let's rewrite the plural rules generator as an iterator. +
Now it’s time for the finale. Let’s rewrite the plural rules generator as an iterator.
class LazyRules: @@ -246,7 +246,7 @@ rules = LazyRules()- Also, this is a good place to initialize the cache, which you’ll use later as you read the patterns from the pattern file.
Before we continue, let's take a closer look at rules_f. It's not defined within the __init__() method. In fact, it's not defined within any method. It's defined at the class level. It's a class variable, and although you can access it just like an instance variable (self.rules_f), it is shared across all instances of the LazyRules class.
+
Before we continue, let’s take a closer look at rules_f. It’s not defined within the __init__() method. In fact, it’s not defined within any method. It’s defined at the class level. It’s a class variable, and although you can access it just like an instance variable (self.rules_f), it is shared across all instances of the LazyRules class.
>>> import plural6 @@ -364,34 +364,3 @@ rules = LazyRules()
© 2001–9 Mark Pilgrim - - diff --git a/native-datatypes.html b/native-datatypes.html index a19541e..c219056 100644 --- a/native-datatypes.html +++ b/native-datatypes.html @@ -17,7 +17,7 @@ body{counter-reset:h1 2}
Cast aside your first Python program for just a minute, and let's talk about datatypes. In Python, every variable has a datatype, but you don't need to declare it explicitly. Based on each variable's original assignment, Python figures out what type it is and keeps tracks of that internally. +
Cast aside your first Python program for just a minute, and let’s talk about datatypes. In Python, every variable has a datatype, but you don’t need to declare it explicitly. Based on each variable’s original assignment, Python figures out what type it is and keeps tracks of that internally.
Python has many native datatypes. Here are the important ones:
True or False.
@@ -28,8 +28,8 @@ body{counter-reset:h1 2}
Of course, there are a lot more types than these seven. Everything is an object in Python, so there are types like module, function, class, method, file, and even compiled code. You've already seen some of these: modules have names, functions have docstrings, &c. You'll learn about classes in [FIXME xref] and files in [FIXME xref].
-
Strings and bytes are important enough — and complicated enough — that they get their own chapter. Let's look at the others first. +
Of course, there are a lot more types than these seven. Everything is an object in Python, so there are types like module, function, class, method, file, and even compiled code. You’ve already seen some of these: modules have names, functions have docstrings, &c. You’ll learn about classes in [FIXME xref] and files in [FIXME xref].
+
Strings and bytes are important enough — and complicated enough — that they get their own chapter. Let’s look at the others first.
Booleans are either true or false. Python has two constants, True and False, which can be used to assign boolean values directly. Expressions can also evaluate to a boolean value. In certain places (like if statements), Python expects an expression to evaluate to a boolean value. These places are called boolean contexts. You can use virtually any expression in a boolean context, and Python will try to determine its truth value. Different datatypes have different rules about which values are true or false in a boolean context. (This will make more sense once you see some concrete examples later in this chapter.)
@@ -48,7 +48,7 @@ body{counter-reset:h1 2}
>>> size < 0
True
Numbers are awesome. There are so many to choose from. Python supports both integers and floating point numbers. There's no type declaration to distinguish them; Python tells them apart by the presence or absence of a decimal point. +
Numbers are awesome. There are so many to choose from. Python supports both integers and floating point numbers. There’s no type declaration to distinguish them; Python tells them apart by the presence or absence of a decimal point.
>>> type(1) ① <class 'int'> @@ -82,7 +82,7 @@ body{counter-reset:h1 2}
int to a float by calling the float() function.
float to an int by calling int().
int() function will truncate, not round.
-int() function truncates negative numbers towards 0. It's a true truncate function, not a a floor function.
+int() function truncates negative numbers towards 0. It’s a true truncate function, not a a floor function.
/ operator performs floating point division. It returns a float even if both the numerator and denominator are ints.
// operator performs a quirky kind of integer division. When the result is positive, you can think of it as truncating (not rounding) to 0 decimal places, but be careful with that.
-// operator rounds “up” to the nearest integer. Mathematically speaking, it's rounding “down” since −6 is less than −5, but it could trip you up if you expecting it to truncate to −5.
-// operator doesn't always return an integer. If either the numerator or denominator is a float, it will still round to the nearest integer, but the actual return value will be a float.
+// operator rounds “up” to the nearest integer. Mathematically speaking, it’s rounding “down” since −6 is less than −5, but it could trip you up if you expecting it to truncate to −5.
+// operator doesn’t always return an integer. If either the numerator or denominator is a float, it will still round to the nearest integer, but the actual return value will be a float.
** operator means “raised to the power of.” 112 is 121.
% operator gives the remainder after performing integer division. 11 divided by 2 is 5 with a remainder of 1, so the result here is 1.
☞In Python 2, the / operator usually meant integer division, but you could make it behave like floating point division by including a special directive in your code. In Python 3, the / operator always means floating point division. See PEP 238 for details.
Python isn't limited to integers and floating point numbers. It can also do all the fancy math you learned in high school and promptly forgot about. +
Python isn’t limited to integers and floating point numbers. It can also do all the fancy math you learned in high school and promptly forgot about.
>>> import fractions ① >>> x = fractions.Fraction(1, 3) ② @@ -144,7 +144,7 @@ body{counter-reset:h1 2} >>> math.tan(math.pi / 4) ③ 0.99999999999999989
math module has a constant for π, the ratio of a circle's circumference to its diameter.
+math module has a constant for π, the ratio of a circle’s circumference to its diameter.
math module has all the basic trigonometric functions, including sin(), cos(), tan(), and variants like asin().
tan(π / 4) should return 1.0, not 0.99999999999999989.
0 is false.
-0.0 is false. Be careful with this one! If there's the slightest rounding error (not impossible, as you saw in the previous section) then Python will be testing 0.0000000000001 instead of 0 and will return True.
+0.0 is false. Be careful with this one! If there’s the slightest rounding error (not impossible, as you saw in the previous section) then Python will be testing 0.0000000000001 instead of 0 and will return True.
Fraction(0, n) is false for all values of n. All other fractions are true.
Lists are Python's workhorse datatype. When I say “list,” you might be thinking “array whose size I have to declare in advance, that can only contain items of the same type, &c.” Don't think that. Lists are much cooler than that. +
Lists are Python’s workhorse datatype. When I say “list,” you might be thinking “array whose size I have to declare in advance, that can only contain items of the same type, &c.” Don’t think that. Lists are much cooler than that.
☞A list in Python is like an array in Perl 5. In Perl 5, variables that store arrays always start with the
@character; in Python, variables can be named anything, and Python keeps track of the datatype internally.
-☞A list in Python is much more than an array in Java (although it can be used as one if that's really all you want out of life). A better analogy would be to the
ArrayListclass, which can hold arbitrary objects and can expand dynamically as new items are added. +☞A list in Python is much more than an array in Java (although it can be used as one if that’s really all you want out of life). A better analogy would be to the
ArrayListclass, which can hold arbitrary objects and can expand dynamically as new items are added.
Creating a list is easy: use square brackets to wrap a comma-separated list of values. @@ -210,7 +210,7 @@ body{counter-reset:h1 2}
Once you've defined a list, you can get any part of it as a new list. This is called slicing the list. +
Once you’ve defined a list, you can get any part of it as a new list. This is called slicing the list.
>>> a_list
['a', 'b', 'mpilgrim', 'z', 'example']
@@ -228,7 +228,7 @@ body{counter-reset:h1 2}
['a', 'b', 'mpilgrim', 'z', 'example']
a_list[1]), up to but not including the second slice index (in this case a_list[3]).
-a_list[0:3] returns the first three items of the list, starting at a_list[0], up to but not including a_list[3].
0, you can leave it out, and 0 is implied. So a_list[:3] is the same as a_list[0:3], because the starting 0 is implied.
a_list[3:] is the same as a_list[3:5], because this list has five items. There is a pleasing symmetry here. In this five-item list, a_list[:3] returns the first 3 items, and a_list[3:] returns the last two items. In fact, a_list[:n] will always return the first n items, and a_list[n:] will return the rest, regardless of the length of the list.
@@ -251,12 +251,12 @@ body{counter-reset:h1 2}
>>> a_list
['a', 'a', 2.0, 3, True, 'four', 'e']
+ operator concatenates lists. A list can contain any number of items; there is no size limit (other than available memory). A list can contain items of any datatype; they don't all need to be the same type. Here we have a list containing a string, a floating point number, and an integer.
++ operator concatenates lists. A list can contain any number of items; there is no size limit (other than available memory). A list can contain items of any datatype; they don’t all need to be the same type. Here we have a list containing a string, a floating point number, and an integer.
append() method adds a single item to the end of the list. (Now we have four different datatypes in the list!)
extend() method takes one argument, a list, and appends each of the items of the argument to the original list.
insert() method inserts a single item into a list. The first argument is the index of the first item in the list that will get bumped out of position. List items do not need to be unique; for example, there are now two separate items with the value 'a', a_list[0] and a_list[1].
Let's look closer at the difference between append() and extend().
+
Let’s look closer at the difference between append() and extend().
>>> a_list = ['a', 'b', 'c'] >>> a_list.extend(['d', 'e', 'f']) ① @@ -276,8 +276,8 @@ body{counter-reset:h1 2}
- The
extend()method takes a single argument, which is always a list, and adds each of the items of that list to a_list.- If you start with a list of three items and extend it with a list of another three items, you end up with a list of six items. -
- On the other hand, the
append()method takes any number of arguments, each of which can be any datatype. Here, you're calling theappend()method with a single argument, a list of three items. -- If you start with a list of six items and append a list onto it, you end up with... a list of seven items. Why seven? Because the last item (which you just appended) is itself a list. Lists can contain any type of data, including other lists. That may be what you want, or it may not. But it's what you asked for, and it's what you got. +
- On the other hand, the
append()method takes any number of arguments, each of which can be any datatype. Here, you’re calling theappend()method with a single argument, a list of three items. +- If you start with a list of six items and append a list onto it, you end up with... a list of seven items. Why seven? Because the last item (which you just appended) is itself a list. Lists can contain any type of data, including other lists. That may be what you want, or it may not. But it’s what you asked for, and it’s what you got.
Searching For Values In A List
@@ -324,7 +324,7 @@ ValueError: list.index(x): x not in listFIXME -->
Dictionaries
-One of Python's most important datatypes is the dictionary, which defines one-to-one relationships between keys and values. +
One of Python’s most important datatypes is the dictionary, which defines one-to-one relationships between keys and values.
@@ -346,7 +346,7 @@ KeyError: 'db.diveintopython3.org'☞A dictionary in Python is like a hash in Perl 5. In Perl 5, variables that store hashes always start with a
%character. In Python, variables can be named anything, and Python keeps track of the datatype internally.
'server' is a key, and its associated value, referenced by a_dict["server"], is 'db.diveintopython3.org'.
'database' is a key, and its associated value, referenced by a_dict["database"], is 'mysql'.
-a_dict["server"] is 'db.diveintopython3.org', but a_dict["db.diveintopython3.org"] raises an exception, because 'db.diveintopython3.org' is not a key.
+a_dict["server"] is 'db.diveintopython3.org', but a_dict["db.diveintopython3.org"] raises an exception, because 'db.diveintopython3.org' is not a key.
Dictionaries do not have any predefined size limit. You can add new key-value pairs to a dictionary at any time, or you can modify the value of an existing key. Continuing from the previous example: @@ -370,11 +370,11 @@ KeyError: 'db.diveintopython3.org'
'user', value 'mark') appears to be in the middle. In fact, it was just a coincidence that the items appeared to be in order in the first example; it is just as much a coincidence that they appear to be out of order now.
user key back to "mark"? No! Look at the key closely — that's a capital U in "User". Dictionary keys are case-sensitive, so this statement is creating a new key-value pair, not overwriting an existing one. It may look similar to you, but as far as Python is concerned, it's completely different.
+user key back to "mark"? No! Look at the key closely — that’s a capital U in "User". Dictionary keys are case-sensitive, so this statement is creating a new key-value pair, not overwriting an existing one. It may look similar to you, but as far as Python is concerned, it’s completely different.
Dictionaries aren't just for strings. Dictionary values can be any datatype, including integers, booleans, arbitrary objects, or even other dictionaries. And within a single dictionary, the values don't all need to be the same type; you can mix and match as needed. Dictionary keys are more restricted, but they can be strings, integers, and a few other types. You can also mix and match key datatypes within a dictionary. -
In fact, you've already seen a dictionary with non-string keys and values, in your first Python program. +
Dictionaries aren’t just for strings. Dictionary values can be any datatype, including integers, booleans, arbitrary objects, or even other dictionaries. And within a single dictionary, the values don’t all need to be the same type; you can mix and match as needed. Dictionary keys are more restricted, but they can be strings, integers, and a few other types. You can also mix and match key datatypes within a dictionary. +
In fact, you’ve already seen a dictionary with non-string keys and values, in your first Python program.
SUFFIXES = {1000: ('KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'),
1024: ('KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB')}
Let's tear that apart in the interactive shell. diff --git a/porting-code-to-python-3-with-2to3.html b/porting-code-to-python-3-with-2to3.html index 42fe31e..7a7d8a5 100644 --- a/porting-code-to-python-3-with-2to3.html +++ b/porting-code-to-python-3-with-2to3.html @@ -27,7 +27,7 @@ td pre{padding:0;border:0}
Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can't fix automatically. This appendix documents what it can fix automatically.
+
Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can’t fix automatically. This appendix documents what it can fix automatically.
print statementIn Python 2, print was a statement. Whatever you wanted to print simply followed the print keyword. In Python 3, print() is a function — whatever you want to print is passed to print() like any other function.
| Notes | Python 2
@@ -219,7 +219,7 @@ import CGIHttpServer
http.server module provides a basic HTTP server.
- |
|---|
| Notes | Python 2 @@ -368,10 +368,10 @@ except ImportError: |
|---|
from . import syntax. The period is actually a relative path from this file (universaldetector.py) to the file you want to import (constants.py). In this case, they are in the same directory, thus the single period. You can also import from the parent directory (from .. import anothermodule) or a subdirectory.
-mbcharsetprober.py is in the same directory as universaldetector.py, so the path is a single period. You can also import form the parent directory (from ..anothermodule import AnotherClass) or a subdirectory.
+mbcharsetprober.py is in the same directory as universaldetector.py, so the path is a single period. You can also import form the parent directory (from ..anothermodule import AnotherClass) or a subdirectory.
next() iterator methodIn Python 2, iterators had a next() method which returned the next item in the sequence. That's still true in Python 3, but there is now also a global next() function that takes an iterator as an argument.
+
In Python 2, iterators had a next() method which returned the next item in the sequence. That’s still true in Python 3, but there is now also a global next() function that takes an iterator as an argument.
| Notes | Python 2 @@ -403,11 +403,11 @@ for an_iterator in a_sequence_of_iterators: an_iterator.__next__() |
|---|
next() method, you now pass the iterator itself to the global next() function.
+next() method, you now pass the iterator itself to the global next() function.
next() function. (The 2to3 script is smart enough to convert this properly.)
__next__() special method.
next() that takes one or more arguments, 2to3 will not touch it. This class can not be used as an iterator, because its next() method takes arguments.
-next() function. In this case, you need to call the iterator's special __next()__ method to get the next item in the sequence. (Alternatively, you could also refactor the code so the local variable wasn't named next, but 2to3 will not do that for you automatically.)
+next() function. In this case, you need to call the iterator’s special __next()__ method to get the next item in the sequence. (Alternatively, you could also refactor the code so the local variable wasn’t named next, but 2to3 will not do that for you automatically.)
filter() global functionIn Python 2, the filter() function returned a list, the result of filtering a sequence through a function that returned True or False for each item in the sequence. In Python 3, the filter() function returns an iterator, not a list.
@@ -482,7 +482,7 @@ reduce(a, b, c)
☞The version of 2to3 that shipped with Python 3.0 would not fix the reduce() function automatically. The fix first appeared in the 2to3 script that shipped with Python 3.1.
apply() global functionPython 2 had a global function called apply(), which took a function f and a list [a, b, c] and returned f(a, b, c). In Python 3, the apply() function no longer exists. Instead, there is a new function calling syntax that allows you to pass a list and have Python apply the list as the function's arguments.
+
Python 2 had a global function called apply(), which took a function f and a list [a, b, c] and returned f(a, b, c). In Python 3, the apply() function no longer exists. Instead, there is a new function calling syntax that allows you to pass a list and have Python apply the list as the function’s arguments.
| Notes | Python 2
@@ -538,7 +538,7 @@ reduce(a, b, c)
exec statement could also take a local namespace (like the variables defined within a function). In Python 3, the exec() function can do that too.
- |
|---|
| Notes | Python 2
@@ -607,7 +607,7 @@ except:
@@ -660,7 +660,7 @@ except:
- |
|---|
| Notes | Python 2
@@ -966,7 +966,7 @@ except:
itertools.izip(), just use the global zip() function.
itertools.imap(), just use map().
itertools.ifilter() becomes filter().
-itertools module still exists in Python 3, it just doesn't have the functions that have migrated to the global namespace. The 2to3 script is smart enough to remove the specific imports that no longer exist, while leaving other imports intact.
+itertools module still exists in Python 3, it just doesn’t have the functions that have migrated to the global namespace. The 2to3 script is smart enough to remove the specific imports that no longer exist, while leaving other imports intact.
|
|---|
2to3 script is smart enough to construct a valid class declaration, even if the class is inherited from one or more base classes.
The rest of the “fixes” listed here aren't really fixes per se. That is, the things they change are matters of style, not substance. They work just as well in Python 3 as they do in Python 2, but the developers of Python have a vested interest in making Python code as uniform as possible. To that end, there is an official Python style guide which outlines — in excruciating detail — all sorts of nitpicky details that you almost certainly don't care about. And given that 2to3 provides such a great infrastructure for converting Python code from one thing to another, the authors took it upon themselves to add a few optional features to improve the readability of your Python programs.
+
The rest of the “fixes” listed here aren’t really fixes per se. That is, the things they change are matters of style, not substance. They work just as well in Python 3 as they do in Python 2, but the developers of Python have a vested interest in making Python code as uniform as possible. To that end, there is an official Python style guide which outlines — in excruciating detail — all sorts of nitpicky details that you almost certainly don’t care about. And given that 2to3 provides such a great infrastructure for converting Python code from one thing to another, the authors took it upon themselves to add a few optional features to improve the readability of your Python programs.
set() literals (explicit)In Python 2, the only way to define a literal set in your code was to call set(a_sequence). This still works in Python 3, but a clearer way of doing it is to use the new set literal notation: curly braces. (Dictionaries are also defined with curly braces, which makes sense once you think about it, because dictionaries are just sets of key-value pairs.)
@@ -1053,7 +1053,7 @@ except:{i for i in a_sequence}
buffer() global function (explicit)Python objects implemented in C can export a “buffer interface,” which is a block of memory that is directly readable and writeable without copying. (That is exactly as powerful and scary as it sounds.) In Python 3, buffer() has been renamed to memoryview(). (It's a little more complicated than that, but you can almost certainly ignore the differences.)
+
Python objects implemented in C can export a “buffer interface,” which is a block of memory that is directly readable and writeable without copying. (That is exactly as powerful and scary as it sounds.) In Python 3, buffer() has been renamed to memoryview(). (It’s a little more complicated than that, but you can almost certainly ignore the differences.)
@@ -1084,7 +1084,7 @@ except:☞The
2to3script will not fix thebuffer()function by default. To enable this fix, specify -f buffer on the command line when you call2to3.
{a: b}
There were a number of common idioms built up in the Python community. Some, like the while 1: loop, date back to Python 1. (Python didn't have a true boolean type until version 2.3, so developers used 1 and 0 instead.) Modern Python programmers should train their brains to use modern versions of these idioms instead.
+
There were a number of common idioms built up in the Python community. Some, like the while 1: loop, date back to Python 1. (Python didn’t have a true boolean type until version 2.3, so developers used 1 and 0 instead.) Modern Python programmers should train their brains to use modern versions of these idioms instead.
diff --git a/refactoring.html b/refactoring.html index 031e383..ef7d02f 100644 --- a/refactoring.html +++ b/refactoring.html @@ -17,13 +17,13 @@ body{counter-reset:h1 10}☞The
2to3script will not fix common idioms by default. To enable this fix, specify -f idioms on the command line when you call2to3.
Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven't written yet. +
Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven’t written yet.
>>> import roman7 >>> roman7.from_roman("") ① 0
InvalidRomanNumeralError exception just like any other sequence of characters that don't represent a valid Roman numeral.
+InvalidRomanNumeralError exception just like any other sequence of characters that don’t represent a valid Roman numeral.
After reproducing the bug, and before fixing it, you should write a test case that fails, thus illustrating the bug. @@ -107,15 +107,15 @@ Ran 11 tests in 0.156s OK ②
Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may seem like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn't pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run all the test cases along with your new one, you are much less likely to break old code when fixing new code. Today's unit test is tomorrow's regression test. +
Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may seem like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn’t pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run all the test cases along with your new one, you are much less likely to break old code when fixing new code. Today’s unit test is tomorrow’s regression test.
Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change. +
Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don’t know what they want until they see it, and even if they do, they aren’t that good at articulating what they want precisely enough to be useful. And even if they do, they’ll want more in the next release anyway. So be prepared to update your test cases as requirements change. -
Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember [FIXME-xref] the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 M characters in a row to represent 4000. If you make this change, you'll be able to expand the range of convertible numbers from 1..3999 to 1..4999. But first, you need to make some changes to your test cases.
+
Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember [FIXME-xref] the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 M characters in a row to represent 4000. If you make this change, you’ll be able to expand the range of convertible numbers from 1..3999 to 1..4999. But first, you need to make some changes to your test cases.
@@ -157,7 +157,7 @@ class RoundtripCheck(unittest.TestCase):
result = roman8.from_roman(numeral)
self.assertEqual(integer, result)
4000 range. Here I've included 4000 (the shortest), 4500 (the second shortest), 4888 (the longest), and 4999 (the largest).
+4000 range. Here I’ve included 4000 (the shortest), 4500 (the second shortest), 4888 (the longest), and 4999 (the largest).
to_roman() with 4000 and expect an error; now that 4000-4999 are good values, you need to bump this up to 5000.
from_roman() with 'MMMM' and expect an error; now that MMMM is considered a valid Roman numeral, you need to bump this up to 'MMMMM'.
1 to 3999. Since the range has now expanded, this for loop need to be updated as well to go up to 4999.
@@ -220,7 +220,7 @@ FAILED (errors=3)
4000, because to_roman() still thinks this is out of range.
Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being tested is never “ahead” of the test cases. While it's behind, you still have some work to do, and as soon as it catches up to the test cases, you stop coding.) +
Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being tested is never “ahead” of the test cases. While it’s behind, you still have some work to do, and as soon as it catches up to the test cases, you stop coding.)
@@ -255,11 +255,11 @@ def from_roman(s):
.
.
from_roman() function at all. The only change is to roman_numeral_pattern. If you look closely, you'll notice that I changed the maximum number of optional M characters from 3 to 4 in the first section of the regular expression. This will allow the Roman numeral equivalents of 4999 instead of 3999. The actual from_roman() function is completely generic; it just looks for repeated Roman numeral characters and adds them up, without caring how many times they repeat. The only reason it didn't handle 'MMMM' before is that you explicitly stopped it with the regular expression pattern matching.
-to_roman() function only needs one small change, in the range check. Where you used to check 0 < n < 4000, you now check 0 < n < 5000. And you change the error message that you raise to reflect the new acceptable range (1..4999 instead of 1..3999). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds 'M' for each thousand that it finds; given 4000, it will spit out 'MMMM'. The only reason it didn't do this before is that you explicitly stopped it with the range check.)
+from_roman() function at all. The only change is to roman_numeral_pattern. If you look closely, you’ll notice that I changed the maximum number of optional M characters from 3 to 4 in the first section of the regular expression. This will allow the Roman numeral equivalents of 4999 instead of 3999. The actual from_roman() function is completely generic; it just looks for repeated Roman numeral characters and adds them up, without caring how many times they repeat. The only reason it didn’t handle 'MMMM' before is that you explicitly stopped it with the regular expression pattern matching.
+to_roman() function only needs one small change, in the range check. Where you used to check 0 < n < 4000, you now check 0 < n < 5000. And you change the error message that you raise to reflect the new acceptable range (1..4999 instead of 1..3999). You don’t need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds 'M' for each thousand that it finds; given 4000, it will spit out 'MMMM'. The only reason it didn’t do this before is that you explicitly stopped it with the range check.)
You may be skeptical that these two small changes are all that you need. Hey, don't take my word for it; see for yourself. +
You may be skeptical that these two small changes are all that you need. Hey, don’t take my word for it; see for yourself.
you@localhost:~$ python3 romantest9.py -v @@ -288,13 +288,13 @@ Ran 12 tests in 0.203s-Refactoring
-The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly. +
The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn’t. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
Refactoring is the process of taking working code and making it work better. Usually, “better” means “faster”, although it can also mean “using less memory”, or “using less disk space”, or simply “more elegantly”. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any program. -
Here, “better” means both “faster” and “easier to maintain.” Specifically, the
from_roman()function is slower and more complex than I'd like, because of that big nasty regular expression that you use to validate Roman numerals. Now, you might think, "Sure, the regular expression is big and hairy, but how else am I supposed to validate that an arbitrary string is a valid a Roman numeral?" +Here, “better” means both “faster” and “easier to maintain.” Specifically, the
from_roman()function is slower and more complex than I’d like, because of that big nasty regular expression that you use to validate Roman numerals. Now, you might think, "Sure, the regular expression is big and hairy, but how else am I supposed to validate that an arbitrary string is a valid a Roman numeral?" -Answer: there's only 5000 of them; why don't you just build a lookup table? This idea gets even better when you realize that you don't need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals, you can build the reverse lookup table to convert Roman numerals to integers. By the time you need to check whether an arbitrary string is a valid Roman numeral, you will have collected all the valid Roman numerals. “Validating” is reduced to a single dictionary lookup. +
Answer: there’s only 5000 of them; why don’t you just build a lookup table? This idea gets even better when you realize that you don’t need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals, you can build the reverse lookup table to convert Roman numerals to integers. By the time you need to check whether an arbitrary string is a valid Roman numeral, you will have collected all the valid Roman numerals. “Validating” is reduced to a single dictionary lookup.
And best of all, you already have a complete set of unit tests. You can change over half the code in the module, but the unit tests will stay the same. That means you can prove — to yourself and to others — that the new code works just as well as the original. @@ -357,13 +357,13 @@ def build_lookup_tables(): build_lookup_tables()
Let's break that down into digestable pieces. Arguably, the most important line is the last one: +
Let’s break that down into digestable pieces. Arguably, the most important line is the last one:
build_lookup_tables()
-You will note that is a function call, but there's no if statement around it. This is not an if __name__ == '__main__' block; it gets called when the module is imported. (It is important to understand that modules are only imported once, then cached. If you import an already-imported module, it does nothing. So this code will only get called the first time you import this module.)
+
You will note that is a function call, but there’s no if statement around it. This is not an if __name__ == '__main__' block; it gets called when the module is imported. (It is important to understand that modules are only imported once, then cached. If you import an already-imported module, it does nothing. So this code will only get called the first time you import this module.)
-
So what does the build_lookup_tables() function do? I'm glad you asked.
+
So what does the build_lookup_tables() function do? I’m glad you asked.
to_roman_table = [ None ]
from_roman_table = {}
@@ -438,7 +438,7 @@ to_roman should fail with 0 input ... ok
OK
to_roman() and from_roman() functions. Since the tests make several thousand function calls (the roundtrip test alone makes 10,000), this savings adds up in a hurry!
+to_roman() and from_roman() functions. Since the tests make several thousand function calls (the roundtrip test alone makes 10,000), this savings adds up in a hurry!
The moral of the story? @@ -451,9 +451,9 @@ OK
Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it work, you'll wonder how you ever got along without it. +
Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you’ve seen it work, you’ll wonder how you ever got along without it. -
These few chapters have covered a lot of ground, and much of it wasn't even Python-specific. There are unit testing frameworks for many languages, all of which require you to understand the same basic concepts: +
These few chapters have covered a lot of ground, and much of it wasn’t even Python-specific. There are unit testing frameworks for many languages, all of which require you to understand the same basic concepts:
© 2001–9 Mark Pilgrim diff --git a/special-method-names.html b/special-method-names.html index 31bb0fb..b19fa86 100644 --- a/special-method-names.html +++ b/special-method-names.html @@ -50,7 +50,8 @@ __ne__ __gt__ - covered in fractions.py __ge__ - covered in fractions.py __bool__ - covered in fractions.py -__cmp__ (*) + +(__cmp__ is gone)
FIXME binary operator intro +
Using the appropriate special methods, you can define your own classes that act like numbers. That is, you can add them, subtract them, and perform other mathematical operations on them. This is how fractions are implemented — the Fraction class implements these special methods, then you can do things like this:
+
+
+>>> from fractions import Fraction +>>> x = Fraction(1, 3) +>>> x / 3 +Fraction(1, 9)+ +
Here is the comprehensive list of special methods you need to implement a number-like class.
| Notes
@@ -195,7 +204,24 @@ __xor__
__or__
-->
- FIXME explain circumstances under which reflected methods will be called. + That’s all well and good if x is an instance of a class that implements those methods. But what if it doesn’t implement one of them? Or worse, what if it implements it, but it can’t handle certain kinds of arguments? For example: + + +>>> from fractions import Fraction +>>> x = Fraction(1, 3) +>>> 1 / x +Fraction(3, 1)+ + This is not a case of taking a The answer lies in a second set of arithmetic special methods with reflected operands. Given an arithmetic operation that takes two operands (e.g.
The set of special methods above take the first approach: given
|
|---|