diff --git a/advanced-iterators.html b/advanced-iterators.html index aa13534..f299960 100644 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -119,7 +119,7 @@ if __name__ == '__main__': >>> {c for c in ''.join(words)} ④ {'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}
for loop. Take the first item from the list, put it in the set. Second. Third. Fourth — wait, that’s in the set already, so it only gets listed once. Fifth. Sixth — again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn’t even need to be sorted first.
+for loop. Take the first item from the list, put it in the set. Second. Third. Fourth — wait, that’s in the set already, so it only gets listed once. Fifth. Sixth — again, a duplicate, so it only gets listed once. The end result? All the unique items in the original list, without any duplicates. The original list doesn’t even need to be sorted first.
''.join(a_list) concatenates all the strings together into one.
[1, 2, 3] taken 2 at a time. Pairs like (1, 1) and (2, 2) never show up, because they contain repeats so they aren’t valid permutations. When there are no more permutations, the iterator raises a StopIteration exception.
The permutations() function doesn’t have to take a list. It can take any sequence — even a string.
+
The permutations() function doesn’t have to take a list. It can take any sequence — even a string.
>>> import itertools
@@ -255,7 +255,7 @@ StopIteration
('C', 'A', 'B'), ('C', 'B', 'A')]
'ABC' is equivalent to the list ['A', 'B', 'C'].
-['A', 'B', 'C'], taken 3 at a time, is ('A', 'B', 'C'). There are five other permutations — the same three characters in every conceivable order.
+['A', 'B', 'C'], taken 3 at a time, is ('A', 'B', 'C'). There are five other permutations — the same three characters in every conceivable order.
permutations() function always returns an iterator, an easy way to debug permutations is to pass that iterator to the built-in list() function to see all the permutations immediately.
ord() function returns the ASCII value of a character, which, in the case of A–Z, is always a byte from 65 to 90.
translate() method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, “translating” MARK to MORK.
eval() function act as the global and local namespaces for evaluating the expression. In this case, they are both empty, which means that when the string "x * 5" is evaluated, there is no reference to x in either the global or local namespace, so eval() throws an exception.
-math module, you didn’t include it in the namespace passed to the eval() function, so the evaluation failed.
⁂
2to3We’re going to migrate the chardet module from Python 2 to Python 3. Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. In some cases this is easy — a function was renamed or moved to a different modules — but in other cases it can get pretty complex. To get a sense of all that it can do, refer to the appendix, Porting code to Python 3 with 2to3. In this chapter, we’ll start by running 2to3 on the chardet package, but as you’ll see, there will still be a lot of work to do after the automated tools have performed their magic.
+
We’re going to migrate the chardet module from Python 2 to Python 3. Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. In some cases this is easy — a function was renamed or moved to a different modules — but in other cases it can get pretty complex. To get a sense of all that it can do, refer to the appendix, Porting code to Python 3 with 2to3. In this chapter, we’ll start by running 2to3 on the chardet package, but as you’ll see, there will still be a lot of work to do after the automated tools have performed their magic.
The main chardet package is split across several different files, all in the same directory. The 2to3 script makes it easy to convert multiple files at once: just pass a directory as a command line argument, and 2to3 will convert each of the files in turn.
C:\home\chardet> python c:\Python30\Tools\Scripts\2to3.py -w chardet\
RefactoringTool: Skipping implicit fixer: buffer
@@ -616,7 +616,7 @@ else:
File "C:\home\chardet\chardet\universaldetector.py", line 29, in <module>
import constants, sys
ImportError: No module named constants
-What’s that you say? No module named constants? Of course there’s a module named constants. …Oh wait, no there isn’t. Remember when the 2to3 script fixed up all those import statements? This library has a lot of relative imports — that is, modules that import other modules within the library. In Python 3, all import statements are absolute by default [FIXME-LINK PEP 0328]. To do relative imports, you need to do something like this instead:
+
What’s that you say? No module named constants? Of course there’s a module named constants. …Oh wait, no there isn’t. Remember when the 2to3 script fixed up all those import statements? This library has a lot of relative imports — that is, modules that import other modules within the library. In Python 3, all import statements are absolute by default [FIXME-LINK PEP 0328]. To do relative imports, you need to do something like this instead:
from . import constants
But wait. Wasn’t the 2to3 script supposed to take care of these for you? Well, it did, but this particular import statement combines two different types of imports into one line: a relative import of the constants module within the library, and an absolute import of the sys module that is pre-installed in the Python standard library. In Python 2, you could combine these into one import statement. In Python 3, you can’t, and the 2to3 script is not smart enough to split the import statement into two.
The solution is to split the import statement manually. So this two-in-one import: @@ -656,7 +656,7 @@ TypeError: can't use a string pattern on a bytes-like object self._highBitDetector = re.compile(r'[\x80-\xFF]')
This pre-compiles a regular expression designed to find non-ASCII characters in the range 128–255 (0x80–0xFF). Wait, that’s not quite right; I need to be more precise with my terminology. This pattern is designed to find non-ASCII bytes in the range 128-255.
And therein lies the problem. -
In Python 2, a string was an array of bytes whose character encoding was tracked separately. If you wanted Python 2 to keep track of the character encoding, you had to use a Unicode string (u'') instead. But in Python 3, a string is always what Python 2 called a Unicode string — that is, an array of Unicode characters (of possibly varying byte lengths). Since this regular expression is defined by a string pattern, it can only be used to search a string — again, an array of characters. But what we’re searching is not a string, it’s a byte array. Looking at the traceback, this error occurred in universaldetector.py:
+
In Python 2, a string was an array of bytes whose character encoding was tracked separately. If you wanted Python 2 to keep track of the character encoding, you had to use a Unicode string (u'') instead. But in Python 3, a string is always what Python 2 called a Unicode string — that is, an array of Unicode characters (of possibly varying byte lengths). Since this regular expression is defined by a string pattern, it can only be used to search a string — again, an array of characters. But what we’re searching is not a string, it’s a byte array. Looking at the traceback, this error occurred in universaldetector.py:
def feed(self, aBuf):
.
.
@@ -671,7 +671,7 @@ TypeError: can't use a string pattern on a bytes-like object
for line in open(f, 'rb'):
u.feed(line)
-And here we find our answer: in the UniversalDetector.feed() method, aBuf is a line read from a file on disk. Look carefully at the parameters used to open the file: 'rb'. 'r' is for “read”; OK, big deal, we’re reading the file. Ah, but 'b' is for “binary.” Without the 'b' flag, this for loop would read the file, line by line, and convert each line into a string — an array of Unicode characters — according to the system default character encoding. (You could override the system encoding with another parameter to the open() function, but never mind that for now.) But with the 'b' flag, this for loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to UniversalDetector.feed(), and eventually gets passed to the pre-compiled regular expression, self._highBitDetector, to search for high-bit… characters. But we don’t have characters; we have bytes. Oops.
+
And here we find our answer: in the UniversalDetector.feed() method, aBuf is a line read from a file on disk. Look carefully at the parameters used to open the file: 'rb'. 'r' is for “read”; OK, big deal, we’re reading the file. Ah, but 'b' is for “binary.” Without the 'b' flag, this for loop would read the file, line by line, and convert each line into a string — an array of Unicode characters — according to the system default character encoding. (You could override the system encoding with another parameter to the open() function, but never mind that for now.) But with the 'b' flag, this for loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to UniversalDetector.feed(), and eventually gets passed to the pre-compiled regular expression, self._highBitDetector, to search for high-bit… characters. But we don’t have characters; we have bytes. Oops.
What we need this regular expression to search is not an array of characters, but an array of bytes.
Once you realize that, the solution is not difficult. Regular expressions defined with strings can search strings. Regular expressions defined with byte arrays can search byte arrays. To define a byte array pattern, we simply change the type of the argument we use to define the regular expression to a byte array. (There is one other case of this same problem, on the very next line.)
class UniversalDetector:
@@ -737,7 +737,7 @@ TypeError: Can't convert 'bytes' object to str implicitly
self._mGotData = False
self._mInputState = ePureAscii
self._mLastChar = ''
-And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can’t concatenate a string to a byte array — not even a zero-length string. +
And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can’t concatenate a string to a byte array — not even a zero-length string.
So what is self._mLastChar anyway? The answer is in the feed() method, just a few lines down from where the trackback occurred.
if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):
@@ -854,7 +854,7 @@ def next_state(self, c):
def feed(self, aBuf):
for c in aBuf:
codingState = self._mCodingSM.next_state(c)
-And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there’s no need to call the ord() function because c is already an int!
+
And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there’s no need to call the ord() function because c is already an int!
Thus:
def next_state(self, c):
# for each byte we get its class
@@ -1131,7 +1131,7 @@ NameError: global name 'reduce' is not defined
return 0.01
total = reduce(operator.add, self._mFreqCounter)
-The reduce() function takes two arguments — a function and a list (strictly speaking, any iterable object will do) — and applies the function cumulatively to each item of the list. In other words, this is a fancy and roundabout way of adding up all the items in a list and returning the result.
+
The reduce() function takes two arguments — a function and a list (strictly speaking, any iterable object will do) — and applies the function cumulatively to each item of the list. In other words, this is a fancy and roundabout way of adding up all the items in a list and returning the result.
This monstrosity was so common that Python added a global sum() function.
def get_confidence(self):
if self.get_state() == constants.eNotMe:
@@ -1185,7 +1185,7 @@ tests\EUC-JP\arclamp.jp.xml EUC-JP with confide
What have we learned?
- Porting any non-trivial amount of code from Python 2 to Python 3 is going to be a pain. There’s no way around it. It’s hard.
-
- The automated
2to3 tool is helpful as far as it goes, but it will only do the easy parts — function renames, module renames, syntax changes. It’s an impressive piece of engineering, but in the end it’s just an intelligent search-and-replace bot.
+ - The automated
2to3 tool is helpful as far as it goes, but it will only do the easy parts — function renames, module renames, syntax changes. It’s an impressive piece of engineering, but in the end it’s just an intelligent search-and-replace bot.
- The #1 porting problem in this library was the difference between strings and bytes. In this case that seems obvious, since the whole point of the
chardet library is to convert a stream of bytes into a string. But “a stream of bytes” comes up more often than you might think. Reading a file in “binary” mode? You’ll get a stream of bytes. Fetching a web page? Calling a web API? They return a stream of bytes, too.
- You need to understand your program. Thoroughly. Preferably because you wrote it, but at the very least, you need to be comfortable with all its quirks and musty corners. The bugs are everywhere.
- Test cases are essential. Don’t port anything without them. Don’t even try. The only reason I have any confidence at all that
chardet works in Python 3 is because I had a test suite that exercised every line of code in the entire library. I never would have found half of these problems with manual spot-checking.
diff --git a/dip3.css b/dip3.css
index b13d916..4aeb36b 100644
--- a/dip3.css
+++ b/dip3.css
@@ -37,7 +37,7 @@ Classname Legend
.c = "centered" = centered footer text (also clears floats)
.a = "asterism" = section break
.v = "navigation" = prev/next navigation links (not breadcrumbs)
-.u = "Unicode" = text contains Unicode characters (requires special font declaration)
+.u = "Unicode" = text contains Unicode characters (requires special font declaration to accomodate *cough* a certain browser)
.nm = "no mobile" = hide this section on mobile devices
.nd = "no decoration" = hide the widgets on this code block
diff --git a/generators.html b/generators.html
index 7fd1521..85c263d 100644
--- a/generators.html
+++ b/generators.html
@@ -20,7 +20,7 @@ body{counter-reset:h1 5}
Diving In
-For reasons passing all understanding, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin (to name a few). Actually, “borrows” is the wrong word; “pillages” is more like it. Or perhaps “assimilates” — like the Borg. Yes, I like that.
+
For reasons passing all understanding, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin (to name a few). Actually, “borrows” is the wrong word; “pillages” is more like it. Or perhaps “assimilates” — like the Borg. Yes, I like that.
We are the Borg. Your linguistic and etymological distinctiveness will be added to our own. Resistance is futile.
In this chapter, you’re going to learn about plural nouns. Also, functions that return other functions, advanced regular expressions, and generators. But first, let’s talk about how to make plural nouns. (If you haven’t read the chapter on regular expressions, now would be a good time. This chapter assumes you understand the basics of regular expressions, and it quickly descends into more advanced uses.)
If you grew up in an English-speaking country or learned English in a formal school setting, you’re probably familiar with the basic rules:
@@ -170,7 +170,7 @@ def plural(noun):
-The reason this technique works is that everything in Python is an object, including functions. The rules data structure contains functions — not names of functions, but actual function objects. When they get assigned in the for loop, then matches_rule and apply_rule are actual functions that you can call. On the first iteration of the for loop, this is equivalent to calling matches_sxz(noun), and if it returns a match, calling apply_sxz(noun).
+
The reason this technique works is that everything in Python is an object, including functions. The rules data structure contains functions — not names of functions, but actual function objects. When they get assigned in the for loop, then matches_rule and apply_rule are actual functions that you can call. On the first iteration of the for loop, this is equivalent to calling matches_sxz(noun), and if it returns a match, calling apply_sxz(noun).
If this additional level of abstraction is confusing, try unrolling the function to see the equivalence. The entire for loop is equivalent to the following:
@@ -392,7 +392,7 @@ def plural(noun):
What have you gained over stage 4? Startup time. In stage 4, when you imported the plural4 module, it read the entire patterns file and built a list of all the possible rules, before you could even think about calling the plural() function. With generators, you can do everything lazily: you read the first rule and create functions and try them, and if that works you don’t ever read the rest of the file or create any other functions.
-
What have you lost? Performance! Every time you call the plural() function, the rules() generator starts over from the beginning — which means re-opening the patterns file and reading from the beginning, one line at a time.
+
What have you lost? Performance! Every time you call the plural() function, the rules() generator starts over from the beginning — which means re-opening the patterns file and reading from the beginning, one line at a time.
What if you could have the best of both worlds: minimal startup cost (don’t execute any code on import), and maximum performance (don’t build the same functions over and over again). Oh, and you still want to keep the rules in a separate file (because code is code and data is data), just as long as you never have to read the same line twice.
diff --git a/http-web-services.html b/http-web-services.html
index 4543c83..ff8f360 100644
--- a/http-web-services.html
+++ b/http-web-services.html
@@ -23,7 +23,7 @@ mark{display:inline}
Diving In
HTTP web services are programmatic ways of sending and receiving data from remote servers using nothing but the operations of HTTP. If you want to get data from the server, use HTTP GET; if you want to send new data to the server, use HTTP POST. Some more advanced HTTP web service APIs also define ways of creating, modifying, and deleting data, using HTTP PUT and HTTP DELETE. In other words, the “verbs” built into the HTTP protocol (GET, POST, PUT, and DELETE) can map directly to application-level operations for retrieving, creating, modifying, and deleting data.
-
The main advantage of this approach is simplicity, and its simplicity has proven popular. Data — usually XML data — can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an HTTP library for downloading it. Debugging is also easier; because each resource in an HTTP web service has a unique address (in the form of a URL), you can load it in your web browser and immediately see the raw data.
+
The main advantage of this approach is simplicity, and its simplicity has proven popular. Data — usually XML data — can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an HTTP library for downloading it. Debugging is also easier; because each resource in an HTTP web service has a unique address (in the form of a URL), you can load it in your web browser and immediately see the raw data.
Examples of HTTP web services:
@@ -52,7 +52,7 @@ mark{display:inline}
Caching
-The most important thing to understand about any type of web service is that network access is incredibly expensive. I don’t mean “dollars and cents” expensive (although bandwidth ain’t free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, latency (the time it takes to send a request and start retrieving data in a response) can still be higher than you anticipated. A router misbehaves, a packet is dropped, an intermediate proxy is under attack — there’s never a dull moment on the public internet, and there may be nothing you can do about it.
+
The most important thing to understand about any type of web service is that network access is incredibly expensive. I don’t mean “dollars and cents” expensive (although bandwidth ain’t free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, latency (the time it takes to send a request and start retrieving data in a response) can still be higher than you anticipated. A router misbehaves, a packet is dropped, an intermediate proxy is under attack — there’s never a dull moment on the public internet, and there may be nothing you can do about it.
HTTP is designed with caching in mind. There is an entire class of devices (called “caching proxies”) whose only job is to sit between you and the rest of the world and minimize network access. Your company or ISP almost certainly maintains caching proxies, even if you’re unaware of them. They work because caching built into the HTTP protocol.
@@ -295,7 +295,7 @@ Content-Type: application/xml
- …the exact same 3070 bytes you downloaded last time.
-
HTTP is designed to work better than this. urllib speaks HTTP like I speak Spanish — enough to get by in a jam, but not enough to hold a conversation. HTTP is a conversation. It’s time to upgrade to a library that speaks HTTP fluently.
+
HTTP is designed to work better than this. urllib speaks HTTP like I speak Spanish — enough to get by in a jam, but not enough to hold a conversation. HTTP is a conversation. It’s time to upgrade to a library that speaks HTTP fluently.
⁂
@@ -363,9 +363,9 @@ Content-Type: application/xml
- Let’s turn on debugging and see what’s on the wire. This is the
httplib2 equivalent of turning on debugging in http.client. httplib2 will print all the data being sent to the server and some key information being sent back.
- Create an
httplib2.Http object with the same directory name as before.
- Request the same URL as before. Nothing appears to happen. More precisely, nothing gets sent to the server, and nothing gets returned from the server. There is absolutely no network activity whatsoever.
-
- Yet we did “receive” some data — in fact, we received all of it.
+
- Yet we did “receive” some data — in fact, we received all of it.
- We also “received” an HTTP status code indicating that the “request” was successful.
-
- Here’s the rub: this “response” was generated from
httplib2’s local cache. That directory name you passed in when you created the httplib2.Http object — that directory holds httplib2’s cache of all the operations it’s ever performed.
+ - Here’s the rub: this “response” was generated from
httplib2’s local cache. That directory name you passed in when you created the httplib2.Http object — that directory holds httplib2’s cache of all the operations it’s ever performed.
You previously requested the data at this URL. That request was successful (status: 200). That response included not only the feed data, but also a set of caching headers that told anyone who was listening that they could cache this resource for up to 24 hours (Cache-Control: max-age=86400, which is 24 hours measured in seconds). httplib2 understand and respects those caching headers, and it stored the previous response in the .cache directory (which you passed in when you create the Http object). That cache hasn’t expired yet, so the second time you request the data at this URL, httplib2 simply returns the cached result without ever hitting the network.
@@ -409,7 +409,7 @@ reply: 'HTTP/1.1 200 OK'
'content-type': 'application/xml'}
httplib2 allows you to add arbitrary HTTP headers to any outgoing request. In order to bypass all caches (not just your local disk cache, but also any caching proxies between you and the remote server), add a no-cache header in the headers dictionary.
-httplib2 initiating a network request. httplib2 understands and respects caching headers in both directions — as part of the incoming response and as part of the outgoing request. It noticed that you added the no-cache header, so it bypassed its local cache altogether and then had no choice but to hit the network to request the data.
+httplib2 initiating a network request. httplib2 understands and respects caching headers in both directions — as part of the incoming response and as part of the outgoing request. It noticed that you added the no-cache header, so it bypassed its local cache altogether and then had no choice but to hit the network to request the data.
httplib2 uses to update its local cache, in the hopes of avoiding network access the next time you request this feed. Everything about HTTP caching is designed to maximize cache hits and minimize network access. Even though you bypassed the cache this time, the remote server would really appreciate it if you would cache the result for next time.
httplib2 also sends the Last-Modified validator back to the server in the If-Modified-Since header.
304 status code and no data.
httplib2 notices the 304 status code and loads the content of the page from its cache.
-304 (returned from the server this time, which caused httplib2 to look in its cache), and 200 (returned from the server last time, and stored in httplib2’s cache along with the page data). response.status returns the status from the cache.
+304 (returned from the server this time, which caused httplib2 to look in its cache), and 200 (returned from the server last time, and stored in httplib2’s cache along with the page data). response.status returns the status from the cache.
response.dict, which is a dictionary of the actual headers returned from the server.
httplib2 is smart enough to let you act dumb.) By the time the request() method returns to the caller, httplib2 has already updated its cache and returned the data to you.
diff --git a/iterators.html b/iterators.html
index 1faccaf..9a01917 100644
--- a/iterators.html
+++ b/iterators.html
@@ -288,8 +288,8 @@ rules = LazyRules()
return self ③
__iter__() method will be called every time someone — say, a for loop — calls iter(rules).
-__iter__() method will be called every time someone — say, a for loop — calls iter(rules).
+__iter__() method returns self, which signals that this class will take care of returning its own values throughout an iteration.
__next__() method gets called whenever someone — say, a for loop — calls next(rules). This method will only make sense if we start at the end and work backwards. So let’s do that.
+__next__() method gets called whenever someone — say, a for loop — calls next(rules). This method will only make sense if we start at the end and work backwards. So let’s do that.
build_match_and_apply_functions() function hasn’t changed; it’s the same as it ever was. Each line of the pattern file will be read exactly once, as late as possible.
self.cache. Each match and apply function will be built exactly once, as late as possible, then cached.
self.cache will be a list of the functions we need to match and apply individual rules. (At least that should sound familiar!) self.cache_index keeps track of which cached item we should return next. If we haven’t exhausted the cache yet (i.e. if the length of self.cache is greater than self.cache_index), then we have a cache hit! Hooray! We can return the match and apply functions from the cache instead of building them from scratch.
-Putting it all together, here’s what happens when: @@ -352,7 +352,7 @@ rules = LazyRules()
plural() function again to pluralize a different word. The for loop in the plural() function will call iter(rules), which will reset the cache index but will not reset the open file object.
for loop will ask for a value from rules, which will invoke its __next__() method. This time, however, the cache is primed with a single pair of match and apply functions, corresponding to the patterns in the first line of the pattern file. Since they were built and cached in the course of pluralizing the previous word, they’re retrieved from the cache. The cache index increments, and the open file is never touched.
-for loop comes around again and asks for another value from rules. This invokes the __next__() method a second time. This time, the cache is exhausted — it only contained one item, and we’re asking for a second — so the __next__() method continues. It reads another line from the open file, builds match and apply functions out of the patterns, and caches them.
+for loop comes around again and asks for another value from rules. This invokes the __next__() method a second time. This time, the cache is exhausted — it only contained one item, and we’re asking for a second — so the __next__() method continues. It reads another line from the open file, builds match and apply functions out of the patterns, and caches them.
readline() command. In the meantime, the cache now has more items in it, and if we start all over again trying to pluralize a new word, each of those items in the cache will be tried before reading the next line from the pattern file.
diff --git a/native-datatypes.html b/native-datatypes.html
index b6251a1..932f461 100644
--- a/native-datatypes.html
+++ b/native-datatypes.html
@@ -32,7 +32,7 @@ body{counter-reset:h1 2}
Of course, there are a lot more types than these seven. Everything is an object in Python, so there are types like module, function, class, method, file, and even compiled code. You’ve already seen some of these: modules have names, functions have docstrings, &c. You’ll learn about classes in [FIXME xref] and files in [FIXME xref].
-
Strings and bytes are important enough — and complicated enough — that they get their own chapter. Let’s look at the others first. +
Strings and bytes are important enough — and complicated enough — that they get their own chapter. Let’s look at the others first.
⁂
>>> a_list = ['a'] >>> a_list = a_list + [2.0, 3] ① ->>> a_list +>>> a_list ② ['a', 2.0, 3] ->>> a_list.append(True) ② +>>> a_list.append(True) ③ >>> a_list ['a', 2.0, 3, True] ->>> a_list.extend(['four', 'Ω']) ③ +>>> a_list.extend(['four', 'Ω']) ④ >>> a_list ['a', 2.0, 3, True, 'four', 'Ω'] ->>> a_list.insert(0, 'Ω') ④ +>>> a_list.insert(0, 'Ω') ⑤ >>> a_list ['Ω', 'a', 2.0, 3, True, 'four', 'Ω']
+ operator concatenates lists. A list can contain any number of items; there is no size limit (other than available memory). A list can contain items of any datatype; they don’t all need to be the same type. Here we have a list containing a string, a floating point number, and an integer.
++ operator concatenates lists to create a new list. A list can contain any number of items; there is no size limit (other than available memory). However, if memory is a concern, you should be aware that list concatenation creates a second list in memory. In this case, that new list is immediately assigned to the existing variable a_list. So this line of code is really a two-step process — concatenation then assignment — which can (temporarily) consume a lot of memory when you’re dealing with large lists.
+append() method adds a single item to the end of the list. (Now we have four different datatypes in the list!)
extend() method takes one argument, a list, and appends each of the items of the argument to the original list.
insert() method inserts a single item into a list. The first argument is the index of the first item in the list that will get bumped out of position. List items do not need to be unique; for example, there are now two separate items with the value 'Ω': the first item, a_list[0], and the last item, a_list[6].
@@ -487,7 +488,7 @@ KeyError: 'db.diveintopython3.org'
'user', value 'mark') appears to be in the middle. In fact, it was just a coincidence that the items appeared to be in order in the first example; it is just as much a coincidence that they appear to be out of order now.
user key back to "mark"? No! Look at the key closely — that’s a capital U in "User". Dictionary keys are case-sensitive, so this statement is creating a new key-value pair, not overwriting an existing one. It may look similar to you, but as far as Python is concerned, it’s completely different.
+user key back to "mark"? No! Look at the key closely — that’s a capital U in "User". Dictionary keys are case-sensitive, so this statement is creating a new key-value pair, not overwriting an existing one. It may look similar to you, but as far as Python is concerned, it’s completely different.
Dictionaries aren’t just for strings. Dictionary values can be any datatype, including integers, booleans, arbitrary objects, or even other dictionaries. And within a single dictionary, the values don’t all need to be the same type; you can mix and match as needed. Dictionary keys are more restricted, but they can be strings, integers, and a few other types. You can also mix and match key datatypes within a dictionary. diff --git a/porting-code-to-python-3-with-2to3.html b/porting-code-to-python-3-with-2to3.html index d48fc42..5290ddf 100644 --- a/porting-code-to-python-3-with-2to3.html +++ b/porting-code-to-python-3-with-2to3.html @@ -32,7 +32,7 @@ td pre{padding:0;border:0}
Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can’t fix automatically. This appendix documents what it can fix automatically.
print statementIn Python 2, print was a statement. Whatever you wanted to print simply followed the print keyword. In Python 3, print() is a function — whatever you want to print is passed to print() like any other function.
+
In Python 2, print was a statement. Whatever you wanted to print simply followed the print keyword. In Python 3, print() is a function — whatever you want to print is passed to print() like any other function.
| Notes | Python 2
@@ -58,7 +58,7 @@ td pre{padding:0;border:0}
print() with one argument
print() with two arguments.
print statement with a comma, it would print the values separated by spaces, then print a trailing space, then stop without printing a carriage return. In Python 3, the way to do this is to pass end=' ' as a keyword argument to the print() function. The end argument defaults to '\n' (a carriage return), so overriding it will suppress the carriage return after printing the other arguments.
-sys.stderr — by using the >>pipe_name syntax. In Python 3, the way to do this is to pass the pipe in the file keyword argument. The file argument defaults to sys.stdout (standard out), so overriding it will output to a different pipe instead.
+sys.stderr — by using the >>pipe_name syntax. In Python 3, the way to do this is to pass the pipe in the file keyword argument. The file argument defaults to sys.stdout (standard out), so overriding it will output to a different pipe instead.
Unicode string literalsPython 2 had two string types: Unicode strings and non-Unicode strings. Python 3 has one string type: Unicode strings. @@ -159,7 +159,7 @@ td pre{padding:0;border:0}
|
|---|
urllib module in Python 2 had a variety of functions, including urlopen() for fetching data and splittype(), splithost(), and splituser() for splitting a URL into its constituent parts. These functions have been reorganized more logically within the new urllib package. 2to3 will also change all calls to these functions so they use the new naming scheme.
-urllib2 module in Python 2 has been folded into the urllib package in Python 3. All your urllib2 favorites — the build_opener() method, Request objects, and HTTPBasicAuthHandler and friends — are still available.
+urllib2 module in Python 2 has been folded into the urllib package in Python 3. All your urllib2 favorites — the build_opener() method, Request objects, and HTTPBasicAuthHandler and friends — are still available.
urllib.parse module in Python 3 contains all the parsing functions from the old urlparse module in Python 2.
urllib.robotparser module parses robots.txt files.
FancyURLopener class, which handles HTTP redirects and other status codes, is still available in the new urllib.request module. The urlencode() function has moved to urllib.parse.
@@ -567,7 +567,7 @@ reduce(a, b, c)
repr('PapayaWhip' + repr(2))
repr() function works on everything.
+repr() function works on everything.
2to3 tool is smart enough to convert this into nested calls to repr().
try...except statement2to3 script is smart enough to construct a valid class declaration, even if the class is inherited from one or more base classes.
The rest of the “fixes” listed here aren’t really fixes per se. That is, the things they change are matters of style, not substance. They work just as well in Python 3 as they do in Python 2, but the developers of Python have a vested interest in making Python code as uniform as possible. To that end, there is an official Python style guide which outlines — in excruciating detail — all sorts of nitpicky details that you almost certainly don’t care about. And given that 2to3 provides such a great infrastructure for converting Python code from one thing to another, the authors took it upon themselves to add a few optional features to improve the readability of your Python programs.
+
The rest of the “fixes” listed here aren’t really fixes per se. That is, the things they change are matters of style, not substance. They work just as well in Python 3 as they do in Python 2, but the developers of Python have a vested interest in making Python code as uniform as possible. To that end, there is an official Python style guide which outlines — in excruciating detail — all sorts of nitpicky details that you almost certainly don’t care about. And given that 2to3 provides such a great infrastructure for converting Python code from one thing to another, the authors took it upon themselves to add a few optional features to improve the readability of your Python programs.
set() literals (explicit)In Python 2, the only way to define a literal set in your code was to call set(a_sequence). This still works in Python 3, but a clearer way of doing it is to use the new set literal notation: curly braces. (Dictionaries are also defined with curly braces, which makes sense once you think about it, because dictionaries are just sets of key-value pairs.)
diff --git a/refactoring.html b/refactoring.html index 1d40a56..8b72d17 100644 --- a/refactoring.html +++ b/refactoring.html @@ -301,7 +301,7 @@ Ran 12 tests in 0.203sAnswer: there’s only 5000 of them; why don’t you just build a lookup table? This idea gets even better when you realize that you don’t need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals, you can build the reverse lookup table to convert Roman numerals to integers. By the time you need to check whether an arbitrary string is a valid Roman numeral, you will have collected all the valid Roman numerals. “Validating” is reduced to a single dictionary lookup. -
And best of all, you already have a complete set of unit tests. You can change over half the code in the module, but the unit tests will stay the same. That means you can prove — to yourself and to others — that the new code works just as well as the original. +
And best of all, you already have a complete set of unit tests. You can change over half the code in the module, but the unit tests will stay the same. That means you can prove — to yourself and to others — that the new code works just as well as the original.
class OutOfRangeError(ValueError): pass @@ -392,7 +392,7 @@ def build_lookup_tables(): to_roman_table.append(roman_numeral) ③ from_roman_table[roman_numeral] = integer-
diff --git a/special-method-names.html b/special-method-names.html index fcbf27f..51a4a3d 100644 --- a/special-method-names.html +++ b/special-method-names.html @@ -31,7 +31,7 @@ td a:link, td a:visited{border:0}- This is a clever bit of programming… perhaps too clever. The
to_roman()function is defined above; it looks up values in the lookup table and returns them. But thebuild_lookup_tables()function redefines theto_roman()function to actually do work (like the previous examples did, before you added a lookup table). Within thebuild_lookup_tables()function, callingto_roman()will call this redefined version. Once thebuild_lookup_tables()function exits, the redefined version disappears — it is only defined in the local scope of thebuild_lookup_tables()function. +- This is a clever bit of programming… perhaps too clever. The
to_roman()function is defined above; it looks up values in the lookup table and returns them. But thebuild_lookup_tables()function redefines theto_roman()function to actually do work (like the previous examples did, before you added a lookup table). Within thebuild_lookup_tables()function, callingto_roman()will call this redefined version. Once thebuild_lookup_tables()function exits, the redefined version disappears — it is only defined in the local scope of thebuild_lookup_tables()function.- This line of code will call the redefined
to_roman()function, which actually calculates the Roman numeral.- Once you have the result (from the redefined
to_roman()function), you add the integer and its Roman numeral equivalent to both lookup tables.
We’ve already covered a few special method names elsewhere in this book — “magic” methods that Python invokes when you use certain syntax. Using special methods, your classes can act like sequences, like dictionaries, like functions, like iterators, or even like numbers! This appendix serves both as a reference for the special methods we’ve seen already and a brief introduction to some of the more esoteric ones. +
We’ve already covered a few special method names elsewhere in this book — “magic” methods that Python invokes when you use certain syntax. Using special methods, your classes can act like sequences, like dictionaries, like functions, like iterators, or even like numbers! This appendix serves both as a reference for the special methods we’ve seen already and a brief introduction to some of the more esoteric ones.
You can make an instance of a class callable — exactly like a function is callable — by defining the __call__() method.
+
You can make an instance of a class callable — exactly like a function is callable — by defining the __call__() method.
Notes
@@ -255,7 +255,7 @@ bytes = zef_file.read(12)
Classes That Act Like Sequences-If your class acts as a container for a set of values — that is, if it makes sense to ask whether your class “contains” a value — then it should probably define the following special methods that make it act like a sequence. + If your class acts as a container for a set of values — that is, if it makes sense to ask whether your class “contains” a value — then it should probably define the following special methods that make it act like a sequence.
|
|---|