diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index 3d4b7f3..2f68afb 100644 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -45,7 +45,7 @@ body{counter-reset:h1 20}
chardet: a mini-FAQWhen you think of “text,” you probably think of “characters and symbols I see on my computer screen.” But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular character encoding. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. +
Usually, when people talk about “text,” they’re thinking of “characters and symbols on the computer screen.” But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular character encoding. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
In reality, it’s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key for the text. Whenever someone gives you a sequence of bytes and claims it’s “text”, you need to know what character encoding they used so you can decode the bytes into characters and display them (or process them, or whatever).
It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key. diff --git a/native-datatypes.html b/native-datatypes.html index e96a4e4..b19e6a4 100644 --- a/native-datatypes.html +++ b/native-datatypes.html @@ -28,8 +28,8 @@ body{counter-reset:h1 2}
One of Python's most important datatypes is the dictionary, which defines one-to-one relationships between keys and values.
diff --git a/porting-code-to-python-3-with-2to3.html b/porting-code-to-python-3-with-2to3.html index 815f606..b7fbebf 100644 --- a/porting-code-to-python-3-with-2to3.html +++ b/porting-code-to-python-3-with-2to3.html @@ -78,7 +78,7 @@ h3:before{counter-increment:h3;content:"A." counter(h2) "." counter(h3) ". "}Diving in
-Python 3 comes with a utility script called
2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: portingchardetto Python 3 describes how to run the2to3script, then shows some things it can't fix automatically. This appendix documents what it can fix automatically. +Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called
2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: portingchardetto Python 3 describes how to run the2to3script, then shows some things it can't fix automatically. This appendix documents what it can fix automatically.
In Python 2,
print()is a function — whatever you want to print is passed toprint()like any other function.skip over this table diff --git a/regular-expressions.html b/regular-expressions.html index 3c293f4..e0501c5 100644 --- a/regular-expressions.html +++ b/regular-expressions.html @@ -34,11 +34,12 @@ body{counter-reset:h1 4}
Summary Diving in
-Regular expressions are a powerful and standardized way of searching, replacing, and parsing text with complex patterns of -characters. If you’ve used regular expressions in other languages (like Perl), the syntax will be very familiar, and you get by just reading the summary of the
remodule to get an overview of the available functions and their arguments. -Strings have methods for searching and replacing:
index(),find(),split(),count(),replace(), &c. But these methods are limited to the simplest of cases. For example, theindex()method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string s, you must calls.lower()ors.upper()and make sure your search strings are the appropriate case to match. Thereplace()andsplit()methods have the same limitations. -If your goal can be accomplished with string functions, you should use them. They’re fast and simple and easy to read, and there’s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with
ifstatements to handle special cases, or if you’re combining them withsplit()andjoin()and list comprehensions in weird unreadable ways, you may need to move up to regular expressions. -Although the regular expression syntax is tight and unlike normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them. +
Every modern programming language has built-in functions for working with strings. In Python, strings have methods for searching and replacing:
index(),find(),split(),count(),replace(), &c. But these methods are limited to the simplest of cases. For example, theindex()method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string s, you must calls.lower()ors.upper()and make sure your search strings are the appropriate case to match. Thereplace()andsplit()methods have the same limitations. +If your goal can be accomplished with string methods, you should use them. They’re fast and simple and easy to read, and there’s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with
ifstatements to handle special cases, or if you’re chaining calls tosplit()andjoin()to slice-and-dice your strings, you may need to move up to regular expressions. +Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing text with complex patterns of characters. Although the regular expression syntax is tight and unlike normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them. +
+☞If you’ve used regular expressions in other languages (like Perl 5), Python’s syntax will be very familiar. Read the summary of the
remodule to get an overview of the available functions and their arguments. +Case study: street addresses
This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don’t just make this stuff up; it’s actually useful.) This example shows how I approached the problem.
diff --git a/table-of-contents.html b/table-of-contents.html index b05c4d3..615e739 100644 --- a/table-of-contents.html +++ b/table-of-contents.html @@ -51,7 +51,9 @@ ul li ol{margin:0;padding:0 0 0 2.5em}Booleans Numbers Lists + Dictionaries NoneFurther reading @@ -248,7 +250,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em} Case study: porting chardetto Python 3-
- Introducing
chardet: a mini-FAQ +- Introducing
chardet: a mini-FAQ
- What is character encoding auto-detection?
- Isn't that impossible? @@ -258,7 +260,7 @@ ul li ol{margin:0;padding:0 0 0 2.5em}
- Diving in
-
UTF-nwith a BOM +UTF-nwith a BOM- Escaped encodings
- Multi-byte encodings
- Single-byte encodings diff --git a/unit-testing.html b/unit-testing.html index 2424e55..549fc68 100644 --- a/unit-testing.html +++ b/unit-testing.html @@ -24,8 +24,8 @@ body{counter-reset:h1 7}
- ...
(Not) diving in
-In previous chapters, you “dived in” by immediately looking at code and trying to understand it as quickly as possible. Now that you have some Python under your belt, you're going to step back and look at the steps that happen before the code gets written. -
In this chapter, you're going to write, debug, and optimize a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now let's step back and consider what it would take to expand that into a two-way utility. +
How do you know that the code you wrote yesterday still works after the changes you made today? Every seasoned programmer has war stories of an “innocent” change that couldn't possibly have affected that other “unrelated” module… If this sounds familiar, this chapter is for you. +
In this chapter, you're going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now step back and consider what it would take to expand that into a two-way utility.
The rules for Roman numerals lead to a number of interesting observations:
- There is only one correct way to represent a particular number as Roman numerals. diff --git a/your-first-python-program.html b/your-first-python-program.html index bc400bc..55ee7d4 100644 --- a/your-first-python-program.html +++ b/your-first-python-program.html @@ -39,7 +39,7 @@ body{counter-reset:h1 1}
- Further reading
Diving in
-You know how other books go on and on about programming fundamentals and finally work up to building something useful? Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it. +
Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], 1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}