diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index 3fa7e12..615b6ac 100644 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -18,36 +18,7 @@ mark{background:#ff8;font-weight:bold}

Words, words. They’re all we have to go on.
Rosencrantz and Guildenstern are Dead

-
    -
  1. Diving in -
  2. What is character encoding auto-detection? -
      -
    1. Isn’t that impossible? -
    2. Does such an algorithm exist? -
    -
  3. Diving in -
      -
    1. UTF-n with a BOM -
    2. Escaped encodings -
    3. Multi-byte encodings -
    4. Single-byte encodings -
    5. windows-1252 -
    -
  4. Running 2to3 -
  5. Fixing what 2to3 can’t -
      -
    1. False is invalid syntax -
    2. No module named constants -
    3. Name 'file' is not defined -
    4. Can’t use a string pattern on a bytes-like object -
    5. Can’t convert 'bytes' object to str implicitly -
    6. Unsupported operand type(s) for +: 'int' and 'bytes' -
    7. ord() expected string of length 1, but int found -
    8. Unorderable types: int() >= str() -
    9. Global name 'reduce' is not defined -
    -
  6. Summary -
+

 

Diving in

Unknown or incorrect character encoding is the #1 cause of gibberish text on the web, in your inbox, and indeed across every computer system ever written. In Chapter 3, I talked about the history of character encoding and the creation of Unicode, the “one encoding to rule them all.” I’d love it if I never had to see a gibberish character on a web page again, because all authoring systems stored accurate encoding information, all transfer protocols were Unicode-aware, and every system that handled text maintained perfect fidelity when converting between encodings.

I’d also like a pony. diff --git a/dip3.js b/dip3.js index ebc61a0..85e272d 100644 --- a/dip3.js +++ b/dip3.js @@ -16,6 +16,8 @@ $(document).ready(function() { } */ + $("#toc").html('table of contents'); + // "hide", "open in new window", and (optionally) "download" widgets on code & screen blocks $("pre > code").each(function(i) { var pre = $(this.parentNode); @@ -81,3 +83,23 @@ function plainTextOnClick(id) { win.document.write('

' + clone.html());
   win.document.close();
 }
+
+function showTOC() {
+  var toc = '';
+  var old_level = 1;
+  $('h2,h3').each(function(i, h) {
+    level = parseInt(h.tagName.substring(1));
+    if (level < old_level) {
+      toc += '';
+    } else if (level > old_level) {
+      toc += '
    '; + } + toc += '
  1. ' + h.innerHTML + ''; + old_level = level; + }); + while (level > 1) { + toc += '
'; + level -= 1; + } + $("#toc").html(toc); +} diff --git a/native-datatypes.html b/native-datatypes.html index 85dc6ed..2925c99 100644 --- a/native-datatypes.html +++ b/native-datatypes.html @@ -13,48 +13,7 @@ body{counter-reset:h1 2}

Wonder is the foundation of all philosophy, research its progress, ignorance its end.
Michel de Montaigne

-
    -
  1. Diving in -
  2. Booleans -
  3. Numbers -
      -
    1. Coercing integers to floats and vice-versa -
    2. Common numerical operations -
    3. Fractions -
    4. Trigonometry -
    5. Numbers in a boolean context -
    -
  4. Lists -
      -
    1. Creating a list -
    2. Slicing a list -
    3. Adding items to a list -
    4. Searching for values in a list -
    5. Lists in a boolean context -
    - -
  5. Dictionaries -
      -
    1. Creating a dictionary -
    2. Modifying a dictionary -
    3. Mixed-value dictionaries -
    4. Dictionaries in a boolean context -
    -
  6. None -
      -
    1. None in a boolean context -
    -
  7. Further reading -
+

 

Diving in

Cast aside your first Python program for just a minute, and let's talk about datatypes. In Python, every variable has a datatype, but you don't need to declare it explicitly. Based on each variable's original assignment, Python figures out what type it is and keeps tracks of that internally.

Python has many native datatypes. Here are the important ones: diff --git a/porting-code-to-python-3-with-2to3.html b/porting-code-to-python-3-with-2to3.html index fc1c994..d1b49f8 100644 --- a/porting-code-to-python-3-with-2to3.html +++ b/porting-code-to-python-3-with-2to3.html @@ -23,64 +23,7 @@ td pre{padding:0;border:0}

Life is pleasant. Death is peaceful. It’s the transition that’s troublesome.
— Isaac Asimov (attributed)

-
    -
  1. Diving in -
  2. print statement -
  3. Unicode string literals -
  4. unicode() global function -
  5. long data type -
  6. <> comparison -
  7. has_key() dictionary method -
  8. Dictionary methods that return lists -
  9. Modules that have been renamed or reorganized -
      -
    1. http -
    2. urllib -
    3. dbm -
    4. xmlrpc -
    5. Other modules -
    -
  10. Relative imports within a package -
  11. next() iterator method -
  12. filter() global function -
  13. map() global function -
  14. reduce() global function (3.1+) -
  15. apply() global function -
  16. intern() global function -
  17. exec statement -
  18. execfile statement (3.1+) -
  19. repr literals (backticks) -
  20. try...except statement -
  21. raise statement -
  22. throw method on generators -
  23. xrange() global function -
  24. raw_input() and input() global functions -
  25. func_* function attributes -
  26. xreadlines() I/O method -
  27. lambda functions with multiple parameters -
  28. Special method attributes -
  29. __nonzero__ special class attribute -
  30. Octal literals -
  31. sys.maxint -
  32. callable() global function -
  33. zip() global function -
  34. StandardError() exception -
  35. types module constants -
  36. isinstance() global function (3.1+) -
  37. basestring datatype -
  38. itertools module -
  39. sys.exc_type, sys.exc_value, sys.exc_traceback -
  40. List comprehensions over tuples -
  41. os.getcwdu() function -
  42. Metaclasses -
  43. Matters of style -
      -
    1. set() literals -
    2. buffer() global function -
    3. Whitespace around commas -
    4. Common idioms -
    -
+

 

Diving in

Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can't fix automatically. This appendix documents what it can fix automatically.

print statement

diff --git a/regular-expressions.html b/regular-expressions.html index 5de2f40..6582a1b 100644 --- a/regular-expressions.html +++ b/regular-expressions.html @@ -13,22 +13,7 @@ body{counter-reset:h1 4}

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
Jamie Zawinski

-
    -
  1. Diving in -
  2. Case study: street addresses -
  3. Case study: Roman numerals -
      -
    1. Checking for thousands -
    2. Checking for hundreds -
    -
  4. Using the {n,m} Syntax -
      -
    1. Checking for tens and ones -
    -
  5. Verbose regular expressions -
  6. Case study: parsing phone numbers -
  7. Summary -
+

 

Diving in

Every modern programming language has built-in functions for working with strings. In Python, strings have methods for searching and replacing: index(), find(), split(), count(), replace(), &c. But these methods are limited to the simplest of cases. For example, the index() method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string s, you must call s.lower() or s.upper() and make sure your search strings are the appropriate case to match. The replace() and split() methods have the same limitations.

If your goal can be accomplished with string methods, you should use them. They’re fast and simple and easy to read, and there’s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with if statements to handle special cases, or if you’re chaining calls to split() and join() to slice-and-dice your strings, you may need to move up to regular expressions. diff --git a/strings.html b/strings.html index f93d603..758083b 100644 --- a/strings.html +++ b/strings.html @@ -14,21 +14,7 @@ body{counter-reset:h1 3}

I’m telling you this ’cause you’re one of my friends.
My alphabet starts where your alphabet ends!
Dr. Seuss, On Beyond Zebra! -

    -
  1. Diving in -
  2. Unicode -
      -
    1. How strings are stored in memory -
    2. Converting between different character encodings -
    3. Specifying character encoding in .py files -
    -
  3. Strings in Python 3 -
  4. Common string operations -
  5. Formatting strings -
  6. The string module -
  7. Strings vs. bytes -
  8. Further reading -
+

 

Diving in

Chinese has thousands of characters. The Rotokas alphabet of Bougainville is the smallest alphabet in the world, with just 12 letters. English has 26, plus a handful of punctuation marks. Python 3 can handle all of these languages, and more. diff --git a/unit-testing.html b/unit-testing.html index 4a18e67..9f5a3f9 100644 --- a/unit-testing.html +++ b/unit-testing.html @@ -13,13 +13,7 @@ body{counter-reset:h1 7}

Certitude is not the test of certainty. We have been cocksure of many things that were not so.
Oliver Wendell Holmes, Jr.

-
    -
  1. (Not) diving in -
  2. A single question -
  3. “Halt and catch fire” -
  4. More halting, more fire -
  5. ... -
+

 

(Not) diving in

How do you know that the code you wrote yesterday still works after the changes you made today? Every seasoned programmer has war stories of an “innocent” change that couldn't possibly have affected that other “unrelated” module… If this sounds familiar, this chapter is for you.

In this chapter, you're going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now step back and consider what it would take to expand that into a two-way utility. diff --git a/your-first-python-program.html b/your-first-python-program.html index 847f256..e0f9cd9 100644 --- a/your-first-python-program.html +++ b/your-first-python-program.html @@ -14,27 +14,7 @@ th{font-family:inherit !important}

Don’t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate.
Ven. Henepola Gunararatana

-
    -
  1. Diving in -
  2. Declaring functions -
      -
    1. How Python's datatypes compare to other programming languages -
    -
  3. Writing readable code -
      -
    1. Docstrings -
    2. Function annotations -
    3. Style conventions -
    -
  4. Everything is an object -
      -
    1. The import search path -
    2. What's an object? -
    -
  5. Indenting code -
  6. Running scripts -
  7. Further reading -
+

 

Diving in

Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it.

[The code examples will be easier to follow if you enable Javascript, but whatever.]