diff --git a/advanced-iterators.html b/advanced-iterators.html index d012cc9..fb099d8 100755 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -19,9 +19,7 @@ mark{display:inline}

 

Diving In

-

HAWAII + IDAHO + IOWA + OHIO == STATES. Or, to put it another way, 510199 + 98153 + 9301 + 3593 == 621246. Am I speaking in tongues? No, it’s just a puzzle. - -

Let me spell it out for you. +

Just as regular expressions put strings on steroids, the itertools module puts iterators on steroids. But first, I want to show you a classic puzzle.

HAWAII + IDAHO + IOWA + OHIO == STATES
 510199 + 98153 + 9301 + 3593 == 621246
diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html
index 21d53e2..5de2c7a 100755
--- a/case-study-porting-chardet-to-python-3.html
+++ b/case-study-porting-chardet-to-python-3.html
@@ -21,7 +21,7 @@ del{background:#f87}
 
 

 

Diving In

-

Unknown or incorrect character encoding is the #1 cause of gibberish text on the web, in your inbox, and indeed across every computer system ever written. In Chapter 3, I talked about the history of character encoding and the creation of Unicode, the “one encoding to rule them all.” I’d love it if I never had to see a gibberish character on a web page again, because all authoring systems stored accurate encoding information, all transfer protocols were Unicode-aware, and every system that handled text maintained perfect fidelity when converting between encodings. +

Question: what’s the #1 cause of gibberish text on the web, in your inbox, and across every computer system ever written? It’s character encoding. In Chapter 3, I talked about the history of character encoding and the creation of Unicode, the “one encoding to rule them all.” I’d love it if I never had to see a gibberish character on a web page again, because all authoring systems stored accurate encoding information, all transfer protocols were Unicode-aware, and every system that handled text maintained perfect fidelity when converting between encodings.

I’d also like a pony.

A Unicode pony.

A Unipony, as it were. diff --git a/comprehensions.html b/comprehensions.html index 7fb2071..57e0215 100644 --- a/comprehensions.html +++ b/comprehensions.html @@ -18,7 +18,7 @@ body{counter-reset:h1 3}

 

Diving In

-

This chapter will teach you about list comprehensions, dictionary comprehensions, and set comprehensions: three related concepts centered around one very powerful technique. But first, I want to take a little detour into two modules that will help you navigate your local file system. +

Every programming language has that one feature, a complicated thing intentionally made simple. If you’re coming from another language, you could easily miss it, because your old language didn’t make that thing simple (because it was busy making something else simple instead). This chapter will teach you about list comprehensions, dictionary comprehensions, and set comprehensions: three related concepts centered around one very powerful technique. But first, I want to take a little detour into two modules that will help you navigate your local file system.

⁂ diff --git a/generators.html b/generators.html index fdaa3da..1f965fe 100755 --- a/generators.html +++ b/generators.html @@ -18,7 +18,7 @@ body{counter-reset:h1 6}

 

Diving In

-

For reasons passing all understanding, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin (to name a few). Actually, “borrows” is the wrong word; “pillages” is more like it. Or perhaps “assimilates” — like the Borg. Yes, I like that. +

Having grown up the son of a librarian and an English major, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin (to name a few). Actually, “borrows” is the wrong word; “pillages” is more like it. Or perhaps “assimilates” — like the Borg. Yes, I like that.

We are the Borg. Your linguistic and etymological distinctiveness will be added to our own. Resistance is futile.

In this chapter, you’re going to learn about plural nouns. Also, functions that return other functions, advanced regular expressions, and generators. But first, let’s talk about how to make plural nouns. (If you haven’t read the chapter on regular expressions, now would be a good time. This chapter assumes you understand the basics of regular expressions, and it quickly descends into more advanced uses.)

If you grew up in an English-speaking country or learned English in a formal school setting, you’re probably familiar with the basic rules: diff --git a/http-web-services.html b/http-web-services.html index a72fda5..6518ab4 100755 --- a/http-web-services.html +++ b/http-web-services.html @@ -19,7 +19,7 @@ mark{display:inline}

 

Diving In

-

HTTP web services are programmatic ways of sending and receiving data from remote servers using nothing but the operations of HTTP. If you want to get data from the server, use HTTP GET; if you want to send new data to the server, use HTTP POST. Some more advanced HTTP web service APIs also allow creating, modifying, and deleting data, using HTTP PUT and HTTP DELETE. In other words, the “verbs” built into the HTTP protocol (GET, POST, PUT, and DELETE) can map directly to application-level operations for retrieving, creating, modifying, and deleting data. +

Philosophically, I can describe HTTP web services in 12 words: exchanging data with remote servers using nothing but the operations of HTTP. If you want to get data from the server, use HTTP GET. If you want to send new data to the server, use HTTP POST. Some more advanced HTTP web service APIs also allow creating, modifying, and deleting data, using HTTP PUT and HTTP DELETE. That’s it. No registries, no envelopes, no wrappers, no tunneling. The “verbs” built into the HTTP protocol (GET, POST, PUT, and DELETE) map directly to application-level operations for retrieving, creating, modifying, and deleting data.

The main advantage of this approach is simplicity, and its simplicity has proven popular. Data — usually XML or JSON — can be built and stored statically, or generated dynamically by a server-side script, and all major programming languages (including Python, of course!) include an HTTP library for downloading it. Debugging is also easier; because each resource in an HTTP web service has a unique address (in the form of a URL), you can load it in your web browser and immediately see the raw data. diff --git a/installing-python.html b/installing-python.html index 113e92a..4793167 100755 --- a/installing-python.html +++ b/installing-python.html @@ -21,12 +21,10 @@ h2,.i>li{clear:both}

 

Diving In

-

Welcome to Python 3. Let's dive in. In this chapter, you'll install the version of Python 3 that's right for you. +

Before you can start programming in Python 3, you need to install it. Or do you?

Which Python Is Right For You?

-

The first thing you need to do with Python is install it. Or do you? -

If you're using an account on a hosted server, your ISP may have already installed Python 3. If you’re running Linux at home, you may already have Python 3, too. Most popular GNU/Linux distributions come with Python 2 in the default installation; a small but growing number of distributions also include Python 3. Mac OS X includes a command-line version of Python 2, but as of this writing it does not include Python 3. Microsoft Windows does not come with any version of Python. But don’t despair! You can point-and-click your way through installing Python, regardless of what operating system you have.

The easiest way to check for Python 3 on your Linux or Mac OS X system is to get to a command line. On Linux, look in your Applications menu for a program called Terminal. (It may be in a submenu like Accessories or System.) On Mac OS X, there is an application called Terminal.app in your /Application/Utilities/ folder. diff --git a/iterators.html b/iterators.html index ca1a532..4b4a3f5 100755 --- a/iterators.html +++ b/iterators.html @@ -18,7 +18,7 @@ body{counter-reset:h1 7}

 

Diving In

-

Generators are really just a special case of iterators. A function that yields values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that. +

Iterators are the “secret sauce” of Python 3. They’re everywhere, underlying everything, always just out of sight. Comprehensions are just a simple form of iterators. Generators are just a simple form of iterators. A function that yields values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that.

Remember the Fibonacci generator? Here it is as a built-from-scratch iterator: diff --git a/native-datatypes.html b/native-datatypes.html index 1393d9c..90934cf 100755 --- a/native-datatypes.html +++ b/native-datatypes.html @@ -18,7 +18,7 @@ body{counter-reset:h1 2}

 

Diving In

-

Cast aside your first Python program for just a minute, and let’s talk about datatypes. In Python, every value has a datatype, but you don’t need to declare the datatype of variables. How does that work? Based on each variable’s original assignment, Python figures out what type it is and keeps tracks of that internally. +

Datatypes. Set aside your first Python program for just a minute, and let’s talk about datatypes. In Python, every value has a datatype, but you don’t need to declare the datatype of variables. How does that work? Based on each variable’s original assignment, Python figures out what type it is and keeps tracks of that internally.

Python has many native datatypes. Here are the important ones:

  1. Booleans are either True or False. diff --git a/packaging.html b/packaging.html index 43edd2c..cb085ef 100644 --- a/packaging.html +++ b/packaging.html @@ -19,9 +19,7 @@ mark{display:inline}

     

    Diving In

    -

    So you want to release a Python script, library, framework, or application. Excellent. The world needs more Python code. - -

    Python 3 comes with a packaging framework called Distutils. Distutils is many things: a build tool (for you), an installation tool (for your users), a package metadata format (for search engines), and more. It integrates with the Python Package Index (“PyPI”), a central repository for open source Python libraries. +

    Real artists ship. Or so says Steve Jobs. Do you want to release a Python script, library, framework, or application? Excellent. The world needs more Python code. Python 3 comes with a packaging framework called Distutils. Distutils is many things: a build tool (for you), an installation tool (for your users), a package metadata format (for search engines), and more. It integrates with the Python Package Index (“PyPI”), a central repository for open source Python libraries.

    All of these facets of Distutils center around the setup script, traditionally called setup.py. In fact, you’ve already seen several Distutils setup scripts in this book. You used Distutils to install httplib2 in HTTP Web Services and again to install chardet in Case Study: Porting chardet to Python 3. diff --git a/porting-code-to-python-3-with-2to3.html b/porting-code-to-python-3-with-2to3.html index 1c48f58..958b14a 100644 --- a/porting-code-to-python-3-with-2to3.html +++ b/porting-code-to-python-3-with-2to3.html @@ -24,7 +24,7 @@ h3:before{counter-increment:h3;content:'A.' counter(h2) '.' counter(h3) '. '}

    Diving In

    -

    Virtually all Python 2 programs will need at least some tweaking to run properly under Python 3. To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can’t fix automatically. This appendix documents what it can fix automatically. +

    So much has changed between Python 2 and Python 3, there are vanishingly few programs that will run unmodified under both. But don’t despair! To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study: porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can’t fix automatically. This appendix documents what it can fix automatically.

    print statement

    diff --git a/refactoring.html b/refactoring.html index cfd4573..af177e3 100755 --- a/refactoring.html +++ b/refactoring.html @@ -18,7 +18,7 @@ body{counter-reset:h1 10}

     

    Diving In

    -

    Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven’t written yet. +

    Like it or not, bugs happen. Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven’t written yet.

    >>> import roman7
     >>> roman7.from_roman('') 
    diff --git a/regular-expressions.html b/regular-expressions.html
    index edd23e9..a680d50 100755
    --- a/regular-expressions.html
    +++ b/regular-expressions.html
    @@ -18,7 +18,7 @@ body{counter-reset:h1 5}
     
     

     

    Diving In

    -

    Every modern programming language has built-in functions for working with strings. In Python, strings have methods for searching and replacing: index(), find(), split(), count(), replace(), &c. But these methods are limited to the simplest of cases. For example, the index() method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string s, you must call s.lower() or s.upper() and make sure your search strings are the appropriate case to match. The replace() and split() methods have the same limitations. +

    Getting a small bit of text out of a large block of text is a challenge. In Python, strings have methods for searching and replacing: index(), find(), split(), count(), replace(), &c. But these methods are limited to the simplest of cases. For example, the index() method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string s, you must call s.lower() or s.upper() and make sure your search strings are the appropriate case to match. The replace() and split() methods have the same limitations.

    If your goal can be accomplished with string methods, you should use them. They’re fast and simple and easy to read, and there’s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with if statements to handle special cases, or if you’re chaining calls to split() and join() to slice-and-dice your strings, you may need to move up to regular expressions.

    Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing text with complex patterns of characters. Although the regular expression syntax is tight and unlike normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them.

    diff --git a/serializing.html b/serializing.html index d5746de..1b98b26 100644 --- a/serializing.html +++ b/serializing.html @@ -18,7 +18,7 @@ body{counter-reset:h1 13}

     

    Diving In

    -

    The concept of serialization is simple. You have a data structure in memory that you want to save, reuse, or send to someone else. How would you do that? Well, that depends on how you want to save it, how you want to reuse it, and to whom you want to send it. Many games allow you to save your progress when you quit the game and pick up where you left off when you relaunch the game. (Actually, many non-gaming applications do this as well.) In this case, a data structure that captures “your progress so far” needs to be stored on disk when you quit, then loaded from disk when you relaunch. The data is only meant to be used by the same program that created it, never sent over a network, and never read by anything other than the program that created it. Therefore, the interoperability issues are limited to ensuring that later versions of the program can read data written by earlier versions. +

    On the surface, the concept of serialization is simple. You have a data structure in memory that you want to save, reuse, or send to someone else. How would you do that? Well, that depends on how you want to save it, how you want to reuse it, and to whom you want to send it. Many games allow you to save your progress when you quit the game and pick up where you left off when you relaunch the game. (Actually, many non-gaming applications do this as well.) In this case, a data structure that captures “your progress so far” needs to be stored on disk when you quit, then loaded from disk when you relaunch. The data is only meant to be used by the same program that created it, never sent over a network, and never read by anything other than the program that created it. Therefore, the interoperability issues are limited to ensuring that later versions of the program can read data written by earlier versions.

    For cases like this, the pickle module is ideal. It’s part of the Python standard library, so it’s always available. It’s fast; the bulk of it is written in C, like the Python interpreter itself. It can store arbitrarily complex Python data structures. diff --git a/special-method-names.html b/special-method-names.html index 160fa94..cb7c807 100644 --- a/special-method-names.html +++ b/special-method-names.html @@ -20,7 +20,7 @@ h3:before{counter-increment:h3;content:'B.' counter(h2) '.' counter(h3) '. '}

     

    Diving In

    -

    We’ve already covered a few special method names elsewhere in this book — “magic” methods that Python invokes when you use certain syntax. Using special methods, your classes can act like sequences, like dictionaries, like functions, like iterators, or even like numbers! This appendix serves both as a reference for the special methods we’ve seen already and a brief introduction to some of the more esoteric ones. +

    Throughout this book, you’ve seen examples of “special methods” — certain “magic” methods that Python invokes when you use certain syntax. Using special methods, your classes can act like sequences, like dictionaries, like functions, like iterators, or even like numbers. This appendix serves both as a reference for the special methods we’ve seen already and a brief introduction to some of the more esoteric ones.

    Basics

    diff --git a/strings.html b/strings.html index 188fa6d..829536a 100755 --- a/strings.html +++ b/strings.html @@ -19,9 +19,9 @@ My alphabet starts where your alphabet ends!
    &m

     

    Some Boring Stuff You Need To Understand Before You Can Dive In

    -

    Did you know that the people of Bougainville have the smallest alphabet in the world? Their Rotokas alphabet is composed of only 12 letters: A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters — 52 if you count uppercase and lowercase separately — plus a handful of !@#$%& punctuation marks. +

    Few people think about it, but text is incredibly complicated. Start with the alphabet. The people of Bougainville have the smallest alphabet in the world; their Rotokas alphabet is composed of only 12 letters: A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters — 52 if you count uppercase and lowercase separately — plus a handful of !@#$%& punctuation marks. -

    When people talk about “text,” they’re thinking of “characters and symbols on the computer screen.” But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular character encoding. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. +

    When you talk about “text,” you’re probably thinking of “characters and symbols on my computer screen.” But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular character encoding. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages.

    In reality, it’s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key. Whenever someone gives you a sequence of bytes — a file, a web page, whatever — and claims it’s “text,” you need to know what character encoding they used so you can decode the bytes into characters. If they give you the wrong key or no key at all, you’re left with the unenviable task of cracking the code yourself. Chances are you’ll get it wrong, and the result will be gibberish. diff --git a/unit-testing.html b/unit-testing.html index 6e2a263..ee2faf4 100755 --- a/unit-testing.html +++ b/unit-testing.html @@ -18,7 +18,9 @@ body{counter-reset:h1 9}

     

    (Not) Diving In

    -

    In this chapter, you’re going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now step back and consider what it would take to expand that into a two-way utility. +

    Kids today. So spoiled by these fast computers and fancy “dynamic” languages. Write first, ship second, debug third (if ever). In my day, we had discipline. Discipline, I say! We had to write programs by hand, on paper, and feed them to the computer on punchcards. And we liked it! + +

    In this chapter, you’re going to write and debug a set of utility functions to convert to and from Roman numerals. You saw the mechanics of constructing and validating Roman numerals in “Case study: roman numerals”. Now step back and consider what it would take to expand that into a two-way utility.

    The rules for Roman numerals lead to a number of interesting observations:

    1. There is only one correct way to represent a particular number as a Roman numeral. diff --git a/whats-new.html b/whats-new.html index 774b96b..49bb3e9 100644 --- a/whats-new.html +++ b/whats-new.html @@ -19,7 +19,7 @@ h3:before{content:''}

       

      a.k.a. “the minus level”

      -

      You read the original “Dive Into Python” and maybe even bought it on paper. (Thanks!) You already know Python 2 pretty well. You’re ready to take the plunge into Python 3. … If all of that is true, read on. (If none of that is true, you’d be better off starting at the beginning.) +

      Are you already a Python programmer? Did you read the original “Dive Into Python”, or maybe even buy it on paper? (Thanks!) Are you ready to take the plunge into Python 3? … If so, read on. (If none of that is true, you’d be better off starting at the beginning.)

      Python 3 comes with a script called 2to3. Learn it. Love it. Use it. Porting Code to Python 3 with 2to3 is a reference of all the things that the 2to3 tool can fix automatically. Since a lot of those things are syntax changes, it’s a good starting point to learn about a lot of the syntax changes in Python 3. (print is now a function, `x` doesn’t work, &c.) diff --git a/where-to-go-from-here.html b/where-to-go-from-here.html index 55616f2..a52b7d6 100644 --- a/where-to-go-from-here.html +++ b/where-to-go-from-here.html @@ -19,7 +19,7 @@ h3:before{counter-increment:h3;content:'C.' counter(h2) '.' counter(h3) '. '}

       

      Things to Read

      -

      There are a number of topics that I decided not to cover in this book, for which free tutorials exist. +

      Unfortunately, I can not write cover every facet of Python 3 in this book. Fortunately, there are many wonderful, freely available tutorials available elsewhere.

      Decorators: diff --git a/xml.html b/xml.html index fe99b3a..40a20b7 100755 --- a/xml.html +++ b/xml.html @@ -19,7 +19,7 @@ mark{display:inline}

       

      Diving In

      -

      Most of the chapters in this book have centered around a piece of sample code. But XML isn’t about code; it’s about data. One common use of XML is “syndication feeds” that list the latest articles on a blog, forum, or other frequently-updated website. Most popular blogging software can produce a feed and update it whenever new articles, discussion threads, or blog posts are published. You can follow a blog by “subscribing” to its feed, and you can follow multiple blogs with a dedicated “feed aggregator” like Google Reader. +

      Nearly all the chapters in this book revolve around a piece of sample code. But XML isn’t about code; it’s about data. One common use of XML is “syndication feeds” that list the latest articles on a blog, forum, or other frequently-updated website. Most popular blogging software can produce a feed and update it whenever new articles, discussion threads, or blog posts are published. You can follow a blog by “subscribing” to its feed, and you can follow multiple blogs with a dedicated “feed aggregator” like Google Reader.

      Here, then, is the XML data we’ll be working with in this chapter. It’s a feed — specifically, an Atom syndication feed. diff --git a/your-first-python-program.html b/your-first-python-program.html index a8cefc5..3c726ae 100755 --- a/your-first-python-program.html +++ b/your-first-python-program.html @@ -22,7 +22,7 @@ mark{display:inline}

       

      Diving In

      -

      Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let’s skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don’t worry about that, because you’re going to dissect it line by line. But read through it first and see what, if anything, you can make of it. +

      Convention dictates that I should bore you with the fundamental building blocks of programming, so we can slowly work up to building something useful. Let’s skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don’t worry about that, because you’re going to dissect it line by line. But read through it first and see what, if anything, you can make of it.

      [download humansize.py]

      SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
                   1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}