iterators and generators chapter

--HG--
rename : humansize.py => examples/humansize.py
rename : roman1.py => examples/roman1.py
rename : roman2.py => examples/roman2.py
rename : roman3.py => examples/roman3.py
rename : roman4.py => examples/roman4.py
rename : roman5.py => examples/roman5.py
rename : roman6.py => examples/roman6.py
rename : roman7.py => examples/roman7.py
rename : roman8.py => examples/roman8.py
rename : romantest1.py => examples/romantest1.py
rename : romantest2.py => examples/romantest2.py
rename : romantest3.py => examples/romantest3.py
rename : romantest4.py => examples/romantest4.py
rename : romantest5.py => examples/romantest5.py
rename : romantest6.py => examples/romantest6.py
rename : romantest7.py => examples/romantest7.py
rename : romantest8.py => examples/romantest8.py
This commit is contained in:
Mark Pilgrim
2009-03-27 01:43:33 -05:00
parent 18b0144075
commit 933dc9459a
52 changed files with 2247 additions and 695 deletions
+58 -47
View File
@@ -13,13 +13,13 @@ body{counter-reset:h1 3}
<h1>Strings</h1>
<blockquote class=q>
<p><span>&#x275D;</span> I&#8217;m telling you this &#8217;cause you&#8217;re one of my friends.<br>
My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; <cite>Dr. Seuss, On Beyond Zebra!</cite>
My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr. Seuss, On Beyond Zebra!
</blockquote>
<p id=toc>&nbsp;
<h2 id=boring-stuff>Some boring stuff you need to understand before you can dive in</h2>
<p class=f>Did you know that the people of <a href="http://en.wikipedia.org/wiki/Bougainville_Province">Bougainville</a> have the smallest alphabet in the world? Their <a href="http://en.wikipedia.org/wiki/Rotokas_alphabet">Rotokas alphabet</a> is composed of only 12 letters: A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters &mdash; 52 if you count uppercase and lowercase separately &mdash; plus a handful of <i class=baa>!@#$%&</i> punctuation marks.
<p>When people talk about &#8220;text,&#8221; they&#8217;re thinking of &#8220;characters and symbols on the computer screen.&#8221; But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <i>character encoding</i>. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
<p>When people talk about &#8220;text,&#8221; they&#8217;re thinking of &#8220;characters and symbols on the computer screen.&#8221; But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <i>character encoding</i>. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages.
<p>In reality, it&#8217;s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key. Whenever someone gives you a sequence of bytes &mdash; a file, a web page, whatever &mdash; and claims it&#8217;s &#8220;text,&#8221; you need to know what character encoding they used so you can decode the bytes into characters. If they give you the wrong key or no key at all, you&#8217;re left with the unenviable task of cracking the code yourself. Chances are you&#8217;ll get it wrong, and the result will be gibberish.
@@ -101,7 +101,7 @@ La Pe&ntilde;a</pre>
<p>Let's take another look at <a href=your-first-python-program.html#divingin><code>humansize.py</code></a>:
<p class=d>[<a href=humansize.py>download <code>humansize.py</code></a>]
<p class=d>[<a href=examples/humansize.py>download <code>humansize.py</code></a>]
<pre><code>
<a>SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], <span>&#x2460;</span></a>
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
@@ -149,6 +149,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<li>There's a lot going on here. First, that's a method call on a string literal. <em>Strings are objects</em>, and objects have methods. Second, the whole expression evaluates to a string. Third, <code>{0}</code> and <code>{1}</code> are <i>replacement fields</i>, which are replaced by the arguments passed to the <code>format()</code> method.
</ol>
<h3 id=compound-field-names>Compound field names</h3>
<p>The previous example shows the simplest case, where the replacement fields are simply integers. Integer replacement fields are treated as positional indices into the argument list of the <code>format()</code> method. That means that <code>{0}</code> is replaced by the first argument (<var>username</var> in this case), <code>{1}</code> is replaced by the second argument (<var>password</var>), <i class=baa>&amp;</i>c. You can have as many positional indices as you have arguments, and you can have as many arguments as you want. But replacement fields are much more powerful than that.
<pre class=screen>
@@ -160,11 +162,11 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<samp>'1000KB = 1MB'</samp>
</pre>
<ol>
<li>Rather than calling any function in the <code>humansize</code> module, you'll just grab one of the data structures it defines: the list of "SI" (powers-of-1000) suffixes.
<li>Rather than calling any function in the <code>humansize</code> module, you're just grabbing one of the data structures it defines: the list of "SI" (powers-of-1000) suffixes.
<li>This looks complicated, but it's not. <code>{0}</code> would refer to the first argument passed to the <code>format()</code> method, <var>si_suffixes</var>. But <var>si_suffixes</var> is a list. So <code>{0[0]}</code> refers to the first item of the list which is the first argument passed to the <code>format()</code> method: <code>'KB'</code>. Meanwhile, <code>{1[0]}</code> refers to the second item of the same list: <code>'MB'</code>. Everything outside the curly braces &mdash; including <code>1000</code>, the equals sign, and the spaces &mdash; is untouched. The final result is the string <code>'1000KB = 1MB'</code>.
</ol>
<p>What this example shows is that <em>format specifers can access items and properties of data structures using (almost) Python syntax</em>. The following things "just work":
<p>What this example shows is that <em>format specifers can access items and properties of data structures using (almost) Python syntax</em>. This is called <i>compound field names</i>. The following compound field names "just work":
<ul>
<li>Passing a list, and accessing an item of the list by index (as in the previous example)
@@ -193,6 +195,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<lI><code>sys.modules["humansize"].SUFFIXES[1000][0]</code> is the first item of the list of <abbr>SI</abbr> suffixes: <code>'KB'</code>. Therefore, the complete replacement field <code>{0.modules[humansize].SUFFIXES[1000][0]}</code> is replaced by the two-character string <code>KB</code>.
</ul>
<h3 id=format-specifiers>Format specifiers</h3>
<p>But wait! There's more! Let's take another look at that strange line of code from <code>humansize.py</code>:
<pre><code>if size < multiple:
@@ -210,58 +214,54 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<samp class=p>>>> </samp><kbd>"{0:.1f} {1}".format(698.25, 'GB')</kbd>
<samp>'698.3 GB'</samp></pre>
<p>For all the gory details on presentation types, check the <a href="http://docs.python.org/dev/3.0/library/string.html#format-specification-mini-language">Format Specification Mini-Language</a> in the official Python documentation.
<p>For all the gory details on format specifiers, consult the <a href="http://docs.python.org/dev/3.0/library/string.html#format-specification-mini-language">Format Specification Mini-Language</a> in the official Python documentation.
<div class=s>
<p>Note that <code>(k, v)</code> is a tuple. I told you they were good for something.
<h2 id=common-string-methods>Other common string methods</h2>
<p>You might be thinking that this is a lot of work just to do simple string concatentation, and you would be right, except that
string formatting isn't just concatenation. It's not even just formatting. It's also type coercion.
<p>Besides formatting, strings can do a number of other useful tricks.
<pre class=screen>
<samp class=p>>>> </samp><kbd>uid = "sa"</kbd>
<samp class=p>>>> </samp><kbd>pwd = "secret"</kbd>
<samp class=p>>>> </samp><kbd>print pwd + " is not a good password for " + uid</kbd> <span>&#x2460;</span>
secret is not a good password for sa
<samp class=p>>>> </samp><kbd>print "%s is not a good password for %s" % (pwd, uid)</kbd> <span>&#x2461;</span>
secret is not a good password for sa
<samp class=p>>>> </samp><kbd>userCount = 6</kbd>
<samp class=p>>>> </samp><kbd>print "Users connected: %d" % (userCount, )</kbd> <span>&#x2462;</span> <span>&#x2463;</span>
Users connected: 6
<samp class=p>>>> </samp><kbd>print "Users connected: " + userCount</kbd> <span>&#x2464;</span>
<samp class=traceback>Traceback (innermost last):
File "&lt;interactive input>", line 1, in ?
TypeError: cannot concatenate 'str' and 'int' objects</samp></pre>
<a><samp class=p>>>> </samp><kbd>s = """Finished files are the re-</kbd> <span>&#x2460;</span></a>
<samp class=p>... </samp><kbd>sult of years of scientif-</kbd>
<samp class=p>... </samp><kbd>ic study combined with the</kbd>
<samp class=p>... </samp><kbd>experience of years."""</kbd>
<a><samp class=p>>>> </samp><kbd>s.splitlines()</kbd> <span>&#x2461;</span></a>
<samp>['Finished files are the re-',
'sult of years of scientif-',
'ic study combined with the',
'experience of years.']</samp>
<a><samp class=p>>>> </samp><kbd>print(s.lower())</kbd> <span>&#x2462;</span></a>
<samp>finished files are the re-
sult of years of scientif-
ic study combined with the
experience of years.</samp>
<a><samp class=p>>>> </samp><kbd>s.lower().count("f")</kbd> <span>&#x2463;</span></a>
<samp>6</samp></pre>
<ol>
<li><code>+</code> is the string concatenation operator.
<li>In this trivial case, string formatting accomplishes the same result as concatentation.
<li><code>(userCount, )</code> is a tuple with one element. Yes, the syntax is a little strange, but there's a good reason for it: it's unambiguously a tuple. In fact, you can always include a comma after the last element when defining a list, tuple, or dictionary, but the comma is required when defining a tuple with one element. If the comma weren't required, Python wouldn't know whether <code>(userCount)</code> was a tuple with one element or just the value of <var>userCount</var>.
<li>String formatting works with integers by specifying <code>%d</code> instead of <code>%s</code>.
<li>Trying to concatenate a string with a non-string raises an exception. Unlike string formatting, string concatenation works only when everything is already a string.
<li>You can input multi-line strings in the Python interactive shell. Once you start a multi-line string with triple quotation marks, just hit <kbd>ENTER</kbd> and the interactive shell will prompt you to continue the string. Typing the closing triple quotation marks ends the string, and the next <kbd>ENTER</kbd> will execute the command (in this case, assigning the string to <var>s</var>).
<li>The <code>splitlines()</code> method takes one multi-line string and returns a list of strings, one for each line of the original. Note that the carriage returns at the end of each line are not included.
<li>The <code>lower()</code> method converts the entire string to lowercase. (Similarly, the <code>upper()</code> method converts a string to uppercase.)
<li>the <code>count()</code> method counts the number of occurrences of a substring. Yes, there really are six &#8220;f&#8221;s in that sentence!
</ol>
<p>As with <code>printf</code> in <abbr>C</abbr>, string formatting in Python is like a Swiss Army knife. There are options galore, and modifier strings to specially format many different types of values.
<!--
<p>What else can strings do? Here's a common idiom I use for getting bits of data out of semi-structured strings.
<pre class=screen>
<samp class=p>>>> </samp><kbd>print "Today's stock price: %f" % 50.4625</kbd> <span>&#x2460;</span>
<samp>50.462500</samp>
<samp class=p>>>> </samp><kbd>print "Today's stock price: %.2f" % 50.4625</kbd> <span>&#x2461;</span>
<samp>50.46</samp>
<samp class=p>>>> </samp><kbd>print "Change since yesterday: %+.2f" % 1.5</kbd> <span>&#x2462;</span>
<samp>+1.50</samp></pre>
<ol>
<li>The <code>%f</code> string formatting option treats the value as a decimal, and prints it to six decimal places.
<li>The ".2" modifier of the <code>%f</code> option truncates the value to two decimal places.
<li>You can even combine modifiers. Adding the <code>+</code> modifier displays a plus or minus sign before the value. Note that the ".2" modifier is still in place, and is padding the value to exactly two decimal places.
</ol>
</div>
<h2 id=common-string-methods>Common string methods</h2>
<samp class=p>>>> </samp><kbd>import subprocess</kbd>
<samp class=p>>>> </samp><kbd>df = subprocess.getoutput('df -x tmpfs')</kbd>
<samp class=p>>>> </samp><kbd>print(df)</kbd>
<samp>Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 461215812 73256908 364529712 17% /
/dev/sdb1 721075720 620495832 63951288 91% /backup</samp>
<samp class=p>>>> </samp><kbd>
-->
<!--
['capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
-->
<!--
<p>[FIXME is it worth keeping this section on joining lists / splitting strings? All the examples are from an old code sample that isn't used at all anymore.]
<div class=s>
@@ -275,7 +275,13 @@ TypeError: cannot concatenate 'str' and 'int' objects</samp></pre>
is an object. You might have thought I meant that string <em>variables</em> are objects. But no, look closely at this example and you'll see that the string <code>";"</code> itself is an object, and you are calling its <code>join</code> method.
<p>The <code>join</code> method joins the elements of the list into a single string, with each element separated by a semi-colon. The delimiter doesn't need to be a semi-colon; it doesn't even need to be a single character. It can be any string.
<!--<code>join</code> works only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements will raise an exception.-->
<code>join</code> works only on lists of strings; it does not do any type coercion. Joining a list that has one or more non-string elements will raise an exception.
<pre class=screen>
<samp class=p>>>> </samp><kbd>params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}</kbd>
@@ -302,10 +308,15 @@ is an object. You might have thought I meant that string <em>variables</em> are
<li><code>split</code> takes an optional second argument, which is the number of times to split. (&#8220;Oooooh, optional arguments...&#8221; You'll learn how to do this in your own functions in the next chapter.)
</ol>
<!--<code><var>anystring</var>.<code>split</code>(<var>delimiter</var>, 1)</code> is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element).-->
</div>
<h2 id=string-operations>Common string operations</h2>
<code><var>anystring</var>.<code>split</code>(<var>delimiter</var>, 1)</code> is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element).
</div>
-->
<h2 id=string-module>The <code>string</code> module</h2>