asterisms for everyone!

This commit is contained in:
Mark Pilgrim
2009-05-29 22:12:00 -07:00
parent b5c0538af2
commit 5b0405f6a7
14 changed files with 159 additions and 3 deletions
+2
View File
@@ -98,6 +98,8 @@ class OrderedDict(dict, collections.MutableMapping):
return all(p==q for p, q in itertools.zip_longest(self.items(), other.items()))
return dict.__eq__(self, other)</code></pre>
<p class=a>&#x2042;
<h2 id=implementing-fractions>Implementing Fractions</h2>
<p class=nav><a rel=prev class=todo><span>&#x261C;</span></a> <a rel=next class=todo><span>&#x261E;</span></a>
+20
View File
@@ -83,6 +83,8 @@ if __name__ == '__main__':
<samp>SEND + MORE == MONEY
9567 + 1085 == 10652</samp></pre>
<p class=a>&#x2042;
<h2 id=re-findall>Finding all occurrences of a pattern</h2>
<p>The first thing this alphametics solver does is find all the letters (A&ndash;Z) in the puzzle.
@@ -98,6 +100,8 @@ if __name__ == '__main__':
<li>Here the regular expression pattern matches sequences of letters. Again, the return value is a list, and each item in the list is a string that matched the regular expression pattern.
</ol>
<p class=a>&#x2042;
<h2 id=unique-items>Finding the unique items in a sequence</h2>
<p>Set comprehensions make it trivial to find the unique items in a sequence. [FIXME-not sure if I&#8217;m going to cover set comprehensions in an earlier chapter; if not, this is certainly an abrupt and inadequate introduction to the topic.]
@@ -127,6 +131,8 @@ if __name__ == '__main__':
<p>This list is later used to assign digits to characters as the solver iterates through the possible solutions.
<p class=a>&#x2042;
<h2 id=assert>Making assertions</h2>
<p>Like many programming languages, Python has an <code>assert</code> statement. Here&#8217;s how it works.
@@ -155,6 +161,8 @@ AssertionError</samp></pre>
<p>The alphametics solver uses this exact <code>assert</code> statement to bail out early if the puzzle contains more than ten unique letters. Since each letter is assigned a unique digit, and there are only ten digits, a puzzle with more than ten unique letters is unsolvable.
<p class=a>&#x2042;
<h2 id=generator-expressions>Generator expressions</h2>
<p>A generator expression is like a <a href=generators.html>generator function</a> without the function.
@@ -185,6 +193,8 @@ AssertionError</samp></pre>
gen = ord_map(unique_characters)</code></pre>
<p class=a>&#x2042;
<h2 id=permutations>Calculating Permutations&hellip; The Lazy Way!</h2>
<p>First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you&#8217;re doing. Here I&#8217;m talking about combinatorics, but if that doesn&#8217;t mean anything to you, don&#8217;t worry about it. As always, <a href=http://en.wikipedia.org/wiki/Permutation>Wikipedia is your friend</a>.)
@@ -249,6 +259,8 @@ StopIteration</samp>
<li>Since the <code>permutations()</code> function always returns an iterator, an easy way to debug permutations is to pass that iterator to the built-in <code>list()</code> function to see all the permutations immediately.
</ol>
<p class=a>&#x2042;
<h2 id=more-itertools>Other Fun Stuff in the <code>itertools</code> Module</h2>
<pre class=screen>
<samp class=p>>>> </samp><kbd>import itertools</kbd>
@@ -372,6 +384,8 @@ for guess in itertools.permutations(digits, len(characters)):
<p>But what is this <code>translate()</code> method? Ah, now you&#8217;re getting to the <em>really</em> fun part.
<p class=a>&#x2042;
<h2 id=string-translate>A New Kind Of String Manipulation</h2>
<p>Python strings have many methods. You learned about some of those methods in <a href=strings.html>the Strings chapter</a>: <code>lower()</code>, <code>count()</code>, and <code>format()</code>. Now I want to introduce you to a powerful but little-known string manipulation technique: the <code>translate()</code> method.
@@ -411,6 +425,8 @@ for guess in itertools.permutations(digits, len(characters)):
<p>That&#8217;s pretty impressive. But what can you do with a string that happens to be a valid Python expression?
<p class=a>&#x2042;
<h2 id=eval>Evaluating Arbitrary Strings As Python Expressions</h2>
<p>This is the final piece of the puzzle (or rather, the final piece of the puzzle solver). After all that fancy string manipulation, we&#8217;re left with a string like <code>'9567 + 1085 == 10652'</code>. But that&#8217;s a string, and what good is a string? Enter <code>eval()</code>, the universal Python evaluation tool.
@@ -542,6 +558,8 @@ NameError: name '__import__' is not defined</samp></pre>
<p>So, in the end, it <em>is</em> possible to safely evaluate untrusted Python expressions. Passing <code>{"__builtins__": None}</code> as the second parameter to the <code>eval()</code> function is non-intuitive (and not the default behavior), but it does work. If you understand <em>why</em> it works, you&#8217;re less likely to use <code>eval()</code> incorrectly, in a way that works with trusted input but has potentially devastating consequences with untrusted input.
<p class=a>&#x2042;
<h2 id=alphametics-finale>Putting It All Together</h2>
<p>To recap: this program solves alphametic puzzles by brute force, <i>i.e.</i> through an exhaustive search of all possible solutions. To do this, it&hellip;
@@ -559,6 +577,8 @@ NameError: name '__import__' is not defined</samp></pre>
<p>&hellip;in just 14 lines of code.
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
@@ -29,6 +29,8 @@ del{background:#f87}
<p>A Unipony, as it were.
<p>I&#8217;ll settle for character encoding auto-detection.
<p class=a>&#x2042;
<h2 id=faq.what>What is Character Encoding Auto-Detection?</h2>
<p>It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It&#8217;s like cracking a code when you don&#8217;t have the decryption key.
@@ -39,6 +41,8 @@ del{background:#f87}
<h3 id=faq.who>Does Such An Algorithm Exist?</h3>
<p>As it turns out, yes. All major browsers have character encoding auto-detection, because the web is full of pages that have no encoding information whatsoever. <a href=http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/src/base/>Mozilla Firefox contains an encoding auto-detection library</a> which is open source. <a href=http://chardet.feedparser.org/>I ported the library to Python 2</a> and dubbed it the <code>chardet</code> module. This chapter will take you step-by-step through the process of porting the <code>chardet</code> module from Python 2 to Python 3.
<p class=a>&#x2042;
<h2 id=divingin2>Introducing The <code>chardet</code> Module</h2>
<p>[FIXME download link, possibly on chardet.feedparser.org, possibly local]
<p>Before we set off porting the code, it would help if you understood how the code worked! This is a brief guide to navigating the code itself.
@@ -70,6 +74,8 @@ del{background:#f87}
<p>Hebrew is handled as a special case. If the text appears to be Hebrew based on 2-character distribution analysis, <code>HebrewProber</code> (defined in <code>hebrewprober.py</code>) tries to distinguish between Visual Hebrew (where the source text actually stored &#8220;backwards&#8221; line-by-line, and then displayed verbatim so it can be read from right to left) and Logical Hebrew (where the source text is stored in reading order and then rendered right-to-left by the client). Because certain characters are encoded differently based on whether they appear in the middle of or at the end of a word, we can make a reasonable guess about direction of the source text, and return the appropriate encoding (<code>windows-1255</code> for Logical Hebrew, or <code>ISO-8859-8</code> for Visual Hebrew).
<h3 id=how.windows1252><code>windows-1252</code></h3>
<p>If <code>UniversalDetector</code> detects a high-bit character in the text, but none of the other multi-byte or single-byte encoding probers return a confident result, it creates a <code>Latin1Prober</code> (defined in <code>latin1prober.py</code>) to try to detect English text in a <code>windows-1252</code> encoding. This detection is inherently unreliable, because English letters are encoded in the same way in many different encodings. The only way to distinguish <code>windows-1252</code> is through commonly used symbols like smart quotes, curly apostrophes, copyright symbols, and the like. <code>Latin1Prober</code> automatically reduces its confidence rating to allow more accurate probers to win if at all possible.
<p class=a>&#x2042;
<h2 id=running2to3>Running <code>2to3</code></h2>
<p>We&#8217;re going to migrate the <code>chardet</code> module from Python 2 to Python 3. Python 3 comes with a utility script called <code>2to3</code>, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. In some cases this is easy &mdash; a function was renamed or moved to a different modules &mdash; but in other cases it can get pretty complex. To get a sense of all that it <em>can</em> do, refer to the appendix, <a href=porting-code-to-python-3-with-2to3.html>Porting code to Python 3 with <code>2to3</code></a>. In this chapter, we&#8217;ll start by running <code>2to3</code> on the <code>chardet</code> package, but as you&#8217;ll see, there will still be a lot of work to do after the automated tools have performed their magic.
<p>The main <code>chardet</code> package is split across several different files, all in the same directory. The <code>2to3</code> script makes it easy to convert multiple files at once: just pass a directory as a command line argument, and <code>2to3</code> will convert each of the files in turn.
@@ -572,6 +578,8 @@ RefactoringTool: Files that were modified:
RefactoringTool: test.py</samp></pre>
<p>[FIXME explain the difference in import syntax]
<p>Well, that wasn&#8217;t so hard. Just a few imports and print statements to convert. Time to run the new version. Do you think it&#8217;ll work?
<p class=a>&#x2042;
<h2 id=manual>Fixing What <code>2to3</code> Can&#8217;t</h2>
<h3 id=falseisinvalidsyntax><code>False</code> is invalid syntax</h3>
<aside>You do have tests, right?</aside>
@@ -1171,6 +1179,8 @@ tests\EUC-JP\arclamp.jp.xml EUC-JP with confide
.
316 tests</samp></pre>
<p>Holy crap, it actually works! <em><a href=http://www.hampsterdance.com/>/me does a little dance</a></em>
<p class=a>&#x2042;
<h2 id=summary>Summary</h2>
<p>What have we learned?
<ol>
+11 -3
View File
@@ -35,7 +35,7 @@ Classname Legend
.q = "quote" = quote at beginning of each chapter
.f = "fancy" = first paragraph of each chapter (gets a fancy drop-cap)
.c = "centered" = centered footer text (also clears floats)
.s = "simple" =
.a = "asterism" = section break
.nm = "no mobile" = hide this section on mobile devices
.nd = "no decoration" = hide the widgets on this code block
@@ -52,6 +52,7 @@ Acknowledgements & Inspirations
"Compose to a Vertical Rhythm" ........................... http://24ways.org/2006/compose-to-a-vertical-rhythm
"Use the Best Available Ampersand" ....................... http://simplebits.com/notebook/2008/08/14/ampersands.html
"Unicode Support in HTML, Fonts, and Web Browsers" ....... http://alanwood.net/unicode/
"Punctuation" ............................................ http://en.wikipedia.org/wiki/Punctuation
*/
/* typography */
@@ -67,7 +68,7 @@ pre, kbd, samp, code, var, .b {
span {
font: medium 'Arial Unicode MS', FreeSerif, OpenSymbol, 'DejaVu Sans', sans-serif;
}
pre span {
pre span, .a {
font-family: 'Arial Unicode MS', 'DejaVu Sans', FreeSerif, OpenSymbol, sans-serif;
}
.baa {
@@ -119,11 +120,18 @@ html {
body {
margin: 1.75em 28px;
}
.c {
.c, .a {
clear: both;
text-align: center;
}
.c {
margin: 2.154em 0;
}
.a {
font-size: xx-large;
line-height: .875;
color: #444;
}
form div, #level {
float: right;
}
+12
View File
@@ -33,6 +33,8 @@ body{counter-reset:h1 5}
<p>(I know, there are a lot of exceptions. <i>Man</i> becomes <i>men</i> and <i>woman</i> becomes <i>women</i>, but <i>human</i> becomes <i>humans</i>. <i>Mouse</i> becomes <i>mice</i> and <i>louse</i> becomes <i>lice</i>, but <i>house</i> becomes <i>houses</i>. <i>Knife</i> becomes <i>knives</i> and <i>wife</i> becomes <i>wives</i>, but <i>lowlife</i> becomes <i>lowlifes</i>. And don&#8217;t even get me started on words that are their own plural, like <i>sheep</i>, <i>deer</i>, and <i>haiku</i>.)
<p>Other languages, of course, are completely different.
<p>Let&#8217;s design a Python library that automatically pluralizes English nouns. We&#8217;ll start just these four rules, but keep in mind that you&#8217;ll inevitably need to add more.
<p class=a>&#x2042;
<h2 id=i-know>I Know, Let&#8217;s Use Regular Expressions!</h2>
<p>So you&#8217;re looking at words, which, at least in English, means you&#8217;re looking at strings of characters. You have rules that say you need to find different combinations of characters, then do different things to them. This sounds like a job for regular expressions!
<p class=d>[<a href=examples/plural1.py>download <code>plural1.py</code></a>]
@@ -117,6 +119,8 @@ def plural(noun):
</ol>
<p>Regular expression substitutions are extremely powerful, and the <code>\1</code> syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder to read, and it doesn&#8217;t directly map to the way you first described the pluralizing rules. You originally laid out rules like &#8220;if the word ends in S, X, or Z, then add ES&#8221;. If you look at this function, you have two lines of code that say &#8220;if the word ends in S, X, or Z, then add ES&#8221;. It doesn&#8217;t get much more direct than that.
<p class=a>&#x2042;
<h2 id=a-list-of-functions>A List Of Functions</h2>
<p>Now you&#8217;re going to add a level of abstraction. You started by defining a list of rules: if this, do that, otherwise go to the next rule. Let&#8217;s temporarily complicate part of the program so you can simplify another part.
@@ -195,6 +199,8 @@ def plural(noun):
<p>But this is really just a stepping stone to the next section. Let&#8217;s move on&hellip;
<p class=a>&#x2042;
<h2 id=a-list-of-patterns>A List Of Patterns</h2>
<p>Defining separate named functions for each match and apply rule isn&#8217;t really necessary. You never call them directly; you add them to the <var>rules</var> list and call them through there. Furthermore, each function follows one of two patterns. All the match functions call <code>re.search()</code>, and all the apply functions call <code>re.sub()</code>. Let&#8217;s factor out the patterns so that defining new rules can be easier.
@@ -241,6 +247,8 @@ def build_match_and_apply_functions(pattern, search, replace):
<li>Since the <var>rules</var> list is the same as the previous example (really, it is), it should come as no surprise that the <code>plural()</code> function hasn&#8217;t changed at all. It&#8217;s completely generic; it takes a list of rule functions and calls them in order. It doesn&#8217;t care how the rules are defined. In the previous example, they were defined as seperate named functions. Now they are built dynamically by mapping the output of the <code>build_match_and_apply_functions()</code> function onto a list of raw strings. It doesn&#8217;t matter; the <code>plural()</code> function still works the same way.
</ol>
<p class=a>&#x2042;
<h2 id=a-file-of-patterns>A File Of Patterns</h2>
<p>You&#8217;ve factored out all the duplicate code and added enough abstractions so that the pluralization rules are defined in a list of strings. The next logical step is to take these strings and put them in a separate file, where they can be maintained separately from the code that uses them.
@@ -286,6 +294,8 @@ finally:
<p>The improvement here is that you&#8217;ve completely separated the pluralization rules into an external file, so it can be maintained separately from the code that uses it. Code is code, data is data, and life is good.
<p class=a>&#x2042;
<h2 id=generators>Generators</h2>
<p>Wouldn&#8217;t it be grand to have a generic <code>plural()</code> function that parses the rules file? Get rules, check for a match, apply appropriate transformation, go to next rule. That&#8217;s all the <code>plural()</code> function has to do, and that&#8217;s all the <code>plural()</code> function should do.
@@ -389,6 +399,8 @@ def plural(noun):
<p>To do that, you&#8217;ll need to build your own iterator. But before you do <em>that</em>, you need to learn about Python classes.
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
<li><a href=http://www.python.org/dev/peps/pep-0255/>PEP 255: Simple Generators</a>
+10
View File
@@ -30,22 +30,32 @@ h1:before{counter-increment:h1;content:""}
<!--
<p>But since you&#8217;re here, I&#8217;d like to talk about some of the small stuff I sweated while writing this book.
<p class=a>&#x2042;
<h2 id=typography>Typography</h2>
<p>vertical rhythm, best available ampersand, curly quotes/apostrophes, other stuff from webtypography.net
<p class=a>&#x2042;
<h2 id=graphics>Graphics</h2>
<p>Unicode, callouts, font-family issues on Windows
<p class=a>&#x2042;
<h2 id=performance>Performance</h2>
<p>"Dive Into History 2009 edition", minimizing CSS + JS + HTML, inline CSS, async jQuery
<p class=a>&#x2042;
<h2 id=fun>Fun stuff</h2>
<p>Quotes, constrained writing, PapayaWhip
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
+12
View File
@@ -49,6 +49,8 @@ body{counter-reset:h1 6}
<p><code>class</code>? What&#8217;s a class?
<p class=a>&#x2042;
<h2 id=defining-classes>Defining Classes</h2>
<p>Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you&#8217;ve defined.
@@ -89,6 +91,8 @@ class Fib:
<p>In the <code>__init__()</code> method, <var>self</var> refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify <var>self</var> explicitly when defining the method, you do <em>not</em> specify it when calling the method; Python will add it for you automatically.
<p class=a>&#x2042;
<h2 id=instantiating-classes>Instantiating Classes</h2>
<p>Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the <code>__init__()</code> method requires. The return value will be the newly created object.
@@ -112,6 +116,8 @@ class Fib:
<p><span>&#x261E;</span>In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit <code>new</code> operator like <abbr>C++</abbr> or Java.
</blockquote>
<p class=a>&#x2042;
<h2 id=instance-variables>Instance Variables</h2>
<p>On to the next line:
@@ -148,6 +154,8 @@ class Fib:
<samp class=p>>>> </samp><kbd>fib2.max</kbd>
<samp>200</samp></pre>
<p class=a>&#x2042;
<h2 id=a-fibonacci-iterator>A Fibonacci Iterator</h2>
<p><em>Now</em> you&#8217;re ready to learn how to build an iterator. An iterator is just a class that defines an <code>__iter__()</code> method.
@@ -195,6 +203,8 @@ class Fib:
<li>How does the <code>for</code> loop know when to stop? I&#8217;m glad you asked! When <code>next(fib_iter)</code> raises a <code>StopIteration</code> exception, the <code>for</code> loop will swallow the exception and gracefully exit. (Any other exception will pass through and be raised as usual.) And where have you seen a <code>StopIteration</code> exception? In the <code>__next__()</code> method, of course!
</ul>
<p class=a>&#x2042;
<h2 id=a-plural-rule-iterator>A Plural Rule Iterator</h2>
<aside>iter(f) calls f.__iter__<br>next(f) calls f.__next__</aside>
@@ -356,6 +366,8 @@ rules = LazyRules()</code></pre>
<li><strong>Separation of code and data.</strong> All the patterns are stored in a separate file. Code is code, and data is data, and never the twain shall meet.
</ol>
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
<li><a href=http://www.python.org/dev/peps/pep-0234/>PEP 234: Iterators</a>
+14
View File
@@ -33,6 +33,8 @@ body{counter-reset:h1 2}
</ol>
<p>Of course, there are a lot more types than these seven. <a href=your-first-python-program.html#everythingisanobject>Everything is an object</a> in Python, so there are types like <i>module</i>, <i>function</i>, <i>class</i>, <i>method</i>, <i>file</i>, and even <i>compiled code</i>. You&#8217;ve already seen some of these: <a href=your-first-python-program.html#runningscripts>modules have names</a>, <a href=your-first-python-program.html#docstrings>functions have <code>docstrings</code></a>, <i class=baa>&amp;</i>c. You&#8217;ll learn about classes in [FIXME xref] and files in [FIXME xref].
<p>Strings and bytes are important enough &mdash; and complicated enough &mdash; that they get their own chapter. Let&#8217;s look at the others first.
<p class=a>&#x2042;
<h2 id=booleans>Booleans</h2>
<aside>You can use virtually any expression in a boolean context.</aside>
<p>Booleans are either true or false. Python has two constants, <code>True</code> and <code>False</code>, which can be used to assign boolean values directly. Expressions can also evaluate to a boolean value. In certain places (like <code>if</code> statements), Python expects an expression to evaluate to a boolean value. These places are called <i>boolean contexts</i>. You can use virtually any expression in a boolean context, and Python will try to determine its truth value. Different datatypes have different rules about which values are true or false in a boolean context. (This will make more sense once you see some concrete examples later in this chapter.)
@@ -50,6 +52,8 @@ body{counter-reset:h1 2}
<samp class=p>>>> </samp><kbd>size = -1</kbd>
<samp class=p>>>> </samp><kbd>size &lt; 0</kbd>
<samp>True</samp></pre>
<p class=a>&#x2042;
<h2 id=numbers>Numbers</h2>
<p>Numbers are awesome. There are so many to choose from. Python supports both integers and floating point numbers. There&#8217;s no type declaration to distinguish them; Python tells them apart by the presence or absence of a decimal point.
<pre class=screen>
@@ -182,6 +186,8 @@ body{counter-reset:h1 2}
<li>Non-zero floating point numbers are true; <code>0.0</code> is false. Be careful with this one! If there&#8217;s the slightest rounding error (not impossible, as you saw in the previous section) then Python will be testing <code>0.0000000000001</code> instead of <code>0</code> and will return <code>True</code>.
<li>Fractions can also be used in a boolean context. <code>Fraction(0, n)</code> is false for all values of <var>n</var>. All other fractions are true.
</ol>
<p class=a>&#x2042;
<h2 id=lists>Lists</h2>
<p>Lists are Python&#8217;s workhorse datatype. When I say &#8220;list,&#8221; you might be thinking &#8220;array whose size I have to declare in advance, that can only contain items of the same type, <i class=baa>&amp;</i>c.&#8221; Don&#8217;t think that. Lists are much cooler than that.
<blockquote class="note compare perl5">
@@ -326,9 +332,13 @@ ValueError: list.index(x): x not in list</samp></pre>
<li>Any list with at least one item is true. The value of the items is irrelevant.
</ol>
<!--
<p class=a>&#x2042;
<h2 id=sets>Sets</h2>
<p>FIXME
-->
<p class=a>&#x2042;
<h2 id=dictionaries>Dictionaries</h2>
<p>One of Python&#8217;s most important datatypes is the dictionary, which defines one-to-one relationships between keys and values.
<blockquote class="note compare perl5">
@@ -419,6 +429,8 @@ KeyError: 'db.diveintopython3.org'</samp></pre>
<li>In a boolean context, an empty dictionary is false.
<li>Any dictionary with at least one key-value pair is true.
</ol>
<p class=a>&#x2042;
<h2 id=none><code>None</code></h2>
<p><code>None</code> is a special constant in Python. It is a null value. <code>None</code> is not the same as <code>False</code>. <code>None</code> is not <code>0</code>. <code>None</code> is not an empty string. Comparing <code>None</code> to anything other than <code>None</code> will always return <code>False</code>.
<p><code>None</code> is the only null value. It has its own datatype (<code>NoneType</code>). You can assign <code>None</code> to any variable, but you can not create other <code>NoneType</code> objects. All variables whose value is <code>None</code> are equal to each other.
@@ -453,6 +465,8 @@ KeyError: 'db.diveintopython3.org'</samp></pre>
<samp>no, it's false</samp>
<samp class=p>>>> </samp><kbd>is_it_true(not None)</kbd>
<samp>yes, it's true</samp></pre>
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
<li><a href=http://docs.python.org/3.0/library/fractions.html>The <code>fractions</code> module</a>
+6
View File
@@ -115,6 +115,8 @@ Ran 11 tests in 0.156s
<p>Coding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may <em>seem</em> like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn&#8217;t pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run <em>all</em> the test cases along with your new one, you are much less likely to break old code when fixing new code. Today&#8217;s unit test is tomorrow&#8217;s regression test.
<p class=a>&#x2042;
<h2 id=changing-requirements>Handling Changing Requirements</h2>
<p>Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don&#8217;t know what they want until they see it, and even if they do, they aren&#8217;t that good at articulating what they want precisely enough to be useful. And even if they do, they&#8217;ll want more in the next release anyway. So be prepared to update your test cases as requirements change.
@@ -289,6 +291,8 @@ Ran 12 tests in 0.203s
<p>Comprehensive unit testing means never having to rely on a programmer who says &#8220;Trust me.&#8221;
<p class=a>&#x2042;
<h2 id=refactoring>Refactoring</h2>
<p>The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually <em>prove</em> that you didn&#8217;t. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
@@ -452,6 +456,8 @@ OK</samp></pre>
<li>Unit tests can give you the confidence to do large-scale refactoring.
</ul>
<p class=a>&#x2042;
<h2 id=summary>Summary</h2>
<p>Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you&#8217;ve seen it work, you&#8217;ll wonder how you ever got along without it.
+12
View File
@@ -26,6 +26,8 @@ body{counter-reset:h1 4}
<blockquote class="note compare perl5">
<p><span>&#x261E;</span>If you&#8217;ve used regular expressions in other languages (like Perl 5), Python&#8217;s syntax will be very familiar. Read the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
</blockquote>
<p class=a>&#x2042;
<h2 id=streetaddresses>Case Study: Street Addresses</h2>
<p>This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don&#8217;t just make this stuff up; it&#8217;s actually useful.) This example shows how I approached the problem.
<pre class=screen>
@@ -68,6 +70,8 @@ body{counter-reset:h1 4}
<li><em>*sigh*</em> Unfortunately, I soon found more cases that contradicted my logic. In this case, the street address contained the word <code>'ROAD'</code> as a whole word by itself, but it wasn&#8217;t at the end, because the address had an apartment number after the street designation. Because <code>'ROAD'</code> isn&#8217;t at the very end of the string, it doesn&#8217;t match, so the entire call to <code>re.sub()</code> ends up replacing nothing at all, and you get the original string back, which is not what you want.
<li>To solve this problem, I removed the <code>$</code> character and added another <code>\b</code>. Now the regular expression reads &#8220;match <code>'ROAD'</code> when it&#8217;s a whole word by itself anywhere in the string,&#8221; whether at the end, the beginning, or somewhere in the middle.
</ol>
<p class=a>&#x2042;
<h2 id=romannumerals>Case Study: Roman Numerals</h2>
<p>You&#8217;ve most likely seen Roman numerals, even if you didn&#8217;t recognize them. You may have seen them in copyrights of old movies and television shows (&#8220;Copyright <code>MCMXLVI</code>&#8221; instead of &#8220;Copyright <code>1946</code>&#8221;), or on the dedication walls of libraries or universities (&#8220;established <code>MDCCCLXXXVIII</code>&#8221; instead of &#8220;established <code>1888</code>&#8221;). You may also have seen them in outlines and bibliographical references. It&#8217;s a system of representing numbers that really does date back to the ancient Roman empire (hence the name).
<p>In Roman numerals, there are seven characters that are repeated and combined in various ways to represent numbers.
@@ -157,6 +161,8 @@ body{counter-reset:h1 4}
<li>Interestingly, an empty string still matches this pattern, because all the <code>M</code> characters are optional and ignored, and the empty string matches the <code>D?C?C?C?</code> pattern where all the characters are optional and ignored.
</ol>
<p>Whew! See how quickly regular expressions can get nasty? And you&#8217;ve only covered the thousands and hundreds places of Roman numerals. But if you followed all that, the tens and ones places are easy, because they&#8217;re exactly the same pattern. But let&#8217;s look at another way to express the pattern.
<p class=a>&#x2042;
<h2 id=nmsyntax>Using The <code>{n,m}</code> Syntax</h2>
<aside>{1,4} matches between 1 and 4 occurrences of a pattern.</aside>
<p>In the previous section, you were dealing with a pattern where the same character could be repeated up to three times. There is another way to express this in regular expressions, which some people find more readable. First look at the method we already used in the previous example.
@@ -240,6 +246,8 @@ body{counter-reset:h1 4}
</ol>
<p>If you followed all that and understood it on the first try, you&#8217;re doing better than I did. Now imagine trying to understand someone else&#8217;s regular expressions, in the middle of a critical function of a large program. Or even imagine coming back to your own regular expressions a few months later. I&#8217;ve done it, and it&#8217;s not a pretty sight.
<p>Now let&#8217;s explore an alternate syntax that can help keep your expressions maintainable.
<p class=a>&#x2042;
<h2 id=verbosere>Verbose Regular Expressions</h2>
<p>So far you&#8217;ve just been dealing with what I&#8217;ll call &#8220;compact&#8221; regular expressions. As you&#8217;ve seen, they are difficult to read, and even if you figure out what one does, that&#8217;s no guarantee that you&#8217;ll be able to understand it six months later. What you really need is inline documentation.
<p>Python allows you to do this with something called <i>verbose regular expressions</i>. A verbose regular expression is different from a compact regular expression in two ways:
@@ -273,6 +281,8 @@ body{counter-reset:h1 4}
<li>This matches the start of the string, then three of a possible three <code>M</code>, then <code>D</code> and three of a possible three <code>C</code>, then <code>L</code> and three of a possible three <code>X</code>, then <code>V</code> and three of a possible three <code>I</code>, then the end of the string.
<li>This does not match. Why? Because it doesn&#8217;t have the <code>re.VERBOSE</code> flag, so the <code>re.search</code> function is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks. Python can&#8217;t auto-detect whether a regular expression is verbose or not. Python assumes every regular expression is compact unless you explicitly state that it is verbose.
</ol>
<p class=a>&#x2042;
<h2 id=phonenumbers>Case study: Parsing Phone Numbers</h2>
<aside>\d matches any numeric digit (0&ndash;9). \D matches anything but digits.</aside>
<p>So far you&#8217;ve concentrated on matching whole patterns. Either the pattern matches, or it doesn&#8217;t. But regular expressions are much more powerful than that. When a regular expression <em>does</em> match, you can pick out specific pieces of it. You can find out what matched where.
@@ -404,6 +414,8 @@ body{counter-reset:h1 4}
<li>Other than being spread out over multiple lines, this is exactly the same regular expression as the last step, so it&#8217;s no surprise that it parses the same inputs.
<li>Final sanity check. Yes, this still works. You&#8217;re done.
</ol>
<p class=a>&#x2042;
<h2 id=summary>Summary</h2>
<p>This is just the tiniest tip of the iceberg of what regular expressions can do. In other words, even though you&#8217;re completely overwhelmed by them now, believe me, you ain&#8217;t seen nothing yet.
<p>You should now be familiar with the following techniques:
+14
View File
@@ -47,6 +47,8 @@ My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr
<p>Now cry a lot, because everything you thought you knew about strings is wrong, and there ain&#8217;t no such thing as &#8220;plain text.&#8221;
<p class=a>&#x2042;
<h2 id=one-ring-to-rule-them-all>Unicode</h2>
<p><i>Enter Unicode.</i>
@@ -75,6 +77,8 @@ My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr
<p>Advantages: super-efficient encoding of common <abbr>ASCII</abbr> characters. No worse than UTF-16 for extended Latin characters. Better than UTF-32 for Chinese characters. Also (and you&#8217;ll have to trust me on this, because I&#8217;m not going to show you the math), due to the exact nature of the bit twiddling, there are no byte-ordering issues. A document encoded in UTF-8 uses the exact same stream of bytes on any computer.
<p class=a>&#x2042;
<h2 id=divingin>Diving In</h2>
<p>In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8, or a Python string encoded as CP-1252. "Is this string UTF-8?" is an invalid question. UTF-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
@@ -94,6 +98,8 @@ My alphabet starts where your alphabet ends! <span>&#x275E;</span><br>&mdash; Dr
<li>Just like lists, you can concatenate strings using the <code>+</code> operator.
</ol>
<p class=a>&#x2042;
<h2 id=formatting-strings>Formatting Strings</h2>
<aside>Strings can be defined with either single or double quotes.</aside>
@@ -213,6 +219,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
<p>For all the gory details on format specifiers, consult the <a href=http://docs.python.org/3.0/library/string.html#format-specification-mini-language>Format Specification Mini-Language</a> in the official Python documentation.
<p class=a>&#x2042;
<h2 id=common-string-methods>Other Common String Methods</h2>
<p>Besides formatting, strings can do a number of other useful tricks.
@@ -261,6 +269,8 @@ experience of years.</samp>
<li>Finally, Python can turn that list-of-lists into a dictionary simply by passing it to the <code>dict()</code> function.
</ol>
<p class=a>&#x2042;
<h2 id=byte-arrays>Strings vs. Bytes</h2>
<p>Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a <i>string</i>. An immutable sequence of numbers-between-0-and-255 is called a <i>bytes</i> object.
@@ -365,6 +375,8 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
<li>This is a string. It has nine characters. It is the sequence of characters you get when you take <var>by</var> and decode it using the Big5 encoding algorithm. It is identical to the original string.
</ol>
<p class=a>&#x2042;
<h2 id=py-encoding>Postscript: Character Encoding Of Python Source Code</h2>
<p>Python 3 assumes that your source code &mdash; <i>i.e.</i> each <code>.py</code> file &mdash; is encoded in UTF-8.
@@ -384,6 +396,8 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
<p>For more information, consult <a href=http://www.python.org/dev/peps/pep-0263/><abbr>PEP</abbr> 263: Defining Python Source Code Encodings</a>.
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<p>On Unicode in Python:
+8
View File
@@ -41,6 +41,8 @@ body{counter-reset:h1 8}
<li>When maintaining code, it helps you cover your ass when someone comes screaming that your latest change broke their old code. (&#8220;But <em>sir</em>, all the unit tests passed when I checked it in...&#8221;)
<li>When writing code in a team, it increases confidence that the code you&#8217;re about to commit isn&#8217;t going to break someone else&#8217;s code, because you can run their unit tests first. (I&#8217;ve seen this sort of thing in code sprints. A team breaks up the assignment, everybody takes the specs for their task, writes unit tests for it, then shares their unit tests with the rest of the team. That way, nobody goes off too far into developing code that doesn&#8217;t play well with others.)
</ul>
<p class=a>&#x2042;
<h2 id=romantest1>A Single Question</h2>
<aside>Every test is an island.</aside>
<p>A test case answers a single question about the code it is testing. A test case should be able to...
@@ -221,6 +223,8 @@ OK</samp></pre>
<li>Hooray! The <code>to_roman()</code> function passes the &#8220;known values&#8221; test case. It&#8217;s not comprehensive, but it does put the function through its paces with a variety of inputs, including inputs that produce every single-character Roman numeral, the largest possible input (<code>3999</code>), and the input that produces the longest possible Roman numeral (<code>3888</code>). At this point, you can be reasonably confident that the function works for any good input value you could throw at it.
</ol>
<p>&#8220;Good&#8221; input? Hmm. What about bad input?
<p class=a>&#x2042;
<h2 id=romantest2>&#8220;Halt And Catch Fire&#8221;</h2>
<aside>The Pythonic way to halt and catch fire is to raise an exception.</aside>
<p>It is not enough to test that functions succeed when given good input; you must also test that they fail when given bad input. And not just any sort of failure; they must fail in the way you expect.
@@ -334,6 +338,8 @@ OK</samp></pre>
<li>Hooray! Both tests pass. Because you worked iteratively, bouncing back and forth between testing and coding, you can be sure that the two lines of code you just wrote were the cause of that one test going from &#8220;fail&#8221; to &#8220;pass.&#8221; That kind of confidence doesn&#8217;t come cheap, but it will pay for itself over the lifetime of your code.
</ol>
<p class=a>&#x2042;
<h2 id=romantest3>More Halting, More Fire</h2>
<p>Along with testing numbers that are too large, you need to test numbers that are too small. As <a href=#divingin>we noted in our functional requirements</a>, Roman numerals cannot express <code>0</code> or negative numbers.
@@ -430,6 +436,8 @@ Ran 4 tests in 0.016s
OK</samp></pre>
<p class=a>&#x2042;
<h2 id=romantest4>And One More Thing&hellip;</h2>
<p>There was one more <a href=#divingin>functional requirement</a> for converting numbers to Roman numerals: dealing with non-integers.
+16
View File
@@ -91,6 +91,8 @@ mark{display:inline}
&lt;/entry>
&lt;/feed></code></pre>
<p class=a>&#x2042;
<h2 id=xml-intro>A 5-Minute Crash Course in XML</h2>
<p>If you already know about <abbr>XML</abbr>, you can skip this section.
@@ -173,6 +175,8 @@ mark{display:inline}
<p>And now you know just enough <abbr>XML</abbr> to be dangerous!
<p class=a>&#x2042;
<h2 id=xml-structure>The Structure Of An Atom Feed</h2>
<p>Think of a weblog, or in fact any website with frequently updated content, like <a href=http://www.cnn.com/>CNN.com</a>. The site itself has a title (&#8220;CNN.com&#8221;), a subtitle (&#8220;Breaking News, U.S., World, Weather, Entertainment <i class=baa>&amp;</i> Video News&#8221;), a last-updated date (&#8220;updated 12:43 p.m. EDT, Sat May 16, 2009&#8221;), and a list of articles posted at different times. Each article also has a title, a first-published date (and maybe also a last-updated date, if they published a correction or fixed a typo), and a unique URL.
@@ -242,6 +246,8 @@ mark{display:inline}
<li>Finally, the end tag for the <code>entry</code> element, signaling the end of the metadata for this article.
</ol>
<p class=a>&#x2042;
<h2 id=xml-parse>Parsing XML</h2>
<p>Python can parse <abbr>XML</abbr> documents in several ways. It has traditional <a href=http://en.wikipedia.org/wiki/XML#DOM><abbr>DOM</abbr></a> and <a href=http://en.wikipedia.org/wiki/Simple_API_for_XML><abbr>SAX</abbr></a> parsers, but I will focus on a different library called ElementTree.
@@ -320,6 +326,8 @@ mark{display:inline}
<li>The <code>updated</code> element has no attributes, so its <code>.attrib</code> is just an empty dictionary.
</ol>
<p class=a>&#x2042;
<h2 id=xml-find>Searching For Nodes Within An XML Document</h2>
<p>So far, we&#8217;ve worked with this <abbr>XML</abbr> document &#8220;from the top down,&#8221; starting with the root element, getting its child elements, and so on throughout the document. But many uses of <abbr>XML</abbr> require you to find specific elements. Etree can do that, too.
@@ -433,6 +441,8 @@ StopIteration</samp></pre>
<p>Overall, ElementTree&#8217;s <code>findall()</code> method is a very powerful feature, but the query language can be a bit surprising. It is officially described as &#8220;<a href=http://effbot.org/zone/element-xpath.htm>limited support for XPath expressions</a>.&#8221; <a href=http://www.w3.org/TR/xpath>XPath</a> is a W3C standard for querying <abbr>XML</abbr> documents. ElementTree&#8217;s query language is similar enough to XPath to do basic searching, but dissimilar enough that it may annoy you if you already know XPath. Now let&#8217;s look at a third-party <abbr>XML</abbr> library that extends the ElementTree <abbr>API</abbr> with full XPath support.
<p class=a>&#x2042;
<h2 id=xml-lxml>Going Further With lxml</h2>
<p><a href=http://codespeak.net/lxml/>lxml</a> is an open source third-party library that builds on the popular <a href=http://www.xmlsoft.org/>libxml2 parser</a>. It provides a 100% compatible ElementTree <abbr>API</abbr>, then extends it with full XPath support and a few other niceties. There are <a href=http://pypi.python.org/pypi/lxml/>installers available for Windows</a>; Linux users should always try to use distribution-specific tools like <code>yum</code> or <code>apt-get</code> to install precompiled binaries from their repositories. Otherwise you&#8217;ll need to <a href=http://codespeak.net/lxml/installation.html>install lxml manually</a>.
@@ -480,6 +490,8 @@ except ImportError:
<li>XPath expressions don&#8217;t always return a list of elements. Technically, the <abbr>DOM</abbr> of a parsed <abbr>XML</abbr> document doesn&#8217;t contain elements; it contains <i>nodes</i>. Depending on their type, nodes can be elements, attributes, or even text content. The result of an XPath query is a list of nodes. This query returns a list of text nodes: the text content (<code>text()</code>) of the <code>title</code> element (<code>atom:title</code>) that is a child of the current element (<code>./</code>).
</ol>
<p class=a>&#x2042;
<h2 id=xml-generate>Generating XML</h2>
<p>Python&#8217;s support for <abbr>XML</abbr> is not limited to parsing existing documents. You can also create <abbr>XML</abbr> documents from scratch.
@@ -549,6 +561,8 @@ except ImportError:
<li>You can also apply &#8220;pretty printing&#8221; to the serialization, which inserts line breaks after end tags, and after start tags of elements that contain child elements but no text content. In technical terms, lxml adds &#8220;insignificant whitespace&#8221; to make the output more readable.
</ol>
<p class=a>&#x2042;
<h2 id=xml-custom-parser>Customizing Your XML Parser</h2>
<p>The <abbr>XML</abbr> specification mandates that all conforming <abbr>XML</abbr> parsers employ &#8220;draconian error handling.&#8221; That is, they must halt and catch fire as soon as they detect any sort of wellformedness error in the <abbr>XML</abbr> document. Wellformedness errors include mismatched start and end tags, undefined entities, illegal Unicode characters, and a number of other esoteric rules. This is in stark contrast to other common formats like <abbr>HTML</abbr> &mdash; your browser doesn&#8217;t stop rendering a web page if you forget to close an <abbr>HTML</abbr> tag or escape an ampersand in an attribute value. (It is a common misconception that <abbr>HTML</abbr> has no defined error handling. <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#parsing><abbr>HTML</abbr> error handling</a> is actually quite well-defined, but it&#8217;s significantly more complicated than &#8220;halt and catch fire on first error.&#8221;)
@@ -610,6 +624,8 @@ lxml.etree.XMLSyntaxError: Entity 'hellip' not defined, line 3, column 28</samp>
<p>It is important to reiterate that there is <strong>no guarantee of interoperability</strong> with &#8220;recovering&#8221; <abbr>XML</abbr> parsers. A different parser might decide that it recognized the <code>&amp;hellip;</code> entity from <abbr>HTML</abbr>, and replace it with <code>&amp;amp;hellip;</code> instead. Is that &#8220;better&#8221;? Maybe. Is it &#8220;more correct&#8221;? No, they are both equally incorrect. The correct behavior (according to the <abbr>XML</abbr> specification) is to halt and catch fire. If you&#8217;ve decided not to do that, you&#8217;re on your own.
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
+12
View File
@@ -70,6 +70,8 @@ if __name__ == "__main__":
<p>So why does running the script on the command line give you the same output every time? We&#8217;ll get to that. First, let&#8217;s look at that <code>approximate_size()</code> function.
<p class=a>&#x2042;
<h2 id=declaringfunctions>Declaring Functions</h2>
<p>Python has functions like most other languages, but it does not have separate header files like <abbr>C++</abbr> or <code>interface</code>/<code>implementation</code> sections like Pascal. When you need a function, just declare it, like this:
<pre><code>def approximate_size(size, a_kilobyte_is_1024_bytes=True):</code></pre>
@@ -129,6 +131,8 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<li>This call fails too, for the same reason as the previous call. Is that surprising? After all, you passed <code>4000</code> for the argument named <code>size</code>, then &#8220;obviously&#8221; that <code>False</code> value was meant for the <var>a_kilobyte_is_1024_bytes</var> argument. But Python doesn&#8217;t work that way. As soon as you have a named argument, all arguments to the right of that need to be named arguments, too.
</ol>
<p class=a>&#x2042;
<h2 id=readability>Writing Readable Code</h2>
<p>I won&#8217;t bore you with a long finger-wagging speech about the importance of documenting your code. Just know that code is written once but read many times, and the most important audience for your code is yourself, six months after writing it (i.e. after you&#8217;ve forgotten everything but need to fix something). Python makes it easy to write readable code, so take advantage of it. You&#8217;ll thank me in six months.
<h3 id=docstrings>Documentation Strings</h3>
@@ -153,6 +157,8 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<blockquote class=note>
<p><span>&#x261E;</span>Many Python <abbr>IDE</abbr>s use the <code>docstring</code> to provide context-sensitive documentation, so that when you type a function name, its <code>docstring</code> appears as a tooltip. This can be incredibly helpful, but it&#8217;s only as good as the <code>docstring</code>s you write.
</blockquote>
<p class=a>&#x2042;
<h2 id=everythingisanobject>Everything Is An Object</h2>
<p>In case you missed it, I just said that Python functions have attributes, and that those attributes are available at runtime. A function, like everything else in Python, is an object.
<p>Run the interactive Python shell and follow along:
@@ -215,6 +221,8 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<p>Still, this doesn&#8217;t answer the more fundamental question: what is an object? Different programming languages define &#8220;object&#8221; in different ways. In some, it means that <em>all</em> objects <em>must</em> have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser. Some objects have neither attributes nor methods, <em>but they could</em>. Not all objects are subclassable. But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.
<p>You may have heard the term &#8220;first-class object&#8221; in other programming contexts. In Python, functions are <i>first-class objects</i>. You can pass a function as an argument to another function. Modules are <i>first-class objects</i>. You can pass an entire module as an argument to a function. Classes are first-class objects, and individual instances of a class are also first-class objects.
<p>This is important, so I&#8217;m going to repeat it in case you missed it the first few times: <em>everything in Python is an object</em>. Strings are objects. Lists are objects. Functions are objects. Classes are objects. Class instances are objects. Even modules are objects.
<p class=a>&#x2042;
<h2 id=indentingcode>Indenting Code</h2>
<p>Python functions have no explicit <code>begin</code> or <code>end</code>, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (<code>:</code>) and the indentation of the code itself.
<pre><code>
@@ -240,6 +248,8 @@ SyntaxError: non-keyword arg after keyword arg</samp></pre>
<blockquote class="note compare java">
<p><span>&#x261E;</span>Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. <abbr>C++</abbr> and Java use semicolons to separate statements and curly braces to separate code blocks.
</blockquote>
<p class=a>&#x2042;
<h2 id=runningscripts>Running Scripts</h2>
<aside>Everything in Python is an object.</aside>
<p>Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them, by including a special block of code that executes when you run the Python file on the command line. Take the last few lines of <code>humansize.py</code>:
@@ -261,6 +271,8 @@ if __name__ == "__main__":
<samp>1.0 TB
931.3 GiB</samp></pre>
<p>And that&#8217;s your first Python program!
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
<li><a href=http://www.python.org/dev/peps/pep-0257/>PEP 257: Docstring Conventions</a> explains what distinguishes a good <code>docstring</code> from a great <code>docstring</code>.