diff --git a/advanced-classes.html b/advanced-classes.html index 5cf84e4..fd2bb4e 100644 --- a/advanced-classes.html +++ b/advanced-classes.html @@ -98,6 +98,8 @@ class OrderedDict(dict, collections.MutableMapping): return all(p==q for p, q in itertools.zip_longest(self.items(), other.items())) return dict.__eq__(self, other) +
⁂ +
⁂ +
The first thing this alphametics solver does is find all the letters (A–Z) in the puzzle. @@ -98,6 +100,8 @@ if __name__ == '__main__':
⁂ +
Set comprehensions make it trivial to find the unique items in a sequence. [FIXME-not sure if I’m going to cover set comprehensions in an earlier chapter; if not, this is certainly an abrupt and inadequate introduction to the topic.] @@ -127,6 +131,8 @@ if __name__ == '__main__':
This list is later used to assign digits to characters as the solver iterates through the possible solutions. +
⁂ +
Like many programming languages, Python has an assert statement. Here’s how it works.
@@ -155,6 +161,8 @@ AssertionError
The alphametics solver uses this exact assert statement to bail out early if the puzzle contains more than ten unique letters. Since each letter is assigned a unique digit, and there are only ten digits, a puzzle with more than ten unique letters is unsolvable.
+
⁂ +
A generator expression is like a generator function without the function. @@ -185,6 +193,8 @@ AssertionError gen = ord_map(unique_characters) +
⁂ +
First of all, what the heck are permutations? Permutations are a mathematical concept. (There are actually several definitions, depending on what kind of math you’re doing. Here I’m talking about combinatorics, but if that doesn’t mean anything to you, don’t worry about it. As always, Wikipedia is your friend.) @@ -249,6 +259,8 @@ StopIteration
permutations() function always returns an iterator, an easy way to debug permutations is to pass that iterator to the built-in list() function to see all the permutations immediately.
+⁂ +
itertools Module>>> import itertools @@ -372,6 +384,8 @@ for guess in itertools.permutations(digits, len(characters)):But what is this
translate()method? Ah, now you’re getting to the really fun part. +⁂ +
A New Kind Of String Manipulation
Python strings have many methods. You learned about some of those methods in the Strings chapter:
lower(),count(), andformat(). Now I want to introduce you to a powerful but little-known string manipulation technique: thetranslate()method. @@ -411,6 +425,8 @@ for guess in itertools.permutations(digits, len(characters)):That’s pretty impressive. But what can you do with a string that happens to be a valid Python expression? +
⁂ +
Evaluating Arbitrary Strings As Python Expressions
This is the final piece of the puzzle (or rather, the final piece of the puzzle solver). After all that fancy string manipulation, we’re left with a string like
'9567 + 1085 == 10652'. But that’s a string, and what good is a string? Entereval(), the universal Python evaluation tool. @@ -542,6 +558,8 @@ NameError: name '__import__' is not defined
So, in the end, it is possible to safely evaluate untrusted Python expressions. Passing {"__builtins__": None} as the second parameter to the eval() function is non-intuitive (and not the default behavior), but it does work. If you understand why it works, you’re less likely to use eval() incorrectly, in a way that works with trusted input but has potentially devastating consequences with untrusted input.
+
⁂ +
To recap: this program solves alphametic puzzles by brute force, i.e. through an exhaustive search of all possible solutions. To do this, it… @@ -559,6 +577,8 @@ NameError: name '__import__' is not defined
…in just 14 lines of code. +
⁂ +
A Unipony, as it were.
I’ll settle for character encoding auto-detection. +
⁂ +
It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key. @@ -39,6 +41,8 @@ del{background:#f87}
As it turns out, yes. All major browsers have character encoding auto-detection, because the web is full of pages that have no encoding information whatsoever. Mozilla Firefox contains an encoding auto-detection library which is open source. I ported the library to Python 2 and dubbed it the chardet module. This chapter will take you step-by-step through the process of porting the chardet module from Python 2 to Python 3.
+
⁂ +
chardet Module[FIXME download link, possibly on chardet.feedparser.org, possibly local]
Before we set off porting the code, it would help if you understood how the code worked! This is a brief guide to navigating the code itself. @@ -70,6 +74,8 @@ del{background:#f87}
Hebrew is handled as a special case. If the text appears to be Hebrew based on 2-character distribution analysis, HebrewProber (defined in hebrewprober.py) tries to distinguish between Visual Hebrew (where the source text actually stored “backwards” line-by-line, and then displayed verbatim so it can be read from right to left) and Logical Hebrew (where the source text is stored in reading order and then rendered right-to-left by the client). Because certain characters are encoded differently based on whether they appear in the middle of or at the end of a word, we can make a reasonable guess about direction of the source text, and return the appropriate encoding (windows-1255 for Logical Hebrew, or ISO-8859-8 for Visual Hebrew).
windows-1252If UniversalDetector detects a high-bit character in the text, but none of the other multi-byte or single-byte encoding probers return a confident result, it creates a Latin1Prober (defined in latin1prober.py) to try to detect English text in a windows-1252 encoding. This detection is inherently unreliable, because English letters are encoded in the same way in many different encodings. The only way to distinguish windows-1252 is through commonly used symbols like smart quotes, curly apostrophes, copyright symbols, and the like. Latin1Prober automatically reduces its confidence rating to allow more accurate probers to win if at all possible.
+
⁂ +
2to3We’re going to migrate the chardet module from Python 2 to Python 3. Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. In some cases this is easy — a function was renamed or moved to a different modules — but in other cases it can get pretty complex. To get a sense of all that it can do, refer to the appendix, Porting code to Python 3 with 2to3. In this chapter, we’ll start by running 2to3 on the chardet package, but as you’ll see, there will still be a lot of work to do after the automated tools have performed their magic.
The main chardet package is split across several different files, all in the same directory. The 2to3 script makes it easy to convert multiple files at once: just pass a directory as a command line argument, and 2to3 will convert each of the files in turn.
@@ -572,6 +578,8 @@ RefactoringTool: Files that were modified:
RefactoringTool: test.py
[FIXME explain the difference in import syntax]
Well, that wasn’t so hard. Just a few imports and print statements to convert. Time to run the new version. Do you think it’ll work? +
⁂ +
2to3 Can’tFalse is invalid syntaxHoly crap, it actually works! /me does a little dance +
⁂ +
What have we learned?
(I know, there are a lot of exceptions. Man becomes men and woman becomes women, but human becomes humans. Mouse becomes mice and louse becomes lice, but house becomes houses. Knife becomes knives and wife becomes wives, but lowlife becomes lowlifes. And don’t even get me started on words that are their own plural, like sheep, deer, and haiku.)
Other languages, of course, are completely different.
Let’s design a Python library that automatically pluralizes English nouns. We’ll start just these four rules, but keep in mind that you’ll inevitably need to add more. +
⁂ +
So you’re looking at words, which, at least in English, means you’re looking at strings of characters. You have rules that say you need to find different combinations of characters, then do different things to them. This sounds like a job for regular expressions!
[download plural1.py]
@@ -117,6 +119,8 @@ def plural(noun):
Regular expression substitutions are extremely powerful, and the \1 syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder to read, and it doesn’t directly map to the way you first described the pluralizing rules. You originally laid out rules like “if the word ends in S, X, or Z, then add ES”. If you look at this function, you have two lines of code that say “if the word ends in S, X, or Z, then add ES”. It doesn’t get much more direct than that.
+
⁂ +
Now you’re going to add a level of abstraction. You started by defining a list of rules: if this, do that, otherwise go to the next rule. Let’s temporarily complicate part of the program so you can simplify another part. @@ -195,6 +199,8 @@ def plural(noun):
But this is really just a stepping stone to the next section. Let’s move on… +
⁂ +
Defining separate named functions for each match and apply rule isn’t really necessary. You never call them directly; you add them to the rules list and call them through there. Furthermore, each function follows one of two patterns. All the match functions call re.search(), and all the apply functions call re.sub(). Let’s factor out the patterns so that defining new rules can be easier.
@@ -241,6 +247,8 @@ def build_match_and_apply_functions(pattern, search, replace):
plural() function hasn’t changed at all. It’s completely generic; it takes a list of rule functions and calls them in order. It doesn’t care how the rules are defined. In the previous example, they were defined as seperate named functions. Now they are built dynamically by mapping the output of the build_match_and_apply_functions() function onto a list of raw strings. It doesn’t matter; the plural() function still works the same way.
+⁂ +
You’ve factored out all the duplicate code and added enough abstractions so that the pluralization rules are defined in a list of strings. The next logical step is to take these strings and put them in a separate file, where they can be maintained separately from the code that uses them. @@ -286,6 +294,8 @@ finally:
The improvement here is that you’ve completely separated the pluralization rules into an external file, so it can be maintained separately from the code that uses it. Code is code, data is data, and life is good. +
⁂ +
Wouldn’t it be grand to have a generic plural() function that parses the rules file? Get rules, check for a match, apply appropriate transformation, go to next rule. That’s all the plural() function has to do, and that’s all the plural() function should do.
@@ -389,6 +399,8 @@ def plural(noun):
To do that, you’ll need to build your own iterator. But before you do that, you need to learn about Python classes. +
⁂ +
⁂ +
One of Python’s most important datatypes is the dictionary, which defines one-to-one relationships between keys and values.
@@ -419,6 +429,8 @@ KeyError: 'db.diveintopython3.org'- In a boolean context, an empty dictionary is false.
- Any dictionary with at least one key-value pair is true. +
⁂ +
None
Noneis a special constant in Python. It is a null value.Noneis not the same asFalse.Noneis not0.Noneis not an empty string. ComparingNoneto anything other thanNonewill always returnFalse.
Noneis the only null value. It has its own datatype (NoneType). You can assignNoneto any variable, but you can not create otherNoneTypeobjects. All variables whose value isNoneare equal to each other. @@ -453,6 +465,8 @@ KeyError: 'db.diveintopython3.org' no, it's false >>> is_it_true(not None) yes, it's true +⁂ +
Further Reading
+
- The
fractionsmodule diff --git a/refactoring.html b/refactoring.html index c5c0074..26cfda7 100644 --- a/refactoring.html +++ b/refactoring.html @@ -115,6 +115,8 @@ Ran 11 tests in 0.156sCoding this way does not make fixing bugs any easier. Simple bugs (like this one) require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may seem like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn’t pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run all the test cases along with your new one, you are much less likely to break old code when fixing new code. Today’s unit test is tomorrow’s regression test. +
⁂ +
Handling Changing Requirements
Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don’t know what they want until they see it, and even if they do, they aren’t that good at articulating what they want precisely enough to be useful. And even if they do, they’ll want more in the next release anyway. So be prepared to update your test cases as requirements change. @@ -289,6 +291,8 @@ Ran 12 tests in 0.203s
Comprehensive unit testing means never having to rely on a programmer who says “Trust me.” +
⁂ +
Refactoring
The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn’t. The best thing about unit testing is that it gives you the freedom to refactor mercilessly. @@ -452,6 +456,8 @@ OK
- Unit tests can give you the confidence to do large-scale refactoring.
⁂ +
Summary
Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you’ve seen it work, you’ll wonder how you ever got along without it. diff --git a/regular-expressions.html b/regular-expressions.html index 1f5de1c..45380c3 100644 --- a/regular-expressions.html +++ b/regular-expressions.html @@ -26,6 +26,8 @@ body{counter-reset:h1 4}
+☞If you’ve used regular expressions in other languages (like Perl 5), Python’s syntax will be very familiar. Read the summary of the
remodule to get an overview of the available functions and their arguments.⁂ +
Case Study: Street Addresses
This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don’t just make this stuff up; it’s actually useful.) This example shows how I approached the problem.
@@ -68,6 +70,8 @@ body{counter-reset:h1 4}- *sigh* Unfortunately, I soon found more cases that contradicted my logic. In this case, the street address contained the word
'ROAD'as a whole word by itself, but it wasn’t at the end, because the address had an apartment number after the street designation. Because'ROAD'isn’t at the very end of the string, it doesn’t match, so the entire call tore.sub()ends up replacing nothing at all, and you get the original string back, which is not what you want.- To solve this problem, I removed the
$character and added another\b. Now the regular expression reads “match'ROAD'when it’s a whole word by itself anywhere in the string,” whether at the end, the beginning, or somewhere in the middle. +⁂ +
Case Study: Roman Numerals
You’ve most likely seen Roman numerals, even if you didn’t recognize them. You may have seen them in copyrights of old movies and television shows (“Copyright
MCMXLVI” instead of “Copyright1946”), or on the dedication walls of libraries or universities (“establishedMDCCCLXXXVIII” instead of “established1888”). You may also have seen them in outlines and bibliographical references. It’s a system of representing numbers that really does date back to the ancient Roman empire (hence the name).In Roman numerals, there are seven characters that are repeated and combined in various ways to represent numbers. @@ -157,6 +161,8 @@ body{counter-reset:h1 4}
- Interestingly, an empty string still matches this pattern, because all the
Mcharacters are optional and ignored, and the empty string matches theD?C?C?C?pattern where all the characters are optional and ignored.Whew! See how quickly regular expressions can get nasty? And you’ve only covered the thousands and hundreds places of Roman numerals. But if you followed all that, the tens and ones places are easy, because they’re exactly the same pattern. But let’s look at another way to express the pattern. +
⁂ +
Using The
{n,m}SyntaxIn the previous section, you were dealing with a pattern where the same character could be repeated up to three times. There is another way to express this in regular expressions, which some people find more readable. First look at the method we already used in the previous example. @@ -240,6 +246,8 @@ body{counter-reset:h1 4}
If you followed all that and understood it on the first try, you’re doing better than I did. Now imagine trying to understand someone else’s regular expressions, in the middle of a critical function of a large program. Or even imagine coming back to your own regular expressions a few months later. I’ve done it, and it’s not a pretty sight.
Now let’s explore an alternate syntax that can help keep your expressions maintainable. +
⁂ +
Verbose Regular Expressions
So far you’ve just been dealing with what I’ll call “compact” regular expressions. As you’ve seen, they are difficult to read, and even if you figure out what one does, that’s no guarantee that you’ll be able to understand it six months later. What you really need is inline documentation.
Python allows you to do this with something called verbose regular expressions. A verbose regular expression is different from a compact regular expression in two ways: @@ -273,6 +281,8 @@ body{counter-reset:h1 4}
- This matches the start of the string, then three of a possible three
M, thenDand three of a possible threeC, thenLand three of a possible threeX, thenVand three of a possible threeI, then the end of the string.- This does not match. Why? Because it doesn’t have the
re.VERBOSEflag, so there.searchfunction is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks. Python can’t auto-detect whether a regular expression is verbose or not. Python assumes every regular expression is compact unless you explicitly state that it is verbose. +⁂ +
Case study: Parsing Phone Numbers
So far you’ve concentrated on matching whole patterns. Either the pattern matches, or it doesn’t. But regular expressions are much more powerful than that. When a regular expression does match, you can pick out specific pieces of it. You can find out what matched where. @@ -404,6 +414,8 @@ body{counter-reset:h1 4}
- Other than being spread out over multiple lines, this is exactly the same regular expression as the last step, so it’s no surprise that it parses the same inputs.
- Final sanity check. Yes, this still works. You’re done. +
⁂ +
Summary
This is just the tiniest tip of the iceberg of what regular expressions can do. In other words, even though you’re completely overwhelmed by them now, believe me, you ain’t seen nothing yet.
You should now be familiar with the following techniques: diff --git a/strings.html b/strings.html index 8720d8c..1fe3d0d 100644 --- a/strings.html +++ b/strings.html @@ -47,6 +47,8 @@ My alphabet starts where your alphabet ends! ❞
— DrNow cry a lot, because everything you thought you knew about strings is wrong, and there ain’t no such thing as “plain text.” +
⁂ +
Unicode
Enter Unicode. @@ -75,6 +77,8 @@ My alphabet starts where your alphabet ends! ❞
— DrAdvantages: super-efficient encoding of common ASCII characters. No worse than UTF-16 for extended Latin characters. Better than UTF-32 for Chinese characters. Also (and you’ll have to trust me on this, because I’m not going to show you the math), due to the exact nature of the bit twiddling, there are no byte-ordering issues. A document encoded in UTF-8 uses the exact same stream of bytes on any computer. +
⁂ +
Diving In
In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8, or a Python string encoded as CP-1252. "Is this string UTF-8?" is an invalid question. UTF-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions. @@ -94,6 +98,8 @@ My alphabet starts where your alphabet ends! ❞
— Dr- Just like lists, you can concatenate strings using the
+operator. +⁂ +
Formatting Strings
@@ -213,6 +219,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):For all the gory details on format specifiers, consult the Format Specification Mini-Language in the official Python documentation. +
⁂ +
Other Common String Methods
Besides formatting, strings can do a number of other useful tricks. @@ -261,6 +269,8 @@ experience of years.
- Finally, Python can turn that list-of-lists into a dictionary simply by passing it to the
dict()function. +⁂ +
Strings vs. Bytes
Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a string. An immutable sequence of numbers-between-0-and-255 is called a bytes object. @@ -365,6 +375,8 @@ TypeError: Can't convert 'bytes' object to str implicitly
- This is a string. It has nine characters. It is the sequence of characters you get when you take by and decode it using the Big5 encoding algorithm. It is identical to the original string. +
⁂ +
Postscript: Character Encoding Of Python Source Code
Python 3 assumes that your source code — i.e. each
.pyfile — is encoded in UTF-8. @@ -384,6 +396,8 @@ TypeError: Can't convert 'bytes' object to str implicitlyFor more information, consult PEP 263: Defining Python Source Code Encodings. +
⁂ +
Further Reading
On Unicode in Python: diff --git a/unit-testing.html b/unit-testing.html index 7428c35..f8a0b30 100644 --- a/unit-testing.html +++ b/unit-testing.html @@ -41,6 +41,8 @@ body{counter-reset:h1 8}
- When maintaining code, it helps you cover your ass when someone comes screaming that your latest change broke their old code. (“But sir, all the unit tests passed when I checked it in...”)
- When writing code in a team, it increases confidence that the code you’re about to commit isn’t going to break someone else’s code, because you can run their unit tests first. (I’ve seen this sort of thing in code sprints. A team breaks up the assignment, everybody takes the specs for their task, writes unit tests for it, then shares their unit tests with the rest of the team. That way, nobody goes off too far into developing code that doesn’t play well with others.)
⁂ +
A test case answers a single question about the code it is testing. A test case should be able to... @@ -221,6 +223,8 @@ OK
to_roman() function passes the “known values” test case. It’s not comprehensive, but it does put the function through its paces with a variety of inputs, including inputs that produce every single-character Roman numeral, the largest possible input (3999), and the input that produces the longest possible Roman numeral (3888). At this point, you can be reasonably confident that the function works for any good input value you could throw at it.
“Good” input? Hmm. What about bad input? +
⁂ +
It is not enough to test that functions succeed when given good input; you must also test that they fail when given bad input. And not just any sort of failure; they must fail in the way you expect. @@ -334,6 +338,8 @@ OK
⁂ +
Along with testing numbers that are too large, you need to test numbers that are too small. As we noted in our functional requirements, Roman numerals cannot express 0 or negative numbers.
@@ -430,6 +436,8 @@ Ran 4 tests in 0.016s
OK
+
⁂ +
There was one more functional requirement for converting numbers to Roman numerals: dealing with non-integers. diff --git a/xml.html b/xml.html index 818515c..c90aa8d 100644 --- a/xml.html +++ b/xml.html @@ -91,6 +91,8 @@ mark{display:inline} </entry> </feed> +
⁂ +
If you already know about XML, you can skip this section. @@ -173,6 +175,8 @@ mark{display:inline}
And now you know just enough XML to be dangerous! +
⁂ +
Think of a weblog, or in fact any website with frequently updated content, like CNN.com. The site itself has a title (“CNN.com”), a subtitle (“Breaking News, U.S., World, Weather, Entertainment & Video News”), a last-updated date (“updated 12:43 p.m. EDT, Sat May 16, 2009”), and a list of articles posted at different times. Each article also has a title, a first-published date (and maybe also a last-updated date, if they published a correction or fixed a typo), and a unique URL. @@ -242,6 +246,8 @@ mark{display:inline}
entry element, signaling the end of the metadata for this article.
+⁂ +
Python can parse XML documents in several ways. It has traditional DOM and SAX parsers, but I will focus on a different library called ElementTree. @@ -320,6 +326,8 @@ mark{display:inline}
updated element has no attributes, so its .attrib is just an empty dictionary.
+⁂ +
So far, we’ve worked with this XML document “from the top down,” starting with the root element, getting its child elements, and so on throughout the document. But many uses of XML require you to find specific elements. Etree can do that, too. @@ -433,6 +441,8 @@ StopIteration
Overall, ElementTree’s findall() method is a very powerful feature, but the query language can be a bit surprising. It is officially described as “limited support for XPath expressions.” XPath is a W3C standard for querying XML documents. ElementTree’s query language is similar enough to XPath to do basic searching, but dissimilar enough that it may annoy you if you already know XPath. Now let’s look at a third-party XML library that extends the ElementTree API with full XPath support.
+
⁂ +
lxml is an open source third-party library that builds on the popular libxml2 parser. It provides a 100% compatible ElementTree API, then extends it with full XPath support and a few other niceties. There are installers available for Windows; Linux users should always try to use distribution-specific tools like yum or apt-get to install precompiled binaries from their repositories. Otherwise you’ll need to install lxml manually.
@@ -480,6 +490,8 @@ except ImportError:
text()) of the title element (atom:title) that is a child of the current element (./).
+⁂ +
Python’s support for XML is not limited to parsing existing documents. You can also create XML documents from scratch. @@ -549,6 +561,8 @@ except ImportError:
⁂ +
The XML specification mandates that all conforming XML parsers employ “draconian error handling.” That is, they must halt and catch fire as soon as they detect any sort of wellformedness error in the XML document. Wellformedness errors include mismatched start and end tags, undefined entities, illegal Unicode characters, and a number of other esoteric rules. This is in stark contrast to other common formats like HTML — your browser doesn’t stop rendering a web page if you forget to close an HTML tag or escape an ampersand in an attribute value. (It is a common misconception that HTML has no defined error handling. HTML error handling is actually quite well-defined, but it’s significantly more complicated than “halt and catch fire on first error.”) @@ -610,6 +624,8 @@ lxml.etree.XMLSyntaxError: Entity 'hellip' not defined, line 3, column 28
It is important to reiterate that there is no guarantee of interoperability with “recovering” XML parsers. A different parser might decide that it recognized the … entity from HTML, and replace it with &hellip; instead. Is that “better”? Maybe. Is it “more correct”? No, they are both equally incorrect. The correct behavior (according to the XML specification) is to halt and catch fire. If you’ve decided not to do that, you’re on your own.
+
⁂ +
So why does running the script on the command line give you the same output every time? We’ll get to that. First, let’s look at that approximate_size() function.
+
⁂ +
Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it, like this:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
@@ -129,6 +131,8 @@ SyntaxError: non-keyword arg after keyword arg
4000 for the argument named size, then “obviously” that False value was meant for the a_kilobyte_is_1024_bytes argument. But Python doesn’t work that way. As soon as you have a named argument, all arguments to the right of that need to be named arguments, too.
+⁂ +
I won’t bore you with a long finger-wagging speech about the importance of documenting your code. Just know that code is written once but read many times, and the most important audience for your code is yourself, six months after writing it (i.e. after you’ve forgotten everything but need to fix something). Python makes it easy to write readable code, so take advantage of it. You’ll thank me in six months.
+☞Many Python IDEs use the
docstringto provide context-sensitive documentation, so that when you type a function name, itsdocstringappears as a tooltip. This can be incredibly helpful, but it’s only as good as thedocstrings you write.
⁂ +
In case you missed it, I just said that Python functions have attributes, and that those attributes are available at runtime. A function, like everything else in Python, is an object.
Run the interactive Python shell and follow along: @@ -215,6 +221,8 @@ SyntaxError: non-keyword arg after keyword arg
Still, this doesn’t answer the more fundamental question: what is an object? Different programming languages define “object” in different ways. In some, it means that all objects must have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser. Some objects have neither attributes nor methods, but they could. Not all objects are subclassable. But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.
You may have heard the term “first-class object” in other programming contexts. In Python, functions are first-class objects. You can pass a function as an argument to another function. Modules are first-class objects. You can pass an entire module as an argument to a function. Classes are first-class objects, and individual instances of a class are also first-class objects.
This is important, so I’m going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Classes are objects. Class instances are objects. Even modules are objects. +
⁂ +
Python functions have no explicit begin or end, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (:) and the indentation of the code itself.
@@ -240,6 +248,8 @@ SyntaxError: non-keyword arg after keyword arg
+☞Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++ and Java use semicolons to separate statements and curly braces to separate code blocks.
⁂ +
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them, by including a special block of code that executes when you run the Python file on the command line. Take the last few lines of humansize.py:
@@ -261,6 +271,8 @@ if __name__ == "__main__":
1.0 TB
931.3 GiB
And that’s your first Python program! +
⁂ +
docstring from a great docstring.