You are here: Home ‣ Dive Into Python 3 ‣
Difficulty level: ♦♢♢♢♢
❝ Don’t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. ❞
— Ven. Henepola Gunaratana
Books about programming usually start with a bunch of boring chapters about fundamentals and eventually work up to building something useful. Let’s skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don’t worry about that, because you’re going to dissect it line by line. But read through it first and see what, if anything, you can make of it.
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: string
'''
if size < 0:
raise ValueError('number must be non-negative')
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]:
size /= multiple
if size < multiple:
return '{0:.1f} {1}'.format(size, suffix)
raise ValueError('number too large')
if __name__ == '__main__':
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))
Now let’s run this program on the command line. On Windows, it will look something like this:
c:\home\diveintopython3> c:\python30\python.exe humansize.py 1.0 TB 931.3 GiB
On Mac OS X or Linux, it would look something like this:
you@localhost:~$ python3 humansize.py 1.0 TB 931.3 GiB
What just happened? You executed your first Python program. You called the Python intepreter on the command line, and you passed the name of the script you wanted Python to execute. The script defines a single function, the approximate_size() function, which takes an exact file size in bytes and calculates a “pretty” (but approximate) size. (You’ve probably seen this in Windows Explorer, or the Mac OS X Finder, or Nautilus or Dolphin or Thunar on Linux. If you display a folder of documents as a multi-column list, it will display a table with the document icon, the document name, the size, type, last-modified date, and so on. If the folder contains a 1093-byte file named TODO, your file manager won’t display TODO 1093 bytes; it’ll say something like TODO 1 KB instead. That’s what the approximate_size() function does.)
Look at the bottom of the script, and you’ll see two calls to print(approximate_size(arguments)). These are function calls — first calling the approximate_size() function and passing a number of arguments, then taking the return value and passing it straight on to the print() function. The print() function is built-in; you’ll never see an explicit declaration of it. You can just use it, anytime, anywhere. (There are lots of built-in functions, and lots more functions that are separated into modules. Patience, grasshopper.)
So why does running the script on the command line give you the same output every time? We’ll get to that. First, let’s look at that approximate_size() function.
⁂
Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it, like this:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
The keyword def starts the function declaration, followed by the function name, followed by the arguments in parentheses. Multiple arguments are separated with commas.
Also note that the function doesn’t define a return datatype. Python functions do not specify the datatype of their return value; they don’t even specify whether or not they return a value. (In fact, every Python function returns a value; if the function ever executes a return statement, it will return that value, otherwise it will return None, the Python null value.)
☞In some languages, functions (that return a value) start with
function, and subroutines (that do not return a value) start withsub. There are no subroutines in Python. Everything is a function, all functions return a value (even if it’sNone), and all functions start withdef.
The approximate_size() function takes the two arguments — size and a_kilobyte_is_1024_bytes — but neither argument specifies a datatype. In Python, variables are never explicitly typed. Python figures out what type a variable is and keeps track of it internally.
☞In Java and other statically-typed languages, you must specify the datatype of the function return value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally.
Python allows function arguments to have default values; if the function is called without the argument, the argument gets its default value. Futhermore, arguments can be specified in any order by using named arguments.
Let’s take another look at that approximate_size() function declaration:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
The second argument, a_kilobyte_is_1024_bytes, specifies a default value of True. This means the argument is optional; you can call the function without it, and Python will act as if you had called it with True as a second parameter.
Now look at the bottom of the script:
if __name__ == '__main__':
print(approximate_size(1000000000000, False)) ①
print(approximate_size(1000000000000)) ②
approximate_size() function with two argument. Within the approximate_size() function, a_kilobyte_is_1024_bytes will be False, since you explicitly passed False as the second argument.
approximate_size() function with only one argument. But that’s OK, because the second argument is optional! Since the caller doesn’t specify, the second argument defaults to True, as defined by the function declaration.
You can also pass values into a function by name.
>>> from humansize import approximate_size >>> approximate_size(4000, a_kilobyte_is_1024_bytes=False) ① '4.0 KB' >>> approximate_size(size=4000, a_kilobyte_is_1024_bytes=False) ② '4.0 KB' >>> approximate_size(a_kilobyte_is_1024_bytes=False, size=4000) ③ '4.0 KB' >>> approximate_size(a_kilobyte_is_1024_bytes=False, 4000) ④ File "<stdin>", line 1 SyntaxError: non-keyword arg after keyword arg >>> approximate_size(size=4000, False) ⑤ File "<stdin>", line 1 SyntaxError: non-keyword arg after keyword arg
approximate_size() function with 4000 for the first argument (size) and False for the argument named a_kilobyte_is_1024_bytes. (That happens to be the second argument, but doesn’t matter, as you’ll see in a minute.)
approximate_size() function with 4000 for the argument named size and False for the argument named a_kilobyte_is_1024_bytes. (These named arguments happen to be in the same order as the arguments are listed in the function declaration, but that doesn’t matter either.)
approximate_size() function with False for the argument named a_kilobyte_is_1024_bytes and 4000 for the argument named size. (See? I told you the order didn’t matter.)
4000 for the argument named size, then “obviously” that False value was meant for the a_kilobyte_is_1024_bytes argument. But Python doesn’t work that way. As soon as you have a named argument, all arguments to the right of that need to be named arguments, too.
⁂
I won’t bore you with a long finger-wagging speech about the importance of documenting your code. Just know that code is written once but read many times, and the most important audience for your code is yourself, six months after writing it (i.e. after you’ve forgotten everything but need to fix something). Python makes it easy to write readable code, so take advantage of it. You’ll thank me in six months.
You can document a Python function by giving it a documentation string (docstring for short). In this program, the approximate_size() function has a docstring:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: string
'''
Triple quotes signify a multi-line string. Everything between the start and end quotes is part of a single string, including carriage returns, leading white space, and other quote characters. You can use them anywhere, but you’ll see them most often used when defining a docstring.
☞Triple quotes are also an easy way to define a string with both single and double quotes, like
qq/.../in Perl 5.
Everything between the triple quotes is the function’s docstring, which documents what the function does. A docstring, if it exists, must be the first thing defined in a function (that is, on the next line after the function declaration). You don’t technically need to give your function a docstring, but you always should. I know you’ve heard this in every programming class you’ve ever taken, but Python gives you an added incentive: the docstring is available at runtime as an attribute of the function.
☞Many Python IDEs use the
docstringto provide context-sensitive documentation, so that when you type a function name, itsdocstringappears as a tooltip. This can be incredibly helpful, but it’s only as good as thedocstrings you write.
⁂
In case you missed it, I just said that Python functions have attributes, and that those attributes are available at runtime. A function, like everything else in Python, is an object.
Run the interactive Python shell and follow along:
>>> import humansize ① >>> print(humansize.approximate_size(4096, True)) ② 4.0 KiB >>> print(humansize.approximate_size.__doc__) ③ Convert a file size to human-readable form. Keyword arguments: size -- file size in bytes a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024 if False, use multiples of 1000 Returns: string
humansize program as a module -- a chunk of code that you can use interactively, or from a larger Python program. (You’ll see examples of multi-module Python programs in [FIXME xref].) Once you import a module, you can reference any of its public functions, classes, or attributes. Modules can do this to access functionality in other modules, and you can do it in the Python interactive shell too. This is an important concept, and you’ll see a lot more of it throughout this book.
approximate_size; it must be humansize.approximate_size. If you’ve used classes in Java, this should feel vaguely familiar.
__doc__.
☞
importin Python is likerequirein Perl. Once youimporta Python module, you access its functions withmodule.function; once yourequirea Perl module, you access its functions withmodule::function.
import Search PathBefore this goes any further, I want to briefly mention the library search path. Python looks in several places when you try to import a module. Specifically, it looks in all the directories defined in sys.path. This is just a list, and you can easily view it or modify it with standard list methods. (You’ll learn more about lists in Native Datatypes.)
>>> import sys ① >>> sys.path ② ['', '/usr/lib/python30.zip', '/usr/lib/python3.0', '/usr/lib/python3.0/plat-linux2@EXTRAMACHDEPPATH@', '/usr/lib/python3.0/lib-dynload', '/usr/lib/python3.0/dist-packages', '/usr/local/lib/python3.0/dist-packages'] >>> sys ③ <module 'sys' (built-in)> >>> sys.path.insert(0, '/home/mark/py') ④ >>> sys.path ⑤ ['/home/mark/py', '', '/usr/lib/python30.zip', '/usr/lib/python3.0', '/usr/lib/python3.0/plat-linux2@EXTRAMACHDEPPATH@', '/usr/lib/python3.0/lib-dynload', '/usr/lib/python3.0/dist-packages', '/usr/local/lib/python3.0/dist-packages']
sys module makes all of its functions and attributes available.
sys.path is a list of directory names that constitute the current search path. (Yours will look different, depending on your operating system, what version of Python you’re running, and where it was originally installed.) Python will look through these directories (in this order) for a .py file whose name matches what you’re trying to import.
.py files. Some, like the sys module, are built-in modules; they are actually baked right into Python itself. Built-in modules behave just like regular modules, but their Python source code is not available, because they are not written in Python! (The sys module is written in C.)
sys.path, and then Python will look in that directory as well, whenever you try to import a module. The effect lasts as long as Python is running.
sys.path.insert(0, new_path), you inserted a new directory as the first item of the sys.path list, and therefore at the beginning of Python’s search path. This is almost always what you want. In case of naming conflicts (for example, if Python ships with version 2 of a particular library but you want to use version 3), this ensures that your modules will be found and used instead of the modules that came with Python.
Everything in Python is an object, and everything can have attributes and methods. All functions have a built-in attribute __doc__, which returns the docstring defined in the function’s source code. The sys module is an object which has (among other things) an attribute called path. And so forth.
Still, this doesn’t answer the more fundamental question: what is an object? Different programming languages define “object” in different ways. In some, it means that all objects must have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser. Some objects have neither attributes nor methods, but they could. Not all objects are subclassable. But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.
You may have heard the term “first-class object” in other programming contexts. In Python, functions are first-class objects. You can pass a function as an argument to another function. Modules are first-class objects. You can pass an entire module as an argument to a function. Classes are first-class objects, and individual instances of a class are also first-class objects.
This is important, so I’m going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Classes are objects. Class instances are objects. Even modules are objects.
⁂
Python functions have no explicit begin or end, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (:) and the indentation of the code itself.
def approximate_size(size, a_kilobyte_is_1024_bytes=True): ①
if size < 0: ②
raise ValueError('number must be non-negative') ③
④
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]: ⑤
size /= multiple
if size < multiple:
return '{0:.1f} {1}'.format(size, suffix)
raise ValueError('number too large')
if statements, for loops, while loops, and so forth. Indenting starts a block and unindenting ends it. There are no explicit braces, brackets, or keywords. This means that whitespace is significant, and must be consistent. In this example, the function code is indented four spaces. It doesn’t need to be four spaces, it just needs to be consistent. The first line that is not indented marks the end of the function.
if statement is followed by a code block. If the if expression evaluates to true, the indented block is executed, otherwise it falls to the else block (if any). (Note the lack of parentheses around the expression.)
if code block. This raise statement will raise an exception (of type ValueError), but only if size < 0.
for loop also marks the start of a code block. Code blocks can contain multiple lines, as long as they are all indented the same amount. This for loop has three lines of code in it. There is no other special syntax for multi-line code blocks. Just indent and get on with your life.
After some initial protests and several snide analogies to Fortran, you will make peace with this and start seeing its benefits. One major benefit is that all Python programs look similar, since indentation is a language requirement and not a matter of style. This makes it easier to read and understand other people’s Python code.
☞Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++ and Java use semicolons to separate statements and curly braces to separate code blocks.
⁂
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them, by including a special block of code that executes when you run the Python file on the command line. Take the last few lines of humansize.py:
if __name__ == '__main__':
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))
☞Like C, Python uses
==for comparison and=for assignment. Unlike C, Python does not support in-line assignment, so there’s no chance of accidentally assigning the value you thought you were comparing.
So what makes this if statement special? Well, modules are objects, and all modules have a built-in attribute __name__. A module’s __name__ depends on how you’re using the module. If you import the module, then __name__ is the module’s filename, without a directory path or file extension.
>>> import humansize >>> humansize.__name__ 'humansize'
But you can also run the module directly as a standalone program, in which case __name__ will be a special default value, __main__. Python will evaluate this if statement, find a true expression, and execute the if code block. In this case, to print two values.
c:\home\diveintopython3> c:\python30\python.exe humansize.py 1.0 TB 931.3 GiB
And that’s your first Python program!
⁂
docstring from a great docstring.
© 2001–9 Mark Pilgrim