Files
dive-into-python3/comprehensions.html
T
2009-07-26 16:56:29 -04:00

287 lines
20 KiB
HTML

<!DOCTYPE html>
<head>
<meta charset=utf-8>
<title>Comprehensions - Dive into Python 3</title>
<!--[if IE]><script src=j/html5.js></script><![endif]-->
<link rel=stylesheet href=dip3.css>
<style>
body{counter-reset:h1 3}
</style>
<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
<link rel=stylesheet media=print href=print.css>
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#comprehensions>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span class=u title=beginner>&#x2666;&#x2666;&#x2662;&#x2662;&#x2662;</span>
<h1>Comprehensions</h1>
<blockquote class=q>
<p><span class=u>&#x275D;</span> Our imagination is stretched to the utmost, not, as in fiction, to imagine things which are not really there, but just to comprehend those things which are. <span class=u>&#x275E;</span><br>&mdash; <a href=http://en.wikiquote.org/wiki/Richard_Feynman>Richard Feynman</a>
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
<p class=f>This chapter will teach you about list comprehensions, dictionary comprehensions, and set comprehensions: three related concepts centered around one very powerful technique. But first, I want to take a little detour into two modules that will help you navigate your local file system.
<h2 id=os>Working With Files And Directories</h2>
<p>Python 3 comes with a module called <code>os</code>, which stands for &#8220;operating system.&#8221; The <a href=http://docs.python.org/3.1/library/os.html><code>os</code> module</a> contains a plethora of functions to get information on&nbsp;&mdash;&nbsp;and in some cases, to manipulate&nbsp;&mdash;&nbsp;local directories, files, processes, and environment variables. Python does its best to offer a unified <abbr>API</abbr> across <a href=installing-python.html>all supported operating systems</a> so your programs can run on any computer with as little platform-specific code as possible.
<h3 id=getcwd>The Current Working Directory</h3>
<p>When you&#8217;re just getting started with Python, you&#8217;re going to spend a lot of time in <a href=installing-python.html#idle>the Python Shell</a>. Throughout this book, you will see examples that go like this:
<ol>
<li>Import one of the modules in the <a href=examples/><code>examples</code> folder</a>
<li>Call a function in that module
<li>Explain the result
</ol>
<p>If you don&#8217;t know about the current working directory, step 1 will probably fail with an <code>ImportError</code>. Why? Because Python will look for the example module in <a href=your-first-python-program.html#importsearchpath>the import search path</a>, but it won&#8217;t find it because the <code>examples</code> folder isn&#8217;t one of the directories in the search path. To get past this, you can do one of two things:
<ol>
<li>Add the <code>examples</code> folder to the import search path
<li>Change the current working directory to the <code>examples</code> folder
</ol>
<p>The current working directory is an invisible property that Python holds in memory at all times. There is always a current working directory, whether you&#8217;re in the Python Shell, running your own Python script from the command line, or running a Python <abbr>CGI</abbr> script on a web server somewhere.
<p>The <code>os</code> module contains two functions to deal with the current working directory.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>import os</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.getcwd())</kbd> <span class=u>&#x2461;</span></a>
<samp>C:\Python31</samp>
<a><samp class=p>>>> </samp><kbd class=pp>os.chdir('/Users/pilgrim/diveintopython3/examples')</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.getcwd())</kbd> <span class=u>&#x2463;</span></a>
<samp>C:\Users\pilgrim\diveintopython3\examples</samp></pre>
<ol>
<li>The <code>os</code> module comes with Python; you can import it anytime, anywhere.
<li>Use the <code>os.getcwd()</code> function to get the current working directory. When you run the graphical Python Shell, the current working directory starts as the directory where the Python Shell executable is. On Windows, this depends on where you installed Python; the default directory is <code>c:\Python31</code>. If you run the Python Shell from the command line, the current working directory starts as the directory you were in when you ran <code>python3</code>.
<li>Use the <code>os.chdir()</code> function to change the current working directory.
<li>When I called the <code>os.chdir()</code> function, I used a Linux-style pathname (forward slashes, no drive letter) even though I&#8217;m on Windows. This is one of the places where Python tries to paper over the differences between operating systems.
</ol>
<h3 id=ospath>Working With Filenames and Directory Names</h3>
<p>While we&#8217;re on the subject of directories, I want to point out the <code>os.path</code> submodule. <code>os.path</code> contains functions for manipulating filenames and directory names.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>import os</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.path.join('/Users/pilgrim/diveintopython3/examples/', 'humansize.py'))</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>/Users/pilgrim/diveintopython3/examples/humansize.py</samp>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.path.join('/Users/pilgrim/diveintopython3/examples', 'humansize.py'))</kbd> <span class=u>&#x2461;</span></a>
<samp class=pp>/Users/pilgrim/diveintopython3/examples\humansize.py</samp>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.path.expanduser('~'))</kbd> <span class=u>&#x2462;</span></a>
<samp class=pp>c:\Users\pilgrim</samp>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.path.join(os.path.expanduser('~'), 'diveintopython3', 'examples', 'humansize.py'))</kbd> <span class=u>&#x2463;</span></a>
<samp class=pp>c:\Users\pilgrim\diveintopython3\examples\humansize.py</samp></pre>
<ol>
<li>The <code>os.path.join()</code> function constructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings.
<li>In this slightly less trivial case, <code>join</code> will add an extra backslash to the pathname before joining it to the filename. I was overjoyed when I discovered this, since <code>addSlashIfNecessary()</code> is one of the stupid little functions I always need to write when building up my toolbox in a new language. <em>Do not</em> write this stupid little function in Python; smart people have already taken care of it for you.
<li>The <code>os.path.expanduser()</code> function will expand a pathname that uses <code>~</code> to represent the current user&#8217;s home directory. This works on any platform where users have a home directory, including Linux, Mac OS X, and Windows.
<li>Combining these techniques, you can easily construct pathnames for directories and files under the user&#8217;s home directory.
</ol>
<p><code>os.path</code> also contains functions to split full pathnames, directory names, and filenames into their constituent parts.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>pathname = '/Users/pilgrim/diveintopython3/examples/humansize.py'</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>os.path.split(pathname)</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>('/Users/pilgrim/diveintopython3/examples', 'humansize.py')</samp>
<a><samp class=p>>>> </samp><kbd class=pp>(dirname, filename) = os.path.split(pathname)</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>dirname</kbd> <span class=u>&#x2462;</span></a>
<samp class=pp>'/Users/pilgrim/diveintopython3/examples'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>filename</kbd> <span class=u>&#x2463;</span></a>
<samp class=pp>'humansize.py'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>(shortname, extension) = os.path.splitext(filename)</kbd> <span class=u>&#x2464;</span></a>
<samp class=p>>>> </samp><kbd class=pp>shortname</kbd>
<samp class=pp>'humansize'</samp>
<samp class=p>>>> </samp><kbd class=pp>extension</kbd>
<samp class=pp>'.py'</samp></pre>
<ol>
<li>The <code>split</code> function splits a full pathname and returns a tuple containing the path and filename. Remember when I said you could use <a href=native-datatypes.html#multivar>multi-variable assignment</a> to return multiple values from a function? The <code>os.path.split()</code> function does exactly that.
<li>You assign the return value of the <code>split</code> function into a tuple of two variables. Each variable receives the value of the corresponding element of the returned tuple.
<li>The first variable, <var>dirname</var>, receives the value of the first element of the tuple returned from the <code>os.path.split()</code> function, the file path.
<li>The second variable, <var>filename</var>, receives the value of the second element of the tuple returned from the <code>os.path.split()</code> function, the filename.
<li><code>os.path</code> also contains the <code>os.path.splitext()</code> function, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique to assign each of them to separate variables.
</ol>
<h3 id=glob>Listing Directories</h3>
<p>The <code>glob</code> module is another tool in the Python standard library. It&#8217;s an easy way to get the contents of a directory programmatically, and it uses the sort of wildcards that you may already be familiar with from working on the command line.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>os.chdir('/Users/pilgrim/diveintopython3/')</kbd>
<samp class=p>>>> </samp><kbd class=pp>import glob</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>glob.glob('examples/*.xml')</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>['examples\\feed-broken.xml',
'examples\\feed-ns0.xml',
'examples\\feed.xml']</samp>
<a><samp class=p>>>> </samp><kbd class=pp>os.chdir('examples/')</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>glob.glob('*test*.py')</kbd> <span class=u>&#x2462;</span></a>
<samp class=pp>['alphameticstest.py',
'pluraltest1.py',
'pluraltest2.py',
'pluraltest3.py',
'pluraltest4.py',
'pluraltest5.py',
'pluraltest6.py',
'romantest1.py',
'romantest10.py',
'romantest2.py',
'romantest3.py',
'romantest4.py',
'romantest5.py',
'romantest6.py',
'romantest7.py',
'romantest8.py',
'romantest9.py']</samp></pre>
<ol>
<li>The <code>glob</code> module takes a wildcard and returns the path of all files and directories matching the wildcard. In this example, the wildcard is a directory path plus &#8220;<code>*.xml</code>&#8221;, which will match all <code>.xml</code> files in the <code>examples</code> subdirectory.
<li>Now change the current working directory to the <code>examples</code> subdirectory. The <code>os.chdir()</code> function can take relative pathnames.
<li>You can include multiple wildcards in your glob pattern. This example finds all the files in the current working directory that end in a <code>.py</code> extension and contain the word <code>test</code> anywhere in their filename.
</ol>
<h3 id=osstat>Getting File Metadata</h3>
<p>Every modern file system stores metadata about each file: creation date, last-modified date, file size, and so on. Python provides a single <abbr>API</abbr> to access this metadata. You don&#8217;t need to open the file; all you need is the filename.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>import os</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>print(os.getcwd())</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>c:\Users\pilgrim\diveintopython3\examples</samp>
<a><samp class=p>>>> </samp><kbd class=pp>metadata = os.stat('feed.xml')</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd class=pp>metadata.st_mtime</kbd>
<samp class=pp>1247520344.9537716</samp>
<a><samp class=p>>>> </samp><kbd class=pp>import time</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>time.localtime(metadata.st_mtime)</kbd> <span class=u>&#x2463;</span></a>
<samp class=pp>time.struct_time(tm_year=2009, tm_mon=7, tm_mday=13, tm_hour=17,
tm_min=25, tm_sec=44, tm_wday=0, tm_yday=194, tm_isdst=1)</samp>
</pre>
<ol>
<li>FIXME
<li>FIXME
<li>FIXME
<li>FIXME
</ol>
<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd class=pp>metadata.st_size</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>3070</samp>
<samp class=p>>>> </samp><kbd class=pp>import humansize</kbd>
<a.<samp class=p>>>> </samp><kbd class=pp>humansize.approximate_size(metadata.st_size)</kbd> <span class=u>&#x2461;</span></a>
<samp class=pp>'3.0 KiB'</samp></pre>
<ol>
<li>FIXME
<li>FIXME
</ol>
<h3 id=abspath>Constructing Absolute Pathnames</h3>
<p>In the previous example, the <code>glob.glob()</code> function returned a list of relative pathnames. The first example had pathnames like <code>'examples\feed.xml'</code>, and the second example had even shorter relative pathnames like <code>'romantest1.py'</code>. As long as you stay in the same current working directory, these relative pathnames will work for opening files or getting file metadata. But if you want to construct an absolute pathname&nbsp;&mdash;&nbsp;<i>i.e.</i> one that includes all the directory names back to the root directory or drive letter&nbsp;&mdash;&nbsp;then you&#8217;ll need the <code>os.path.abspath()</code> function.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>import os</kbd>
<samp class=p>>>> </samp><kbd class=pp>print(os.getcwd())</kbd>
<samp class=pp>c:\Users\pilgrim\diveintopython3\examples</samp>
<samp class=p>>>> </samp><kbd class=pp>print(os.path.abspath('feed.xml'))</kbd>
<samp class=pp>c:\Users\pilgrim\diveintopython3\examples\feed.xml</samp></pre>
<h2 id=list-comprehensions>List Comprehensions</h2>
<p>A <dfn>list comprehension</dfn> provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.
<pre class=screen>
<samp class=p>>>> </samp><kbd>a_list = [1, 9, 8, 4]</kbd>
<a><samp class=p>>>> </samp><kbd>[elem * 2 for elem in a_list]</kbd> <span>&#x2460;</span></a>
<samp class=pp>[2, 18, 16, 8]</samp>
<a><samp class=p>>>> </samp><kbd>a_list</kbd> <span>&#x2461;</span></a>
<samp class=pp>[1, 9, 8, 4]</samp>
<a><samp class=p>>>> </samp><kbd>a_list = [elem * 2 for elem in a_list]</kbd> <span>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd>a_list</kbd>
<samp class=pp>[2, 18, 16, 8]</samp></pre>
<ol>
<li>To make sense of this, look at it from right to left. <var>a_list</var> is the list you&#8217;re mapping. The Python interpreter loops through <var>a_list</var> one element at a time, temporarily assigning the value of each element to the variable <var>elem</var>. Python then applies the function <code><var>elem</var> * 2</code> and appends that result to the returned list.
<li>A list comprehension creates a new list; it does not change the original list.
<li>It is safe to assign the result of a list comprehension to the variable that you&#8217;re mapping. Python constructs the new list in memory, and when the list comprehension is complete, it assigns the result to the original variable.
</ol>
<p>FIXME
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>glob.glob('*.xml')</kbd>
<samp class=pp>['feed-broken.xml', 'feed-ns0.xml', 'feed.xml']</samp>
<samp class=p>>>> </samp><kbd class=pp>[os.path.abspath(filename) for filename in glob.glob('*.xml')]</kbd>
<samp class=pp>['c:\\Users\\pilgrim\\diveintopython3\\examples\\feed-broken.xml',
'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed-ns0.xml',
'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed.xml']</samp>
</pre>
<pre>
>>> print("\n".join(["{0:>8} {1}".format(humansize.approximate_size(os.stat(f).st_size, False), os.path.abspath(f)) for f in glob.glob('*.py')]))
2.5 KB c:\Users\pilgrim\diveintopython3\examples\alphametics.py
2.5 KB c:\Users\pilgrim\diveintopython3\examples\alphameticstest.py
1.5 KB c:\Users\pilgrim\diveintopython3\examples\fibonacci.py
1.8 KB c:\Users\pilgrim\diveintopython3\examples\fibonacci2.py
2.5 KB c:\Users\pilgrim\diveintopython3\examples\humansize.py
0.2 KB c:\Users\pilgrim\diveintopython3\examples\oneline.py
1.9 KB c:\Users\pilgrim\diveintopython3\examples\plural1.py
2.3 KB c:\Users\pilgrim\diveintopython3\examples\plural2.py
2.3 KB c:\Users\pilgrim\diveintopython3\examples\plural3.py
2.3 KB c:\Users\pilgrim\diveintopython3\examples\plural4.py
2.4 KB c:\Users\pilgrim\diveintopython3\examples\plural5.py
2.8 KB c:\Users\pilgrim\diveintopython3\examples\plural6.py
3.0 KB c:\Users\pilgrim\diveintopython3\examples\pluraltest1.py
3.0 KB c:\Users\pilgrim\diveintopython3\examples\pluraltest2.py
3.0 KB c:\Users\pilgrim\diveintopython3\examples\pluraltest3.py
3.0 KB c:\Users\pilgrim\diveintopython3\examples\pluraltest4.py
3.0 KB c:\Users\pilgrim\diveintopython3\examples\pluraltest5.py
6.1 KB c:\Users\pilgrim\diveintopython3\examples\pluraltest6.py
0.5 KB c:\Users\pilgrim\diveintopython3\examples\regression.py
2.2 KB c:\Users\pilgrim\diveintopython3\examples\roman1.py
3.4 KB c:\Users\pilgrim\diveintopython3\examples\roman10.py
2.3 KB c:\Users\pilgrim\diveintopython3\examples\roman2.py
2.3 KB c:\Users\pilgrim\diveintopython3\examples\roman3.py
2.5 KB c:\Users\pilgrim\diveintopython3\examples\roman4.py
2.7 KB c:\Users\pilgrim\diveintopython3\examples\roman5.py
3.6 KB c:\Users\pilgrim\diveintopython3\examples\roman6.py
3.7 KB c:\Users\pilgrim\diveintopython3\examples\roman7.py
3.7 KB c:\Users\pilgrim\diveintopython3\examples\roman8.py
3.7 KB c:\Users\pilgrim\diveintopython3\examples\roman9.py
4.0 KB c:\Users\pilgrim\diveintopython3\examples\romantest1.py
6.7 KB c:\Users\pilgrim\diveintopython3\examples\romantest10.py
4.2 KB c:\Users\pilgrim\diveintopython3\examples\romantest2.py
4.5 KB c:\Users\pilgrim\diveintopython3\examples\romantest3.py
4.7 KB c:\Users\pilgrim\diveintopython3\examples\romantest4.py
5.3 KB c:\Users\pilgrim\diveintopython3\examples\romantest5.py
6.1 KB c:\Users\pilgrim\diveintopython3\examples\romantest6.py
6.3 KB c:\Users\pilgrim\diveintopython3\examples\romantest7.py
6.5 KB c:\Users\pilgrim\diveintopython3\examples\romantest8.py
6.6 KB c:\Users\pilgrim\diveintopython3\examples\romantest9.py
0.4 KB c:\Users\pilgrim\diveintopython3\examples\stdout.py</pre>
<p class=a>&#x2042;
<h2 id=set-comprehensions>Set Comprehensions</h2>
<p>FIXME
<p class=a>&#x2042;
<h2 id=dictionary-comprehensions>Dictionary Comprehensions</h2>
<p>FIXME
<p class=a>&#x2042;
<h2 id=furtherreading>Further Reading</h2>
<ul>
<li><a href=http://docs.python.org/3.1/library/os.html><code>os</code> module</a>
</ul>
<p class=v><a href=native-datatypes.html rel=prev title='back to &#8220;Native Datatypes&#8221;'><span class=u>&#x261C;</span></a> <a href=strings.html rel=next title='onward to &#8220;Strings&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>