dive-into-python3/files.html

<!DOCTYPE html>
<head>
<meta charset=utf-8>
<title>Files - Dive into Python 3</title>
<!--[if IE]><script src=j/html5.js></script><![endif]-->
<link rel=stylesheet href=dip3.css>
<style>
body{counter-reset:h1 12}
</style>
<link rel=stylesheet type=text/css media='only screen and (max-device-width: 480px)' href=mobile.css>
<link rel=stylesheet media=print href=print.css>
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=sa value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#files>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span class=u title=intermediate>&#x2666;&#x2666;&#x2666;&#x2662;&#x2662;</span>
<h1>Files</h1>
<blockquote class=q>
<p><span class=u>&#x275D;</span> FIXME <span class=u>&#x275E;</span><br>&mdash; FIXME
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
<p class=f>FIXME

<h2 id=reading>Reading From Text Files</h2>

<p>Before you can read from a file, you need to open it. Opening a file in Python couldn&#8217;t be easier:

<pre class=nd><code class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</code></pre>

<p>Python has a built-in <code>open()</code> function, which takes a filename as an argument. Here the filename is <code class=pp>'examples/chinese.txt'</code>. There are five interesting things about this filename:

<ol>
<li>It&#8217;s not just the name of a file; it&#8217;s a combination of a directory path and a filename. A hypothetical file-opening function could have taken two arguments&nbsp;&mdash;&nbsp;a directory path and a filename&nbsp;&mdash;&nbsp;but the <code>open()</code> function only takes one. In Python, whenever you need a &#8220;filename,&#8221; you can include some or all of a directory path as well.
<li>The directory path uses a forward slash, but I didn&#8217;t say what operating system I was using. Windows uses backward slashes to denote subdirectories, while Mac OS X and Linux use forward slashes. But in Python, forward slashes always Just Work, even on Windows.
<li>The directory path does not begin with a slash or a drive letter, so it is called a <i>relative path</i>. Relative to what, you might ask? Patience, grasshopper.
<li>It&#8217;s a string. All modern operating systems (even Windows!) use Unicode to store the names of files and directories. Python 3 fully supports non-<abbr>ASCII</abbr> pathnames.
<li>It doesn&#8217;t need to be on your local disk. You might have a network drive mounted. That &#8220;file&#8221; might be a figment of <a href=http://en.wikipedia.org/wiki/Filesystem_in_Userspace>an entirely virtual filesystem</a>. If your computer considers it a file and can access it as a file, Python can open it.
</ol>

<p>But that call to the <code>open()</code> function didn&#8217;t stop at the filename. There&#8217;s another argument, called <code>encoding</code>. Oh dear, <a href=strings.html#boring-stuff>that sounds dreadfully familiar</a>.

<h3 id=encoding>Character Encoding Rears Its Ugly Head</h3>

<p>Bytes are bytes; <a href=strings.html#byte-arrays>characters are an abstraction</a>. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a &#8220;text file&#8221; from disk, how does Python convert that sequence of bytes into a sequence of characters? It decodes the bytes according to a specific character encoding algorithm and returns a sequence of Unicode characters (otherwise known as a string).

<pre>
# This example was created on Windows. Other platforms may
# behave differently, for reasons outlined below.
<samp class=p>>>> </samp><kbd class=pp>file = open('examples/chinese.txt')</kbd>
<samp class=p>>>> </samp><kbd class=pp>a_string = file.read()</kbd>
<samp class=traceback>Traceback (most recent call last):
  File "&lt;stdin>", line 1, in &lt;module>
  File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: character maps to &lt;undefined></samp>
<samp class=p>>>> </samp></pre>

<p>What just happened? You didn&#8217;t specify a character encoding, so Python is forced to use the default encoding. What&#8217;s the default encoding? If you look closely at the traceback, you can see that it&#8217;s dying in <code>cp1252.py</code>, meaning that Python is using CP-1252 as the default encoding here. (CP-1252 is a common encoding on computers running Microsoft Windows.) The CP-1252 character set doesn&#8217;t support the characters that are in this file, so the read fails with an ugly <code>UnicodeDecodeError</code>.

<p>But wait, it&#8217;s worse than that! The default encoding is <em>platform-dependent</em>, so this code <em>might</em> work on your computer (if your default encoding is UTF-8), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252).

<blockquote class=note>
<p><span class=u>&#x261E;</span>If you need to get the default character encoding, import the <code>locale</code> module and call <code>locale.getpreferredencoding()</code>. On my Windows laptop, it returns <code>'cp1252'</code>, but on my Linux box upstairs, it returns <code>'UTF8'</code>. I can&#8217;t even maintain consistency in my own house! Your results may be different (even on Windows) depending on which version of your operating system you have installed and how your regional/language settings are configured. This is why it&#8217;s so important to specify the encoding every time you open a file.

</blockquote>

<h3 id=file-objects>File Objects</h3>

<p>So far, all we know is that Python has a built-in function called <code>open()</code>. The <code>open()</code> function returns a <i>file object</i>, which has methods and attributes for getting information about and manipulating the file.

<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.name</kbd>                                              <span class=u>&#x2460;</span></a>
<samp class=pp>'examples/chinese.txt'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.encoding</kbd>                                          <span class=u>&#x2461;</span></a>
<samp class=pp>'utf-8'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.mode</kbd>                                              <span class=u>&#x2462;</span></a>
<samp class=pp>'r'</samp></pre>
<ol>
<li>The <code>name</code> attribute reflects the name you passed in to the <code>open()</code> function when you opened the file. It is not normalized to an absolute pathname.
<li>Likewise, <code>encoding</code> attribute reflects the encoding you passed in to the <code>open()</code> function. If you didn&#8217;t specify the encoding when you opened the file (bad developer!) then the <code>encoding</code> attribute will reflect <code>locale.getpreferredencoding()</code>.
<li>The <code>mode</code> attribute tells you in which mode the file was opened. You can pass an optional <var>mode</var> parameter to the <code>open()</code> function. You didn&#8217;t specify a mode when you opened this file, so Python defaults to <code>'r'</code>, which means &#8220;open for reading only, in text mode.&#8221; As you&#8217;ll see later in this chapter, the file mode serves several purposes; different modes let you write to a file, append to a file, or open a file in binary mode (in which you deal with bytes instead of strings).
</ol>

<blockquote class=note>
<p><span class=u>&#x261E;</span>The <a href=http://docs.python.org/3.1/library/io.html#module-interface>documentation for the <code>open()</code> function</a> lists all the possible file modes.
</blockquote>

<h3 id=read>Reading Data From A Text File</h3>

<p>After you open a file for reading, you&#8217;ll probably want to read from it at some point.

<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>a_file = open('examples/chinese.txt', encoding='utf-8')</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                                            <span class=u>&#x2460;</span></a>
<samp class=pp>'Dive Into Python 是为有经验的程序员编写的一本 Python 书。\n'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                                            <span class=u>&#x2461;</span></a>
<samp class=pp>''</samp></pre>
<ol>
<li>Once you open a file (with the correct encoding), reading from it is just a matter of calling the file object&#8217;s <code>read()</code> method. The result is a string.
<li>Perhaps somewhat surprisingly, reading the file again does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
</ol>

<p>What if you want to re-read a file?

<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                      <span class=u>&#x2460;</span></a>
<samp class=pp>''</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                     <span class=u>&#x2461;</span></a>
<samp class=pp>0</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(16)</kbd>                    <span class=u>&#x2462;</span></a>
<samp class=pp>'Dive Into Python'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                     <span class=u>&#x2463;</span></a>
<samp class=pp>' '</samp>
<samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>
<samp class=pp>'是'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                      <span class=u>&#x2464;</span></a>
<samp class=pp>20</samp></pre>
<ol>
<li>Since you&#8217;re still at the end of the file, further calls to the file object&#8217;s <code>read()</code> method simply return an empty string.
<li>The <code>seek()</code> method moves to a specific byte position in a file.
<li>The <code>read()</code> method can take an optional parameter, the number of characters to read.
<li>If you like, you can even read one character at a time.
<li>16 + 1 + 1 = &hellip; 20?
</ol>

<p>Let&#8217;s see that again.

<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(17)</kbd>                    <span class=u>&#x2460;</span></a>
<samp class=pp>17</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                     <span class=u>&#x2461;</span></a>
<samp class=pp>'是'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                      <span class=u>&#x2462;</span></a>
<samp class=pp>20</samp></pre>
<ol>
<li>Move to the 17<sup>th</sup> byte.
<li>Read one character.
<li>Now you&#8217;re on the 20<sup>th</sup> byte.
</ol>

<p>Do you see it yet? The <code>seek()</code> and <code>tell()</code> methods always count <em>bytes</em>, but since you opened this file as text, the <code>read()</code> method counts <em>characters</em>. Chinese characters <a href=strings.html#boring-stuff>require multiple bytes to encode in UTF-8</a>. The English characters in the file only require one byte each, so you might be misled into thinking that they&#8217;re counting the same thing. But that&#8217;s only true for some characters.

<p>But wait, it gets worse!

<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(18)</kbd>                         <span class=u>&#x2460;</span></a>
<samp class=pp>18</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read(1)</kbd>                          <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
  File "&lt;pyshell#12>", line 1, in &lt;module>
    a_file.read(1)
  File "C:\Python31\lib\codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x98 in position 0: unexpected code byte</samp></pre>
<ol>
<li>Move to the 18<sup>th</sup> byte and try to read one character.
<li>Why does this fail? Because there isn&#8217;t a character at the 18<sup>th</sup> byte. The nearest character starts at the 17<sup>th</sup> byte (and goes for three bytes). Trying to read a character from the middle will fail with a <code>UnicodeDecodeError</code>.
</ol>

<h3 id=close>Closing Files</h3>

<p>Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It&#8217;s important to close files as soon as you&#8217;re finished with them.

<pre class='nd screen'>
# continued from the previous example
<samp class=p>>>> </samp><kbd class=pp>a_file.close()</kbd></pre>

<p>Well <em>that</em> was anticlimactic.

<p>The file object <var>a_file</var> still exists; calling its <code>close()</code> method doesn&#8217;t destroy the object itself. But it&#8217;s not terribly useful.

<pre class=screen>
# continued from the previous example
<a><samp class=p>>>> </samp><kbd class=pp>a_file.read()</kbd>                           <span class=u>&#x2460;</span></a>
<samp class=traceback>Traceback (most recent call last):
  File "&lt;pyshell#24>", line 1, in &lt;module>
    a_file.read()
ValueError: I/O operation on closed file.</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.seek(0)</kbd>                          <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
  File "&lt;pyshell#25>", line 1, in &lt;module>
    a_file.seek(0)
ValueError: I/O operation on closed file.</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.tell()</kbd>                           <span class=u>&#x2462;</span></a>
<samp class=traceback>Traceback (most recent call last):
  File "&lt;pyshell#26>", line 1, in &lt;module>
    a_file.tell()
ValueError: I/O operation on closed file.</samp>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.close()</kbd>                          <span class=u>&#x2463;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>a_file.closed</kbd>                           <span class=u>&#x2464;</span></a>
<samp class=pp>True</samp></pre>
<ol>
<li>You can&#8217;t read from a closed file; that raises an <code>IOError</code> exception.
<li>You can&#8217;t seek in a closed file either.
<li>There&#8217;s no current position in a closed file, so the <code>tell()</code> method also fails.
<li>Perhaps surprisingly, calling the <code>close()</code> method on a file object whose file has been closed does <em>not</em> raise an exception. It&#8217;s just a no-op.
<li>Closed file objects do have one useful attribute: the <code>closed</code> attribute will confirm that the file is closed.
</ol>

<h3 id=with>Using The <code>with</code> Statement</h3>

<p>FIXME "with open(...) as file" pattern

<h3 id=for>Reading Data One Line At A Time</h3>

<p>A &#8220;line&#8221; of a text file is just what you think it is&nbsp;&mdash;&nbsp;you type a few words and press <kbd>ENTER</kbd>, and now you&#8217;re on a new line. A line of text is a sequence of characters delimited by&hellip; what exactly? Well, it&#8217;s complicated, because text files can use several different characters to mark the end of a line. Every operating system has its own convention. Some use a carriage return character, others use a line feed character, and some use both characters at the end of every line.

<p>Now breathe a sigh of relief, because <em>Python handles line endings automatically</em> by default. If you say, &#8220;I want to read this text file one line at a time,&#8221; Python will figure out which kind of line ending the text file uses and and it will all Just Work.

<blockquote class=note>
<p><span class=u>&#x261E;</span>If you need fine-grained control over what&#8217;s considered a line ending, you can pass the optional <code>newline</code> parameter to the <code>open()</code> function. See <a href=http://docs.python.org/3.1/library/io.html#module-interface>the <code>open()</code> function documentation</a> for all the gory details.
</blockquote>

<p>So, how do you actually do it? Read a file one line at a time, that is. It&#8217;s so simple, it&#8217;s beautiful.

<p class=d>[<a href=examples/oneline.py>download <code>oneline.py</code></a>]
<pre><code class=pp>line_number = 0
<a>with open('examples/favorite-people.txt', encoding='utf-8') as a_file:  <span class=u>&#x2460;</span></a>
<a>    for a_line in a_file:                                               <span class=u>&#x2461;</span></a>
        line_number += 1
<a>        print('{} {}'.format(line_number, a_line.rstrip()))             <span class=u>&#x2462;</span></a></code></pre>
<ol>
<li>FIXME
<li>
<li>
</ol>

<p>FIXME

<pre class=screen>
<samp class=p>you@localhost:~/diveintopython3$ </samp><kbd class=pp>python3 examples/oneline.py</kbd>
<samp>1 Dora
2 Ethan
3 Wesley
4 John
5 Anne
6 Mike
7 Chris
8 Sarah
9 Alex
10 Lizzie</samp></pre>

<h2 id=writing>Writing to Text Files</h2>

<p>FIXME

<!--
<p>As you would expect, you can also write to files in much the same way that you read from them. There are two basic file modes:
<div class=itemizedlist>
<ul>
<li>"Append" mode will add data to the end of the file.
<li>"write" mode will overwrite the file.
</ul>
<p>Either mode will create the file automatically if it doesn&#8217;t already exist, so there&#8217;s never a need for any sort of fiddly
   "if the log file doesn't exist yet, create a new empty file just so you can open it for the first time" logic. Just open
   it and start writing.
<div class=example><h3 id="fileinfo.files.writeandappend">Example 6.7. Writing to Files</h3><pre class=screen>
<samp class=p>>>> </samp><kbd>logfile = open('test.log', 'w')</kbd> <span>&#x2460;</span>
<samp class=p>>>> </samp><kbd>logfile.write('test succeeded')</kbd> <span>&#x2461;</span>
<samp class=p>>>> </samp><kbd>logfile.close()</kbd>
<samp class=p>>>> </samp><kbd>print file('test.log').read()</kbd>   <span>&#x2462;</span>
test succeeded
<samp class=p>>>> </samp><kbd>logfile = open('test.log', 'a')</kbd> <span>&#x2463;</span>
<samp class=p>>>> </samp><kbd>logfile.write('line 2')</kbd>
<samp class=p>>>> </samp><kbd>logfile.close()</kbd>
<samp class=p>>>> </samp><kbd>print file('test.log').read()</kbd>   <span>&#x2464;</span>
test succeededline 2
</pre>
<ol>
<li>You start boldly by creating either the new file <code>test.log</code> or overwrites the existing file, and opening the file for writing. (The second parameter <code>"w"</code> means open the file for writing.)  Yes, that&#8217;s all as dangerous as it sounds. I hope you didn&#8217;t care about the previous    contents of that file, because it&#8217;s gone now.
<li>You can add data to the newly opened file with the <code>write</code> method of the file object returned by <code>open</code>.
<li><code>file</code> is a synonym for <code>open</code>. This one-liner opens the file, reads its contents, and prints them.
<li>You happen to know that <code>test.log</code> exists (since you just finished writing to it), so you can open it and append to it. (The <code>"a"</code> parameter means open the file for appending.)  Actually you could do this even if the file didn&#8217;t exist, because opening    the file for appending will create the file if necessary. But appending will <em>never</em> harm the existing contents of the file.
<li>As you can see, both the original line you wrote and the second line you appended are now in <code>test.log</code>. Also note that carriage returns are not included. Since you didn&#8217;t write them explicitly to the file either time, the    file doesn&#8217;t include them. You can write a carriage return with the <code>"\n"</code> character. Since you didn&#8217;t do this, everything you wrote to the file ended up smooshed together on the same line.
-->

<h3 id=encoding-again>Character Encoding Again</h3>

<p>FIXME

<h3 id=write>Write A Little, Write A Lot</h3>

<p>FIXME write(), writelines(), .writeable

<h2 id=ioerror>Handling I/O Errors</h2>

<p>FIXME

<!--
<p>Now you&#8217;ve seen enough to understand the file handling code in the <code>fileinfo.py</code> sample code from the previous chapter. This example shows how to safely open and read from a file and gracefully handle
   errors.
<div class=example><h3 id="fileinfo.files.incode">Example 6.6. File Objects in <code>MP3FileInfo</code></h3><pre><code>
        try:              <span>&#x2460;</span> fsock = open(filename, "rb", 0) <span>&#x2461;</span> try:              fsock.seek(-128, 2)         <span>&#x2462;</span>     tagdata = fsock.read(128)   <span>&#x2463;</span> finally:      <span>&#x2464;</span>     fsock.close()               . . .
        except IOError:   <span>&#x2465;</span> pass         </pre>
<ol>
<li>Because opening and reading files is risky and may raise an exception, all of this code is wrapped in a <code>try...except</code> block. (Hey, isn&#8217;t <a href="#odbchelper.indenting" title="2.5. Indenting Code">standardized indentation</a> great?  This is where you start to appreciate it.)
<li>The <code>open</code> function may raise an <code>IOError</code>. (Maybe the file doesn&#8217;t exist.)
<li>The <code>seek</code> method may raise an <code>IOError</code>. (Maybe the file is smaller than 128 bytes.)
<li>The <code>read</code> method may raise an <code>IOError</code>. (Maybe the disk has a bad sector, or it&#8217;s on a network drive and the network just went down.)
<li>This is new: a <code>try...finally</code> block. Once the file has been opened successfully by the <code>open</code> function, you want to make absolutely sure that you close it, even if an exception is raised by the <code>seek</code> or <code>read</code> methods. That&#8217;s what a <code>try...finally</code> block is for: code in the <code>finally</code> block will <em>always</em> be executed, even if something in the <code>try</code> block raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.
<li>At last, you handle your <code>IOError</code> exception. This could be the <code>IOError</code> exception raised by the call to <code>open</code>, <code>seek</code>, or <code>read</code>. Here, you really don&#8217;t care, because all you&#8217;re going to do is ignore it silently and continue. (Remember, <code>pass</code> is a Python statement that <a href="#fileinfo.class.simplest" title="Example 5.3. The Simplest Python Class">does nothing</a>.)  That&#8217;s perfectly legal; &#8220;handling&#8221; an exception can mean explicitly doing nothing. It still counts as handled, and processing will continue normally on the    next line of code after the <code>try...except</code> block.
-->

<h2 id=binary>Binary Files</h2>

<p>FIXME

<pre>
>>> image = open('examples/beauregard-100x100.jpg', 'rb')
>>> image
&lt;io.BufferedReader object at 0x00C7A390>
>>> image.mode
'rb'
>>> image.name
'examples/beauregard-100x100.jpg'
</pre>

<pre>
>>> image
&lt;io.BufferedReader object at 0x00C7A390>
>>> image.tell()
0
>>> data = image.read(3)
>>> data
b'\xff\xd8\xff'
>>> image.tell()
3
>>> image.seek(0)
0
>>> data = image.read()
>>> len(data)
3150
</pre>

<h2 id=file-like-objects>File-like Objects</h2>

<p>FIXME

<!--
<p>One of Python&#8217;s greatest strengths is its dynamic binding, and one powerful use of dynamic binding is the <dfn>file-like object</dfn>.

<p>Many functions which require an input source could simply take a filename, go open the file for reading, read it, and close it when they&#8217;re done. But they don&#8217;t. Instead, they take a <em>file-like object</em>.

<p>In the simplest case, a <em>file-like object</em> is any object with a <code>read</code> method with an optional <var>size</var> parameter, which returns a string. When called with no <var>size</var> parameter, it reads everything there is to read from the input source and returns all the data as a single string. When
called with a <var>size</var> parameter, it reads that much from the input source and returns that much data; when called again, it picks up where it left
off and returns the next chunk of data.
<p>This is how <a href="#fileinfo.files" title="6.2. Working with File Objects">reading from real files</a> works; the difference is that you&#8217;re not limiting yourself to real files. The input source could be anything: a file on
disk, a web page, even a hard-coded string. As long as you pass a file-like object to the function, and the function simply
calls the object&#8217;s <code>read</code> method, the function can handle any kind of input source without specific code to handle each kind.
-->

<!--
<div class=example><h3 id="kgp.openanything.stringio.example">Example 10.4. Introducing <code>StringIO</code></h3><pre class=screen>
<samp class=p>>>> </samp><kbd>contents = "&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"</kbd>
<samp class=p>>>> </samp><kbd>import StringIO</kbd>
<samp class=p>>>> </samp><kbd>ssock = StringIO.StringIO(contents)</kbd>   <span>&#x2460;</span>
<samp class=p>>>> </samp><kbd>ssock.read()</kbd>        <span>&#x2461;</span>
"&lt;grammar>&lt;ref id='bit'>&lt;p>0&lt;/p>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>"
<samp class=p>>>> </samp><kbd>ssock.read()</kbd>        <span>&#x2462;</span>
''
<samp class=p>>>> </samp><kbd>ssock.seek(0)</kbd>       <span>&#x2463;</span>
<samp class=p>>>> </samp><kbd>ssock.read(15)</kbd>      <span>&#x2464;</span>
'&lt;grammar>&lt;ref i'
<samp class=p>>>> </samp><kbd>ssock.read(15)</kbd>
"d='bit'>&lt;p>0&lt;/p"
<samp class=p>>>> </samp><kbd>ssock.read()</kbd>
'>&lt;p>1&lt;/p>&lt;/ref>&lt;/grammar>'
<samp class=p>>>> </samp><kbd>ssock.close()</kbd>       <span>&#x2465;</span></pre>
<ol>
<li>The <code>StringIO</code> module contains a single class, also called <code>StringIO</code>, which allows you to turn a string into a file-like object. The <code>StringIO</code> class takes the string as a parameter when creating an instance.
<li>Now you have a file-like object, and you can do all sorts of file-like things with it. Like <code>read</code>, which returns the original string.
<li>Calling <code>read</code> again returns an empty string. This is how real file objects work too; once you read the entire file, you can&#8217;t read any more without explicitly seeking to the beginning of the file. The <code>StringIO</code> object works the same way.
<li>You can explicitly seek to the beginning of the string, just like seeking through a file, by using the <code>seek</code> method of the <code>StringIO</code> object.
<li>You can also read the string in chunks, by passing a <var>size</var> parameter to the <code>read</code> method.
<li>At any time, <code>read</code> will return the rest of the string that you haven&#8217;t read yet. All of this is exactly how file objects work; hence the term
<em>file-like object</em>.
-->

<h2 id=stdio>Standard Input, Output, and Error</h2>

<p>FIXME

<!--
<p><abbr>UNIX</abbr> users are already familiar with the concept of standard input, standard output, and standard error. This section is for
   the rest of you.
<p>Standard output and standard error (commonly abbreviated <code>stdout</code> and <code>stderr</code>) are pipes that are built into every <abbr>UNIX</abbr> system. When you <code>print</code> something, it goes to the <code>stdout</code> pipe; when your program crashes and prints out debugging information (like a traceback in Python), it goes to the <code>stderr</code> pipe. Both of these pipes are ordinarily just connected to the terminal window where you are working, so when a program
prints, you see the output, and when a program crashes, you see the debugging information. (If you&#8217;re working on a system
with a window-based Python <abbr>IDE</abbr>, <code>stdout</code> and <code>stderr</code> default to your &#8220;Interactive Window&#8221;.)
<div class=example><h3>Example 10.8. Introducing <code>stdout</code> and <code>stderr</code></h3><pre class=screen>
<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
<samp class=p>...    </samp>print 'Dive in'             <span>&#x2460;</span>
<samp>Dive in
Dive in
Dive in</samp>
<samp class=p>>>> </samp><kbd>import sys</kbd>
<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
<samp class=p>...    </samp>sys.stdout.write('Dive in') <span>&#x2461;</span>
Dive inDive inDive in
<samp class=p>>>> </samp><kbd>for i in range(3):</kbd>
<samp class=p>...    </samp>sys.stderr.write('Dive in') <span>&#x2462;</span>
Dive inDive inDive in</pre>
<ol>
<li>As you saw in <a href="#fileinfo.for.counter" title="Example 6.9. Simple Counters">Example 6.9, &#8220;Simple Counters&#8221;</a>, you can use Python&#8217;s built-in <code>range</code> function to build simple counter loops that repeat something a set number of times.
<li><code>stdout</code> is a file-like object; calling its <code>write</code> function will print out whatever string you give it. In fact, this is what the <code>print</code> function really does; it adds a carriage return to the end of the string you&#8217;re printing, and calls <code>sys.stdout.write</code>.
<li>In the simplest case, <code>stdout</code> and <code>stderr</code> send their output to the same place: the Python <abbr>IDE</abbr> (if you&#8217;re in one), or the terminal (if you&#8217;re running Python from the command line). Like <code>stdout</code>, <code>stderr</code> does not add carriage returns for you; if you want them, add them yourself.
<p><code>stdout</code> and <code>stderr</code> are both file-like objects, like the ones you discussed in <a href="#kgp.openanything" title="10.1. Abstracting input sources">Section 10.1, &#8220;Abstracting input sources&#8221;</a>, but they are both write-only. They have no <code>read</code> method, only <code>write</code>. Still, they are file-like objects, and you can assign any other file- or file-like object to them to redirect their output.
<div class=example><h3>Example 10.9. Redirecting output</h3><pre class=screen>
<samp class=p>[you@localhost kgp]$ </samp>python stdout.py
Dive in
<samp class=p>[you@localhost kgp]$ </samp>cat out.log
This message will be logged instead of displayed</pre><p>(On Windows, you can use <code>type</code> instead of <code>cat</code> to display the contents of a file.)
<p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
#stdout.py
import sys

print 'Dive in'      <span>&#x2460;</span>
saveout = sys.stdout <span>&#x2461;</span>
fsock = open('out.log', 'w')           <span>&#x2462;</span>
sys.stdout = fsock   <span>&#x2463;</span>
print 'This message will be logged instead of displayed' <span>&#x2464;</span>
sys.stdout = saveout <span>&#x2465;</span>
fsock.close()        <span>&#x2466;</span>
</pre>
<ol>
<li>This will print to the <abbr>IDE</abbr> &#8220;Interactive Window&#8221; (or the terminal, if running the script from the command line).
<li>Always save <code>stdout</code> before redirecting it, so you can set it back to normal later.
<li>Open a file for writing. If the file doesn&#8217;t exist, it will be created. If the file does exist, it will be overwritten.
<li>Redirect all further output to the new file you just opened.
<li>This will be &#8220;printed&#8221; to the log file only; it will not be visible in the <abbr>IDE</abbr> window or on the screen.
<li>Set <code>stdout</code> back to the way it was before you mucked with it.
<li>Close the log file.
<p>Redirecting <code>stderr</code> works exactly the same way, using <code>sys.stderr</code> instead of <code>sys.stdout</code>.
<div class=example><h3>Example 10.10. Redirecting error information</h3><pre class=screen>
<samp class=p>[you@localhost kgp]$ </samp>python stderr.py
<samp class=p>[you@localhost kgp]$ </samp>cat error.log
<samp>Traceback (most recent line last):
  File "stderr.py", line 5, in ?
    raise Exception, 'this error will be logged'
Exception: this error will be logged</span></pre><p>If you have not already done so, you can <a href="http://diveintopython3.org/download/diveintopython3-examples-5.4.zip" title="Download example scripts">download this and other examples</a> used in this book.
<pre><code>
#stderr.py
import sys

fsock = open('error.log', 'w')               <span>&#x2460;</span>
sys.stderr = fsock         <span>&#x2461;</span>
raise Exception, 'this error will be logged' <span>&#x2462;</span> <span>&#x2463;</span>
</pre>
<ol>
<li>Open the log file where you want to store debugging information.
<li>Redirect standard error by assigning the file object of the newly-opened log file to <code>stderr</code>.
<li>Raise an exception. Note from the screen output that this does <em>not</em> print anything on screen. All the normal traceback information has been written to <code>error.log</code>.
<li>Also note that you&#8217;re not explicitly closing your log file, nor are you setting <code>stderr</code> back to its original value. This is fine, since once the program crashes (because of the exception), Python will clean up and close the file for us, and it doesn&#8217;t make any difference that <code>stderr</code> is never restored, since, as I mentioned, the program crashes and Python ends. Restoring the original is more important for <code>stdout</code>, if you expect to go do other stuff within the same script afterwards.
<p>Since it is so common to write error messages to standard error, there is a shorthand syntax that can be used instead of going
through the hassle of redirecting it outright.
<div class=example><h3 id="kgp.stdio.print.example">Example 10.11. Printing to <code>stderr</code></h3><pre class=screen>
<samp class=p>>>> </samp><kbd>print 'entering function'</kbd>
entering function
<samp class=p>>>> </samp><kbd>import sys</kbd>
<samp class=p>>>> </samp><kbd>print >> sys.stderr, 'entering function'</kbd> <span>&#x2460;</span>
entering function
</pre>
<ol>
<li>This shorthand syntax of the <code>print</code> statement can be used to write to any open file, or file-like object. In this case, you can redirect a single <code>print</code> statement to <code>stderr</code> without affecting subsequent <code>print</code> statements.
<p>Standard input, on the other hand, is a read-only file object, and it represents the data flowing into the program from some
previous program. This will likely not make much sense to classic Mac OS users, or even Windows users unless you were ever fluent on the <abbr>MS-DOS</abbr> command line. The way it works is that you can construct a chain of commands in a single line, so that one program&#8217;s output
becomes the input for the next program in the chain. The first program simply outputs to standard output (without doing any
special redirecting itself, just doing normal <code>print</code> statements or whatever), and the next program reads from standard input, and the operating system takes care of connecting
one program&#8217;s output to the next program&#8217;s input.
-->

<h2 id=furtherreading>Further Reading</h2>

<p>FIXME

<!--
<ul>
<li><a href="http://www.python.org/doc/current/tut/tut.html"><i class=citetitle>Python Tutorial</i></a> discusses reading and writing files, including how to <a href="http://www.python.org/doc/current/tut/node9.html#SECTION009210000000000000000">read a file one line at a time into a list</a>.

<li><a href="http://www.effbot.org/guides/">eff-bot</a> discusses efficiency and performance of <a href="http://www.effbot.org/guides/readline-performance.htm">various ways of reading a file</a>.

<li><a href="http://www.faqts.com/knowledge-base/index.phtml/fid/199/">Python Knowledge Base</a> answers <a href="http://www.faqts.com/knowledge-base/index.phtml/fid/552">common questions about files</a>.

<li><a href="http://www.python.org/doc/current/lib/"><i class=citetitle>Python Library Reference</i></a> summarizes <a href="http://www.python.org/doc/current/lib/bltin-file-objects.html">all the file object methods</a>.

</ul>
-->

<p class=v><a href=advanced-classes.html rel=prev title='back to &#8220;Advanced Classes&#8221;'><span class=u>&#x261C;</span></a> <a href=xml.html rel=next title='onward to &#8220;XML&#8221;'><span class=u>&#x261E;</span></a>

<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>