From 3583007e4db0a1fae8fe8f1966fdbbd9883b21ab Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Sun, 16 Aug 2009 13:10:01 -0400 Subject: [PATCH] "file objects" --> "stream objects". "file-like objects" --> "stream objects". --- files.html | 68 ++++++++++++++++++++++++++---------------------------- 1 file changed, 33 insertions(+), 35 deletions(-) diff --git a/files.html b/files.html index 3e435eb..c67d5df 100644 --- a/files.html +++ b/files.html @@ -65,9 +65,9 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara -

File Objects

+

Stream Objects

-

So far, all we know is that Python has a built-in function called open(). The open() function returns a file object, which has methods and attributes for getting information about and manipulating the file. +

So far, all we know is that Python has a built-in function called open(). The open() function returns a stream object, which has methods and attributes for getting information about and manipulating a stream of characters.

 >>> a_file = open('examples/chinese.txt', encoding='utf-8')
@@ -98,7 +98,7 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara
 >>> a_file.read()                                            
 ''
    -
  1. Once you open a file (with the correct encoding), reading from it is just a matter of calling the file object’s read() method. The result is a string. +
  2. Once you open a file (with the correct encoding), reading from it is just a matter of calling the stream object’s read() method. The result is a string.
  3. Perhaps somewhat surprisingly, reading the file again does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
@@ -119,7 +119,7 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara >>> a_file.tell() 20
    -
  1. Since you’re still at the end of the file, further calls to the file object’s read() method simply return an empty string. +
  2. Since you’re still at the end of the file, further calls to the stream object’s read() method simply return an empty string.
  3. The seek() method moves to a specific byte position in a file.
  4. The read() method can take an optional parameter, the number of characters to read.
  5. If you like, you can even read one character at a time. @@ -171,7 +171,7 @@ UnicodeDecodeError: 'utf8' codec can't decode byte 0x98 in position 0: unexpecte

    Well that was anticlimactic. -

    The file object a_file still exists; calling its close() method doesn’t destroy the object itself. But it’s not terribly useful. +

    The stream object a_file still exists; calling its close() method doesn’t destroy the object itself. But it’s not terribly useful.

     # continued from the previous example
    @@ -197,13 +197,13 @@ ValueError: I/O operation on closed file.
     
  6. You can’t read from a closed file; that raises an IOError exception.
  7. You can’t seek in a closed file either.
  8. There’s no current position in a closed file, so the tell() method also fails. -
  9. Perhaps surprisingly, calling the close() method on a file object whose file has been closed does not raise an exception. It’s just a no-op. -
  10. Closed file objects do have one useful attribute: the closed attribute will confirm that the file is closed. +
  11. Perhaps surprisingly, calling the close() method on a stream object whose file has been closed does not raise an exception. It’s just a no-op. +
  12. Closed stream objects do have one useful attribute: the closed attribute will confirm that the file is closed.

Closing Files Automatically

-

File objects have an explicit close() method, but what happens if your code has a bug and crashes before you call close()? That file could theoretically stay open for much longer than necessary. While you’re debugging on your local computer, that’s not a big deal. On a production server, maybe it is. +

Stream objects have an explicit close() method, but what happens if your code has a bug and crashes before you call close()? That file could theoretically stay open for much longer than necessary. While you’re debugging on your local computer, that’s not a big deal. On a production server, maybe it is.

Python 2 had a solution for this: the try..finally block. That still works in Python 3, and you may see it in other people’s code or in older code that was ported to Python 3. But Python 3 also adds a cleaner solution: the with statement. @@ -212,15 +212,15 @@ ValueError: I/O operation on closed file. a_character = a_file.read(1) print(a_character) -

This code calls open(), but it never calls a_file.close(). The with statement starts a code block, like an if statement or a for loop. Inside this code block, you can use the variable a_file as the file object returned from the call to open(). All the regular file object methods are available — seek(), read(), whatever you need. When the with block ends, Python calls a_file.close() automatically. +

This code calls open(), but it never calls a_file.close(). The with statement starts a code block, like an if statement or a for loop. Inside this code block, you can use the variable a_file as the stream object returned from the call to open(). All the regular stream object methods are available — seek(), read(), whatever you need. When the with block ends, Python calls a_file.close() automatically.

Here’s the kicker: no matter how or when you exit the with block, Python will close that file… even if you “exit” it via an unhandled exception. That’s right, even if your code raises an exception and your entire program comes to a screeching halt, that file will get closed. Guaranteed.

-

In technical terms, the with statement creates a runtime context. In these examples, the file object acts as a context manager. Python creates the file object a_file and tells it that it is entering a runtime context. When the with code block is completed, Python tells the file object that it is exiting the runtime context, and the file object calls its own close() method. See Appendix B, “Context Managers” for details. +

In technical terms, the with statement creates a runtime context. In these examples, the stream object acts as a context manager. Python creates the stream object a_file and tells it that it is entering a runtime context. When the with code block is completed, Python tells the stream object that it is exiting the runtime context, and the stream object calls its own close() method. See Appendix B, “Context Managers” for details.

-

There’s nothing file-specific about the with statement; it’s just a generic framework for creating runtime contexts and telling objects that they’re entering and exiting a runtime context. If the object in question is a file object, then it does useful file-like things (like closing the file automatically). But that behavior is defined in the file object, not in the with statement. There are lots of other ways to use context managers that have nothing to do with files. You can even create your own, as you’ll see later in this chapter. +

There’s nothing file-specific about the with statement; it’s just a generic framework for creating runtime contexts and telling objects that they’re entering and exiting a runtime context. If the object in question is a stream object, then it does useful file-like things (like closing the file automatically). But that behavior is defined in the stream object, not in the with statement. There are lots of other ways to use context managers that have nothing to do with files. You can even create your own, as you’ll see later in this chapter.

Reading Data One Line At A Time

@@ -242,7 +242,7 @@ ValueError: I/O operation on closed file. print('{} {}'.format(line_number, a_line.rstrip()))
  1. Using the with pattern, you safely open the file and let Python close it for you. -
  2. To read a file one line at a time, use a for loop. That’s it. Besides having explicit methods like read(), the file object is also an iterator which spits out a single line every time you ask for a value. +
  3. To read a file one line at a time, use a for loop. That’s it. Besides having explicit methods like read(), the stream object is also an iterator which spits out a single line every time you ask for a value.
  4. Using the format() string method, you can print out the line number and the line itself. (The a_line variable contains the complete line, carriage returns and all. The rstrip() string method removes the trailing whitespace, including the carriage return characters.)
@@ -263,7 +263,7 @@ ValueError: I/O operation on closed file.

Writing to Text Files

-

You can write to files in much the same way that you read from them. First you open a file and get a file object, then you use methods on the file object to write data to the file, then you close the file. +

You can write to files in much the same way that you read from them. First you open a file and get a stream object, then you use methods on the stream object to write data to the file, then you close the file.

To open a file for writing, use the open() method and specify the write mode. There are two file modes for writing: @@ -274,7 +274,7 @@ ValueError: I/O operation on closed file.

Either mode will create the file automatically if it doesn’t already exist, so there’s never a need for any sort of fiddly “if the file doesn’t exist yet, create a new empty file just so you can open it for the first time” function. Just open a file and start writing. -

You should always close a file as soon as you’re done writing to it, to release the file handle and ensure that the data is actually written to disk. As with reading data from a file, you can call the file object’s close() method, or you can use the with statement and let Python close the file for you. I bet you can guess which technique I recommend. +

You should always close a file as soon as you’re done writing to it, to release the file handle and ensure that the data is actually written to disk. As with reading data from a file, you can call the stream object’s close() method, or you can use the with statement and let Python close the file for you. I bet you can guess which technique I recommend.

 >>> with open('test.log', mode='w', encoding='utf-8') as a_file:  
@@ -289,7 +289,7 @@ ValueError: I/O operation on closed file.
 test succeededand again                                           
  1. You start boldly by creating the new file test.log (or overwriting the existing file), and opening the file for writing. The mode='w' parameter means open the file for writing. Yes, that’s all as dangerous as it sounds. I hope you didn’t care about the previous contents of that file (if any), because that data is gone now. -
  2. You can add data to the newly opened file with the write method of the file object returned by the open() function. After the with block ends, Python automatically closes the file. +
  3. You can add data to the newly opened file with the write method of the stream object returned by the open() function. After the with block ends, Python automatically closes the file.
  4. That was so fun, let’s do it again. But this time, with mode='a' to append to the file instead of overwriting it. Appending will never harm the existing contents of the file.
  5. Both the original line you wrote and the second line you appended are now in the file test.log. Also note that carriage returns are not included. Since you didn’t write them explicitly to the file either time, the file doesn’t include them. You can write a carriage return with the '\n' character. Since you didn’t do this, everything you wrote to the file ended up on one line.
@@ -317,10 +317,10 @@ ValueError: I/O operation on closed file. File "<stdin>", line 1, in <module> AttributeError: '_io.BufferedReader' object has no attribute 'encoding'
    -
  1. Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the mode parameter contains a 'b'. -
  2. The file object you get from opening a file in binary mode has many of the same attributes, including mode, which reflects the mode parameter you passed into the open() function. -
  3. File objects for binary files also have a name attribute, just like file objects for text files. -
  4. Here’s one difference, though: the file object for a binary file has no encoding attribute. That makes sense, right? You’re reading (or writing) bytes, not strings, so there’s no conversion for Python to do. What you get out of a binary file is exactly what you put into it, no conversion necessary. +
  5. Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the mode parameter contains a 'b' character. +
  6. The stream object you get from opening a file in binary mode has many of the same attributes, including mode, which reflects the mode parameter you passed into the open() function. +
  7. Binary stream objects also have a name attribute, just like text stream objects. +
  8. Here’s one difference, though: a binary stream object has no encoding attribute. That makes sense, right? You’re reading (or writing) bytes, not strings, so there’s no conversion for Python to do. What you get out of a binary file is exactly what you put into it, no conversion necessary.

Did I mention you’re reading bytes? Oh yes you are. @@ -349,15 +349,13 @@ AttributeError: '_io.BufferedReader' object has no attribute 'encoding'⁂ -

File-like Objects

+

Streams Objects From Non-File Sources

-

One of Python’s greatest strengths is its dynamic binding, and one powerful use of dynamic binding is the file-like object. +

Imagine you’re writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it, and close it before exiting. But you shouldn’t do that.. Instead, your API should take an arbitrary stream object. -

Your functions which require an input source could simply take a filename as a string, go open the file for reading, read it, and close it when they’re done. But they shouldn’t. Instead, they should take a file-like object. +

In the simplest case, a stream object is anything with a read() method which takes an optional size parameter and returns a string. When called with no size parameter, the read() method should read everything there is to read from the input source and return all the data as a single value. When called with a size parameter, it reads that much from the input source and returns that much data. When called again, it picks up where it left off and returns the next chunk of data. -

In the simplest case, a file-like object is any object with a read() method with an optional size parameter, which returns a string. When called with no size parameter, it reads everything there is to read from the input source and returns all the data as a single string. When called with a size parameter, it reads that much from the input source and returns that much data. When called again, it picks up where it left off and returns the next chunk of data. - -

You know, like a real file object. The difference is that you’re not limiting yourself to real files. The input source that’s being “read” could be anything: a web page, a string in memory, even the output of another program. As long as your functions take a file-like object and simply call the object’s read() method, you can handle any input source that acts like a file, without specific code to handle each kind of input. +

That sounds exactly like the stream object you get from opening a real file. The difference is that you’re not limiting yourself to real files. The input source that’s being “read” could be anything: a web page, a string in memory, even the output of another program. As long as your functions take a stream object and simply call the object’s read() method, you can handle any input source that acts like a file, without specific code to handle each kind of input.

 >>> a_string = 'PapayaWhip is the new black.'
@@ -379,7 +377,7 @@ AttributeError: '_io.BufferedReader' object has no attribute 'encoding''new black.'
  1. The io module contains the definition of the StringIO class that you can use to treat a string in memory as a file. -
  2. To create a file-like object out of a string, create an instance of the io.StringIO() class and pass it the string you want to use as your “file” data. Now you have a file-like object, and you can do all sorts of file-like things with it. +
  3. To create a stream object out of a string, create an instance of the io.StringIO() class and pass it the string you want to use as your “file” data. Now you have a stream object, and you can do all sorts of stream-like things with it.
  4. Calling the read() method “reads” the entire “file,” which in the case of a StringIO object simply returns the original string.
  5. Just like a real file, calling the read() method again returns an empty string.
  6. You can explicitly seek to the beginning of the string, just like seeking through a real file, by using the seek() method of the StringIO object. @@ -388,9 +386,9 @@ AttributeError: '_io.BufferedReader' object has no attribute 'encoding'Handling Compressed Files -

    The Python standard library contains modules that support reading and writing compressed files. There are a number of different compression schemes; the most popular for single files are gzip and bzip2. (You may have also encountered PKZIP archives and GNU Tar archives. Python has modules for those, too.) +

    The Python standard library contains modules that support reading and writing compressed files. There are a number of different compression schemes; the two most popular on non-Windows systems are gzip and bzip2. (You may have also encountered PKZIP archives and GNU Tar archives. Python has modules for those, too.) -

    The gzip module lets you create a file-like object for reading or writing a gzip-compressed file. The file-like object it gives you supports the read() method (if you opened it for reading) or the write() method (if you opened it for writing). That means you can use the methods you’ve already learned for regular files to directly read or write a gzip-compressed file, without creating a temporary file to store the decompressed data. +

    The gzip module lets you create a stream object for reading or writing a gzip-compressed file. The stream object it gives you supports the read() method (if you opened it for reading) or the write() method (if you opened it for writing). That means you can use the methods you’ve already learned for regular files to directly read or write a gzip-compressed file, without creating a temporary file to store the decompressed data.

    As an added bonus, it supports the with statement too, so you can let Python automatically close your gzip-compressed file when you’re done with it. @@ -432,11 +430,11 @@ PapayaWhip new blacknew blacknew black

    1. The print() statement, in a loop. Nothing surprising here. -
    2. stdout is defined in the sys module, and it is a file-like object. Calling its write function will print out whatever string you give it. In fact, this is what the print function really does; it adds a carriage return to the end of the string you’re printing, and calls sys.stdout.write. +
    3. stdout is defined in the sys module, and it is a stream object. Calling its write function will print out whatever string you give it. In fact, this is what the print function really does; it adds a carriage return to the end of the string you’re printing, and calls sys.stdout.write.
    4. In the simplest case, sys.stdout and sys.stderr send their output to the same place: the Python IDE (if you’re in one), or the terminal (if you’re running Python from the command line). Like standard output, standard error does not add carriage returns for you. If you want carriage returns, you’ll need to write carriage return characters.
    -

    sys.stdout and sys.stderr are file-like objects, but they are write-only. Attempting to call their read() method will always raise an IOError. +

    sys.stdout and sys.stderr are stream objects, but they are write-only. Attempting to call their read() method will always raise an IOError.

     >>> import sys
    @@ -447,7 +445,7 @@ IOError: not readable

    Redirecting Standard Output

    -

    So sys.stdout and sys.stderr are file-like objects, albeit ones that only support writing. But they’re not constants; they’re variables. That means you can assign them a new value — another file object, or another file-like object — and redirect their output. +

    sys.stdout and sys.stderr are stream objects, albeit ones that only support writing. But they’re not constants; they’re variables. That means you can assign them a new value — any other stream object — to redirect their output.

    [download stdout.py]

    import sys
    @@ -486,12 +484,12 @@ C
     print('C')                                                                             
    1. This will print to the IDE “Interactive Window” (or the terminal, if running the script from the command line). -
    2. This is a with statement, which you’ve seen before. But unlike all previous example, this one doesn’t stop at as a_file. Instead, there’s a comma and another function call. The with statement can actually take a comma-separated list of contexts. The first is a context you’ve seen several times already: it opens a file, assigns the file object to a_file, and closes the file automatically when the context ends. The second context is a custom-built context that redirects sys.stdout to the file object that was created in the first context. +
    3. This is a with statement, which you’ve seen before. But unlike all previous example, this one doesn’t stop at as a_file. Instead, there’s a comma and another function call. The with statement can actually take a comma-separated list of contexts. The first is a context you’ve seen several times already: it opens a file, assigns the stream object to a_file, and closes the file automatically when the context ends. The second context is a custom-built context that redirects sys.stdout to the stream object that was created in the first context.
    4. Because this print() statement is executed with the contexts created by the with statement, it will not print to the screen; it will write to the file out.log.
    5. The with code block is over. Python has told each context manager to do whatever it is they do upon exiting a context. The first context closed the file; the second context changed sys.stdout back to its original value. That means that this call to the print() function will once again print to the screen.
    -

    Now take a look at the RedirectStdoutTo class. It is a custom context manager. Upon entering the context, it redirects sys.stdout to a given file-like object. Upon exiting the context, it restores sys.stdout to its original value. +

    Now take a look at the RedirectStdoutTo class. It is a custom context manager. Upon entering the context, it redirects sys.stdout to a given stream object. Upon exiting the context, it restores sys.stdout to its original value.

    class RedirectStdoutTo:
         def __init__(self, out_new):    
    @@ -504,7 +502,7 @@ C
         def __exit__(self, *args):      
             sys.stdout = self.out_old
      -
    1. The __init__() method is called immediately after an instance is created. It takes one parameter, the file-like object that you want to use as standard output for the life of the context. This method just saves the file-like object in an instance variable so other methods can use it later. +
    2. The __init__() method is called immediately after an instance is created. It takes one parameter, the stream object that you want to use as standard output for the life of the context. This method just saves the stream object in an instance variable so other methods can use it later.
    3. The __enter__() method is a special class method; Python calls it when entering a context (i.e. at the beginning of the with statement). This method saves the current value of sys.stdout in self.out_old, then redirects standard output by assigning self.out_new to sys.stdout.
    4. The __exit__() method is another special class method; Python calls it when exiting the context (i.e. at the end of the with statement). This method restores standard output to its original value by assigning the saved self.out_old value to sys.stdout.
    @@ -518,7 +516,7 @@ C