From d74c0ce05b1e3246719ba0ae1767eedb32ec3f1b Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Sun, 19 Jul 2009 14:51:57 -0400 Subject: [PATCH] finished #gzip section --- files.html | 112 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 64 insertions(+), 48 deletions(-) diff --git a/files.html b/files.html index 318fd04..aadfad4 100644 --- a/files.html +++ b/files.html @@ -294,67 +294,60 @@ ValueError: I/O operation on closed file.

Character Encoding Again

-

FIXME +

Did you notice the encoding parameter that got passed in to the open() function while you were opening a file for writing? It’s important; don’t ever leave it out! As you saw in the beginning of this chapter, files don’t contain strings, they contain bytes. Reading a “string” from a text file only works because you told Python what encoding to use to read a stream of bytes and convert it to a string. Writing text to a file presents the same problem in reverse. You can’t write characters to a file; characters are an abstraction. In order to write to the file, Python needs to know how to convert your string into a sequence of bytes. The only way to be sure it’s performing the correct conversion is to specify the encoding parameter when you open the file for writing.

Write A Little, Write A Lot

FIXME write(), writelines(), .writeable -

Handling I/O Errors

- -

FIXME - - -

Binary Files

FIXME -

->>> image = open('examples/beauregard-100x100.jpg', 'rb')
->>> image
-<io.BufferedReader object at 0x00C7A390>
->>> image.mode
-'rb'
->>> image.name
-'examples/beauregard-100x100.jpg'
-
+
+>>> an_image = open('examples/beauregard-100x100.jpg', mode='rb')        
+>>> an_image.mode                                                        
+'rb'
+>>> an_image.name                                                        
+'examples/beauregard.jpg'
+>>> an_image.encoding                                                    
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+AttributeError: '_io.BufferedReader' object has no attribute 'encoding'
+
    +
  1. FIXME +
  2. +
  3. +
  4. +
-
->>> image
-<io.BufferedReader object at 0x00C7A390>
->>> image.tell()
-0
->>> data = image.read(3)
->>> data
-b'\xff\xd8\xff'
->>> image.tell()
-3
->>> image.seek(0)
-0
->>> data = image.read()
->>> len(data)
-3150
-
+
+# continued from the previous example
+>>> an_image.tell()       
+0
+>>> data = image.read(3)  
+>>> data
+b'\xff\xd8\xff'
+>>> type(data)            
+<class 'bytes'>
+>>> an_image.tell()
+3
+>>> an_image.seek(0)
+0
+>>> data = an_image.read()
+>>> len(data)
+3150
+
    +
  1. FIXME +
  2. +
  3. +

File-like Objects

One of Python’s greatest strengths is its dynamic binding, and one powerful use of dynamic binding is the file-like object. -

Your functions which require an input source could simply take a filename, go open the file for reading, read it, and close it when they’re done. But they shouldn’t. Instead, they should take a file-like object. +

Your functions which require an input source could simply take a filename as a string, go open the file for reading, read it, and close it when they’re done. But they shouldn’t. Instead, they should take a file-like object.

In the simplest case, a file-like object is any object with a read() method with an optional size parameter, which returns a string. When called with no size parameter, it reads everything there is to read from the input source and returns all the data as a single string. When called with a size parameter, it reads that much from the input source and returns that much data. When called again, it picks up where it left off and returns the next chunk of data. @@ -379,14 +372,37 @@ b'\xff\xd8\xff' >>> a_file.read() 'new black.'

    -
  1. FIXME -
  2. FIXME Now you have a file-like object, and you can do all sorts of file-like things with it. +
  3. The io module contains the definition of the StringIO class that you can use to treat a string in memory as a file. +
  4. To create a file-like object out of a string, create an instance of the io.StringIO() class and pass it the string you want to use as your “file” data. Now you have a file-like object, and you can do all sorts of file-like things with it.
  5. Calling the read() method “reads” the entire “file,” which in the case of a StringIO object simply returns the original string.
  6. Just like a real file, calling the read() method again returns an empty string.
  7. You can explicitly seek to the beginning of the string, just like seeking through a real file, by using the seek() method of the StringIO object.
  8. You can also read the string in chunks, by passing a size parameter to the read() method.
+

Handling Compressed Files

+ +

The Python standard library contains modules that support reading and writing compressed files. There are a number of different compression schemes; the most popular for single files are gzip and bzip2. (You may have also encountered PKZIP archives and GNU Tar archives. Python has modules for those, too.) + +

The gzip module lets you create a file-like object for reading or writing a gzip-compressed file. The file-like object it gives you supports the read() method (if you opened it for reading) or the write() method (if you opened it for writing). That means you can use the methods you’ve already learned for regular files to directly read or write a gzip-compressed file, without creating a temporary file to store the decompressed data. + +

As an added bonus, it supports the with statement too, so you can let Python automatically close your gzip-compressed file when you’re done with it. + +

+you@localhost:~$ python3
+
+>>> import gzip
+>>> with gzip.open('out.log.gz', mode='wb') as z_file:
+...   z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))
+... 
+>>> exit()
+
+you@localhost:~$ ls -l out.log.gz
+-rw-r--r--  1 mark mark    79 2009-07-19 14:29 out.log.gz
+you@localhost:~$ gunzip out.log.gz
+you@localhost:~$ cat out.log
+A nine mile walk is no joke, especially in the rain.
+

Standard Input, Output, and Error

Command-line gurus are already familiar with the concept of standard input, standard output, and standard error. This section is for the rest of you.