From 1de06137af63d44f2a3f76a7463269c1c91e9aba Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Thu, 16 Jul 2009 11:19:05 -0400 Subject: [PATCH] basic outline of files chapter --- files.html | 349 ++++++++++++++++++++++++----------------------------- 1 file changed, 155 insertions(+), 194 deletions(-) diff --git a/files.html b/files.html index 0c8f790..a3bc05d 100644 --- a/files.html +++ b/files.html @@ -19,58 +19,51 @@ body{counter-reset:h1 12}

FIXME
— FIXME

  -

Diving in

+

Diving In

FIXME -

File Objects

+

Reading From Text Files

+ +

FIXME + +

+open(..., encoding='...')
+open(..., 'r', encoding='...')
+
+ +

Character Encoding Rears Its Ugly Head

+ +

FIXME + +

File Objects

+ +

FIXME

Python has a built-in function, open(), for opening a file on disk. The open() function returns a file object, which has methods and attributes for getting information about and manipulating the file. -

->>> image = open('examples/beauregard-100x100.jpg', 'rb')
->>> image
-<io.BufferedReader object at 0x00C7A390>
->>> image.mode
-'rb'
->>> image.name
-'examples/beauregard-100x100.jpg'
->>>
-
>>> f = open("/music/_singles/kairo.mp3", "rb") 
->>> f       
-<open file '/music/_singles/kairo.mp3', mode 'rb' at 010E3988>
->>> f.mode  
-'rb'
->>> f.name  
-'/music/_singles/kairo.mp3'
+ + +

Reading Data From A Text File

+ +

FIXME + + +

6.2.2. Closing Files

-

Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It's - important to close files as soon as you're finished with them. + +

Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It’s important to close files as soon as you’re finished with them. + + + +

FIXME checking if a file is closed + +

Using The with Statement

+ +

FIXME "with open(...) as file" pattern + +

Reading Data One Line At A Time

+ +

FIXME + +

FIXME what's a "line"? (line endings discussion, universal line endings, etc.) + +

Writing to Text Files

+ +

FIXME + + -

  • eff-bot discusses efficiency and performance of various ways of reading a file. +

    Character Encoding Again

    -
  • Python Knowledge Base answers common questions about files. +

    FIXME -

  • Python Library Reference summarizes all the file object methods. +

    Write A Little, Write A Lot

    - +

    FIXME write(), writelines(), .writeable +

    Handling I/O Errors

    +

    FIXME + +

    Binary Files

    + +

    FIXME + +

    +>>> image = open('examples/beauregard-100x100.jpg', 'rb')
    +>>> image
    +<io.BufferedReader object at 0x00C7A390>
    +>>> image.mode
    +'rb'
    +>>> image.name
    +'examples/beauregard-100x100.jpg'
    +
    + +
    +>>> image
    +<io.BufferedReader object at 0x00C7A390>
    +>>> image.tell()
    +0
    +>>> data = image.read(3)
    +>>> data
    +b'\xff\xd8\xff'
    +>>> image.tell()
    +3
    +>>> image.seek(0)
    +0
    +>>> data = image.read()
    +>>> len(data)
    +3150
    +
    + +

    File-like Objects

    + +

    FIXME + + -<channel> -<title>Slashdot</title> -<link>http://slashdot.org/</link> -<description>News for nerds, stuff that matters</description> -</channel> - -<image> -<title>Slashdot</title> -<url>http://images.slashdot.org/topics/topicslashdot.gif</url> -<link>http://slashdot.org/</link> -</image> - -<item> -<title>To HDTV or Not to HDTV?</title> -<link>http://slashdot.org/article.pl?sid=01/12/28/0421241</link> -</item> - -[...snip...]

  • -
      -
    1. As you saw in a previous chapter, urlopen takes a web page URL and returns a file-like object. Most importantly, this object has a read method which returns the HTML source of the web page. -
    2. Now you pass the file-like object to minidom.parse, which obediently calls the read method of the object and parses the XML data that the read method returns. The fact that this XML data is now coming straight from a web page is completely irrelevant. minidom.parse doesn't know about web pages, and it doesn't care about web pages; it just knows about file-like objects. -
    3. As soon as you're done with it, be sure to close the file-like object that urlopen gives you. -
    4. By the way, this URL is real, and it really is XML. It's an XML representation of the current headlines on Slashdot, a technical news and gossip site. -

      Example 10.3. Parsing XML from a string (the easy but inflexible way)

      ->>> contents = "<grammar><ref id='bit'><p>0</p><p>1</p></ref></grammar>"
      ->>> xmldoc = minidom.parseString(contents) 
      ->>> print xmldoc.toxml()
      -<?xml version="1.0" ?>
      -<grammar><ref id="bit"><p>0</p><p>1</p></ref></grammar>
      -
        -
      1. minidom has a method, parseString, which takes an entire XML document as a string and parses it. You can use this instead of minidom.parse if you know you already have your entire XML document in a string. -

        OK, so you can use the minidom.parse function for parsing both local files and remote URLs, but for parsing strings, you use... a different function. That means that if you want to be able to take input from a -file, a URL, or a string, you'll need special logic to check whether it's a string, and call the parseString function instead. How unsatisfying. -

        If there were a way to turn a string into a file-like object, then you could simply pass this object to minidom.parse. And in fact, there is a module specifically designed for doing just that: StringIO. + - # try to open with native open function (if source is pathname) - try: - return open(source) - except (IOError, OSError): - pass +

        Standard Input, Output, and Error

        - # treat source as string - import StringIO - return StringIO.StringIO(str(source)) -
          -
        1. The openAnything function takes a single parameter, source, and returns a file-like object. source is a string of some sort; it can either be a URL (like 'http://slashdot.org/slashdot.rdf'), a full or partial pathname to a local file (like 'binary.xml'), or a string that contains actual XML data to be parsed. -
        2. First, you see if source is a URL. You do this through brute force: you try to open it as a URL and silently ignore errors caused by trying to open something which is not a URL. This is actually elegant in the sense that, if urllib ever supports new types of URLs in the future, you will also support them without recoding. If urllib is able to open source, then the return kicks you out of the function immediately and the following try statements never execute. -
        3. On the other hand, if urllib yelled at you and told you that source wasn't a valid URL, you assume it's a path to a file on disk and try to open it. Again, you don't do anything fancy to check whether source is a valid filename or not (the rules for valid filenames vary wildly between different platforms anyway, so you'd probably get them wrong anyway). Instead, you just blindly open the file, and silently trap any errors. -
        4. By this point, you need to assume that source is a string that has hard-coded data in it (since nothing else worked), so you use StringIO to create a file-like object out of it and return that. (In fact, since you're using the str function, source doesn't even need to be a string; it could be any object, and you'll use its string representation, as defined by its __str__ special method.) -

          Now you can use this openAnything function in conjunction with minidom.parse to make a function that takes a source that refers to an XML document somehow (either as a URL, or a local filename, or a hard-coded XML document in a string) and parses it. -

          Example 10.7. Using openAnything in kgp.py

          
          -class KantGenerator:
          -    def _load(self, source):
          -        sock = toolbox.openAnything(source)
          -        xmldoc = minidom.parse(sock).documentElement
          -        sock.close()
          -        return xmldoc

          10.2. Standard input, output, and error

          +

          FIXME + + +

          Further Reading

          +

          FIXME +