You are here: Home ‣ Dive Into Python 3 ‣
Difficulty level: ♦♦♦♢♢
❝ FIXME ❞
— FIXME
FIXME
Before you can read from a file, you need to open it. Opening a file in Python couldn’t be easier:
a_file = open('examples/chinese.txt', encoding='utf-8')
Python has a built-in open() function, which takes a filename as an argument. Here the filename is 'examples/chinese.txt'. There are four interesting things about this filename:
open() function only takes one. In Python, whenever you need a “filename,” you can include some or all of a directory path as well.
But that call to the open() function didn’t stop at the filename. There’s another argument, called encoding. Oh dear, that sounds dreadfully familiar.
Bytes are bytes; characters are an abstraction. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a “text file” from disk, how does Python convert that sequence of bytes into a sequence of characters? It decodes the bytes according to a specific character encoding algorithm and returns a sequence of Unicode characters (otherwise known as a string).
# This example was created on Windows. Other platforms may
# behave differently, for reasons outlined below.
>>> file = open('examples/chinese.txt')
>>> a_string = file.read()
Traceback (most recent call last):
File "", line 1, in
File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: character maps to
>>>
What just happened? You didn’t specify a character encoding, so Python is forced to use the default encoding. What’s the default encoding? If you look closely at the traceback, you can see that it’s dying in cp1252.py, meaning that Python is using CP-1252 as the default encoding here. (CP-1252 is a common encoding on computers running Microsoft Windows.) The CP-1252 character set doesn’t support the characters that are in this file, so the read fails with an ugly UnicodeDecodeError.
But wait, it’s worse than that! The default encoding is platform-dependent, so this code might work on your computer (if your default encoding is UTF-8), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252).
☞If you need to get the default character encoding, import the
localemodule and calllocale.getpreferredencoding(). On my Windows laptop, it returns'cp1252', but on my Linux box upstairs, it returns'UTF8'. I can’t even maintain consistency in my own house! Your results may be different (even on Windows) depending on which version of your operating system you have installed and how your regional/language settings are configured. This is why it’s so important to specify the encoding every time you open a file.
So far, all we know is that Python has a built-in function called open(). The open() function returns a file object, which has methods and attributes for getting information about and manipulating the file.
>>> a_file = open('examples/chinese.txt', encoding='utf-8')
>>> a_file.name ②
'examples/chinese.txt'
>>> a_file.mode ③
'r'
>>> a_file.encoding ④
'utf-8'
FIXME
Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It’s important to close files as soon as you’re finished with them.
FIXME checking if a file is closed
with StatementFIXME "with open(...) as file" pattern
FIXME
FIXME what's a "line"? (line endings discussion, universal line endings, etc.)
FIXME
FIXME
FIXME write(), writelines(), .writeable
FIXME
FIXME
>>> image = open('examples/beauregard-100x100.jpg', 'rb')
>>> image
<io.BufferedReader object at 0x00C7A390>
>>> image.mode
'rb'
>>> image.name
'examples/beauregard-100x100.jpg'
>>> image <io.BufferedReader object at 0x00C7A390> >>> image.tell() 0 >>> data = image.read(3) >>> data b'\xff\xd8\xff' >>> image.tell() 3 >>> image.seek(0) 0 >>> data = image.read() >>> len(data) 3150
FIXME
FIXME
FIXME
© 2001–9 Mark Pilgrim