diff --git a/files.html b/files.html index 2390083..79da47c 100644 --- a/files.html +++ b/files.html @@ -24,12 +24,20 @@ body{counter-reset:h1 12}

Reading From Text Files

-

FIXME +

Before you can read from a file, you need to open it. Opening a file in Python couldn’t be easier: -

-open(..., encoding='...')
-open(..., 'r', encoding='...')
-
+
a_file = open('examples/chinese.txt', encoding='utf-8')
+ +

Python has a built-in open() function, which takes a filename as an argument. Here the filename is 'examples/chinese.txt'. There are four interesting things about this filename: + +

    +
  1. It’s not just the name of a file; it’s a combination of a directory path and a filename. A hypothetical file-opening function could have taken two arguments — a directory path and a filename — but the open() function only takes one. In Python, whenever you need a “filename,” you can include some or all of a directory path as well. +
  2. The directory path uses a forward slash, but I didn’t say what operating system I was using. Windows uses backward slashes to denote subdirectories, while Mac OS X and Linux use forward slashes. But in Python, forward slashes always Just Work, even on Windows. +
  3. The directory path does not begin with a slash or a drive letter, so it is called a relative path. Relative to what, you might ask? Patience, grasshopper. +
  4. It’s a string. All modern operating systems (even Windows!) use Unicode to store the names of files and directories. Python 3 fully supports non-ASCII pathnames. +
+ +

But that call to the open() function didn’t stop at the filename. There’s another argument, called encoding. Oh dear, that sounds dreadfully familiar.

Character Encoding Rears Its Ugly Head

@@ -63,6 +71,15 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara

Python has a built-in function, open(), for opening a file on disk. The open() function returns a file object, which has methods and attributes for getting information about and manipulating the file. +

+>>> a_file = open('examples/chinese.txt', encoding='utf-8')
+>>> a_file.name
+'examples/chinese.txt'
+>>> a_file.mode
+'r'
+>>> a_file.encoding
+'utf-8'
+