diff --git a/files.html b/files.html index b041677..e7d08c3 100644 --- a/files.html +++ b/files.html @@ -126,7 +126,7 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara
FIXME +
Let’s see that again.
# continued from the previous example @@ -137,12 +137,14 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara >>> a_file.tell() ③ 20
FIXME +
Do you see it yet? The seek() and tell() methods always count bytes, but since you opened this file as text, the read() method counts characters. Chinese characters require multiple bytes to encode in UTF-8. The English characters in the file only require one byte each, so you might be misled into thinking that they’re counting the same thing. But that’s only true for some characters.
+
+
But wait, it gets worse!
>>> a_file.seek(18) ① @@ -155,8 +157,8 @@ UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: chara (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0x98 in position 0: unexpected code byte
UnicodeDecodeError.