mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
clarifications [thanks G.P.]
This commit is contained in:
@@ -107,7 +107,7 @@ class OrderedDict(dict, collections.MutableMapping):
|
||||
<pre class=screen>
|
||||
<samp class=p>>>> </samp><kbd>import ordereddict</kbd>
|
||||
<samp class=p>>>> </samp><kbd>od = ordereddict.OrderedDict()</kbd>
|
||||
<samp class=p>>>> </samp><kbd>klass = od.__class__</kbd>
|
||||
<a><samp class=p>>>> </samp><kbd>klass = od.__class__</kbd> <span>①</span></a>
|
||||
<samp class=p>>>> </samp><kbd>type(klass)</kbd>
|
||||
<samp><class 'abc.ABCMeta'></samp>
|
||||
<samp class=p>>>> </samp><kbd>klass.__name__</kbd>
|
||||
|
||||
@@ -600,8 +600,8 @@ if not hasattr(__builtin__, 'False'):
|
||||
else:
|
||||
False = __builtin__.False
|
||||
True = __builtin__.True</code></pre>
|
||||
<p>This piece of code is designed to allow this library to run under older versions of Python 2. Prior to Python 2.3 [FIXME-LINK], Python had no built-in <code>Boolean</code> type. This code detects the absence of the built-in constants <code>True</code> and <code>False</code>, and defines them if necessary.
|
||||
<p>However, Python 3 will always have a <code>Boolean</code> type, so this entire code snippet is unnecessary. The simplest solution is to replace all instances of <code>constants.True</code> and <code>constants.False</code> with <code>True</code> and <code>False</code>, respectively, then delete this dead code from <code>constants.py</code>.
|
||||
<p>This piece of code is designed to allow this library to run under older versions of Python 2. Prior to Python 2.3 [FIXME-LINK], Python had no built-in <code>bool</code> type. This code detects the absence of the built-in constants <code>True</code> and <code>False</code>, and defines them if necessary.
|
||||
<p>However, Python 3 will always have a <code>bool</code> type, so this entire code snippet is unnecessary. The simplest solution is to replace all instances of <code>constants.True</code> and <code>constants.False</code> with <code>True</code> and <code>False</code>, respectively, then delete this dead code from <code>constants.py</code>.
|
||||
<p>So this line in <code>universaldetector.py</code>:
|
||||
<pre><code>self.done = constants.False</code></pre>
|
||||
<p>Becomes
|
||||
@@ -635,8 +635,8 @@ import sys</code></pre>
|
||||
File "test.py", line 9, in <module>
|
||||
for line in file(f, 'rb'):
|
||||
NameError: name 'file' is not defined</samp></pre>
|
||||
<p>This one surprised me, because I’ve been using this idiom as long as I can remember. In Python 2, the global <var>file()</var> function was an alias for <var>open()</var>, which was the standard way of opening files for reading. In Python 3, the entire system for reading and writing files has been refactored into the <code>io</code> module. [FIXME-LINK PEP 3116] I’ll cover the new I/O module in more detail in Chapter FIXME, but for now, the important bit is that the global <var>file()</var> function no longer exists. However, the <var>open()</var> function does still exist. (Technically, it’s an alias for <var>io.open()</var>, but never mind that right now.)
|
||||
<p>Thus, the simplest solution to the problem of the missing <var>file()</var> is to call <var>open()</var> instead:
|
||||
<p>This one surprised me, because I’ve been using this idiom as long as I can remember. In Python 2, the global <code>file()</code> function was an alias for the <code>open()</code> function, which was the standard way of opening files for reading. In Python 3, the entire system for reading and writing files has been refactored into the <code>io</code> module. [FIXME-LINK PEP 3116] I’ll cover the new I/O module in more detail in Chapter FIXME, but for now, the important bit is that the global <code>file()</code> function no longer exists. However, the <code>open()</code> function does still exist. (Technically, it’s an alias for <var>io.open()</var>, but never mind that right now.)
|
||||
<p>Thus, the simplest solution to the problem of the missing <code>file()</code> is to call the <code>open()</code> function instead:
|
||||
<pre><code>for line in open(f, 'rb'):</code></pre>
|
||||
<p>And that’s all I have to say about that.
|
||||
<h3 id=cantuseastringpattern>Can’t use a string pattern on a bytes-like object</h3>
|
||||
@@ -670,7 +670,7 @@ TypeError: can't use a string pattern on a bytes-like object</samp></pre>
|
||||
for line in open(f, 'rb'):
|
||||
u.feed(line)</code></pre>
|
||||
<aside>Not an array of characters, but an array of bytes.</aside>
|
||||
<p>And here we find our answer: in the <code>UniversalDetector.feed()</code> method, <var>aBuf</var> is a line read from a file on disk. Look carefully at the parameters used to open the file: <code>'rb'</code>. <code>'r'</code> is for “read”; OK, big deal, we’re reading the file. Ah, but <code>'b'</code> is for “binary.” Without the <code>'b'</code> flag, this <code>for</code> loop would read the file, line by line, and convert each line into a string — an array of Unicode characters — according to the system default character encoding. (You could override the system encoding with another parameter to <var>open()</var>, but never mind that for now.) But with the <code>'b'</code> flag, this <code>for</code> loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to <code>UniversalDetector.feed()</code>, and eventually gets passed to the pre-compiled regular expression, <var>self._highBitDetector</var>, to search for high-bit… characters. But we don’t have characters; we have bytes. Oops.
|
||||
<p>And here we find our answer: in the <code>UniversalDetector.feed()</code> method, <var>aBuf</var> is a line read from a file on disk. Look carefully at the parameters used to open the file: <code>'rb'</code>. <code>'r'</code> is for “read”; OK, big deal, we’re reading the file. Ah, but <code>'b'</code> is for “binary.” Without the <code>'b'</code> flag, this <code>for</code> loop would read the file, line by line, and convert each line into a string — an array of Unicode characters — according to the system default character encoding. (You could override the system encoding with another parameter to the <code>open()</code> function, but never mind that for now.) But with the <code>'b'</code> flag, this <code>for</code> loop reads the file, line by line, and stores each line exactly as it appears in the file, as an array of bytes. That byte array gets passed to <code>UniversalDetector.feed()</code>, and eventually gets passed to the pre-compiled regular expression, <var>self._highBitDetector</var>, to search for high-bit… characters. But we don’t have characters; we have bytes. Oops.
|
||||
<p>What we need this regular expression to search is not an array of characters, but an array of bytes.
|
||||
<p>Once you realize that, the solution is not difficult. Regular expressions defined with strings can search strings. Regular expressions defined with byte arrays can search byte arrays. To define a byte array pattern, we simply change the type of the argument we use to define the regular expression to a byte array. (There is one other case of this same problem, on the very next line.)
|
||||
<pre><code> class UniversalDetector:
|
||||
|
||||
Reference in New Issue
Block a user