From c5aad4947307386be27359256a90ee67e561fdb7 Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Thu, 16 Jul 2009 12:44:50 -0400 Subject: [PATCH] s/The answer is that//g --- case-study-porting-chardet-to-python-3.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index b13ef5c..fb65de6 100755 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -737,7 +737,7 @@ TypeError: Can't convert 'bytes' object to str implicitly self._mInputState = ePureAscii self._mLastChar = ''

And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can’t concatenate a string to a byte array — not even a zero-length string. -

So what is self._mLastChar anyway? The answer is in the feed() method, just a few lines down from where the trackback occurred. +

So what is self._mLastChar anyway? In the feed() method, just a few lines down from where the trackback occurred.

if self._mInputState == ePureAscii:
     if self._highBitDetector.search(aBuf):
         self._mInputState = eHighbyte
@@ -853,7 +853,7 @@ def next_state(self, c):
 def feed(self, aBuf):
     for c in aBuf:
         codingState = self._mCodingSM.next_state(c)
-

And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there’s no need to call the ord() function because c is already an int! +

Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there’s no need to call the ord() function because c is already an int!

Thus:

  def next_state(self, c):
       # for each byte we get its class