diff --git a/case-study-porting-chardet-to-python-3.html b/case-study-porting-chardet-to-python-3.html index b13ef5c..fb65de6 100755 --- a/case-study-porting-chardet-to-python-3.html +++ b/case-study-porting-chardet-to-python-3.html @@ -737,7 +737,7 @@ TypeError: Can't convert 'bytes' object to str implicitly self._mInputState = ePureAscii self._mLastChar = ''
And now we have our answer. Do you see it? self._mLastChar is a string, but aBuf is a byte array. And you can’t concatenate a string to a byte array — not even a zero-length string. -
So what is self._mLastChar anyway? The answer is in the feed() method, just a few lines down from where the trackback occurred.
+
So what is self._mLastChar anyway? In the feed() method, just a few lines down from where the trackback occurred.
if self._mInputState == ePureAscii:
if self._highBitDetector.search(aBuf):
self._mInputState = eHighbyte
@@ -853,7 +853,7 @@ def next_state(self, c):
def feed(self, aBuf):
for c in aBuf:
codingState = self._mCodingSM.next_state(c)
-And now we have the answer. Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there’s no need to call the ord() function because c is already an int!
+
Do you see it? In Python 2, aBuf was a string, so c was a 1-character string. (That’s what you get when you iterate over a string — all the characters, one by one.) But now, aBuf is a byte array, so c is an int, not a 1-character string. In other words, there’s no need to call the ord() function because c is already an int!
Thus:
def next_state(self, c):
# for each byte we get its class