various clarifications about generator expressions

This commit is contained in:
Mark Pilgrim
2009-07-15 10:36:02 -04:00
parent 8843481267
commit 29240baeee
3 changed files with 22 additions and 9 deletions
+1 -1
View File
@@ -69,7 +69,7 @@ My alphabet starts where your alphabet ends! <span class=u>&#x275E;</span><br>&m
<p>Other people pondered these questions, and they came up with a solution:
<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>UTF-8
<p class=xxxl>UTF-8
<p>UTF-8 is a <em>variable-length</em> encoding system for Unicode. That is, different characters take up a different number of bytes. For <abbr>ASCII</abbr> characters (A-Z, <i class=baa>&amp;</i>c.) UTF-8 uses just one byte per character. In fact, it uses the exact same bytes; the first 128 characters (0&ndash;127) in UTF-8 are indistinguishable from <abbr>ASCII</abbr>. &#8220;Extended Latin&#8221; characters like &ntilde; and &ouml; end up taking two bytes. (The bytes are not simply the Unicode code point like they would be in UTF-16; there is some serious bit-twiddling involved.) Chinese characters like &#x4E2D; end up taking three bytes. The rarely-used &#8220;astral plane&#8221; characters take four bytes.