search box and more skip links

This commit is contained in:
Mark Pilgrim
2009-02-04 01:22:01 -05:00
parent ae6b88465b
commit 8441b2f82c
6 changed files with 25 additions and 11 deletions
+4 -2
View File
@@ -10,12 +10,14 @@ body{counter-reset:h1 19}
</style>
</head>
<body>
<p class="skip"><a href="#divingin">skip to main content</a>
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8">&nbsp;<input name="q" size="31">&nbsp;<input type="submit" name="sa" value="Search"></div><p>You are here: <a href="index.html">Dive Into Python 3</a> <span>&#8227;</span> <b>Chapter 20</b></form>
<h1>Case study: porting <code class="filename">chardet</code> to Python 3</h1>
<blockquote class="q">
<p><span>&#x275D;</span> Words, words. They&#8217;re all we have to go on. <span>&#x275E;</span><br>&mdash; <cite>Rosencrantz and Guildenstern are Dead</cite>
</blockquote>
<ol>
<li><a href="#faq">Introducing <code class="filename">chardet</code>: a mini-FAQ</a>
<li><a href="#divingin">Introducing <code class="filename">chardet</code></a>
<ol>
<li><a href="#faq.what">What is character encoding auto-detection?</a>
<li><a href="#faq.impossible">Isn&#8217;t that impossible?</a>
@@ -41,7 +43,7 @@ body{counter-reset:h1 19}
<li><a href="#cantconvertbytesobject">Can&#8217;t convert '<code>bytes</code>' object to <code>str</code> implicitly</a>
</ol>
</ol>
<h2 id="faq">Introducing <code class="filename">chardet</code>: a mini-FAQ</h2>
<h2 id="divingin">Introducing <code class="filename">chardet</code>: a mini-FAQ</h2>
<p class="fancy">When you think of &#8220;text,&#8221; you probably think of &#8220;characters and symbols I see on my computer screen.&#8221; But computers don&#8217;t deal in characters and symbols; they deal in bits and bytes. Every piece of text you&#8217;ve ever seen on a computer screen is actually stored in a particular <em>character encoding</em>. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
<p>In reality, it&#8217;s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key for the text. Whenever someone gives you a sequence of bytes and claims it&#8217;s &#8220;text&#8221;, you need to know what character encoding they used so you can decode the bytes into characters and display them (or process them, or whatever).
<h3 id="faq.what">What is character encoding auto-detection?</h3>