mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
more validation fiddling
This commit is contained in:
@@ -8,12 +8,14 @@
|
||||
<link rel="shortcut icon" href="data:image/ico,">
|
||||
<link rel="alternate" type="application/atom+xml" href="http://hg.diveintopython3.org/atom-log">
|
||||
<style type="text/css">
|
||||
body{counter-reset:h1 19}
|
||||
body{counter-reset:h1 20}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<p class="skip"><a href="#divingin">skip to main content</a>
|
||||
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8"> <input name="q" size="31"> <input type="submit" name="sa" value="Search"></div><p>You are here: <a href="/">Dive Into Python 3</a> <span>‣</span></p> <h1>Case study: porting <code>chardet</code> to Python 3</h1></form>
|
||||
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8"> <input name="q" size="31"> <input type="submit" name="sa" value="Search"></div></form>
|
||||
<p class="nav">You are here: <a href="/">Dive Into Python 3</a> <span>‣</span>
|
||||
<h1>Case study: porting <code>chardet</code> to Python 3</h1>
|
||||
<blockquote class="q">
|
||||
<p><span>❝</span> Words, words. They’re all we have to go on. <span>❞</span><br>— <cite>Rosencrantz and Guildenstern are Dead</cite>
|
||||
</blockquote>
|
||||
@@ -26,7 +28,7 @@ body{counter-reset:h1 19}
|
||||
<li><a href="#faq.yippie">Yippie! Screw the standards, I’ll just auto-detect everything!</a>
|
||||
<li><a href="#faq.why">Why bother with auto-detection if it’s slow, inaccurate, and non-standard?</a>
|
||||
</ol>
|
||||
<li><a href="#divingin">Diving in</a>
|
||||
<li><a href="#divingin2">Diving in</a>
|
||||
<ol>
|
||||
<li><a href="#how.bom"><code>UTF-n</code> with a <abbr title="Byte Order Mark">BOM</abbr></a>
|
||||
<li><a href="#how.esc">Escaped encodings</a>
|
||||
@@ -67,7 +69,7 @@ body{counter-reset:h1 19}
|
||||
<h3 id="faq.why">Why bother with auto-detection if it’s slow, inaccurate, and non-standard?</h3>
|
||||
<p>Sometimes you receive text with verifiably inaccurate encoding information. Or text without any encoding information, and the specified default encoding doesn’t work. There are also some poorly designed standards that have no way to specify encoding at all.
|
||||
<p>If following the relevant standards gets you nowhere, <em>and</em> you decide that processing the text is more important than maintaining interoperability, then you can try to auto-detect the character encoding as a last resort. An example is my <a href="http://feedparser.org/">Universal Feed Parser</a>, which calls this auto-detection library <a href="http://feedparser.org/docs/character-encoding.html">only after exhausting all other options</a>.
|
||||
<h2 id="divingin">Diving in</h2>
|
||||
<h2 id="divingin2">Diving in</h2>
|
||||
<p>This is a brief guide to navigating the code itself.
|
||||
<p>The main entry point for the detection algorithm is <code class="filename">universaldetector.py</code>, which has one class, <code>UniversalDetector</code>. (You might think the main entry point is the <code>detect</code> function in <code class="filename">chardet/__init__.py</code>, but that’s really just a convenience function that creates a <code>UniversalDetector</code> object, calls it, and returns its result.)
|
||||
<p>There are 5 categories of encodings that <code>UniversalDetector</code> handles:
|
||||
|
||||
@@ -7,10 +7,10 @@ a:link{color:#1b67c9}
|
||||
a:visited{color:darkorchid}
|
||||
h1 a,h2 a,h3 a,#nav a{color:inherit !important}
|
||||
abbr,acronym{letter-spacing:0.1em;text-transform:lowercase;font-variant:small-caps}
|
||||
h1,h2,h3,p,ul,ol,#search{margin:1.75em 0;font-size:medium}
|
||||
h1,h2,h3,p,ul,ol{margin:1.75em 0;font-size:medium}
|
||||
h1,.nav{display:inline}
|
||||
h2,h3{clear:both}
|
||||
form div{float:right}
|
||||
form p,form h1{display:inline}
|
||||
pre{white-space:pre-wrap;padding-left:2.154em;line-height:1.75;border-left:1px dotted}
|
||||
pre,kbd,code,samp{font-family:Consolas,Inconsolata,Monaco,monospace;font-size:medium;word-spacing:0}
|
||||
pre a{padding:0.4375em 0;border:0}
|
||||
@@ -20,7 +20,7 @@ pre a:hover{border:0}
|
||||
kbd{font-weight:bold}
|
||||
.prompt{color:#667}/*the neighbor of the beast*/
|
||||
td pre{margin:0;padding:0;border:0}
|
||||
li ol{margin:0 inherit}
|
||||
li ol{margin:0}
|
||||
.c{text-align:center;font-size:small}
|
||||
p.fancy:first-letter{float:left;background:transparent;color:gainsboro;padding:0.11em 4px 0 0;font:normal 4em/0.68 serif}
|
||||
blockquote.q{margin:auto;text-align:right;font-style:oblique}
|
||||
@@ -38,9 +38,9 @@ span,tr + tr th:first-child{font-family:'Arial Unicode MS',sans-serif;font-style
|
||||
table.simple th{font-family:inherit !important}
|
||||
.fr{width:100%;border:1px dotted}
|
||||
.fr h4{margin-top:-1.2em;margin-left:-1em;width:8.5em;border:1px dotted;padding: 3px 3px 3px 13px;background:#fff;color:inherit;position:relative}
|
||||
.hover{background:#eee !important;color:inherit !important;cursor:default !important}
|
||||
.hover{background:#eee;color:inherit;cursor:default}
|
||||
body{counter-reset:h1}
|
||||
h1:before{counter-increment:h1;content:"Chapter " counter(h1) ". "}
|
||||
h1:before{content:"Chapter " counter(h1) ". "}
|
||||
h1{counter-reset:h2}
|
||||
h2:before{counter-increment:h2;content:counter(h1) "." counter(h2) ". "}
|
||||
h2{counter-reset:h3}
|
||||
|
||||
+1
-1
@@ -8,7 +8,7 @@
|
||||
<link rel="alternate" type="application/atom+xml" href="http://hg.diveintopython3.org/atom-log">
|
||||
<style type="text/css">
|
||||
body{counter-reset:h1 -1}
|
||||
h1{background:papayawhip}
|
||||
h1{background:papayawhip;display:block}
|
||||
h2{margin-left:1.75em}
|
||||
h3{margin-left:3.5em}
|
||||
.appendix h1:before{content:""}
|
||||
|
||||
@@ -15,7 +15,9 @@ h3:before{counter-increment:h3;content:"A." counter(h2) "." counter(h3) ". "}
|
||||
</head>
|
||||
<body>
|
||||
<p class="skip"><a href="#divingin">skip to main content</a>
|
||||
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8"> <input name="q" size="31"> <input type="submit" name="sa" value="Search"></div><p>You are here: <a href="/">Dive Into Python 3</a> <span>‣</span></p> <h1>Porting code to Python 3 with <code>2to3</code></h1></form>
|
||||
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8"> <input name="q" size="31"> <input type="submit" name="sa" value="Search"></div></form>
|
||||
<p class="nav">You are here: <a href="/">Dive Into Python 3</a> <span>‣</span>
|
||||
<h1>Porting code to Python 3 with <code>2to3</code></h1>
|
||||
<blockquote class="q">
|
||||
<p><span>❝</span> Life is pleasant. Death is peaceful. It’s the transition that’s troublesome. <span>❞</span><br>— Isaac Asimov (attributed)
|
||||
</blockquote>
|
||||
@@ -46,7 +48,6 @@ h3:before{counter-increment:h3;content:"A." counter(h2) "." counter(h3) ". "}
|
||||
<li><a href="#exec"><code>exec</code> statement</a>
|
||||
<li><a href="#execfile"><code>execfile</code> statement</a> (3.1+)
|
||||
<li><a href="#repr"><code>repr</code> literals (backticks)</a>
|
||||
<li><a href="#exceptions">Exceptions</a>
|
||||
<li><a href="#except"><code>try...except</code> statement</a>
|
||||
<li><a href="#raise"><code>raise</code> statement</a>
|
||||
<li><a href="#throw"><code>throw</code> method on generators</a>
|
||||
|
||||
@@ -8,12 +8,14 @@
|
||||
<link rel="shortcut icon" href="data:image/ico,">
|
||||
<link rel="alternate" type="application/atom+xml" href="http://hg.diveintopython3.org/atom-log">
|
||||
<style type="text/css">
|
||||
body{counter-reset:h1 0}
|
||||
body{counter-reset:h1 1}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<p class="skip"><a href="#divingin">skip to main content</a>
|
||||
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8"> <input name="q" size="31"> <input type="submit" name="sa" value="Search"></div><p>You are here: <a href="/">Dive Into Python 3</a> <span>‣</span></p> <h1>Your first Python program</h1></form>
|
||||
<form action="http://www.google.com/cse" id="search"><div><input type="hidden" name="cx" value="014021643941856155761:l5eihuescdw"><input type="hidden" name="ie" value="UTF-8"> <input name="q" size="31"> <input type="submit" name="sa" value="Search"></div></form>
|
||||
<p class="nav">You are here: <a href="/">Dive Into Python 3</a> <span>‣</span>
|
||||
<h1>Your first Python program</h1>
|
||||
<blockquote class="q">
|
||||
<p><span>❝</span> Don’t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. <span>❞</span><br>— <cite>Ven. Henepola Gunararatana</cite>
|
||||
</blockquote>
|
||||
|
||||
Reference in New Issue
Block a user