mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
asterisms for everyone!
This commit is contained in:
@@ -47,6 +47,8 @@ My alphabet starts where your alphabet ends! <span>❞</span><br>— Dr
|
||||
|
||||
<p>Now cry a lot, because everything you thought you knew about strings is wrong, and there ain’t no such thing as “plain text.”
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=one-ring-to-rule-them-all>Unicode</h2>
|
||||
|
||||
<p><i>Enter Unicode.</i>
|
||||
@@ -75,6 +77,8 @@ My alphabet starts where your alphabet ends! <span>❞</span><br>— Dr
|
||||
|
||||
<p>Advantages: super-efficient encoding of common <abbr>ASCII</abbr> characters. No worse than UTF-16 for extended Latin characters. Better than UTF-32 for Chinese characters. Also (and you’ll have to trust me on this, because I’m not going to show you the math), due to the exact nature of the bit twiddling, there are no byte-ordering issues. A document encoded in UTF-8 uses the exact same stream of bytes on any computer.
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=divingin>Diving In</h2>
|
||||
|
||||
<p>In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8, or a Python string encoded as CP-1252. "Is this string UTF-8?" is an invalid question. UTF-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
|
||||
@@ -94,6 +98,8 @@ My alphabet starts where your alphabet ends! <span>❞</span><br>— Dr
|
||||
<li>Just like lists, you can concatenate strings using the <code>+</code> operator.
|
||||
</ol>
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=formatting-strings>Formatting Strings</h2>
|
||||
|
||||
<aside>Strings can be defined with either single or double quotes.</aside>
|
||||
@@ -213,6 +219,8 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
|
||||
|
||||
<p>For all the gory details on format specifiers, consult the <a href=http://docs.python.org/3.0/library/string.html#format-specification-mini-language>Format Specification Mini-Language</a> in the official Python documentation.
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=common-string-methods>Other Common String Methods</h2>
|
||||
|
||||
<p>Besides formatting, strings can do a number of other useful tricks.
|
||||
@@ -261,6 +269,8 @@ experience of years.</samp>
|
||||
<li>Finally, Python can turn that list-of-lists into a dictionary simply by passing it to the <code>dict()</code> function.
|
||||
</ol>
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=byte-arrays>Strings vs. Bytes</h2>
|
||||
|
||||
<p>Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a <i>string</i>. An immutable sequence of numbers-between-0-and-255 is called a <i>bytes</i> object.
|
||||
@@ -365,6 +375,8 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
|
||||
<li>This is a string. It has nine characters. It is the sequence of characters you get when you take <var>by</var> and decode it using the Big5 encoding algorithm. It is identical to the original string.
|
||||
</ol>
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=py-encoding>Postscript: Character Encoding Of Python Source Code</h2>
|
||||
|
||||
<p>Python 3 assumes that your source code — <i>i.e.</i> each <code>.py</code> file — is encoded in UTF-8.
|
||||
@@ -384,6 +396,8 @@ TypeError: Can't convert 'bytes' object to str implicitly</samp>
|
||||
|
||||
<p>For more information, consult <a href=http://www.python.org/dev/peps/pep-0263/><abbr>PEP</abbr> 263: Defining Python Source Code Encodings</a>.
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=furtherreading>Further Reading</h2>
|
||||
|
||||
<p>On Unicode in Python:
|
||||
|
||||
Reference in New Issue
Block a user