diff --git a/strings.html b/strings.html index ba172cc..c802a39 100644 --- a/strings.html +++ b/strings.html @@ -51,7 +51,7 @@ My alphabet starts where your alphabet ends!
&m

Unicode

-

Enter Unicode. +

Enter Unicode.

Unicode is a system designed to represent every character from every language. Unicode represents each letter, character, or ideograph as a 4-byte number. Each number represents a unique character used in at least one of the world’s languages. (Not all the numbers are used, but more than 65535 of them are, so 2 bytes wouldn’t be sufficient.) Characters that are used in multiple languages generally have the same number, unless there is a good etymological reason not to. Regardless, there is exactly 1 number per character, and exactly 1 character per number. Every number always means just one thing; there are no “modes” to keep track of. U+0041 is always 'A', even if your language doesn’t have an 'A' in it. @@ -93,9 +93,9 @@ My alphabet starts where your alphabet ends!
&m '深入 Python 3'

  1. To create a string, enclose it in quotes. Python strings can be defined with either single quotes (') or double quotes ("). -
  2. The built-in len() function returns the length of the string, i.e. the number of characters. This is the same function you use to find the length of a list. A string is like a list of characters. +
  3. The built-in len() function returns the length of the string, i.e. the number of characters. This is the same function you use to find the length of a list. A string is like a list of characters.
  4. Just like getting individual items out of a list, you can get individual characters out of a string using index notation. -
  5. Just like lists, you can concatenate strings using the + operator. +
  6. Just like lists, you can concatenate strings using the + operator.

⁂ @@ -138,7 +138,7 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):

  • There’s a… whoa, what the heck is that? -

    Python 3 supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is to insert a value into a string with single placeholder. +

    Python 3 supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is to insert a value into a string with single placeholder.

     >>> username = 'mark'
    @@ -147,7 +147,7 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):
     "mark's password is PapayaWhip"
    1. No, my password is not really PapayaWhip. -
    2. There’s a lot going on here. First, that’s a method call on a string literal. Strings are objects, and objects have methods. Second, the whole expression evaluates to a string. Third, {0} and {1} are replacement fields, which are replaced by the arguments passed to the format() method. +
    3. There’s a lot going on here. First, that’s a method call on a string literal. Strings are objects, and objects have methods. Second, the whole expression evaluates to a string. Third, {0} and {1} are replacement fields, which are replaced by the arguments passed to the format() method.

    Compound Field Names

    @@ -207,7 +207,7 @@ def approximate_size(size, a_kilobyte_is_1024_bytes=True):

    {1} is replaced with the second argument passed to the format() method, which is suffix. But what is {0:.1f}? It’s two things: {0}, which you recognize, and :.1f, which you don’t. The second half (including and after the colon) defines the format specifier, which further refines how the replaced variable should be formatted.

    -

    Format specifiers allow you to munge the replacement text in a variety of useful ways, like the printf() function in C. You can add zero- or space-padding, align strings, control decimal precision, and even convert numbers to hexadecimal. +

    Format specifiers allow you to munge the replacement text in a variety of useful ways, like the printf() function in C. You can add zero- or space-padding, align strings, control decimal precision, and even convert numbers to hexadecimal.

    Within a replacement field, a colon (:) marks the start of the format specifier. The format specifier “.1” means “round to the nearest tenth” (i.e. display only one digit after the decimal point). The format specifier “f” means “fixed-point number” (as opposed to exponential notation or some other decimal representation). Thus, given a size of 698.25 and suffix of 'GB', the formatted string would be '698.3 GB', because 698.25 gets rounded to one decimal place, then the suffix is appended after the number. @@ -242,8 +242,8 @@ experience of years. >>> s.lower().count('f') 6

      -
    1. You can input multi-line strings in the Python interactive shell. Once you start a multi-line string with triple quotation marks, just hit ENTER and the interactive shell will prompt you to continue the string. Typing the closing triple quotation marks ends the string, and the next ENTER will execute the command (in this case, assigning the string to s). -
    2. The splitlines() method takes one multi-line string and returns a list of strings, one for each line of the original. Note that the carriage returns at the end of each line are not included. +
    3. You can input multiline strings in the Python interactive shell. Once you start a multiline string with triple quotation marks, just hit ENTER and the interactive shell will prompt you to continue the string. Typing the closing triple quotation marks ends the string, and the next ENTER will execute the command (in this case, assigning the string to s). +
    4. The splitlines() method takes one multiline string and returns a list of strings, one for each line of the original. Note that the carriage returns at the end of each line are not included.
    5. The lower() method converts the entire string to lowercase. (Similarly, the upper() method converts a string to uppercase.)
    6. The count() method counts the number of occurrences of a substring. Yes, there really are six “f”s in that sentence!
    @@ -263,7 +263,7 @@ experience of years. {'password': 'PapayaWhip', 'user': 'pilgrim', 'database': 'master'}
      -
    1. The split() string method takes one argument, a delimiter, and split a string into a list of strings based on the delimiter. Here, the delimiter is an ampersand character, but it could be anything. +
    2. The split() string method takes one argument, a delimiter, and split a string into a list of strings based on the delimiter. Here, the delimiter is an ampersand character, but it could be anything.
    3. Now we have a list of strings, each with a key, followed by an equals sign, followed by a value. We want to iterate over the entire list and split each string into two strings based on the first equals sign. (In theory, a value could contain an equals sign too. If we just used 'key=value=foo'.split('='), we would end up with a three-item list ['key', 'value', 'foo'].)
    4. Finally, Python can turn that list-of-lists into a dictionary simply by passing it to the dict() function.
    @@ -272,7 +272,7 @@ experience of years.

    Strings vs. Bytes

    -

    Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a string. An immutable sequence of numbers-between-0-and-255 is called a bytes object. +

    Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a string. An immutable sequence of numbers-between-0-and-255 is called a bytes object.

     >>> by = b'abcd\x65'  
    @@ -294,7 +294,7 @@ experience of years.
       File "<stdin>", line 1, in <module>
     TypeError: 'bytes' object does not support item assignment
      -
    1. To define a bytes object, use the b'' “byte literal” syntax. Each byte within the byte literal can be an ASCII character or an encoded hexadecimal number from \x00 to \xff (0–255). +
    2. To define a bytes object, use the b''byte literal” syntax. Each byte within the byte literal can be an ASCII character or an encoded hexadecimal number from \x00 to \xff (0–255).
    3. The type of a bytes object is bytes.
    4. Just like lists and strings, you can get the length of a bytes object with the built-in len() function.
    5. Just like lists and strings, you can use the + operator to concatenate bytes objects. The result is a new bytes object. @@ -336,11 +336,11 @@ TypeError: Can't convert 'bytes' object to str implicitly 1
      1. You can’t concatenate bytes and strings. They are two different data types. -
      2. You can’t count the occurrences of bytes in a string, because there are no bytes in a string. A string is a sequence of characters. Perhaps you meant “count the occurrences of the string that you would get after decoding this sequence of bytes in a particular character encoding”? Well then, you’ll need to say that explicitly. Python 3 won’t implicitly convert bytes to strings or strings to bytes. +
      3. You can’t count the occurrences of bytes in a string, because there are no bytes in a string. A string is a sequence of characters. Perhaps you meant “count the occurrences of the string that you would get after decoding this sequence of bytes in a particular character encoding”? Well then, you’ll need to say that explicitly. Python 3 won’t implicitly convert bytes to strings or strings to bytes.
      4. By an amazing coincidence, this line of code says “count the occurrences of the string that you would get after decoding this sequence of bytes in this particular character encoding.”
      -

      And here is the link between strings and bytes: bytes objects have a decode() method that takes a character encoding and returns a string, and strings have an encode() method that takes a character encoding and returns a bytes object. In the previous example, the decoding was relatively straightforward — converting a sequence of bytes n the ASCII encoding into a string of characters. But the same process works with any encoding that supports the characters of the string — even legacy (non-Unicode) encodings. +

      And here is the link between strings and bytes: bytes objects have a decode() method that takes a character encoding and returns a string, and strings have an encode() method that takes a character encoding and returns a bytes object. In the previous example, the decoding was relatively straightforward — converting a sequence of bytes n the ASCII encoding into a string of characters. But the same process works with any encoding that supports the characters of the string — even legacy (non-Unicode) encodings.

       >>> a_string = '深入 Python'         
      @@ -381,7 +381,7 @@ TypeError: Can't convert 'bytes' object to str implicitly
       

      Python 3 assumes that your source code — i.e. each .py file — is encoded in UTF-8.

      -

      In Python 2, the default encoding for .py files was ASCII. In Python 3, the default encoding is UTF-8. +

      In Python 2, the default encoding for .py files was ASCII. In Python 3, the default encoding is UTF-8.

      If you would like to use a different encoding within your Python code, you can put an encoding declaration on the first line of each file. This declaration defines a .py file to be windows-1252: