Merge pull request #772 from Lukasa/develop

First pass at documenting encodings and RFC compliance.
2026-06-05 22:50:18 +00:00 · 2012-08-10 09:05:08 -07:00
parent 56b01bf0e6 7a9419ce35
commit 64646182b2
2 changed files with 40 additions and 5 deletions
@@ -343,6 +343,31 @@ To use HTTP Basic Auth with your proxy, use the `http://user:password@host/` syn
        "http": "http://user:pass@10.10.1.10:3128/",
    }

+Compliance
+----------
+
+Requests is intended to be compliant with all relevant specifications and
+RFCs where that compliance will not cause difficulties for users. This
+attention to the specification can lead to some behaviour that may seem
+unusual to those not familiar with the relevant specification.
+
+Encodings
+^^^^^^^^^
+
+When you receive a response, Requests makes a guess at the encoding to use for
+decoding the response when you call the ``Response.text`` method. Requests
+will first check for an encoding in the HTTP header, and if none is present,
+will use `chardet <http://pypi.python.org/pypi/chardet>`_ to attempt to guess
+the encoding.
+
+The only time Requests will not do this is if no explicit charset is present
+in the HTTP headers **and** the ``Content-Type`` header contains ``text``. In
+this situation,
+`RFC 2616 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_
+specifies that the default charset must be ``ISO-8859-1``. Requests follows
+the specification in this case. If you require a different encoding, you can
+manually set the ``Response.encoding`` property, or use the raw
+``Request.content``.

 HTTP Verbs
 ----------
@@ -86,12 +86,22 @@ again::
 Requests will automatically decode content from the server. Most unicode
 charsets are seamlessly decoded.

-When you make a request, ``r.encoding`` is set, based on the HTTP headers.
-Requests will use that encoding when you access ``r.text``.  If ``r.encoding``
-is ``None``, Requests will make an extremely educated guess of the encoding
-of the response body. You can manually set ``r.encoding`` to any encoding
-you'd like, and that charset will be used.
+When you make a request, Requests makes educated guesses about the encoding of
+the response based on the HTTP headers. The text encoding guessed by Requests
+is used when you access ``r.text``. You can find out what encoding Requests is
+using, and change it, using the ``r.encoding`` property::

+    >>> r.encoding
+    'utf-8'
+    >>> r.encoding = 'ISO-8859-1'
+
+If you change the encoding, Requests will use the new value of ``r.encoding``
+whenever you call ``r.text``.
+
+Requests will also use custom encodings in the event that you need them. If
+you have created your own encoding and registered it with the ``codecs``
+module, you can simply use the codec name as the value of ``r.encoding`` and
+Requests will handle the decoding for you.

 Binary Response Content
 -----------------------