mirror of
https://github.com/kennethreitz/requests.git
synced 2026-06-05 22:50:18 +00:00
Switch LGPL'd chardet for MIT licensed charset_normalizer (#5797)
Although using the (non-vendored) chardet library is fine for requests itself, but using a LGPL dependency the story is a lot less clear for downstream projects, particularly ones that might like to bundle requests (and thus chardet) in to a single binary -- think something similar to what docker-compose is doing. By including an LGPL'd module it is no longer clear if the resulting artefact must also be LGPL'd. By changing out this dependency for one under MIT we remove all license ambiguity. As an "escape hatch" I have made the code so that it will use chardet first if it is installed, but we no longer depend upon it directly, although there is a new extra added, `requests[lgpl]`. This should minimize the impact to users, and give them an escape hatch if charset_normalizer turns out to be not as good. (In my non-exhaustive tests it detects the same encoding as chartdet in every case I threw at it) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
This commit is contained in:
committed by
GitHub
parent
33d448eb21
commit
2ed84f55b2
+15
-3
@@ -697,10 +697,22 @@ Encodings
|
||||
When you receive a response, Requests makes a guess at the encoding to
|
||||
use for decoding the response when you access the :attr:`Response.text
|
||||
<requests.Response.text>` attribute. Requests will first check for an
|
||||
encoding in the HTTP header, and if none is present, will use `chardet
|
||||
<https://pypi.org/project/chardet/>`_ to attempt to guess the encoding.
|
||||
encoding in the HTTP header, and if none is present, will use
|
||||
`charset_normalizer <https://pypi.org/project/charset_normalizer/>`_
|
||||
or `chardet <https://github.com/chardet/chardet>`_ to attempt to
|
||||
guess the encoding.
|
||||
|
||||
The only time Requests will not do this is if no explicit charset
|
||||
If ``chardet`` is installed, ``requests`` uses it, however for python3
|
||||
``chardet`` is no longer a mandatory dependency. The ``chardet``
|
||||
library is an LGPL-licenced dependency and some users of requests
|
||||
cannot depend on mandatory LGPL-licensed dependencies.
|
||||
|
||||
When you install ``request`` without specifying ``[use_chardet_on_py3]]`` extra,
|
||||
and ``chardet`` is not already installed, ``requests`` uses ``charset-normalizer``
|
||||
(MIT-licensed) to guess the encoding. For Python 2, ``requests`` uses only
|
||||
``chardet`` and is a mandatory dependency there.
|
||||
|
||||
The only time Requests will not guess the encoding is if no explicit charset
|
||||
is present in the HTTP headers **and** the ``Content-Type``
|
||||
header contains ``text``. In this situation, `RFC 2616
|
||||
<https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_ specifies
|
||||
|
||||
Reference in New Issue
Block a user