mirror of
https://github.com/kennethreitz/langchain.git
synced 2026-06-05 23:00:18 +00:00
62603f2664
I was trying to use web loaders on some spanish documentation (e.g. [this site](https://www.fromdoppler.com/es/mailing-tendencias/), but the auto-encoding introduced in https://github.com/langchain-ai/langchain/pull/3602 was detected as "MacRoman" instead of the (correct) "UTF-8". To address this, I've added the ability to disable the auto-encoding, as well as the ability to explicitly tell the loader what encoding to use. - **Description:** Makes auto-setting the encoding optional in `WebBaseLoader`, and introduces an `encoding` option to explicitly set it. - **Dependencies:** N/A - **Tag maintainer:** @hwchase17 - **Twitter handle:** @czue