mirror of
https://github.com/kennethreitz/requests-html.git
synced 2026-06-05 23:00:20 +00:00
Create LXML from raw_html
Create LXML from `self.raw_html` instead of `self.html` to allow LXML to process plain XML pages as per beda42's findings in issue https://github.com/kennethreitz/requests-html/issues/145 I have tested this change with 200 sites and it seems to fix the issue. HTML pages seem to all be working as expected. I haven't run into an issue with any that I've tested.
This commit is contained in:
+1
-1
@@ -159,7 +159,7 @@ class BaseParser:
|
||||
try:
|
||||
self._lxml = soup_parse(self.html, features='html.parser')
|
||||
except ValueError:
|
||||
self._lxml = lxml.html.fromstring(self.html)
|
||||
self._lxml = lxml.html.fromstring(self.raw_html)
|
||||
|
||||
return self._lxml
|
||||
|
||||
|
||||
Reference in New Issue
Block a user