Create LXML from raw_html

Create LXML from `self.raw_html` instead of `self.html` to allow LXML to process plain XML pages as per beda42's findings in issue https://github.com/kennethreitz/requests-html/issues/145

I have tested this change with 200 sites and it seems to fix the issue. HTML pages seem to all be working as expected. I haven't run into an issue with any that I've tested.
This commit is contained in:
Angus Dippenaar
2018-04-05 13:47:39 +02:00
committed by GitHub
parent c59480bf15
commit c21f0784cd
+1 -1
View File
@@ -159,7 +159,7 @@ class BaseParser:
try:
self._lxml = soup_parse(self.html, features='html.parser')
except ValueError:
self._lxml = lxml.html.fromstring(self.html)
self._lxml = lxml.html.fromstring(self.raw_html)
return self._lxml