Within the render function, the page is rendered through the _async_render function. This function will try to render content by first creating a page, and currently will only close said page if the content is generated. However, if at any point there's a timeout beforehand, the current page isn't closed, and instead _async_render will be called again [as per the # assigned to retries in render()] and end up leaving behind an unused page. This change will enable render to close the "failed" attempt BEFORE opening a new page to try again, and should fix the issue of massive cpu buildup with multiple chromium instances. Sorry if this is messy, it's my first time using git to make a change.
I ran into `pyppeteer.errors.BrowserError: Failed to connect to browser port:` and after a bit of snooping found that some Linux packages needed to be installed on my machine for pyppeteer to run. Suggest to add a note to save others time!
PyQuery with XML sites also has the same issue that LXML does with unicode encoded strings because it uses LXML to parse the page.
The fix has already been applied to LXML, so we can fix the issue with PyQuery by passing the already parsed LXML into PyQuery.
Some websites don't have valid bytes, even when the encoding is specified. I'm not 100% sure if replacing "bad" bytes is the correct way to fix the problem. It seems to fix the issues I've run into with some sites.
Create LXML from `self.raw_html` instead of `self.html` to allow LXML to process plain XML pages as per beda42's findings in issue https://github.com/kennethreitz/requests-html/issues/145
I have tested this change with 200 sites and it seems to fix the issue. HTML pages seem to all be working as expected. I haven't run into an issue with any that I've tested.