Commit Graph

227 Commits

Author SHA1 Message Date
Xiao Tan f23ccbfcb5 fix: fstring formatted with bytes 2019-02-21 13:53:17 +08:00
David Hinschberger b2cb028b32 Update requests_html.py
small grammar/typos
2019-01-12 18:19:14 -06:00
Alessandro Romano 4f4cfc0c1d Set verify default value to True 2018-09-21 15:17:57 +02:00
Alessandro Romano 8c6b3b3f92 Merge branch 'master' into verify-parameter 2018-09-19 15:50:12 +02:00
Alessandro Romano f4d9de9f9c Delete old class member browser_argsverify 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 1faa61b7d0 Add asyncsession.run method 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 3d6b82862d Fix r.html.next() for next url 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 69bc5dcb20 Fix HTML class to use async iter and render on bare mode 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 99f9c89766 Add arender method to HTML 2018-09-19 15:45:17 +02:00
Ordanis Sanchez a25b3737ad Add async iterator to HTML class 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 2d4c58deb8 Add HTMLSession.browser runtime exception, AsyncSession an async close method 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 47a3646405 Create a base session 2018-09-19 15:45:17 +02:00
Ordanis Sanchez d0a5642de7 Fix merge errors on HTMLSession 2018-09-19 15:43:13 +02:00
Alessandro Romano 5cc1ca4a70 Add verify parameter coherently with ignoreHTTPSErrors pyppeteer parameter 2018-09-19 15:42:20 +02:00
Ordanis Sanchez 69dd1cc77f Add asyncsession.run method 2018-09-18 16:59:14 -04:00
Ordanis Sanchez 09c7b683cc Fix r.html.next() for next url 2018-09-18 16:59:14 -04:00
Ordanis Sanchez fc1fabd8dc Fix HTML class to use async iter and render on bare mode 2018-09-18 16:59:14 -04:00
Ordanis Sanchez 85e77d134a Add arender method to HTML 2018-09-18 16:56:55 -04:00
Ordanis Sanchez c12d7c6aca Add async iterator to HTML class 2018-09-18 16:49:22 -04:00
Ordanis Sanchez dd05a02de7 Add HTMLSession.browser runtime exception, AsyncSession an async close method 2018-09-18 16:49:22 -04:00
Ordanis Sanchez 2e460d93c3 Create a base session 2018-09-18 16:49:22 -04:00
Ordanis Sanchez 9cef8a06b9 Fix merge errors on HTMLSession 2018-09-18 16:37:31 -04:00
Alessandro Romano 52ddd80824 Merge branch 'master' into add-ignoreHTTPSError-parameter 2018-09-18 17:31:17 +02:00
kennethreitz 51afd9e474 Merge pull request #157 from CodeMogul/fixes
Made basic edits
2018-09-18 02:56:12 -04:00
kennethreitz 29acbaabc7 Merge pull request #160 from SN9NV/patch-1
Create LXML from raw_html
2018-09-18 02:54:05 -04:00
kennethreitz f760df2be2 Merge pull request #162 from SN9NV/patch-2
Replace errors when decoding raw_html
2018-09-18 02:52:37 -04:00
kennethreitz 625910e1a5 Merge branch 'master' into master 2018-09-18 02:51:54 -04:00
kennethreitz c6f6858ea0 Merge pull request #200 from meetmangukiya/patch-1
requests_html.py: Typo HTTPSession -> HTMLSession
2018-09-18 02:44:41 -04:00
kennethreitz e37b40e59f Merge pull request #189 from carrionc/patch-2
Multiple chromium tab fix
2018-09-18 02:42:54 -04:00
kennethreitz e05933acfc Merge pull request #191 from montenegrodr/patch-1
fix: typo
2018-09-18 02:42:43 -04:00
kennethreitz 5d7c859975 Merge pull request #193 from pennyarcade/master
Update requests_html.py
2018-09-18 02:41:02 -04:00
kennethreitz 16c0dbe13d Merge pull request #201 from timotk/patch-1
Fix minor typo
2018-09-18 02:40:16 -04:00
kennethreitz 87b183c7fc Merge pull request #205 from leven-cn/develop
Add "tag" attribute for Element objects
2018-09-18 02:40:06 -04:00
m9mhmdy cb6e5fb557 Fix a small typo 2018-08-30 16:25:25 +02:00
Alessandro Romano b1a7acf33a Added ignoreHTTPErrors parameter 2018-08-09 15:03:15 +02:00
Li Yun 1c21f63672 Add "lineno" attribute for Element object 2018-07-04 11:30:59 +08:00
Li Yun 116a4b08eb Add "tag" attribute for Element object 2018-07-04 11:20:08 +08:00
Timo 71e2571d3a Fix minor typo
lopp -> loop in init docstring of AsyncHTMLSession.
2018-06-25 13:18:57 +02:00
Meet Mangukiya 4db2931ddc requests_html.py: Typo HTTPSession -> HTMLSession 2018-06-24 22:23:47 +05:30
Martin Rotwang 96dbba8fbd Update requests_html.py
e.g. to add a proxy setting
usage: s=Session(browser_args=['--no-sandbox', '--proxy-server=127.0.0.1:9876'])
@see: https://github.com/GoogleChrome/puppeteer/issues/336
2018-06-05 12:39:46 +02:00
Robson D. Montenegro a1c5e6ac8b fix: typo 2018-06-03 21:52:51 +01:00
carrionc 956e60054c Multiple chromium tab fix
Within the render function, the page is rendered through the _async_render function. This function will try to render content by first creating a page, and currently will only close said page if the content is generated. However, if at any point there's a timeout beforehand, the current page isn't closed, and instead _async_render will be called again [as per the # assigned to retries in render()] and end up leaving behind an unused page. This change will enable render to close the "failed" attempt BEFORE opening a new page to try again, and should fix the issue of massive cpu buildup with multiple chromium instances. Sorry if this is messy, it's my first time using git to make a change.
2018-05-30 00:40:37 -04:00
Angus Dippenaar 2a7d08722d Initialize PyQuery with lxml
PyQuery with XML sites also has the same issue that LXML does with unicode encoded strings because it uses LXML to parse the page.
The fix has already been applied to LXML, so we can fix the issue with PyQuery by passing the already parsed LXML into PyQuery.
2018-04-14 21:32:00 +02:00
Shay Elmualem 50c9058d04 Minor typo fix 2018-04-07 22:18:46 +03:00
Angus Dippenaar 05ff6e87ca Replace errors when decoding raw_html
Some websites don't have valid bytes, even when the encoding is specified. I'm not 100% sure if replacing "bad" bytes is the correct way to fix the problem. It seems to fix the issues I've run into with some sites.
2018-04-07 17:15:51 +02:00
Angus Dippenaar c21f0784cd Create LXML from raw_html
Create LXML from `self.raw_html` instead of `self.html` to allow LXML to process plain XML pages as per beda42's findings in issue https://github.com/kennethreitz/requests-html/issues/145

I have tested this change with 200 sites and it seems to fix the issue. HTML pages seem to all be working as expected. I haven't run into an issue with any that I've tested.
2018-04-05 13:47:39 +02:00
Siddhesh Nachane cb55034b42 Made basic fixes
1. Corrected Comments and DocStrings Spell Errors.
2. Added .vscode folder to .gitignore
3. Replaced `i` with place holder `_` (as i is never used)
2018-03-31 22:51:34 +05:30
kennethreitz 122b42a144 cleanup
Signed-off-by: Kenneth Reitz <me@kennethreitz.org>
2018-03-21 07:46:27 -04:00
Ordanis Sanchez a2cc6bfa55 Update HTML.render to use session.browser anf close pages automatically 2018-03-20 19:50:04 -04:00
Ordanis Sanchez 9a53202ce5 Extend session close method to shutdown browser 2018-03-20 19:20:20 -04:00