Commit Graph

212 Commits

Author SHA1 Message Date
Ordanis Sanchez 69bc5dcb20 Fix HTML class to use async iter and render on bare mode 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 99f9c89766 Add arender method to HTML 2018-09-19 15:45:17 +02:00
Ordanis Sanchez a25b3737ad Add async iterator to HTML class 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 2d4c58deb8 Add HTMLSession.browser runtime exception, AsyncSession an async close method 2018-09-19 15:45:17 +02:00
Ordanis Sanchez 47a3646405 Create a base session 2018-09-19 15:45:17 +02:00
Ordanis Sanchez d0a5642de7 Fix merge errors on HTMLSession 2018-09-19 15:43:13 +02:00
Alessandro Romano 5cc1ca4a70 Add verify parameter coherently with ignoreHTTPSErrors pyppeteer parameter 2018-09-19 15:42:20 +02:00
Alessandro Romano 52ddd80824 Merge branch 'master' into add-ignoreHTTPSError-parameter 2018-09-18 17:31:17 +02:00
kennethreitz 51afd9e474 Merge pull request #157 from CodeMogul/fixes
Made basic edits
2018-09-18 02:56:12 -04:00
kennethreitz 29acbaabc7 Merge pull request #160 from SN9NV/patch-1
Create LXML from raw_html
2018-09-18 02:54:05 -04:00
kennethreitz f760df2be2 Merge pull request #162 from SN9NV/patch-2
Replace errors when decoding raw_html
2018-09-18 02:52:37 -04:00
kennethreitz 625910e1a5 Merge branch 'master' into master 2018-09-18 02:51:54 -04:00
kennethreitz c6f6858ea0 Merge pull request #200 from meetmangukiya/patch-1
requests_html.py: Typo HTTPSession -> HTMLSession
2018-09-18 02:44:41 -04:00
kennethreitz e37b40e59f Merge pull request #189 from carrionc/patch-2
Multiple chromium tab fix
2018-09-18 02:42:54 -04:00
kennethreitz e05933acfc Merge pull request #191 from montenegrodr/patch-1
fix: typo
2018-09-18 02:42:43 -04:00
kennethreitz 5d7c859975 Merge pull request #193 from pennyarcade/master
Update requests_html.py
2018-09-18 02:41:02 -04:00
kennethreitz 16c0dbe13d Merge pull request #201 from timotk/patch-1
Fix minor typo
2018-09-18 02:40:16 -04:00
kennethreitz 87b183c7fc Merge pull request #205 from leven-cn/develop
Add "tag" attribute for Element objects
2018-09-18 02:40:06 -04:00
m9mhmdy cb6e5fb557 Fix a small typo 2018-08-30 16:25:25 +02:00
Alessandro Romano b1a7acf33a Added ignoreHTTPErrors parameter 2018-08-09 15:03:15 +02:00
Li Yun 1c21f63672 Add "lineno" attribute for Element object 2018-07-04 11:30:59 +08:00
Li Yun 116a4b08eb Add "tag" attribute for Element object 2018-07-04 11:20:08 +08:00
Timo 71e2571d3a Fix minor typo
lopp -> loop in init docstring of AsyncHTMLSession.
2018-06-25 13:18:57 +02:00
Meet Mangukiya 4db2931ddc requests_html.py: Typo HTTPSession -> HTMLSession 2018-06-24 22:23:47 +05:30
Martin Rotwang 96dbba8fbd Update requests_html.py
e.g. to add a proxy setting
usage: s=Session(browser_args=['--no-sandbox', '--proxy-server=127.0.0.1:9876'])
@see: https://github.com/GoogleChrome/puppeteer/issues/336
2018-06-05 12:39:46 +02:00
Robson D. Montenegro a1c5e6ac8b fix: typo 2018-06-03 21:52:51 +01:00
carrionc 956e60054c Multiple chromium tab fix
Within the render function, the page is rendered through the _async_render function. This function will try to render content by first creating a page, and currently will only close said page if the content is generated. However, if at any point there's a timeout beforehand, the current page isn't closed, and instead _async_render will be called again [as per the # assigned to retries in render()] and end up leaving behind an unused page. This change will enable render to close the "failed" attempt BEFORE opening a new page to try again, and should fix the issue of massive cpu buildup with multiple chromium instances. Sorry if this is messy, it's my first time using git to make a change.
2018-05-30 00:40:37 -04:00
Angus Dippenaar 2a7d08722d Initialize PyQuery with lxml
PyQuery with XML sites also has the same issue that LXML does with unicode encoded strings because it uses LXML to parse the page.
The fix has already been applied to LXML, so we can fix the issue with PyQuery by passing the already parsed LXML into PyQuery.
2018-04-14 21:32:00 +02:00
Shay Elmualem 50c9058d04 Minor typo fix 2018-04-07 22:18:46 +03:00
Angus Dippenaar 05ff6e87ca Replace errors when decoding raw_html
Some websites don't have valid bytes, even when the encoding is specified. I'm not 100% sure if replacing "bad" bytes is the correct way to fix the problem. It seems to fix the issues I've run into with some sites.
2018-04-07 17:15:51 +02:00
Angus Dippenaar c21f0784cd Create LXML from raw_html
Create LXML from `self.raw_html` instead of `self.html` to allow LXML to process plain XML pages as per beda42's findings in issue https://github.com/kennethreitz/requests-html/issues/145

I have tested this change with 200 sites and it seems to fix the issue. HTML pages seem to all be working as expected. I haven't run into an issue with any that I've tested.
2018-04-05 13:47:39 +02:00
Siddhesh Nachane cb55034b42 Made basic fixes
1. Corrected Comments and DocStrings Spell Errors.
2. Added .vscode folder to .gitignore
3. Replaced `i` with place holder `_` (as i is never used)
2018-03-31 22:51:34 +05:30
kennethreitz 122b42a144 cleanup
Signed-off-by: Kenneth Reitz <me@kennethreitz.org>
2018-03-21 07:46:27 -04:00
Ordanis Sanchez a2cc6bfa55 Update HTML.render to use session.browser anf close pages automatically 2018-03-20 19:50:04 -04:00
Ordanis Sanchez 9a53202ce5 Extend session close method to shutdown browser 2018-03-20 19:20:20 -04:00
Ordanis Sanchez c279bd3d63 Add browser obj to HTMLSession 2018-03-20 18:47:06 -04:00
kennethreitz ef67e9f96f Merge pull request #141 from oldani/bugfix/issue_135
Catch typeError on render, add maxretires exception
2018-03-20 17:11:53 -04:00
Ordanis Sanchez ff95aded81 Catch typeError on render, add maxretires exception 2018-03-16 12:02:03 -04:00
Ordanis Sanchez 9b21faf291 Update Sessions classes to be passed down to HTML class 2018-03-14 10:31:36 -04:00
Ordanis Sanchez a79e5479de Move next method form BaseParser to HTML class 2018-03-14 10:16:40 -04:00
bonfy 76f2f6434c add func add_next_symbol make it possible to append word to default next page symbols 2018-03-13 11:11:07 +08:00
kennethreitz 6f8b676ac3 Merge branch 'master' of github.com:kennethreitz/requests-html 2018-03-11 09:44:55 -04:00
shaunpud d55bcfb34f Shorten 2018-03-11 19:53:54 +08:00
kennethreitz bcb0881d15 Merge pull request #126 from frostming/bugfix/links
Fix bugs related to links
2018-03-11 07:37:20 -04:00
Frost Ming af97ddd5f1 Fix bugs related to links
* #121 KeyError of special base tag
* #124 Remove 'mailto:' links out from links
2018-03-11 16:26:58 +08:00
miyakogi dc932571ee Pyppeteer's api has been changed
Today I released new version of pyppeteer (0.0.13).
In that release, `pyppeteer.launch` has been changed to coroutine function.
2018-03-10 14:57:07 +09:00
kennethreitz d9ee89eaf4 Merge branch 'master' of github.com:kennethreitz/requests-html 2018-03-09 10:42:08 -05:00
kennethreitz 3a5a94eb85 cleaning
Signed-off-by: Kenneth Reitz <me@kennethreitz.org>
2018-03-09 10:42:04 -05:00
Andrew Gorcester 14da46f03d Add tests for ._make_absolute() and make them pass. 2018-03-06 16:06:23 -08:00
kennethreitz 89c001a02e Merge branch 'master' of github.com:kennethreitz/requests-html 2018-03-06 11:45:44 -05:00