mirror of
https://github.com/kennethreitz/requests-html.git
synced 2026-06-05 23:00:20 +00:00
Update docs
This commit is contained in:
committed by
Alessandro Romano
parent
1faa61b7d0
commit
b769fc3dac
+30
@@ -20,6 +20,7 @@ When using this library you automatically get:
|
||||
- Automatic following of redirects.
|
||||
- Connection–pooling and cookie persistence.
|
||||
- The Requests experience you know and love, with magical parsing abilities.
|
||||
- **Async Support**
|
||||
|
||||
.. Other nice features include:
|
||||
|
||||
@@ -38,6 +39,24 @@ Make a GET request to 'python.org', using Requests:
|
||||
|
||||
>>> r = session.get('https://python.org/')
|
||||
|
||||
Try async and get some sites at the same time:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> from requests_html import AsyncHTMLSession
|
||||
>>> asession = AsyncHTMLSession()
|
||||
|
||||
>>> async def get_pythonorg():
|
||||
... r = await asession.get('https://python.org/')
|
||||
|
||||
>>> async def get_reddit():
|
||||
... r = await asession.get('https://reddit.com/')
|
||||
|
||||
>>> async def get_google():
|
||||
... r = await asession.get('https://google.com/')
|
||||
|
||||
>>> result = session.run(get_pythonorg, get_reddit, get_google)
|
||||
|
||||
Grab a list of all links on the page, as–is (anchors excluded):
|
||||
|
||||
.. code-block:: pycon
|
||||
@@ -140,6 +159,17 @@ Let's grab some text that's rendered by JavaScript:
|
||||
>>> r.html.search('Python 2 will retire in only {months} months!')['months']
|
||||
'<time>25</time>'
|
||||
|
||||
Or you can do this async also:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> r = asession.get('http://python-requests.org/')
|
||||
|
||||
>>> await r.html.arender()
|
||||
|
||||
>>> r.html.search('Python 2 will retire in only {months} months!')['months']
|
||||
'<time>25</time>'
|
||||
|
||||
Note, the first time you ever run the ``render()`` method, it will download
|
||||
Chromium into your home directory (e.g. ``~/.pyppeteer/``). This only happens
|
||||
once.
|
||||
|
||||
@@ -28,6 +28,7 @@ When using this library you automatically get:
|
||||
- Automatic following of redirects.
|
||||
- Connection–pooling and cookie persistence.
|
||||
- The Requests experience you know and love, with magical parsing abilities.
|
||||
- **Async Support**
|
||||
|
||||
.. Other nice features include:
|
||||
|
||||
@@ -57,6 +58,33 @@ Make a GET request to `python.org <https://python.org/>`_, using `Requests <http
|
||||
|
||||
>>> r = session.get('https://python.org/')
|
||||
|
||||
Or want to try our async session:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> from requests_html import AsyncHTMLSession
|
||||
>>> asession = AsyncHTMLSession()
|
||||
|
||||
>>> r = await asession.get('https://python.org/')
|
||||
|
||||
But async is fun when fetching some sites at the same time:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> from requests_html import AsyncHTMLSession
|
||||
>>> asession = AsyncHTMLSession()
|
||||
|
||||
>>> async def get_pythonorg():
|
||||
... r = await asession.get('https://python.org/')
|
||||
|
||||
>>> async def get_reddit():
|
||||
... r = await asession.get('https://reddit.com/')
|
||||
|
||||
>>> async def get_google():
|
||||
... r = await asession.get('https://google.com/')
|
||||
|
||||
>>> session.run(get_pythonorg, get_reddit, get_google)
|
||||
|
||||
Grab a list of all links on the page, as–is (anchors excluded):
|
||||
|
||||
.. code-block:: pycon
|
||||
@@ -179,6 +207,17 @@ Let's grab some text that's rendered by JavaScript:
|
||||
>>> r.html.search('Python 2 will retire in only {months} months!')['months']
|
||||
'<time>25</time>'
|
||||
|
||||
Or you can do this async also:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> r = asession.get('http://python-requests.org/')
|
||||
|
||||
>>> await r.html.arender()
|
||||
|
||||
>>> r.html.search('Python 2 will retire in only {months} months!')['months']
|
||||
'<time>25</time>'
|
||||
|
||||
Note, the first time you ever run the ``render()`` method, it will download
|
||||
Chromium into your home directory (e.g. ``~/.pyppeteer/``). This only happens
|
||||
once. You may also need to install a few `Linux packages <https://github.com/miyakogi/pyppeteer/issues/60>`_ to get pyppeteer working.
|
||||
@@ -202,6 +241,17 @@ There's also intelligent pagination support (always improving):
|
||||
<HTML url='https://www.reddit.com/?count=150&after=t3_81nrcd'>
|
||||
…
|
||||
|
||||
For `async` pagination use the new `async for`:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> r = await asession.get('https://reddit.com')
|
||||
>>> async for html in r.html:
|
||||
... print(html)
|
||||
<HTML url='https://www.reddit.com/'>
|
||||
<HTML url='https://www.reddit.com/?count=25&after=t3_81puu5'>
|
||||
…
|
||||
|
||||
You can also just request the next URL easily:
|
||||
|
||||
.. code-block:: pycon
|
||||
@@ -246,6 +296,16 @@ You can also render JavaScript pages without Requests:
|
||||
>>> print(html.html)
|
||||
<html><head></head><body><a href="https://httpbin.org"></a></body></html>
|
||||
|
||||
For using `arender` just pass `async_=True` to HTML.
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
# ^^ using above script ^^
|
||||
>>> html = HTML(html=doc, async_=True)
|
||||
>>> val = await html.arender(script=script, reload=False)
|
||||
>>> print(val)
|
||||
{'width': 800, 'height': 600, 'deviceScaleFactor': 1}
|
||||
|
||||
|
||||
API Documentation
|
||||
=================
|
||||
@@ -278,6 +338,10 @@ These sessions are for making HTTP requests:
|
||||
:inherited-members:
|
||||
|
||||
|
||||
.. autoclass:: AsyncHTMLSession
|
||||
:inherited-members:
|
||||
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
Reference in New Issue
Block a user