Requests-HTML: HTML Parsing for Humans™ ======================================= This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When using this library you automatically get: - jQuery selectors (thanks to PyQuery). - Mocked user-agent (like a real web browser). - Automatic following of redirects. - Connection–pooling and cookie persistience. - The Requests experience you know and love, with magic parsing abilities. Other nice features include: - Markdown export of pages and elements. Usage ===== .. code-block:: pycon Make a GET request to 'python.org', using Requests: >>> from requests_html import session >>> r = session.get('https://python.org/') Grab a list of all links on the page, as–is (anchors excluded): .. code-block:: pycon >>> r.html.links {'/users/membership/', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/about/success/', 'http://flask.pocoo.org/', 'http://www.djangoproject.com/', '/blogs/', ... '/psf-landing/', 'https://wiki.python.org/moin/PythonBooks'} Grab a list of all links on the page, in absolute form (anchors excluded): .. code-block:: pycon >>> r.html.absolute_links {'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/downloads/mac-osx/', 'http://flask.pocoo.org/', 'https://www.python.org//docs.python.org/3/tutorial/', 'http://www.djangoproject.com/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org//docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/about/success/', 'http://twitter.com/ThePSF', 'https://www.python.org/events/python-user-group/634/', ..., 'https://wiki.python.org/moin/PythonBooks'} Select an element with a jQuery selector. .. code-block:: pycon >>> about = r.html.find('#about', first=True) Grab an element's text contents: .. code-block:: pycon >>> print(about.text) About Applications Quotes Getting Started Help Python Brochure Introspect an Element's attributes: .. code-block:: pycon >>> about.attrs {'id': 'about', 'class': 'tier-1 element-1 ', 'aria-haspopup': 'true'} Select Elements within Elements: .. code-block:: pycon >>> about.find('a') [, , , , , ] Render an Element as Markdown: .. code-block:: pycon >>> print(about.markdown) * [About](/about/) * [Applications](/about/apps/) * [Quotes](/about/quotes/) * [Getting Started](/about/gettingstarted/) * [Help](/about/help/) * [Python Brochure](http://brochure.getpython.info/) Search for text on the page: .. code-block:: pycon >>> r.html.search('Python is a {} language')[0] programming More complex CSS Selector example (copied from Chrome dev tools): .. code-block:: pycon >>> r = session.get('https://github.com/') >>> sel = 'body > div.application-main > div.jumbotron.jumbotron-codelines > div > div > div.col-md-7.text-center.text-md-left > p' >>> print(r.html.find(sel)[0].text) GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside millions of other developers. Installation ============ .. code-block:: shell $ pipenv install requests-html ✨🍰✨