diff --git a/docs/docs_skeleton/static/img/qa_data_load.png b/docs/docs_skeleton/static/img/qa_data_load.png new file mode 100644 index 000000000..680e32331 Binary files /dev/null and b/docs/docs_skeleton/static/img/qa_data_load.png differ diff --git a/docs/docs_skeleton/static/img/qa_flow.jpeg b/docs/docs_skeleton/static/img/qa_flow.jpeg new file mode 100644 index 000000000..301099c49 Binary files /dev/null and b/docs/docs_skeleton/static/img/qa_flow.jpeg differ diff --git a/docs/docs_skeleton/static/img/qa_intro.png b/docs/docs_skeleton/static/img/qa_intro.png new file mode 100644 index 000000000..cb31050dd Binary files /dev/null and b/docs/docs_skeleton/static/img/qa_intro.png differ diff --git a/docs/docs_skeleton/static/img/summary_chains.png b/docs/docs_skeleton/static/img/summary_chains.png new file mode 100644 index 000000000..3efef8d7e Binary files /dev/null and b/docs/docs_skeleton/static/img/summary_chains.png differ diff --git a/docs/extras/modules/chains/additional/graph_sparql_qa.ipynb b/docs/extras/modules/chains/additional/graph_sparql_qa.ipynb index 78990b253..883a32513 100644 --- a/docs/extras/modules/chains/additional/graph_sparql_qa.ipynb +++ b/docs/extras/modules/chains/additional/graph_sparql_qa.ipynb @@ -1,300 +1,301 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "c94240f5", - "metadata": {}, - "source": [ - "# GraphSparqlQAChain\n", - "\n", - "Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends Semantic Web Technologies, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). [SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to SQL or Cypher for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating SPARQL.\\\n", - "Disclaimer: To date, SPARQL query generation via LLMs is still a bit unstable. Be especially careful with UPDATE queries, which alter the graph." - ] - }, - { - "cell_type": "markdown", - "id": "dbc0ee68", - "metadata": {}, - "source": [ - "There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "62812aad", - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [], - "source": [ - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphSparqlQAChain\n", - "from langchain.graphs import RdfGraph" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "0928915d", - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [], - "source": [ - "graph = RdfGraph(\n", - " source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n", - " standard=\"rdf\",\n", - " local_copy=\"test.ttl\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Note that providing a `local_file` is necessary for storing changes locally if the source is read-only." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "id": "58c1a8ea", - "metadata": {}, - "source": [ - "## Refresh graph schema information\n", - "If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "4e3de44f", - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [], - "source": [ - "graph.load_schema()" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "1fe76ccd", - "metadata": {}, - "outputs": [ + "cells": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "In the following, each IRI is followed by the local name and optionally its description in parentheses. \n", - "The RDF graph supports the following node types:\n", - " (PersonalProfileDocument, None), (RSAPublicKey, None), (Male, None), (Person, None), (Work, None)\n", - "The RDF graph supports the following relationships:\n", - " (seeAlso, None), (title, None), (mbox_sha1sum, None), (maker, None), (oidcIssuer, None), (publicHomePage, None), (openid, None), (storage, None), (name, None), (country, None), (type, None), (profileHighlightColor, None), (preferencesFile, None), (label, None), (modulus, None), (participant, None), (street2, None), (locality, None), (nick, None), (homepage, None), (license, None), (givenname, None), (street-address, None), (postal-code, None), (street, None), (lat, None), (primaryTopic, None), (fn, None), (location, None), (developer, None), (city, None), (region, None), (member, None), (long, None), (address, None), (family_name, None), (account, None), (workplaceHomepage, None), (title, None), (publicTypeIndex, None), (office, None), (homePage, None), (mbox, None), (preferredURI, None), (profileBackgroundColor, None), (owns, None), (based_near, None), (hasAddress, None), (img, None), (assistant, None), (title, None), (key, None), (inbox, None), (editableProfile, None), (postalCode, None), (weblog, None), (exponent, None), (avatar, None)\n", - "\n" - ] - } - ], - "source": [ - "graph.get_schema" - ] - }, - { - "cell_type": "markdown", - "id": "68a3c677", - "metadata": {}, - "source": [ - "## Querying the graph\n", - "\n", - "Now, you can use the graph SPARQL QA chain to ask questions about the graph." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "7476ce98", - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [], - "source": [ - "chain = GraphSparqlQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "ef8ee27b", - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n", - "Identified intent:\n", - "\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n", - "Generated SPARQL:\n", - "\u001B[32;1m\u001B[1;3mPREFIX foaf: \n", - "SELECT ?homepage\n", - "WHERE {\n", - " ?person foaf:name \"Tim Berners-Lee\" .\n", - " ?person foaf:workplaceHomepage ?homepage .\n", - "}\u001B[0m\n", - "Full Context:\n", - "\u001B[32;1m\u001B[1;3m[]\u001B[0m\n", - "\n", - "\u001B[1m> Finished chain.\u001B[0m\n" - ] + "cell_type": "markdown", + "id": "c94240f5", + "metadata": {}, + "source": [ + "# GraphSparqlQAChain\n", + "\n", + "Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends Semantic Web Technologies, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). [SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to SQL or Cypher for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating SPARQL.\\\n", + "Disclaimer: To date, SPARQL query generation via LLMs is still a bit unstable. Be especially careful with UPDATE queries, which alter the graph." + ] }, { - "data": { - "text/plain": [ - "\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\"" + "cell_type": "markdown", + "id": "dbc0ee68", + "metadata": {}, + "source": [ + "There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)." ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "chain.run(\"What is Tim Berners-Lee's work homepage?\")" - ] - }, - { - "cell_type": "markdown", - "id": "af4b3294", - "metadata": {}, - "source": [ - "## Updating the graph\n", - "\n", - "Analogously, you can update the graph, i.e., insert triples, using natural language." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "fdf38841", - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n", - "Identified intent:\n", - "\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n", - "Generated SPARQL:\n", - "\u001B[32;1m\u001B[1;3mPREFIX foaf: \n", - "INSERT {\n", - " ?person foaf:workplaceHomepage .\n", - "}\n", - "WHERE {\n", - " ?person foaf:name \"Timothy Berners-Lee\" .\n", - "}\u001B[0m\n", - "\n", - "\u001B[1m> Finished chain.\u001B[0m\n" - ] }, { - "data": { - "text/plain": [ - "'Successfully inserted triples into the graph.'" + "cell_type": "code", + "execution_count": 1, + "id": "62812aad", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "outputs": [], + "source": [ + "from langchain.chat_models import ChatOpenAI\n", + "from langchain.chains import GraphSparqlQAChain\n", + "from langchain.graphs import RdfGraph" ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")" - ] - }, - { - "cell_type": "markdown", - "id": "5e0f7fc1", - "metadata": {}, - "source": [ - "Let's verify the results:" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "f874171b", - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/plain": [ - "[(rdflib.term.URIRef('https://www.w3.org/'),),\n", - " (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]" + "cell_type": "code", + "execution_count": 8, + "id": "0928915d", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "outputs": [], + "source": [ + "graph = RdfGraph(\n", + " source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n", + " standard=\"rdf\",\n", + " local_copy=\"test.ttl\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Note that providing a `local_file` is necessary for storing changes locally if the source is read-only." + ], + "metadata": { + "collapsed": false + }, + "id": "7af596b5" + }, + { + "cell_type": "markdown", + "id": "58c1a8ea", + "metadata": {}, + "source": [ + "## Refresh graph schema information\n", + "If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "4e3de44f", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "outputs": [], + "source": [ + "graph.load_schema()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1fe76ccd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In the following, each IRI is followed by the local name and optionally its description in parentheses. \n", + "The RDF graph supports the following node types:\n", + " (PersonalProfileDocument, None), (RSAPublicKey, None), (Male, None), (Person, None), (Work, None)\n", + "The RDF graph supports the following relationships:\n", + " (seeAlso, None), (title, None), (mbox_sha1sum, None), (maker, None), (oidcIssuer, None), (publicHomePage, None), (openid, None), (storage, None), (name, None), (country, None), (type, None), (profileHighlightColor, None), (preferencesFile, None), (label, None), (modulus, None), (participant, None), (street2, None), (locality, None), (nick, None), (homepage, None), (license, None), (givenname, None), (street-address, None), (postal-code, None), (street, None), (lat, None), (primaryTopic, None), (fn, None), (location, None), (developer, None), (city, None), (region, None), (member, None), (long, None), (address, None), (family_name, None), (account, None), (workplaceHomepage, None), (title, None), (publicTypeIndex, None), (office, None), (homePage, None), (mbox, None), (preferredURI, None), (profileBackgroundColor, None), (owns, None), (based_near, None), (hasAddress, None), (img, None), (assistant, None), (title, None), (key, None), (inbox, None), (editableProfile, None), (postalCode, None), (weblog, None), (exponent, None), (avatar, None)\n", + "\n" + ] + } + ], + "source": [ + "graph.get_schema" + ] + }, + { + "cell_type": "markdown", + "id": "68a3c677", + "metadata": {}, + "source": [ + "## Querying the graph\n", + "\n", + "Now, you can use the graph SPARQL QA chain to ask questions about the graph." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "7476ce98", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "outputs": [], + "source": [ + "chain = GraphSparqlQAChain.from_llm(\n", + " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "ef8ee27b", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n", + "Identified intent:\n", + "\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n", + "Generated SPARQL:\n", + "\u001b[32;1m\u001b[1;3mPREFIX foaf: \n", + "SELECT ?homepage\n", + "WHERE {\n", + " ?person foaf:name \"Tim Berners-Lee\" .\n", + " ?person foaf:workplaceHomepage ?homepage .\n", + "}\u001b[0m\n", + "Full Context:\n", + "\u001b[32;1m\u001b[1;3m[]\u001b[0m\n", + "\n", + "\u001b[1m> Finished chain.\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\"" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain.run(\"What is Tim Berners-Lee's work homepage?\")" + ] + }, + { + "cell_type": "markdown", + "id": "af4b3294", + "metadata": {}, + "source": [ + "## Updating the graph\n", + "\n", + "Analogously, you can update the graph, i.e., insert triples, using natural language." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "fdf38841", + "metadata": { + "pycharm": { + "is_executing": true + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n", + "Identified intent:\n", + "\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n", + "Generated SPARQL:\n", + "\u001b[32;1m\u001b[1;3mPREFIX foaf: \n", + "INSERT {\n", + " ?person foaf:workplaceHomepage .\n", + "}\n", + "WHERE {\n", + " ?person foaf:name \"Timothy Berners-Lee\" .\n", + "}\u001b[0m\n", + "\n", + "\u001b[1m> Finished chain.\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "'Successfully inserted triples into the graph.'" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")" + ] + }, + { + "cell_type": "markdown", + "id": "5e0f7fc1", + "metadata": {}, + "source": [ + "Let's verify the results:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f874171b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[(rdflib.term.URIRef('https://www.w3.org/'),),\n", + " (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "query = (\n", + " \"\"\"PREFIX foaf: \\n\"\"\"\n", + " \"\"\"SELECT ?hp\\n\"\"\"\n", + " \"\"\"WHERE {\\n\"\"\"\n", + " \"\"\" ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n", + " \"\"\" ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n", + " \"\"\"}\"\"\"\n", + ")\n", + "graph.query(query)" ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" } - ], - "source": [ - "query = (\n", - " \"\"\"PREFIX foaf: \\n\"\"\"\n", - " \"\"\"SELECT ?hp\\n\"\"\"\n", - " \"\"\"WHERE {\\n\"\"\"\n", - " \"\"\" ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n", - " \"\"\" ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n", - " \"\"\"}\"\"\"\n", - ")\n", - "graph.query(query)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "lc", - "language": "python", - "name": "lc" + ], + "metadata": { + "kernelspec": { + "display_name": "lc", + "language": "python", + "name": "lc" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.4" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/docs/extras/modules/data_connection/document_loaders/integrations/web_base.ipynb b/docs/extras/modules/data_connection/document_loaders/integrations/web_base.ipynb index 5a91c3cd1..ca7937b1d 100644 --- a/docs/extras/modules/data_connection/document_loaders/integrations/web_base.ipynb +++ b/docs/extras/modules/data_connection/document_loaders/integrations/web_base.ipynb @@ -1,277 +1,279 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "bf920da0", - "metadata": {}, - "source": [ - "# WebBaseLoader\n", - "\n", - "This covers how to use `WebBaseLoader` to load all text from `HTML` webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as `IMSDbLoader`, `AZLyricsLoader`, and `CollegeConfidentialLoader`" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "00b6de21", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.document_loaders import WebBaseLoader" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "0231df35", - "metadata": {}, - "outputs": [], - "source": [ - "loader = WebBaseLoader(\"https://www.espn.com/\")" - ] - }, - { - "cell_type": "markdown", - "id": "c162b300-5f4b-4e37-bab3-17f590fc07cc", - "metadata": {}, - "source": [ - "To bypass SSL verification errors during fetching, you can set the \"verify\" option:\n", - "\n", - "loader.requests_kwargs = {'verify':False}" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "f06bdc4e", - "metadata": {}, - "outputs": [], - "source": [ - "data = loader.load()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "a390d79f", - "metadata": {}, - "outputs": [ + "cells": [ { - "data": { - "text/plain": [ - "[Document(page_content=\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANCAAMNCAAWNHLSoccer…MLBNCAAFGolfTennisSports BettingBoxingCFLNCAACricketF1HorseLLWSMMANASCARNBA G LeagueOlympic SportsRacingRN BBRN FBRugbyWNBAWorld Baseball ClassicWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nAre you ready for Opening Day? Here's your guide to MLB's offseason chaosWait, Jacob deGrom is on the Rangers now? Xander Bogaerts and Trea Turner signed where? And what about Carlos Correa? Yeah, you're going to need to read up before Opening Day.12hESPNIllustration by ESPNEverything you missed in the MLB offseason3h2:33World Series odds, win totals, props for every teamPlay fantasy baseball for free!TOP HEADLINESQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersLAMAR WANTS OUT OF BALTIMOREMarcus Spears identifies the two teams that need Lamar Jackson the most8h2:00Would Lamar sit out? Will Ravens draft a QB? Jackson trade request insightsLamar Jackson has asked Baltimore to trade him, but Ravens coach John Harbaugh hopes the QB will be back.3hJamison HensleyBallard, Colts will consider trading for QB JacksonJackson to Indy? Washington? Barnwell ranks the QB's trade fitsSNYDER'S TUMULTUOUS 24-YEAR RUNHow Washington’s NFL franchise sank on and off the field under owner Dan SnyderSnyder purchased one of the NFL's marquee franchises in 1999. Twenty-four years later, and with the team up for sale, he leaves a legacy of on-field futility and off-field scandal.13hJohn KeimESPNIOWA STAR STEPS UP AGAINJ-Will: Caitlin Clark is the biggest brand in college sports right now8h0:47'The better the opponent, the better she plays': Clark draws comparisons to TaurasiCaitlin Clark's performance on Sunday had longtime observers going back decades to find comparisons.16hKevin PeltonWOMEN'S ELITE EIGHT SCOREBOARDMONDAY'S GAMESCheck your bracket!NBA DRAFTHow top prospects fared on the road to the Final FourThe 2023 NCAA tournament is down to four teams, and ESPN's Jonathan Givony recaps the players who saw their NBA draft stock change.11hJonathan GivonyAndy Lyons/Getty ImagesTALKING BASKETBALLWhy AD needs to be more assertive with LeBron on the court10h1:33Why Perk won't blame Kyrie for Mavs' woes8h1:48WHERE EVERY TEAM STANDSNew NFL Power Rankings: Post-free-agency 1-32 poll, plus underrated offseason movesThe free agent frenzy has come and gone. Which teams have improved their 2023 outlook, and which teams have taken a hit?12hNFL Nation reportersIllustration by ESPNTHE BUCK STOPS WITH BELICHICKBruschi: Fair to criticize Bill Belichick for Patriots' struggles10h1:27 Top HeadlinesQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersFavorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InMarch Madness LiveESPNMarch Madness LiveWatch every men's NCAA tournament game live! ICYMI1:42Austin Peay's coach, pitcher and catcher all ejected after retaliation pitchAustin Peay's pitcher, catcher and coach were all ejected after a pitch was thrown at Liberty's Nathan Keeter, who earlier in the game hit a home run and celebrated while running down the third-base line. Men's Tournament ChallengeIllustration by ESPNMen's Tournament ChallengeCheck your bracket(s) in the 2023 Men's Tournament Challenge, which you can follow throughout the Big Dance. Women's Tournament ChallengeIllustration by ESPNWomen's Tournament ChallengeCheck your bracket(s) in the 2023 Women's Tournament Challenge, which you can follow throughout the Big Dance. Best of ESPN+AP Photo/Lynne SladkyFantasy Baseball ESPN+ Cheat Sheet: Sleepers, busts, rookies and closersYou've read their names all preseason long, it'd be a shame to forget them on draft day. The ESPN+ Cheat Sheet is one way to make sure that doesn't happen.Steph Chambers/Getty ImagesPassan's 2023 MLB season preview: Bold predictions and moreOpening Day is just over a week away -- and Jeff Passan has everything you need to know covered from every possible angle.Photo by Bob Kupbens/Icon Sportswire2023 NFL free agency: Best team fits for unsigned playersWhere could Ezekiel Elliott land? Let's match remaining free agents to teams and find fits for two trade candidates.Illustration by ESPN2023 NFL mock draft: Mel Kiper's first-round pick predictionsMel Kiper Jr. makes his predictions for Round 1 of the NFL draft, including projecting a trade in the top five. Trending NowAnne-Marie Sorvin-USA TODAY SBoston Bruins record tracker: Wins, points, milestonesThe B's are on pace for NHL records in wins and points, along with some individual superlatives as well. Follow along here with our updated tracker.Mandatory Credit: William Purnell-USA TODAY Sports2023 NFL full draft order: AFC, NFC team picks for all roundsStarting with the Carolina Panthers at No. 1 overall, here's the entire 2023 NFL draft broken down round by round. How to Watch on ESPN+Gregory Fisher/Icon Sportswire2023 NCAA men's hockey: Results, bracket, how to watchThe matchups in Tampa promise to be thrillers, featuring plenty of star power, high-octane offense and stellar defense.(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+.Hailie Lynch/XFLHow to watch the XFL: 2023 schedule, teams, players, news, moreEvery XFL game will be streamed on ESPN+. Find out when and where else you can watch the eight teams compete. Sign up to play the #1 Fantasy Baseball GameReactivate A LeagueCreate A LeagueJoin a Public LeaguePractice With a Mock DraftSports BettingAP Photo/Mike KropfMarch Madness betting 2023: Bracket odds, lines, tips, moreThe 2023 NCAA tournament brackets have finally been released, and we have everything you need to know to make a bet on all of the March Madness games. Sign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft Now\\n\\nESPN+\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: © ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\", lookup_str='', metadata={'source': 'https://www.espn.com/'}, lookup_index=0)]" + "cell_type": "markdown", + "id": "bf920da0", + "metadata": {}, + "source": [ + "# WebBaseLoader\n", + "\n", + "This covers how to use `WebBaseLoader` to load all text from `HTML` webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as `IMSDbLoader`, `AZLyricsLoader`, and `CollegeConfidentialLoader`" ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "data" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "878179f7", - "metadata": {}, - "outputs": [], - "source": [ - "\"\"\"\n", - "# Use this piece of code for testing new custom BeautifulSoup parsers\n", - "\n", - "import requests\n", - "from bs4 import BeautifulSoup\n", - "\n", - "html_doc = requests.get(\"{INSERT_NEW_URL_HERE}\")\n", - "soup = BeautifulSoup(html_doc.text, 'html.parser')\n", - "\n", - "# Beautiful soup logic to be exported to langchain.document_loaders.webpage.py\n", - "# Example: transcript = soup.select_one(\"td[class='scrtext']\").text\n", - "# BS4 documentation can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/\n", - "\n", - "\"\"\";" - ] - }, - { - "cell_type": "markdown", - "id": "150988e6", - "metadata": {}, - "source": [ - "## Loading multiple webpages\n", - "\n", - "You can also load multiple webpages at once by passing in a list of urls to the loader. This will return a list of documents in the same order as the urls passed in." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "e25bbd3b", - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/plain": [ - "[Document(page_content=\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANCAAMNCAAWNHLSoccer…MLBNCAAFGolfTennisSports BettingBoxingCFLNCAACricketF1HorseLLWSMMANASCARNBA G LeagueOlympic SportsRacingRN BBRN FBRugbyWNBAWorld Baseball ClassicWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nAre you ready for Opening Day? Here's your guide to MLB's offseason chaosWait, Jacob deGrom is on the Rangers now? Xander Bogaerts and Trea Turner signed where? And what about Carlos Correa? Yeah, you're going to need to read up before Opening Day.12hESPNIllustration by ESPNEverything you missed in the MLB offseason3h2:33World Series odds, win totals, props for every teamPlay fantasy baseball for free!TOP HEADLINESQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersLAMAR WANTS OUT OF BALTIMOREMarcus Spears identifies the two teams that need Lamar Jackson the most7h2:00Would Lamar sit out? Will Ravens draft a QB? Jackson trade request insightsLamar Jackson has asked Baltimore to trade him, but Ravens coach John Harbaugh hopes the QB will be back.3hJamison HensleyBallard, Colts will consider trading for QB JacksonJackson to Indy? Washington? Barnwell ranks the QB's trade fitsSNYDER'S TUMULTUOUS 24-YEAR RUNHow Washington’s NFL franchise sank on and off the field under owner Dan SnyderSnyder purchased one of the NFL's marquee franchises in 1999. Twenty-four years later, and with the team up for sale, he leaves a legacy of on-field futility and off-field scandal.13hJohn KeimESPNIOWA STAR STEPS UP AGAINJ-Will: Caitlin Clark is the biggest brand in college sports right now8h0:47'The better the opponent, the better she plays': Clark draws comparisons to TaurasiCaitlin Clark's performance on Sunday had longtime observers going back decades to find comparisons.16hKevin PeltonWOMEN'S ELITE EIGHT SCOREBOARDMONDAY'S GAMESCheck your bracket!NBA DRAFTHow top prospects fared on the road to the Final FourThe 2023 NCAA tournament is down to four teams, and ESPN's Jonathan Givony recaps the players who saw their NBA draft stock change.11hJonathan GivonyAndy Lyons/Getty ImagesTALKING BASKETBALLWhy AD needs to be more assertive with LeBron on the court9h1:33Why Perk won't blame Kyrie for Mavs' woes8h1:48WHERE EVERY TEAM STANDSNew NFL Power Rankings: Post-free-agency 1-32 poll, plus underrated offseason movesThe free agent frenzy has come and gone. Which teams have improved their 2023 outlook, and which teams have taken a hit?12hNFL Nation reportersIllustration by ESPNTHE BUCK STOPS WITH BELICHICKBruschi: Fair to criticize Bill Belichick for Patriots' struggles10h1:27 Top HeadlinesQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersFavorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InMarch Madness LiveESPNMarch Madness LiveWatch every men's NCAA tournament game live! ICYMI1:42Austin Peay's coach, pitcher and catcher all ejected after retaliation pitchAustin Peay's pitcher, catcher and coach were all ejected after a pitch was thrown at Liberty's Nathan Keeter, who earlier in the game hit a home run and celebrated while running down the third-base line. Men's Tournament ChallengeIllustration by ESPNMen's Tournament ChallengeCheck your bracket(s) in the 2023 Men's Tournament Challenge, which you can follow throughout the Big Dance. Women's Tournament ChallengeIllustration by ESPNWomen's Tournament ChallengeCheck your bracket(s) in the 2023 Women's Tournament Challenge, which you can follow throughout the Big Dance. Best of ESPN+AP Photo/Lynne SladkyFantasy Baseball ESPN+ Cheat Sheet: Sleepers, busts, rookies and closersYou've read their names all preseason long, it'd be a shame to forget them on draft day. The ESPN+ Cheat Sheet is one way to make sure that doesn't happen.Steph Chambers/Getty ImagesPassan's 2023 MLB season preview: Bold predictions and moreOpening Day is just over a week away -- and Jeff Passan has everything you need to know covered from every possible angle.Photo by Bob Kupbens/Icon Sportswire2023 NFL free agency: Best team fits for unsigned playersWhere could Ezekiel Elliott land? Let's match remaining free agents to teams and find fits for two trade candidates.Illustration by ESPN2023 NFL mock draft: Mel Kiper's first-round pick predictionsMel Kiper Jr. makes his predictions for Round 1 of the NFL draft, including projecting a trade in the top five. Trending NowAnne-Marie Sorvin-USA TODAY SBoston Bruins record tracker: Wins, points, milestonesThe B's are on pace for NHL records in wins and points, along with some individual superlatives as well. Follow along here with our updated tracker.Mandatory Credit: William Purnell-USA TODAY Sports2023 NFL full draft order: AFC, NFC team picks for all roundsStarting with the Carolina Panthers at No. 1 overall, here's the entire 2023 NFL draft broken down round by round. How to Watch on ESPN+Gregory Fisher/Icon Sportswire2023 NCAA men's hockey: Results, bracket, how to watchThe matchups in Tampa promise to be thrillers, featuring plenty of star power, high-octane offense and stellar defense.(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+.Hailie Lynch/XFLHow to watch the XFL: 2023 schedule, teams, players, news, moreEvery XFL game will be streamed on ESPN+. Find out when and where else you can watch the eight teams compete. Sign up to play the #1 Fantasy Baseball GameReactivate A LeagueCreate A LeagueJoin a Public LeaguePractice With a Mock DraftSports BettingAP Photo/Mike KropfMarch Madness betting 2023: Bracket odds, lines, tips, moreThe 2023 NCAA tournament brackets have finally been released, and we have everything you need to know to make a bet on all of the March Madness games. Sign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft Now\\n\\nESPN+\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: © ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\", lookup_str='', metadata={'source': 'https://www.espn.com/'}, lookup_index=0),\n", - " Document(page_content='GoogleSearch Images Maps Play YouTube News Gmail Drive More »Web History | Settings | Sign in\\xa0Advanced searchAdvertisingBusiness SolutionsAbout Google© 2023 - Privacy - Terms ', lookup_str='', metadata={'source': 'https://google.com'}, lookup_index=0)]" + "cell_type": "code", + "execution_count": 1, + "id": "00b6de21", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import WebBaseLoader" ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "loader = WebBaseLoader([\"https://www.espn.com/\", \"https://google.com\"])\n", - "docs = loader.load()\n", - "docs" - ] - }, - { - "cell_type": "markdown", - "id": "641be294", - "metadata": {}, - "source": [ - "### Load multiple urls concurrently\n", - "\n", - "You can speed up the scraping process by scraping and parsing multiple urls concurrently.\n", - "\n", - "There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the server you are scraping and don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but may cause the server to block you. Be careful!" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "9f9cf30f", - "metadata": {}, - "outputs": [ + }, { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: nest_asyncio in /Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages (1.5.6)\n" - ] - } - ], - "source": [ - "!pip install nest_asyncio\n", - "\n", - "# fixes a bug with asyncio and jupyter\n", - "import nest_asyncio\n", - "\n", - "nest_asyncio.apply()" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "49586eac", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[Document(page_content=\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANCAAMNCAAWNHLSoccer…MLBNCAAFGolfTennisSports BettingBoxingCFLNCAACricketF1HorseLLWSMMANASCARNBA G LeagueOlympic SportsRacingRN BBRN FBRugbyWNBAWorld Baseball ClassicWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nAre you ready for Opening Day? Here's your guide to MLB's offseason chaosWait, Jacob deGrom is on the Rangers now? Xander Bogaerts and Trea Turner signed where? And what about Carlos Correa? Yeah, you're going to need to read up before Opening Day.12hESPNIllustration by ESPNEverything you missed in the MLB offseason3h2:33World Series odds, win totals, props for every teamPlay fantasy baseball for free!TOP HEADLINESQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersLAMAR WANTS OUT OF BALTIMOREMarcus Spears identifies the two teams that need Lamar Jackson the most7h2:00Would Lamar sit out? Will Ravens draft a QB? Jackson trade request insightsLamar Jackson has asked Baltimore to trade him, but Ravens coach John Harbaugh hopes the QB will be back.3hJamison HensleyBallard, Colts will consider trading for QB JacksonJackson to Indy? Washington? Barnwell ranks the QB's trade fitsSNYDER'S TUMULTUOUS 24-YEAR RUNHow Washington’s NFL franchise sank on and off the field under owner Dan SnyderSnyder purchased one of the NFL's marquee franchises in 1999. Twenty-four years later, and with the team up for sale, he leaves a legacy of on-field futility and off-field scandal.13hJohn KeimESPNIOWA STAR STEPS UP AGAINJ-Will: Caitlin Clark is the biggest brand in college sports right now8h0:47'The better the opponent, the better she plays': Clark draws comparisons to TaurasiCaitlin Clark's performance on Sunday had longtime observers going back decades to find comparisons.16hKevin PeltonWOMEN'S ELITE EIGHT SCOREBOARDMONDAY'S GAMESCheck your bracket!NBA DRAFTHow top prospects fared on the road to the Final FourThe 2023 NCAA tournament is down to four teams, and ESPN's Jonathan Givony recaps the players who saw their NBA draft stock change.11hJonathan GivonyAndy Lyons/Getty ImagesTALKING BASKETBALLWhy AD needs to be more assertive with LeBron on the court9h1:33Why Perk won't blame Kyrie for Mavs' woes8h1:48WHERE EVERY TEAM STANDSNew NFL Power Rankings: Post-free-agency 1-32 poll, plus underrated offseason movesThe free agent frenzy has come and gone. Which teams have improved their 2023 outlook, and which teams have taken a hit?12hNFL Nation reportersIllustration by ESPNTHE BUCK STOPS WITH BELICHICKBruschi: Fair to criticize Bill Belichick for Patriots' struggles10h1:27 Top HeadlinesQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersFavorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InMarch Madness LiveESPNMarch Madness LiveWatch every men's NCAA tournament game live! ICYMI1:42Austin Peay's coach, pitcher and catcher all ejected after retaliation pitchAustin Peay's pitcher, catcher and coach were all ejected after a pitch was thrown at Liberty's Nathan Keeter, who earlier in the game hit a home run and celebrated while running down the third-base line. Men's Tournament ChallengeIllustration by ESPNMen's Tournament ChallengeCheck your bracket(s) in the 2023 Men's Tournament Challenge, which you can follow throughout the Big Dance. Women's Tournament ChallengeIllustration by ESPNWomen's Tournament ChallengeCheck your bracket(s) in the 2023 Women's Tournament Challenge, which you can follow throughout the Big Dance. Best of ESPN+AP Photo/Lynne SladkyFantasy Baseball ESPN+ Cheat Sheet: Sleepers, busts, rookies and closersYou've read their names all preseason long, it'd be a shame to forget them on draft day. The ESPN+ Cheat Sheet is one way to make sure that doesn't happen.Steph Chambers/Getty ImagesPassan's 2023 MLB season preview: Bold predictions and moreOpening Day is just over a week away -- and Jeff Passan has everything you need to know covered from every possible angle.Photo by Bob Kupbens/Icon Sportswire2023 NFL free agency: Best team fits for unsigned playersWhere could Ezekiel Elliott land? Let's match remaining free agents to teams and find fits for two trade candidates.Illustration by ESPN2023 NFL mock draft: Mel Kiper's first-round pick predictionsMel Kiper Jr. makes his predictions for Round 1 of the NFL draft, including projecting a trade in the top five. Trending NowAnne-Marie Sorvin-USA TODAY SBoston Bruins record tracker: Wins, points, milestonesThe B's are on pace for NHL records in wins and points, along with some individual superlatives as well. Follow along here with our updated tracker.Mandatory Credit: William Purnell-USA TODAY Sports2023 NFL full draft order: AFC, NFC team picks for all roundsStarting with the Carolina Panthers at No. 1 overall, here's the entire 2023 NFL draft broken down round by round. How to Watch on ESPN+Gregory Fisher/Icon Sportswire2023 NCAA men's hockey: Results, bracket, how to watchThe matchups in Tampa promise to be thrillers, featuring plenty of star power, high-octane offense and stellar defense.(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+.Hailie Lynch/XFLHow to watch the XFL: 2023 schedule, teams, players, news, moreEvery XFL game will be streamed on ESPN+. Find out when and where else you can watch the eight teams compete. Sign up to play the #1 Fantasy Baseball GameReactivate A LeagueCreate A LeagueJoin a Public LeaguePractice With a Mock DraftSports BettingAP Photo/Mike KropfMarch Madness betting 2023: Bracket odds, lines, tips, moreThe 2023 NCAA tournament brackets have finally been released, and we have everything you need to know to make a bet on all of the March Madness games. Sign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft Now\\n\\nESPN+\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: © ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\", lookup_str='', metadata={'source': 'https://www.espn.com/'}, lookup_index=0),\n", - " Document(page_content='GoogleSearch Images Maps Play YouTube News Gmail Drive More »Web History | Settings | Sign in\\xa0Advanced searchAdvertisingBusiness SolutionsAbout Google© 2023 - Privacy - Terms ', lookup_str='', metadata={'source': 'https://google.com'}, lookup_index=0)]" + "cell_type": "code", + "execution_count": 2, + "id": "0231df35", + "metadata": {}, + "outputs": [], + "source": [ + "loader = WebBaseLoader(\"https://www.espn.com/\")" ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "loader = WebBaseLoader([\"https://www.espn.com/\", \"https://google.com\"])\n", - "loader.requests_per_second = 1\n", - "docs = loader.aload()\n", - "docs" - ] - }, - { - "cell_type": "markdown", - "id": "e337b130", - "metadata": {}, - "source": [ - "## Loading a xml file, or using a different BeautifulSoup parser\n", - "\n", - "You can also look at `SitemapLoader` for an example of how to load a sitemap file, which is an example of using this feature." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "16530c50", - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/plain": [ - "[Document(page_content='\\n\\n10\\nEnergy\\n3\\n2018-01-01\\n2018-01-01\\nfalse\\nUniform test method for the measurement of energy efficiency of commercial packaged boilers.\\n§ 431.86\\nSection § 431.86\\n\\nEnergy\\nDEPARTMENT OF ENERGY\\nENERGY CONSERVATION\\nENERGY EFFICIENCY PROGRAM FOR CERTAIN COMMERCIAL AND INDUSTRIAL EQUIPMENT\\nCommercial Packaged Boilers\\nTest Procedures\\n\\n\\n\\n\\n§\\u2009431.86\\nUniform test method for the measurement of energy efficiency of commercial packaged boilers.\\n(a) Scope. This section provides test procedures, pursuant to the Energy Policy and Conservation Act (EPCA), as amended, which must be followed for measuring the combustion efficiency and/or thermal efficiency of a gas- or oil-fired commercial packaged boiler.\\n(b) Testing and Calculations. Determine the thermal efficiency or combustion efficiency of commercial packaged boilers by conducting the appropriate test procedure(s) indicated in Table 1 of this section.\\n\\nTable 1—Test Requirements for Commercial Packaged Boiler Equipment Classes\\n\\nEquipment category\\nSubcategory\\nCertified rated inputBtu/h\\n\\nStandards efficiency metric(§\\u2009431.87)\\n\\nTest procedure(corresponding to\\nstandards efficiency\\nmetric required\\nby §\\u2009431.87)\\n\\n\\n\\nHot Water\\nGas-fired\\n≥300,000 and ≤2,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nHot Water\\nGas-fired\\n>2,500,000\\nCombustion Efficiency\\nAppendix A, Section 3.\\n\\n\\nHot Water\\nOil-fired\\n≥300,000 and ≤2,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nHot Water\\nOil-fired\\n>2,500,000\\nCombustion Efficiency\\nAppendix A, Section 3.\\n\\n\\nSteam\\nGas-fired (all*)\\n≥300,000 and ≤2,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nSteam\\nGas-fired (all*)\\n>2,500,000 and ≤5,000,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\n\\u2003\\n\\n>5,000,000\\nThermal Efficiency\\nAppendix A, Section 2.OR\\nAppendix A, Section 3 with Section 2.4.3.2.\\n\\n\\n\\nSteam\\nOil-fired\\n≥300,000 and ≤2,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nSteam\\nOil-fired\\n>2,500,000 and ≤5,000,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\n\\u2003\\n\\n>5,000,000\\nThermal Efficiency\\nAppendix A, Section 2.OR\\nAppendix A, Section 3. with Section 2.4.3.2.\\n\\n\\n\\n*\\u2009Equipment classes for commercial packaged boilers as of July 22, 2009 (74 FR 36355) distinguish between gas-fired natural draft and all other gas-fired (except natural draft).\\n\\n(c) Field Tests. The field test provisions of appendix A may be used only to test a unit of commercial packaged boiler with rated input greater than 5,000,000 Btu/h.\\n[81 FR 89305, Dec. 9, 2016]\\n\\n\\nEnergy Efficiency Standards\\n\\n', lookup_str='', metadata={'source': 'https://www.govinfo.gov/content/pkg/CFR-2018-title10-vol3/xml/CFR-2018-title10-vol3-sec431-86.xml'}, lookup_index=0)]" + "cell_type": "markdown", + "id": "c162b300-5f4b-4e37-bab3-17f590fc07cc", + "metadata": {}, + "source": [ + "To bypass SSL verification errors during fetching, you can set the \"verify\" option:\n", + "\n", + "loader.requests_kwargs = {'verify':False}" ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "f06bdc4e", + "metadata": {}, + "outputs": [], + "source": [ + "data = loader.load()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "a390d79f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(page_content=\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANCAAMNCAAWNHLSoccer\u2026MLBNCAAFGolfTennisSports BettingBoxingCFLNCAACricketF1HorseLLWSMMANASCARNBA G LeagueOlympic SportsRacingRN BBRN FBRugbyWNBAWorld Baseball ClassicWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nAre you ready for Opening Day? Here's your guide to MLB's offseason chaosWait, Jacob deGrom is on the Rangers now? Xander Bogaerts and Trea Turner signed where? And what about Carlos Correa? Yeah, you're going to need to read up before Opening Day.12hESPNIllustration by ESPNEverything you missed in the MLB offseason3h2:33World Series odds, win totals, props for every teamPlay fantasy baseball for free!TOP HEADLINESQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersLAMAR WANTS OUT OF BALTIMOREMarcus Spears identifies the two teams that need Lamar Jackson the most8h2:00Would Lamar sit out? Will Ravens draft a QB? Jackson trade request insightsLamar Jackson has asked Baltimore to trade him, but Ravens coach John Harbaugh hopes the QB will be back.3hJamison HensleyBallard, Colts will consider trading for QB JacksonJackson to Indy? Washington? Barnwell ranks the QB's trade fitsSNYDER'S TUMULTUOUS 24-YEAR RUNHow Washington\u2019s NFL franchise sank on and off the field under owner Dan SnyderSnyder purchased one of the NFL's marquee franchises in 1999. Twenty-four years later, and with the team up for sale, he leaves a legacy of on-field futility and off-field scandal.13hJohn KeimESPNIOWA STAR STEPS UP AGAINJ-Will: Caitlin Clark is the biggest brand in college sports right now8h0:47'The better the opponent, the better she plays': Clark draws comparisons to TaurasiCaitlin Clark's performance on Sunday had longtime observers going back decades to find comparisons.16hKevin PeltonWOMEN'S ELITE EIGHT SCOREBOARDMONDAY'S GAMESCheck your bracket!NBA DRAFTHow top prospects fared on the road to the Final FourThe 2023 NCAA tournament is down to four teams, and ESPN's Jonathan Givony recaps the players who saw their NBA draft stock change.11hJonathan GivonyAndy Lyons/Getty ImagesTALKING BASKETBALLWhy AD needs to be more assertive with LeBron on the court10h1:33Why Perk won't blame Kyrie for Mavs' woes8h1:48WHERE EVERY TEAM STANDSNew NFL Power Rankings: Post-free-agency 1-32 poll, plus underrated offseason movesThe free agent frenzy has come and gone. Which teams have improved their 2023 outlook, and which teams have taken a hit?12hNFL Nation reportersIllustration by ESPNTHE BUCK STOPS WITH BELICHICKBruschi: Fair to criticize Bill Belichick for Patriots' struggles10h1:27 Top HeadlinesQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersFavorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InMarch Madness LiveESPNMarch Madness LiveWatch every men's NCAA tournament game live! ICYMI1:42Austin Peay's coach, pitcher and catcher all ejected after retaliation pitchAustin Peay's pitcher, catcher and coach were all ejected after a pitch was thrown at Liberty's Nathan Keeter, who earlier in the game hit a home run and celebrated while running down the third-base line. Men's Tournament ChallengeIllustration by ESPNMen's Tournament ChallengeCheck your bracket(s) in the 2023 Men's Tournament Challenge, which you can follow throughout the Big Dance. Women's Tournament ChallengeIllustration by ESPNWomen's Tournament ChallengeCheck your bracket(s) in the 2023 Women's Tournament Challenge, which you can follow throughout the Big Dance. Best of ESPN+AP Photo/Lynne SladkyFantasy Baseball ESPN+ Cheat Sheet: Sleepers, busts, rookies and closersYou've read their names all preseason long, it'd be a shame to forget them on draft day. The ESPN+ Cheat Sheet is one way to make sure that doesn't happen.Steph Chambers/Getty ImagesPassan's 2023 MLB season preview: Bold predictions and moreOpening Day is just over a week away -- and Jeff Passan has everything you need to know covered from every possible angle.Photo by Bob Kupbens/Icon Sportswire2023 NFL free agency: Best team fits for unsigned playersWhere could Ezekiel Elliott land? Let's match remaining free agents to teams and find fits for two trade candidates.Illustration by ESPN2023 NFL mock draft: Mel Kiper's first-round pick predictionsMel Kiper Jr. makes his predictions for Round 1 of the NFL draft, including projecting a trade in the top five. Trending NowAnne-Marie Sorvin-USA TODAY SBoston Bruins record tracker: Wins, points, milestonesThe B's are on pace for NHL records in wins and points, along with some individual superlatives as well. Follow along here with our updated tracker.Mandatory Credit: William Purnell-USA TODAY Sports2023 NFL full draft order: AFC, NFC team picks for all roundsStarting with the Carolina Panthers at No. 1 overall, here's the entire 2023 NFL draft broken down round by round. How to Watch on ESPN+Gregory Fisher/Icon Sportswire2023 NCAA men's hockey: Results, bracket, how to watchThe matchups in Tampa promise to be thrillers, featuring plenty of star power, high-octane offense and stellar defense.(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+.Hailie Lynch/XFLHow to watch the XFL: 2023 schedule, teams, players, news, moreEvery XFL game will be streamed on ESPN+. Find out when and where else you can watch the eight teams compete. Sign up to play the #1 Fantasy Baseball GameReactivate A LeagueCreate A LeagueJoin a Public LeaguePractice With a Mock DraftSports BettingAP Photo/Mike KropfMarch Madness betting 2023: Bracket odds, lines, tips, moreThe 2023 NCAA tournament brackets have finally been released, and we have everything you need to know to make a bet on all of the March Madness games. Sign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft Now\\n\\nESPN+\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: \u00a9 ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\", lookup_str='', metadata={'source': 'https://www.espn.com/'}, lookup_index=0)]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "878179f7", + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "# Use this piece of code for testing new custom BeautifulSoup parsers\n", + "\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "\n", + "html_doc = requests.get(\"{INSERT_NEW_URL_HERE}\")\n", + "soup = BeautifulSoup(html_doc.text, 'html.parser')\n", + "\n", + "# Beautiful soup logic to be exported to langchain.document_loaders.webpage.py\n", + "# Example: transcript = soup.select_one(\"td[class='scrtext']\").text\n", + "# BS4 documentation can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/\n", + "\n", + "\"\"\";" + ] + }, + { + "cell_type": "markdown", + "id": "150988e6", + "metadata": {}, + "source": [ + "## Loading multiple webpages\n", + "\n", + "You can also load multiple webpages at once by passing in a list of urls to the loader. This will return a list of documents in the same order as the urls passed in." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "e25bbd3b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(page_content=\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANCAAMNCAAWNHLSoccer\u2026MLBNCAAFGolfTennisSports BettingBoxingCFLNCAACricketF1HorseLLWSMMANASCARNBA G LeagueOlympic SportsRacingRN BBRN FBRugbyWNBAWorld Baseball ClassicWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nAre you ready for Opening Day? Here's your guide to MLB's offseason chaosWait, Jacob deGrom is on the Rangers now? Xander Bogaerts and Trea Turner signed where? And what about Carlos Correa? Yeah, you're going to need to read up before Opening Day.12hESPNIllustration by ESPNEverything you missed in the MLB offseason3h2:33World Series odds, win totals, props for every teamPlay fantasy baseball for free!TOP HEADLINESQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersLAMAR WANTS OUT OF BALTIMOREMarcus Spears identifies the two teams that need Lamar Jackson the most7h2:00Would Lamar sit out? Will Ravens draft a QB? Jackson trade request insightsLamar Jackson has asked Baltimore to trade him, but Ravens coach John Harbaugh hopes the QB will be back.3hJamison HensleyBallard, Colts will consider trading for QB JacksonJackson to Indy? Washington? Barnwell ranks the QB's trade fitsSNYDER'S TUMULTUOUS 24-YEAR RUNHow Washington\u2019s NFL franchise sank on and off the field under owner Dan SnyderSnyder purchased one of the NFL's marquee franchises in 1999. Twenty-four years later, and with the team up for sale, he leaves a legacy of on-field futility and off-field scandal.13hJohn KeimESPNIOWA STAR STEPS UP AGAINJ-Will: Caitlin Clark is the biggest brand in college sports right now8h0:47'The better the opponent, the better she plays': Clark draws comparisons to TaurasiCaitlin Clark's performance on Sunday had longtime observers going back decades to find comparisons.16hKevin PeltonWOMEN'S ELITE EIGHT SCOREBOARDMONDAY'S GAMESCheck your bracket!NBA DRAFTHow top prospects fared on the road to the Final FourThe 2023 NCAA tournament is down to four teams, and ESPN's Jonathan Givony recaps the players who saw their NBA draft stock change.11hJonathan GivonyAndy Lyons/Getty ImagesTALKING BASKETBALLWhy AD needs to be more assertive with LeBron on the court9h1:33Why Perk won't blame Kyrie for Mavs' woes8h1:48WHERE EVERY TEAM STANDSNew NFL Power Rankings: Post-free-agency 1-32 poll, plus underrated offseason movesThe free agent frenzy has come and gone. Which teams have improved their 2023 outlook, and which teams have taken a hit?12hNFL Nation reportersIllustration by ESPNTHE BUCK STOPS WITH BELICHICKBruschi: Fair to criticize Bill Belichick for Patriots' struggles10h1:27 Top HeadlinesQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersFavorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InMarch Madness LiveESPNMarch Madness LiveWatch every men's NCAA tournament game live! ICYMI1:42Austin Peay's coach, pitcher and catcher all ejected after retaliation pitchAustin Peay's pitcher, catcher and coach were all ejected after a pitch was thrown at Liberty's Nathan Keeter, who earlier in the game hit a home run and celebrated while running down the third-base line. Men's Tournament ChallengeIllustration by ESPNMen's Tournament ChallengeCheck your bracket(s) in the 2023 Men's Tournament Challenge, which you can follow throughout the Big Dance. Women's Tournament ChallengeIllustration by ESPNWomen's Tournament ChallengeCheck your bracket(s) in the 2023 Women's Tournament Challenge, which you can follow throughout the Big Dance. Best of ESPN+AP Photo/Lynne SladkyFantasy Baseball ESPN+ Cheat Sheet: Sleepers, busts, rookies and closersYou've read their names all preseason long, it'd be a shame to forget them on draft day. The ESPN+ Cheat Sheet is one way to make sure that doesn't happen.Steph Chambers/Getty ImagesPassan's 2023 MLB season preview: Bold predictions and moreOpening Day is just over a week away -- and Jeff Passan has everything you need to know covered from every possible angle.Photo by Bob Kupbens/Icon Sportswire2023 NFL free agency: Best team fits for unsigned playersWhere could Ezekiel Elliott land? Let's match remaining free agents to teams and find fits for two trade candidates.Illustration by ESPN2023 NFL mock draft: Mel Kiper's first-round pick predictionsMel Kiper Jr. makes his predictions for Round 1 of the NFL draft, including projecting a trade in the top five. Trending NowAnne-Marie Sorvin-USA TODAY SBoston Bruins record tracker: Wins, points, milestonesThe B's are on pace for NHL records in wins and points, along with some individual superlatives as well. Follow along here with our updated tracker.Mandatory Credit: William Purnell-USA TODAY Sports2023 NFL full draft order: AFC, NFC team picks for all roundsStarting with the Carolina Panthers at No. 1 overall, here's the entire 2023 NFL draft broken down round by round. How to Watch on ESPN+Gregory Fisher/Icon Sportswire2023 NCAA men's hockey: Results, bracket, how to watchThe matchups in Tampa promise to be thrillers, featuring plenty of star power, high-octane offense and stellar defense.(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+.Hailie Lynch/XFLHow to watch the XFL: 2023 schedule, teams, players, news, moreEvery XFL game will be streamed on ESPN+. Find out when and where else you can watch the eight teams compete. Sign up to play the #1 Fantasy Baseball GameReactivate A LeagueCreate A LeagueJoin a Public LeaguePractice With a Mock DraftSports BettingAP Photo/Mike KropfMarch Madness betting 2023: Bracket odds, lines, tips, moreThe 2023 NCAA tournament brackets have finally been released, and we have everything you need to know to make a bet on all of the March Madness games. Sign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft Now\\n\\nESPN+\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: \u00a9 ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\", lookup_str='', metadata={'source': 'https://www.espn.com/'}, lookup_index=0),\n", + " Document(page_content='GoogleSearch Images Maps Play YouTube News Gmail Drive More \u00bbWeb History | Settings | Sign in\\xa0Advanced searchAdvertisingBusiness SolutionsAbout Google\u00a9 2023 - Privacy - Terms ', lookup_str='', metadata={'source': 'https://google.com'}, lookup_index=0)]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "loader = WebBaseLoader([\"https://www.espn.com/\", \"https://google.com\"])\n", + "docs = loader.load()\n", + "docs" + ] + }, + { + "cell_type": "markdown", + "id": "641be294", + "metadata": {}, + "source": [ + "### Load multiple urls concurrently\n", + "\n", + "You can speed up the scraping process by scraping and parsing multiple urls concurrently.\n", + "\n", + "There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the server you are scraping and don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but may cause the server to block you. Be careful!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "9f9cf30f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: nest_asyncio in /Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages (1.5.6)\n" + ] + } + ], + "source": [ + "!pip install nest_asyncio\n", + "\n", + "# fixes a bug with asyncio and jupyter\n", + "import nest_asyncio\n", + "\n", + "nest_asyncio.apply()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "49586eac", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(page_content=\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANCAAMNCAAWNHLSoccer\u2026MLBNCAAFGolfTennisSports BettingBoxingCFLNCAACricketF1HorseLLWSMMANASCARNBA G LeagueOlympic SportsRacingRN BBRN FBRugbyWNBAWorld Baseball ClassicWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nAre you ready for Opening Day? Here's your guide to MLB's offseason chaosWait, Jacob deGrom is on the Rangers now? Xander Bogaerts and Trea Turner signed where? And what about Carlos Correa? Yeah, you're going to need to read up before Opening Day.12hESPNIllustration by ESPNEverything you missed in the MLB offseason3h2:33World Series odds, win totals, props for every teamPlay fantasy baseball for free!TOP HEADLINESQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersLAMAR WANTS OUT OF BALTIMOREMarcus Spears identifies the two teams that need Lamar Jackson the most7h2:00Would Lamar sit out? Will Ravens draft a QB? Jackson trade request insightsLamar Jackson has asked Baltimore to trade him, but Ravens coach John Harbaugh hopes the QB will be back.3hJamison HensleyBallard, Colts will consider trading for QB JacksonJackson to Indy? Washington? Barnwell ranks the QB's trade fitsSNYDER'S TUMULTUOUS 24-YEAR RUNHow Washington\u2019s NFL franchise sank on and off the field under owner Dan SnyderSnyder purchased one of the NFL's marquee franchises in 1999. Twenty-four years later, and with the team up for sale, he leaves a legacy of on-field futility and off-field scandal.13hJohn KeimESPNIOWA STAR STEPS UP AGAINJ-Will: Caitlin Clark is the biggest brand in college sports right now8h0:47'The better the opponent, the better she plays': Clark draws comparisons to TaurasiCaitlin Clark's performance on Sunday had longtime observers going back decades to find comparisons.16hKevin PeltonWOMEN'S ELITE EIGHT SCOREBOARDMONDAY'S GAMESCheck your bracket!NBA DRAFTHow top prospects fared on the road to the Final FourThe 2023 NCAA tournament is down to four teams, and ESPN's Jonathan Givony recaps the players who saw their NBA draft stock change.11hJonathan GivonyAndy Lyons/Getty ImagesTALKING BASKETBALLWhy AD needs to be more assertive with LeBron on the court9h1:33Why Perk won't blame Kyrie for Mavs' woes8h1:48WHERE EVERY TEAM STANDSNew NFL Power Rankings: Post-free-agency 1-32 poll, plus underrated offseason movesThe free agent frenzy has come and gone. Which teams have improved their 2023 outlook, and which teams have taken a hit?12hNFL Nation reportersIllustration by ESPNTHE BUCK STOPS WITH BELICHICKBruschi: Fair to criticize Bill Belichick for Patriots' struggles10h1:27 Top HeadlinesQB Jackson has requested trade from RavensSources: Texas hiring Terry as full-time coachJets GM: No rush on Rodgers; Lamar not optionLove to leave North Carolina, enter transfer portalBelichick to angsty Pats fans: See last 25 yearsEmbiid out, Harden due back vs. Jokic, NuggetsLynch: Purdy 'earned the right' to start for NinersMan Utd, Wrexham plan July friendly in San DiegoOn paper, Padres overtake DodgersFavorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InMarch Madness LiveESPNMarch Madness LiveWatch every men's NCAA tournament game live! ICYMI1:42Austin Peay's coach, pitcher and catcher all ejected after retaliation pitchAustin Peay's pitcher, catcher and coach were all ejected after a pitch was thrown at Liberty's Nathan Keeter, who earlier in the game hit a home run and celebrated while running down the third-base line. Men's Tournament ChallengeIllustration by ESPNMen's Tournament ChallengeCheck your bracket(s) in the 2023 Men's Tournament Challenge, which you can follow throughout the Big Dance. Women's Tournament ChallengeIllustration by ESPNWomen's Tournament ChallengeCheck your bracket(s) in the 2023 Women's Tournament Challenge, which you can follow throughout the Big Dance. Best of ESPN+AP Photo/Lynne SladkyFantasy Baseball ESPN+ Cheat Sheet: Sleepers, busts, rookies and closersYou've read their names all preseason long, it'd be a shame to forget them on draft day. The ESPN+ Cheat Sheet is one way to make sure that doesn't happen.Steph Chambers/Getty ImagesPassan's 2023 MLB season preview: Bold predictions and moreOpening Day is just over a week away -- and Jeff Passan has everything you need to know covered from every possible angle.Photo by Bob Kupbens/Icon Sportswire2023 NFL free agency: Best team fits for unsigned playersWhere could Ezekiel Elliott land? Let's match remaining free agents to teams and find fits for two trade candidates.Illustration by ESPN2023 NFL mock draft: Mel Kiper's first-round pick predictionsMel Kiper Jr. makes his predictions for Round 1 of the NFL draft, including projecting a trade in the top five. Trending NowAnne-Marie Sorvin-USA TODAY SBoston Bruins record tracker: Wins, points, milestonesThe B's are on pace for NHL records in wins and points, along with some individual superlatives as well. Follow along here with our updated tracker.Mandatory Credit: William Purnell-USA TODAY Sports2023 NFL full draft order: AFC, NFC team picks for all roundsStarting with the Carolina Panthers at No. 1 overall, here's the entire 2023 NFL draft broken down round by round. How to Watch on ESPN+Gregory Fisher/Icon Sportswire2023 NCAA men's hockey: Results, bracket, how to watchThe matchups in Tampa promise to be thrillers, featuring plenty of star power, high-octane offense and stellar defense.(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+.Hailie Lynch/XFLHow to watch the XFL: 2023 schedule, teams, players, news, moreEvery XFL game will be streamed on ESPN+. Find out when and where else you can watch the eight teams compete. Sign up to play the #1 Fantasy Baseball GameReactivate A LeagueCreate A LeagueJoin a Public LeaguePractice With a Mock DraftSports BettingAP Photo/Mike KropfMarch Madness betting 2023: Bracket odds, lines, tips, moreThe 2023 NCAA tournament brackets have finally been released, and we have everything you need to know to make a bet on all of the March Madness games. Sign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft Now\\n\\nESPN+\\n\\n\\n\\n\\nNHL: Select Games\\n\\n\\n\\n\\n\\n\\n\\nXFL\\n\\n\\n\\n\\n\\n\\n\\nMLB: Select Games\\n\\n\\n\\n\\n\\n\\n\\nNCAA Baseball\\n\\n\\n\\n\\n\\n\\n\\nNCAA Softball\\n\\n\\n\\n\\n\\n\\n\\nCricket: Select Matches\\n\\n\\n\\n\\n\\n\\n\\nMel Kiper's NFL Mock Draft 3.0\\n\\n\\nQuick Links\\n\\n\\n\\n\\nMen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nWomen's Tournament Challenge\\n\\n\\n\\n\\n\\n\\n\\nNFL Draft Order\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch NHL Games\\n\\n\\n\\n\\n\\n\\n\\nFantasy Baseball: Sign Up\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: \u00a9 ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\", lookup_str='', metadata={'source': 'https://www.espn.com/'}, lookup_index=0),\n", + " Document(page_content='GoogleSearch Images Maps Play YouTube News Gmail Drive More \u00bbWeb History | Settings | Sign in\\xa0Advanced searchAdvertisingBusiness SolutionsAbout Google\u00a9 2023 - Privacy - Terms ', lookup_str='', metadata={'source': 'https://google.com'}, lookup_index=0)]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "loader = WebBaseLoader([\"https://www.espn.com/\", \"https://google.com\"])\n", + "loader.requests_per_second = 1\n", + "docs = loader.aload()\n", + "docs" + ] + }, + { + "cell_type": "markdown", + "id": "e337b130", + "metadata": {}, + "source": [ + "## Loading a xml file, or using a different BeautifulSoup parser\n", + "\n", + "You can also look at `SitemapLoader` for an example of how to load a sitemap file, which is an example of using this feature." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "16530c50", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(page_content='\\n\\n10\\nEnergy\\n3\\n2018-01-01\\n2018-01-01\\nfalse\\nUniform test method for the measurement of energy efficiency of commercial packaged boilers.\\n\u00c2\u00a7 431.86\\nSection \u00c2\u00a7 431.86\\n\\nEnergy\\nDEPARTMENT OF ENERGY\\nENERGY CONSERVATION\\nENERGY EFFICIENCY PROGRAM FOR CERTAIN COMMERCIAL AND INDUSTRIAL EQUIPMENT\\nCommercial Packaged Boilers\\nTest Procedures\\n\\n\\n\\n\\n\u00a7\\u2009431.86\\nUniform test method for the measurement of energy efficiency of commercial packaged boilers.\\n(a) Scope. This section provides test procedures, pursuant to the Energy Policy and Conservation Act (EPCA), as amended, which must be followed for measuring the combustion efficiency and/or thermal efficiency of a gas- or oil-fired commercial packaged boiler.\\n(b) Testing and Calculations. Determine the thermal efficiency or combustion efficiency of commercial packaged boilers by conducting the appropriate test procedure(s) indicated in Table 1 of this section.\\n\\nTable 1\u2014Test Requirements for Commercial Packaged Boiler Equipment Classes\\n\\nEquipment category\\nSubcategory\\nCertified rated inputBtu/h\\n\\nStandards efficiency metric(\u00a7\\u2009431.87)\\n\\nTest procedure(corresponding to\\nstandards efficiency\\nmetric required\\nby \u00a7\\u2009431.87)\\n\\n\\n\\nHot Water\\nGas-fired\\n\u2265300,000 and \u22642,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nHot Water\\nGas-fired\\n>2,500,000\\nCombustion Efficiency\\nAppendix A, Section 3.\\n\\n\\nHot Water\\nOil-fired\\n\u2265300,000 and \u22642,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nHot Water\\nOil-fired\\n>2,500,000\\nCombustion Efficiency\\nAppendix A, Section 3.\\n\\n\\nSteam\\nGas-fired (all*)\\n\u2265300,000 and \u22642,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nSteam\\nGas-fired (all*)\\n>2,500,000 and \u22645,000,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\n\\u2003\\n\\n>5,000,000\\nThermal Efficiency\\nAppendix A, Section 2.OR\\nAppendix A, Section 3 with Section 2.4.3.2.\\n\\n\\n\\nSteam\\nOil-fired\\n\u2265300,000 and \u22642,500,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\nSteam\\nOil-fired\\n>2,500,000 and \u22645,000,000\\nThermal Efficiency\\nAppendix A, Section 2.\\n\\n\\n\\u2003\\n\\n>5,000,000\\nThermal Efficiency\\nAppendix A, Section 2.OR\\nAppendix A, Section 3. with Section 2.4.3.2.\\n\\n\\n\\n*\\u2009Equipment classes for commercial packaged boilers as of July 22, 2009 (74 FR 36355) distinguish between gas-fired natural draft and all other gas-fired (except natural draft).\\n\\n(c) Field Tests. The field test provisions of appendix A may be used only to test a unit of commercial packaged boiler with rated input greater than 5,000,000 Btu/h.\\n[81 FR 89305, Dec. 9, 2016]\\n\\n\\nEnergy Efficiency Standards\\n\\n', lookup_str='', metadata={'source': 'https://www.govinfo.gov/content/pkg/CFR-2018-title10-vol3/xml/CFR-2018-title10-vol3-sec431-86.xml'}, lookup_index=0)]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "loader = WebBaseLoader(\n", + " \"https://www.govinfo.gov/content/pkg/CFR-2018-title10-vol3/xml/CFR-2018-title10-vol3-sec431-86.xml\"\n", + ")\n", + "loader.default_parser = \"xml\"\n", + "docs = loader.load()\n", + "docs" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Using proxies\n", + "\n", + "Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them." + ], + "metadata": { + "collapsed": false + }, + "id": "672264ad" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "loader = WebBaseLoader(\n", + " \"https://www.walmart.com/search?q=parrots\", proxies={\n", + " \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n", + " \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n", + " }\n", + ")\n", + "docs = loader.load()\n" + ], + "metadata": { + "collapsed": false + }, + "id": "9caf0310" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" } - ], - "source": [ - "loader = WebBaseLoader(\n", - " \"https://www.govinfo.gov/content/pkg/CFR-2018-title10-vol3/xml/CFR-2018-title10-vol3-sec431-86.xml\"\n", - ")\n", - "loader.default_parser = \"xml\"\n", - "docs = loader.load()\n", - "docs" - ] }, - { - "cell_type": "markdown", - "source": [ - "## Using proxies\n", - "\n", - "Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "loader = WebBaseLoader(\n", - " \"https://www.walmart.com/search?q=parrots\", proxies={\n", - " \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n", - " \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n", - " }\n", - ")\n", - "docs = loader.load()\n" - ], - "metadata": { - "collapsed": false - } - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/docs/extras/modules/data_connection/vectorstores/integrations/qdrant.ipynb b/docs/extras/modules/data_connection/vectorstores/integrations/qdrant.ipynb index 7d3e64ffc..1b981466e 100644 --- a/docs/extras/modules/data_connection/vectorstores/integrations/qdrant.ipynb +++ b/docs/extras/modules/data_connection/vectorstores/integrations/qdrant.ipynb @@ -1,749 +1,752 @@ { - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "id": "683953b3", - "metadata": {}, - "source": [ - "# Qdrant\n", - "\n", - ">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. `Qdrant` is tailored to extended filtering support. It makes it useful for all sorts of neural network or semantic-based matching, faceted search, and other applications.\n", - "\n", - "\n", - "This notebook shows how to use functionality related to the `Qdrant` vector database. \n", - "\n", - "There are various modes of how to run `Qdrant`, and depending on the chosen one, there will be some subtle differences. The options include:\n", - "- Local mode, no server required\n", - "- On-premise server deployment\n", - "- Qdrant Cloud\n", - "\n", - "See the [installation instructions](https://qdrant.tech/documentation/install/)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e03e8460-8f32-4d1f-bb93-4f7636a476fa", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "!pip install qdrant-client" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5", - "metadata": {}, - "source": [ - "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "082e7e8b-ac52-430c-98d6-8f0924457642", - "metadata": { - "tags": [] - }, - "outputs": [ + "cells": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "OpenAI API Key: ········\n" - ] - } - ], - "source": [ - "import os\n", - "import getpass\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "aac9563e", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:22.282884Z", - "start_time": "2023-04-04T10:51:21.408077Z" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "from langchain.embeddings.openai import OpenAIEmbeddings\n", - "from langchain.text_splitter import CharacterTextSplitter\n", - "from langchain.vectorstores import Qdrant\n", - "from langchain.document_loaders import TextLoader" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "a3c3999a", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:22.520144Z", - "start_time": "2023-04-04T10:51:22.285826Z" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "loader = TextLoader(\"../../../state_of_the_union.txt\")\n", - "documents = loader.load()\n", - "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", - "docs = text_splitter.split_documents(documents)\n", - "\n", - "embeddings = OpenAIEmbeddings()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "eeead681", - "metadata": {}, - "source": [ - "## Connecting to Qdrant from LangChain\n", - "\n", - "### Local mode\n", - "\n", - "Python client allows you to run the same code in local mode without running the Qdrant server. That's great for testing things out and debugging or if you plan to store just a small amount of vectors. The embeddings might be fully kepy in memory or persisted on disk.\n", - "\n", - "#### In-memory\n", - "\n", - "For some testing scenarios and quick experiments, you may prefer to keep all the data in memory only, so it gets lost when the client is destroyed - usually at the end of your script/notebook." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "8429667e", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:22.525091Z", - "start_time": "2023-04-04T10:51:22.522015Z" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "qdrant = Qdrant.from_documents(\n", - " docs,\n", - " embeddings,\n", - " location=\":memory:\", # Local mode with in-memory storage only\n", - " collection_name=\"my_documents\",\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "59f0b954", - "metadata": {}, - "source": [ - "#### On-disk storage\n", - "\n", - "Local mode, without using the Qdrant server, may also store your vectors on disk so they're persisted between runs." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "24b370e2", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:24.827567Z", - "start_time": "2023-04-04T10:51:22.529080Z" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "qdrant = Qdrant.from_documents(\n", - " docs,\n", - " embeddings,\n", - " path=\"/tmp/local_qdrant\",\n", - " collection_name=\"my_documents\",\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "749658ce", - "metadata": {}, - "source": [ - "### On-premise server deployment\n", - "\n", - "No matter if you choose to launch Qdrant locally with [a Docker container](https://qdrant.tech/documentation/install/), or select a Kubernetes deployment with [the official Helm chart](https://github.com/qdrant/qdrant-helm), the way you're going to connect to such an instance will be identical. You'll need to provide a URL pointing to the service." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "91e7f5ce", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:24.832708Z", - "start_time": "2023-04-04T10:51:24.829905Z" - } - }, - "outputs": [], - "source": [ - "url = \"<---qdrant url here --->\"\n", - "qdrant = Qdrant.from_documents(\n", - " docs,\n", - " embeddings,\n", - " url,\n", - " prefer_grpc=True,\n", - " collection_name=\"my_documents\",\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "c9e21ce9", - "metadata": {}, - "source": [ - "### Qdrant Cloud\n", - "\n", - "If you prefer not to keep yourself busy with managing the infrastructure, you can choose to set up a fully-managed Qdrant cluster on [Qdrant Cloud](https://cloud.qdrant.io/). There is a free forever 1GB cluster included for trying out. The main difference with using a managed version of Qdrant is that you'll need to provide an API key to secure your deployment from being accessed publicly." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "dcf88bdf", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:24.837599Z", - "start_time": "2023-04-04T10:51:24.834690Z" - } - }, - "outputs": [], - "source": [ - "url = \"<---qdrant cloud cluster url here --->\"\n", - "api_key = \"<---api key here--->\"\n", - "qdrant = Qdrant.from_documents(\n", - " docs,\n", - " embeddings,\n", - " url,\n", - " prefer_grpc=True,\n", - " api_key=api_key,\n", - " collection_name=\"my_documents\",\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "93540013", - "metadata": {}, - "source": [ - "## Reusing the same collection\n", - "\n", - "Both `Qdrant.from_texts` and `Qdrant.from_documents` methods are great to start using Qdrant with LangChain, but **they are going to destroy the collection and create it from scratch**! If you want to reuse the existing collection, you can always create an instance of `Qdrant` on your own and pass the `QdrantClient` instance with the connection details." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "b7b432d7", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:24.843090Z", - "start_time": "2023-04-04T10:51:24.840041Z" - } - }, - "outputs": [], - "source": [ - "del qdrant" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "30a87570", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:24.854117Z", - "start_time": "2023-04-04T10:51:24.845385Z" - } - }, - "outputs": [], - "source": [ - "import qdrant_client\n", - "\n", - "client = qdrant_client.QdrantClient(path=\"/tmp/local_qdrant\", prefer_grpc=True)\n", - "qdrant = Qdrant(client=client, collection_name=\"my_documents\", embeddings=embeddings)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "1f9215c8", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T09:27:29.920258Z", - "start_time": "2023-04-04T09:27:29.913714Z" - } - }, - "source": [ - "## Similarity search\n", - "\n", - "The simplest scenario for using Qdrant vector store is to perform a similarity search. Under the hood, our query will be encoded with the `embedding_function` and used to find similar documents in Qdrant collection." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "a8c513ab", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:25.204469Z", - "start_time": "2023-04-04T10:51:24.855618Z" - }, - "tags": [] - }, - "outputs": [], - "source": [ - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "found_docs = qdrant.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "fc516993", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:25.220984Z", - "start_time": "2023-04-04T10:51:25.213943Z" - }, - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", - "\n", - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" - ] - } - ], - "source": [ - "print(found_docs[0].page_content)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "1bda9bf5", - "metadata": {}, - "source": [ - "## Similarity search with score\n", - "\n", - "Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result. \n", - "The returned distance score is cosine distance. Therefore, a lower score is better." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "8804a21d", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:25.631585Z", - "start_time": "2023-04-04T10:51:25.227384Z" - } - }, - "outputs": [], - "source": [ - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "found_docs = qdrant.similarity_search_with_score(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "756a6887", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:25.642282Z", - "start_time": "2023-04-04T10:51:25.635947Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", - "\n", - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n", - "\n", - "Score: 0.8153784913324512\n" - ] - } - ], - "source": [ - "document, score = found_docs[0]\n", - "print(document.page_content)\n", - "print(f\"\\nScore: {score}\")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "525e3582", - "metadata": {}, - "source": [ - "### Metadata filtering\n", - "\n", - "Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "1c2c58dc", - "metadata": {}, - "source": [ - "```python\n", - "from qdrant_client.http import models as rest\n", - "\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))\n", - "```" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "c58c30bf", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:39:53.032744Z", - "start_time": "2023-04-04T10:39:53.028673Z" - } - }, - "source": [ - "## Maximum marginal relevance search (MMR)\n", - "\n", - "If you'd like to look up for some similar documents, but you'd also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "76810fb6", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:26.010947Z", - "start_time": "2023-04-04T10:51:25.647687Z" - } - }, - "outputs": [], - "source": [ - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "found_docs = qdrant.max_marginal_relevance_search(query, k=2, fetch_k=10)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "80c6db11", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:26.016979Z", - "start_time": "2023-04-04T10:51:26.013329Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", - "\n", - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n", - "\n", - "2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n", - "\n", - "I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n", - "\n", - "They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n", - "\n", - "Officer Mora was 27 years old. \n", - "\n", - "Officer Rivera was 22. \n", - "\n", - "Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n", - "\n", - "I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n", - "\n", - "I’ve worked on these issues a long time. \n", - "\n", - "I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n", - "\n" - ] - } - ], - "source": [ - "for i, doc in enumerate(found_docs):\n", - " print(f\"{i + 1}.\", doc.page_content, \"\\n\")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "691a82d6", - "metadata": {}, - "source": [ - "## Qdrant as a Retriever\n", - "\n", - "Qdrant, as all the other vector stores, is a LangChain Retriever, by using cosine similarity. " - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "9427195f", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:26.031451Z", - "start_time": "2023-04-04T10:51:26.018763Z" - } - }, - "outputs": [ - { - "data": { - "text/plain": [ - "VectorStoreRetriever(vectorstore=, search_type='similarity', search_kwargs={})" + "attachments": {}, + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# Qdrant\n", + "\n", + ">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. `Qdrant` is tailored to extended filtering support. It makes it useful for all sorts of neural network or semantic-based matching, faceted search, and other applications.\n", + "\n", + "\n", + "This notebook shows how to use functionality related to the `Qdrant` vector database. \n", + "\n", + "There are various modes of how to run `Qdrant`, and depending on the chosen one, there will be some subtle differences. The options include:\n", + "- Local mode, no server required\n", + "- On-premise server deployment\n", + "- Qdrant Cloud\n", + "\n", + "See the [installation instructions](https://qdrant.tech/documentation/install/)." ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "retriever = qdrant.as_retriever()\n", - "retriever" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "0c851b4f", - "metadata": {}, - "source": [ - "It might be also specified to use MMR as a search strategy, instead of similarity." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "64348f1b", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:26.043909Z", - "start_time": "2023-04-04T10:51:26.034284Z" - } - }, - "outputs": [ + }, { - "data": { - "text/plain": [ - "VectorStoreRetriever(vectorstore=, search_type='mmr', search_kwargs={})" + "cell_type": "code", + "execution_count": null, + "id": "e03e8460-8f32-4d1f-bb93-4f7636a476fa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "!pip install qdrant-client" ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "retriever = qdrant.as_retriever(search_type=\"mmr\")\n", - "retriever" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "f3c70c31", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T10:51:26.495652Z", - "start_time": "2023-04-04T10:51:26.046407Z" - } - }, - "outputs": [ + }, { - "data": { - "text/plain": [ - "Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})" + "attachments": {}, + "cell_type": "markdown", + "id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5", + "metadata": {}, + "source": [ + "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "retriever.get_relevant_documents(query)[0]" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "0358ecde", - "metadata": {}, - "source": [ - "## Customizing Qdrant\n", - "\n", - "There are some options to use an existing Qdrant collection within your Langchain application. In such cases you may need to define how to map Qdrant point into the Langchain `Document`.\n", - "\n", - "### Named vectors\n", - "\n", - "Qdrant supports [multiple vectors per point](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) by named vectors. Langchain requires just a single embedding per document and, by default, uses a single vector. However, if you work with a collection created externally or want to have the named vector used, you can configure it by providing its name.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "Qdrant.from_documents(\n", - " docs,\n", - " embeddings,\n", - " location=\":memory:\",\n", - " collection_name=\"my_documents_2\",\n", - " vector_name=\"custom_vector\",\n", - ")" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "As a Langchain user, you won't see any difference whether you use named vectors or not. Qdrant integration will handle the conversion under the hood." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "### Metadata\n", - "\n", - "Qdrant stores your vector embeddings along with the optional JSON-like payload. Payloads are optional, but since LangChain assumes the embeddings are generated from the documents, we keep the context data, so you can extract the original texts as well.\n", - "\n", - "By default, your document is going to be stored in the following payload structure:\n", - "\n", - "```json\n", - "{\n", - " \"page_content\": \"Lorem ipsum dolor sit amet\",\n", - " \"metadata\": {\n", - " \"foo\": \"bar\"\n", - " }\n", - "}\n", - "```\n", - "\n", - "You can, however, decide to use different keys for the page content and metadata. That's useful if you already have a collection that you'd like to reuse." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "e4d6baf9", - "metadata": { - "ExecuteTime": { - "end_time": "2023-04-04T11:08:31.739141Z", - "start_time": "2023-04-04T11:08:30.229748Z" - } - }, - "outputs": [ + }, { - "data": { - "text/plain": [ - "" + "cell_type": "code", + "execution_count": 2, + "id": "082e7e8b-ac52-430c-98d6-8f0924457642", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OpenAI API Key: \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\n" + ] + } + ], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "aac9563e", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:22.282884Z", + "start_time": "2023-04-04T10:51:21.408077Z" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Qdrant\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "a3c3999a", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:22.520144Z", + "start_time": "2023-04-04T10:51:22.285826Z" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "loader = TextLoader(\"../../../state_of_the_union.txt\")\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "eeead681", + "metadata": {}, + "source": [ + "## Connecting to Qdrant from LangChain\n", + "\n", + "### Local mode\n", + "\n", + "Python client allows you to run the same code in local mode without running the Qdrant server. That's great for testing things out and debugging or if you plan to store just a small amount of vectors. The embeddings might be fully kepy in memory or persisted on disk.\n", + "\n", + "#### In-memory\n", + "\n", + "For some testing scenarios and quick experiments, you may prefer to keep all the data in memory only, so it gets lost when the client is destroyed - usually at the end of your script/notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8429667e", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:22.525091Z", + "start_time": "2023-04-04T10:51:22.522015Z" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "qdrant = Qdrant.from_documents(\n", + " docs,\n", + " embeddings,\n", + " location=\":memory:\", # Local mode with in-memory storage only\n", + " collection_name=\"my_documents\",\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "59f0b954", + "metadata": {}, + "source": [ + "#### On-disk storage\n", + "\n", + "Local mode, without using the Qdrant server, may also store your vectors on disk so they're persisted between runs." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "24b370e2", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:24.827567Z", + "start_time": "2023-04-04T10:51:22.529080Z" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "qdrant = Qdrant.from_documents(\n", + " docs,\n", + " embeddings,\n", + " path=\"/tmp/local_qdrant\",\n", + " collection_name=\"my_documents\",\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "749658ce", + "metadata": {}, + "source": [ + "### On-premise server deployment\n", + "\n", + "No matter if you choose to launch Qdrant locally with [a Docker container](https://qdrant.tech/documentation/install/), or select a Kubernetes deployment with [the official Helm chart](https://github.com/qdrant/qdrant-helm), the way you're going to connect to such an instance will be identical. You'll need to provide a URL pointing to the service." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "91e7f5ce", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:24.832708Z", + "start_time": "2023-04-04T10:51:24.829905Z" + } + }, + "outputs": [], + "source": [ + "url = \"<---qdrant url here --->\"\n", + "qdrant = Qdrant.from_documents(\n", + " docs,\n", + " embeddings,\n", + " url,\n", + " prefer_grpc=True,\n", + " collection_name=\"my_documents\",\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "c9e21ce9", + "metadata": {}, + "source": [ + "### Qdrant Cloud\n", + "\n", + "If you prefer not to keep yourself busy with managing the infrastructure, you can choose to set up a fully-managed Qdrant cluster on [Qdrant Cloud](https://cloud.qdrant.io/). There is a free forever 1GB cluster included for trying out. The main difference with using a managed version of Qdrant is that you'll need to provide an API key to secure your deployment from being accessed publicly." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "dcf88bdf", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:24.837599Z", + "start_time": "2023-04-04T10:51:24.834690Z" + } + }, + "outputs": [], + "source": [ + "url = \"<---qdrant cloud cluster url here --->\"\n", + "api_key = \"<---api key here--->\"\n", + "qdrant = Qdrant.from_documents(\n", + " docs,\n", + " embeddings,\n", + " url,\n", + " prefer_grpc=True,\n", + " api_key=api_key,\n", + " collection_name=\"my_documents\",\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "93540013", + "metadata": {}, + "source": [ + "## Reusing the same collection\n", + "\n", + "Both `Qdrant.from_texts` and `Qdrant.from_documents` methods are great to start using Qdrant with LangChain, but **they are going to destroy the collection and create it from scratch**! If you want to reuse the existing collection, you can always create an instance of `Qdrant` on your own and pass the `QdrantClient` instance with the connection details." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "b7b432d7", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:24.843090Z", + "start_time": "2023-04-04T10:51:24.840041Z" + } + }, + "outputs": [], + "source": [ + "del qdrant" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "30a87570", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:24.854117Z", + "start_time": "2023-04-04T10:51:24.845385Z" + } + }, + "outputs": [], + "source": [ + "import qdrant_client\n", + "\n", + "client = qdrant_client.QdrantClient(path=\"/tmp/local_qdrant\", prefer_grpc=True)\n", + "qdrant = Qdrant(client=client, collection_name=\"my_documents\", embeddings=embeddings)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1f9215c8", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T09:27:29.920258Z", + "start_time": "2023-04-04T09:27:29.913714Z" + } + }, + "source": [ + "## Similarity search\n", + "\n", + "The simplest scenario for using Qdrant vector store is to perform a similarity search. Under the hood, our query will be encoded with the `embedding_function` and used to find similar documents in Qdrant collection." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a8c513ab", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:25.204469Z", + "start_time": "2023-04-04T10:51:24.855618Z" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fc516993", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:25.220984Z", + "start_time": "2023-04-04T10:51:25.213943Z" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you\u2019re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I\u2019d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer\u2014an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation\u2019s top legal minds, who will continue Justice Breyer\u2019s legacy of excellence.\n" + ] + } + ], + "source": [ + "print(found_docs[0].page_content)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1bda9bf5", + "metadata": {}, + "source": [ + "## Similarity search with score\n", + "\n", + "Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result. \n", + "The returned distance score is cosine distance. Therefore, a lower score is better." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "8804a21d", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:25.631585Z", + "start_time": "2023-04-04T10:51:25.227384Z" + } + }, + "outputs": [], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.similarity_search_with_score(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "756a6887", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:25.642282Z", + "start_time": "2023-04-04T10:51:25.635947Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you\u2019re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I\u2019d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer\u2014an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation\u2019s top legal minds, who will continue Justice Breyer\u2019s legacy of excellence.\n", + "\n", + "Score: 0.8153784913324512\n" + ] + } + ], + "source": [ + "document, score = found_docs[0]\n", + "print(document.page_content)\n", + "print(f\"\\nScore: {score}\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "525e3582", + "metadata": {}, + "source": [ + "### Metadata filtering\n", + "\n", + "Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1c2c58dc", + "metadata": {}, + "source": [ + "```python\n", + "from qdrant_client.http import models as rest\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))\n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "c58c30bf", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:39:53.032744Z", + "start_time": "2023-04-04T10:39:53.028673Z" + } + }, + "source": [ + "## Maximum marginal relevance search (MMR)\n", + "\n", + "If you'd like to look up for some similar documents, but you'd also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "76810fb6", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:26.010947Z", + "start_time": "2023-04-04T10:51:25.647687Z" + } + }, + "outputs": [], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.max_marginal_relevance_search(query, k=2, fetch_k=10)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "80c6db11", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:26.016979Z", + "start_time": "2023-04-04T10:51:26.013329Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you\u2019re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I\u2019d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer\u2014an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation\u2019s top legal minds, who will continue Justice Breyer\u2019s legacy of excellence. \n", + "\n", + "2. We can\u2019t change how divided we\u2019ve been. But we can change how we move forward\u2014on COVID-19 and other issues we must face together. \n", + "\n", + "I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n", + "\n", + "They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n", + "\n", + "Officer Mora was 27 years old. \n", + "\n", + "Officer Rivera was 22. \n", + "\n", + "Both Dominican Americans who\u2019d grown up on the same streets they later chose to patrol as police officers. \n", + "\n", + "I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n", + "\n", + "I\u2019ve worked on these issues a long time. \n", + "\n", + "I know what works: Investing in crime preventionand community police officers who\u2019ll walk the beat, who\u2019ll know the neighborhood, and who can restore trust and safety. \n", + "\n" + ] + } + ], + "source": [ + "for i, doc in enumerate(found_docs):\n", + " print(f\"{i + 1}.\", doc.page_content, \"\\n\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "691a82d6", + "metadata": {}, + "source": [ + "## Qdrant as a Retriever\n", + "\n", + "Qdrant, as all the other vector stores, is a LangChain Retriever, by using cosine similarity. " + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "9427195f", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:26.031451Z", + "start_time": "2023-04-04T10:51:26.018763Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "VectorStoreRetriever(vectorstore=, search_type='similarity', search_kwargs={})" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "retriever = qdrant.as_retriever()\n", + "retriever" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "0c851b4f", + "metadata": {}, + "source": [ + "It might be also specified to use MMR as a search strategy, instead of similarity." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "64348f1b", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:26.043909Z", + "start_time": "2023-04-04T10:51:26.034284Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "VectorStoreRetriever(vectorstore=, search_type='mmr', search_kwargs={})" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "retriever = qdrant.as_retriever(search_type=\"mmr\")\n", + "retriever" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "f3c70c31", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T10:51:26.495652Z", + "start_time": "2023-04-04T10:51:26.046407Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you\u2019re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I\u2019d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer\u2014an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation\u2019s top legal minds, who will continue Justice Breyer\u2019s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "retriever.get_relevant_documents(query)[0]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "0358ecde", + "metadata": {}, + "source": [ + "## Customizing Qdrant\n", + "\n", + "There are some options to use an existing Qdrant collection within your Langchain application. In such cases you may need to define how to map Qdrant point into the Langchain `Document`.\n", + "\n", + "### Named vectors\n", + "\n", + "Qdrant supports [multiple vectors per point](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) by named vectors. Langchain requires just a single embedding per document and, by default, uses a single vector. However, if you work with a collection created externally or want to have the named vector used, you can configure it by providing its name.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "Qdrant.from_documents(\n", + " docs,\n", + " embeddings,\n", + " location=\":memory:\",\n", + " collection_name=\"my_documents_2\",\n", + " vector_name=\"custom_vector\",\n", + ")" + ], + "metadata": { + "collapsed": false + }, + "id": "3166ff99" + }, + { + "cell_type": "markdown", + "source": [ + "As a Langchain user, you won't see any difference whether you use named vectors or not. Qdrant integration will handle the conversion under the hood." + ], + "metadata": { + "collapsed": false + }, + "id": "79848e80" + }, + { + "cell_type": "markdown", + "source": [ + "### Metadata\n", + "\n", + "Qdrant stores your vector embeddings along with the optional JSON-like payload. Payloads are optional, but since LangChain assumes the embeddings are generated from the documents, we keep the context data, so you can extract the original texts as well.\n", + "\n", + "By default, your document is going to be stored in the following payload structure:\n", + "\n", + "```json\n", + "{\n", + " \"page_content\": \"Lorem ipsum dolor sit amet\",\n", + " \"metadata\": {\n", + " \"foo\": \"bar\"\n", + " }\n", + "}\n", + "```\n", + "\n", + "You can, however, decide to use different keys for the page content and metadata. That's useful if you already have a collection that you'd like to reuse." + ], + "metadata": { + "collapsed": false + }, + "id": "daeafd6d" + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e4d6baf9", + "metadata": { + "ExecuteTime": { + "end_time": "2023-04-04T11:08:31.739141Z", + "start_time": "2023-04-04T11:08:30.229748Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Qdrant.from_documents(\n", + " docs,\n", + " embeddings,\n", + " location=\":memory:\",\n", + " collection_name=\"my_documents_2\",\n", + " content_payload_key=\"my_page_content_key\",\n", + " metadata_payload_key=\"my_meta\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2300e785", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" } - ], - "source": [ - "Qdrant.from_documents(\n", - " docs,\n", - " embeddings,\n", - " location=\":memory:\",\n", - " collection_name=\"my_documents_2\",\n", - " content_payload_key=\"my_page_content_key\",\n", - " metadata_payload_key=\"my_meta\",\n", - ")" - ] }, - { - "cell_type": "code", - "execution_count": null, - "id": "2300e785", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/docs/extras/use_cases/question_answering/index.mdx b/docs/extras/use_cases/question_answering/index.mdx index 4b3baebc3..8b3f4f1dc 100644 --- a/docs/extras/use_cases/question_answering/index.mdx +++ b/docs/extras/use_cases/question_answering/index.mdx @@ -1,85 +1,446 @@ -# Question answering over documents +# QA and Chat over Documents -Question answering in this context refers to question answering over your document data. -For question answering over other types of data, please see other sources documentation like [SQL database Question Answering](/docs/use_cases/tabular.html) or [Interacting with APIs](/docs/use_cases/apis.html). +Chat and Question-Answering (QA) over `data` are popular LLM use-cases. -For question answering over many documents, you almost always want to create an index over the data. -This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money). +`data` can include many things, including: + +* `Unstructured data` (e.g., PDFs) +* `Structured data` (e.g., SQL) +* `Code` (e.g., Python) + +LangChain supports Chat and QA on various `data` types: + +* See [here](https://python.langchain.com/docs/use_cases/code/) and [here](https://twitter.com/cristobal_dev/status/1675745314592915456?s=20) for `Code` +* See [here](https://python.langchain.com/docs/use_cases/tabular) for `Structured data` + +Below we will review Chat and QA on `Unstructured data`. + +![intro.png](/img/qa_intro.png) + +`Unstructured data` can be loaded from many sources. + +Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders. + +Each loader returns data as a LangChain [`Document`](https://docs.langchain.com/docs/components/schema/document). + +`Documents` are turned into a Chat or QA app following the general steps below: + +* `Splitting`: [Text splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size +* `Storage`: Storage (e.g., often a [vectorstore](https://python.langchain.com/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits +* `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question) +* `Output`: An [LLM](https://python.langchain.com/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved splits + +![flow.jpeg](/img/qa_flow.jpeg) + +## Quickstart + +The above pipeline can be wrapped with a `VectorstoreIndexCreator`. + +In particular: + +* Specify a `Document` loader +* The `splitting`, `storage`, `retrieval`, and `output` generation stages are wrapped + +Let's load this [blog post](https://lilianweng.github.io/posts/2023-06-23-agent/) on agents as an example `Document`. + +We have a QA app in a few lines of code. -**Load Your Documents** - -```python -from langchain.document_loaders import TextLoader -loader = TextLoader('../../modules/state_of_the_union.txt') -``` - -See [here](/docs/modules/data_connection/document_loaders/) for more information on how to get started with document loading. - -**Create Your Index** ```python +from langchain.document_loaders import WebBaseLoader from langchain.indexes import VectorstoreIndexCreator +# Document loader +loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") +# Index that wraps above steps index = VectorstoreIndexCreator().from_loaders([loader]) +# Question-answering +question = "What is Task Decomposition?" +index.query(question) ``` -The best and most popular index by far at the moment is the VectorStore index. -**Query Your Index** + + + ' Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done using LLM with simple prompting, task-specific instructions, or human inputs. Tree of Thoughts (Yao et al. 2023) is an example of a task decomposition technique that explores multiple reasoning possibilities at each step and generates multiple thoughts per step, creating a tree structure.' + + + +Of course, some users do not wnat this level of abstraction. + +Below, we will discuss each stage in more detail. + +## 1. Loading, Splitting, Storage + + + +### 1.1 Getting started + +Specify a `Document` loader. + ```python -query = "What did the president say about Ketanji Brown Jackson" -index.query(query) +# Document loader +from langchain.document_loaders import WebBaseLoader +loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") +data = loader.load() ``` -Alternatively, use `query_with_sources` to also get back the sources involved +Split the `Document` into chunks for embedding and vector storage. + ```python -query = "What did the president say about Ketanji Brown Jackson" -index.query_with_sources(query) +# Split +from langchain.text_splitter import RecursiveCharacterTextSplitter +text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0) +all_splits = text_splitter.split_documents(data) ``` -Again, these high level interfaces obfuscate a lot of what is going on under the hood, so please see [this notebook](/docs/modules/data_connection/) for a more thorough introduction to data modules. +Embed and store the splits in a vector database ([Chroma](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/chroma)). -## Document Question Answering -Question answering involves fetching multiple documents, and then asking a question of them. -The LLM response will contain the answer to your question, based on the content of the documents. +```python +# Store +from langchain.vectorstores import Chroma +from langchain.embeddings import OpenAIEmbeddings +vectorstore = Chroma.from_documents(documents=all_splits,embedding=OpenAIEmbeddings()) +``` + +Here are the three pieces together: + +![lc.png](/img/qa_data_load.png) + +### 1.2 Going Deeper + +#### 1.2.1 Integrations + +`Data Loaders` + +* Browse the > 120 data loader integrations [here](https://integrations.langchain.com/). + +* See further documentation on loaders [here](https://python.langchain.com/docs/modules/data_connection/document_loaders/). + +`Data Transformers` + +* All can ingest loaded `Documents` and process them (e.g., split). + +* See further documentation on transformers [here](https://python.langchain.com/docs/modules/data_connection/document_transformers/). + +`Vectorstores` + +* Browse the > 35 vectorstores integrations [here](https://integrations.langchain.com/). + +* See further documentation on vectorstores [here](https://python.langchain.com/docs/modules/data_connection/vectorstores/). + +#### 1.2.2 Retaining metadata + +`Context-aware splitters` keep the location or "context" of each split in the origional `Document`: + +* [Markdown files](https://python.langchain.com/docs/use_cases/question_answering/document-context-aware-QA) +* [Code (py or js)](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/source_code) +* [Documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/grobid) + +## 2. Retrieval + +### 2.1 Getting started + +Retrieve [relevant splits](https://www.pinecone.io/learn/what-is-similarity-search/) for any question using `similarity_search`. + + +```python +question = "What are the approaches to Task Decomposition?" +docs = vectorstore.similarity_search(question) +len(docs) +``` + + + + + 4 + + + +### 2.2 Going Deeper + +#### 2.2.1 Retrieval + +Vectorstores are commonly used for retrieval. + +But, they are not the only option. + +For example, SVMs (see thread [here](https://twitter.com/karpathy/status/1647025230546886658?s=20)) can also be used. + +LangChain [has many retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/) including, but not limited to, vectorstores. + +All retrievers implement some common, useful methods, such as `get_relevant_documents()`. + + +```python +from langchain.retrievers import SVMRetriever +svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings()) +docs_svm=svm_retriever.get_relevant_documents(question) +len(docs) +``` + + + + + 4 + + + +#### 2.2.2 Advanced retrieval + +Improve on `similarity_search`: + +* `MultiQueryRetriever` [generates variants of the input question](https://python.langchain.com/docs/modules/data_connection/retrievers/how_to/MultiQueryRetriever) to improve retrieval. + +* `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents. + +* Documents can be filtered during retrieval using [`metadata` filters](https://python.langchain.com/docs/use_cases/question_answering/document-context-aware-QA). + + +```python +# MultiQueryRetriever +import logging +from langchain.chat_models import ChatOpenAI +from langchain.retrievers.multi_query import MultiQueryRetriever +logging.basicConfig() +logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO) +retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(), + llm=ChatOpenAI(temperature=0)) +unique_docs = retriever_from_llm.get_relevant_documents(query=question) +len(unique_docs) +``` + + INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?'] + 5 + + + + +## 3. QA + +### 3.1 Getting started + +Distill the retried documents into an answer using an LLM (e.g., `gpt-3.5-turbo`) with `RetrievalQA` chain. + + +```python +from langchain.chat_models import ChatOpenAI +llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0) +from langchain.chains import RetrievalQA +qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever()) +qa_chain({"query": question}) +``` + + + + + {'query': 'What are the approaches to Task Decomposition?', + 'result': 'The approaches to task decomposition include:\n\n1. Simple prompting: This approach involves using simple prompts or questions to guide the agent in breaking down a task into smaller subgoals. For example, the agent can be prompted with "Steps for XYZ" and asked to list the subgoals for achieving XYZ.\n\n2. Task-specific instructions: In this approach, task-specific instructions are provided to the agent to guide the decomposition process. For example, if the task is to write a novel, the agent can be instructed to "Write a story outline" as a subgoal.\n\n3. Human inputs: This approach involves incorporating human inputs in the task decomposition process. Humans can provide guidance, feedback, and suggestions to help the agent break down complex tasks into manageable subgoals.\n\nThese approaches aim to enable efficient handling of complex tasks by breaking them down into smaller, more manageable parts.'} + + + +### 3.2 Going Deeper + +#### 3.2.1 Integrations + +`LLMs` + +* Browse the > 55 model integrations [here](https://integrations.langchain.com/). + +* See further documentation on vectorstores [here](https://python.langchain.com/docs/modules/model_io/models/). + +#### 3.2.2 Running LLMs locally + +The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally. + +LangChain has integrations with many open source LLMs that can be run locally. + +Using `GPT4All` is as simple as [downloading the binary]((https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all)) and then: + + +```python +from langchain.llms import GPT4All +from langchain.chains import RetrievalQA +llm = GPT4All(model="/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin",max_tokens=2048) +qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever()) +``` + + +```python +qa_chain({"query": question}) +``` + + + + + {'query': 'What are the approaches to Task Decomposition?', + 'result': ' There are three main approaches to task decomposition: (1) using language models like GPT-3 for simple prompting such as "Steps for XYZ.\\n1.", (2) using task-specific instructions, and (3) with human inputs.'} + + + +#### 3.2.2 Customizing the prompt + +The prompt in `RetrievalQA` chain can be easily customized. + + +```python +# Build prompt +from langchain.prompts import PromptTemplate +template = """Use the following pieces of context to answer the question at the end. +If you don't know the answer, just say that you don't know, don't try to make up an answer. +Use three sentences maximum and keep the answer as concise as possible. +Always say "thanks for asking!" at the end of the answer. +{context} +Question: {question} +Helpful Answer:""" +QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,) + +# Run chain +from langchain.chains import RetrievalQA +llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0) +qa_chain = RetrievalQA.from_chain_type(llm, + retriever=vectorstore.as_retriever(), + chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}) + +result = qa_chain({"query": question}) +result["result"] +``` + + + + + 'The approaches to Task Decomposition are (1) using simple prompting by LLM, (2) using task-specific instructions, and (3) with human inputs. Thanks for asking!' + + + +#### 3.2.3 Returning source documents + +The full set of retrieved documents used for answer distillation can be returned using `return_source_documents=True`. + + +```python +from langchain.chains import RetrievalQA +qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(), + return_source_documents=True) +result = qa_chain({"query": question}) +print(len(result['source_documents'])) +result['source_documents'][0] +``` + + 4 + Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}) + + + +#### 3.2.4 Citations + +Answer citations can be returned using `RetrievalQAWithSourcesChain`. + + +```python +from langchain.chains import RetrievalQAWithSourcesChain +qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever()) +result = qa_chain({"question": question}) +result +``` + + + + + {'question': 'What are the approaches to Task Decomposition?', + 'answer': 'The approaches to Task Decomposition include (1) using LLM with simple prompting, (2) using task-specific instructions, and (3) incorporating human inputs.\n', + 'sources': 'https://lilianweng.github.io/posts/2023-06-23-agent/'} + + + +#### 3.2.5 Customizing how pass retrieved documents to the LLM + +Retrieved documents can be fed to an LLM for answer distillation in a few different ways. + +`stuff`, `refine`, `map-reduce`, and `map-rerank` chains for passing documents to an LLM prompt are well summarized [here](https://python.langchain.com/docs/modules/chains/document/). + +`stuff` is commonly used because it simply "stuffs" all retrieved documents into the prompt. + +The [load_qa_chain](https://python.langchain.com/docs/modules/chains/additional/question_answering.html) is an easy way to pass documents to an LLM using these various approaches (e.g., see `chain_type`). -The recommended way to get started using a question answering chain is: ```python from langchain.chains.question_answering import load_qa_chain chain = load_qa_chain(llm, chain_type="stuff") -chain.run(input_documents=docs, question=query) +chain({"input_documents": unique_docs, "question": question},return_only_outputs=True) ``` -The following resources exist: -- [Question Answering Notebook](/docs/modules/chains/additional/question_answering.html): A notebook walking through how to accomplish this task. -- [VectorDB Question Answering Notebook](/docs/modules/chains/popular/vector_db_qa.html): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings. -## Adding in sources -There is also a variant of this, where in addition to responding with the answer the language model will also cite its sources (eg which of the documents passed in it used). + {'output_text': 'The approaches to task decomposition include (1) using simple prompting to break down tasks into subgoals, (2) providing task-specific instructions to guide the decomposition process, and (3) incorporating human inputs for task decomposition.'} + + + +We can also pass the `chain_type` to `RetrievalQA`. -The recommended way to get started using a question answering with sources chain is: ```python -from langchain.chains.qa_with_sources import load_qa_with_sources_chain -chain = load_qa_with_sources_chain(llm, chain_type="stuff") -chain({"input_documents": docs, "question": query}, return_only_outputs=True) +qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(), + chain_type="stuff") +result = qa_chain({"query": question}) ``` -## Additional Related Resources +In summary, the user can choose the desired level of abstraction for QA: -Additional related resources include: +![summary_chains.png](/img/summary_chains.png) -- [Building blocks for working with Documents](/docs/modules/data_connection/): Guides on how to use several of the utilities which will prove helpful for this task, including Text Splitters (for splitting up long documents) and Embeddings & Vectorstores (useful for the above Vector DB example). -- [CombineDocuments Chains](/docs/modules/chains/document/): A conceptual overview of specific types of chains by which you can accomplish this task. +## 4. Chat -## End-to-end examples +### 4.1 Getting started -For examples to this done in an end-to-end manner, please see the following resources: +To keep chat history, first specify a `Memory buffer` to track the conversation inputs / outputs. -- [Semantic search over a group chat with Sources Notebook](/docs/use_cases/question_answering/semantic-search-over-chat.html): A notebook that semantically searches over a group chat conversation. -- [Document context aware text splitting and QA](/docs/use_cases/question_answering/document-context-aware-QA.html): A notebook that shows context aware splitting on markdown files and SelfQueryRetriever for QA using the resulting metadata. + +```python +from langchain.memory import ConversationBufferMemory +memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) +``` + +The `ConversationalRetrievalChain` uses chat in the `Memory buffer`. + + +```python +from langchain.chains import ConversationalRetrievalChain +retriever=vectorstore.as_retriever() +chat = ConversationalRetrievalChain.from_llm(llm,retriever=retriever,memory=memory) +``` + + +```python +result = chat({"question": "What are some of the main ideas in self-reflection?"}) +result['answer'] +``` + + + + + "Some of the main ideas in self-reflection include:\n1. Iterative improvement: Self-reflection allows autonomous agents to improve by refining past action decisions and correcting mistakes.\n2. Trial and error: Self-reflection is crucial in real-world tasks where trial and error are inevitable.\n3. Two-shot examples: Self-reflection is created by showing pairs of failed trajectories and ideal reflections for guiding future changes in the plan.\n4. Working memory: Reflections are added to the agent's working memory, up to three, to be used as context for querying.\n5. Performance evaluation: Self-reflection involves continuously reviewing and analyzing actions, self-criticizing behavior, and reflecting on past decisions and strategies to refine approaches.\n6. Efficiency: Self-reflection encourages being smart and efficient, aiming to complete tasks in the least number of steps." + + + +The `Memory buffer` has context to resolve `"it"` ("self-reflection") in the below question. + + +```python +result = chat({"question": "How does the Reflexion paper handle it?"}) +result['answer'] +``` + + + + + "The Reflexion paper handles self-reflection by showing two-shot examples to the Learning Language Model (LLM). Each example consists of a failed trajectory and an ideal reflection that guides future changes in the agent's plan. These reflections are then added to the agent's working memory, up to a maximum of three, to be used as context for querying the LLM. This allows the agent to iteratively improve its reasoning skills by refining past action decisions and correcting previous mistakes." + + + +### 4.2 Going deeper + +The [documentation](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) on `ConversationalRetrievalChain` offers a few extensions, such as streaming and source documents. diff --git a/langchain/retrievers/svm.py b/langchain/retrievers/svm.py index 95792a853..f2e6a141d 100644 --- a/langchain/retrievers/svm.py +++ b/langchain/retrievers/svm.py @@ -5,7 +5,7 @@ https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb""" from __future__ import annotations import concurrent.futures -from typing import Any, List, Optional +from typing import Any, Iterable, List, Optional import numpy as np @@ -53,10 +53,26 @@ class SVMRetriever(BaseRetriever): index = create_index(texts, embeddings) return cls(embeddings=embeddings, index=index, texts=texts, **kwargs) + @classmethod + def from_documents( + cls, + documents: Iterable[Document], + embeddings: Embeddings, + **kwargs: Any, + ) -> SVMRetriever: + texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) + return cls.from_texts(texts=texts, embeddings=embeddings, **kwargs) + def _get_relevant_documents( self, query: str, *, run_manager: CallbackManagerForRetrieverRun ) -> List[Document]: - from sklearn import svm + try: + from sklearn import svm + except ImportError: + raise ImportError( + "Could not import scikit-learn, please install with `pip install " + "scikit-learn`." + ) query_embeds = np.array(self.embeddings.embed_query(query)) x = np.concatenate([query_embeds[None, ...], self.index])