Commit Graph

4545 Commits

Author SHA1 Message Date
Bagatur 81163e3c0c parent retriever nit (#9570)
if ids are nullable seems like they should have default val None.
mirrors VectorStore interface as well. cc @mcantillon21 @jacoblee93
2023-08-22 14:58:16 -04:00
seamusp f3ba9ce7f4 Remove -E all from installation instructions (#9573)
Update installation instructions to only install test dependencies rather than all dependencies.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-08-22 14:57:58 -04:00
Myeongseop Kim f1e602996a import tqdm.auto instead of tqdm tqdm for OpenAIEmbeddings (#9584)
- Description: current code does not work very well on jupyter notebook,
so I changed the code so that it imports `tqdm.auto` instead.
  - Issue: #9582 
  - Dependencies: N/A
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: N/A

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-08-22 14:54:07 -04:00
Predrag Gruevski 35812d0096 Set up concurrency groups and workflow cancelation in CI. (#9564)
If another push to the same PR or branch happens while its CI is still
running, cancel the earlier run in favor of the next run.

There's no point in testing an outdated version of the code. GitHub only
allows a limited number of job runners to be active at the same time, so
it's better to cancel pointless jobs early so that more useful jobs can
run sooner.
2023-08-22 14:21:26 -04:00
Predrag Gruevski d564ec944c poetry lock the experimental package. (#9478) 2023-08-22 14:09:35 -04:00
Predrag Gruevski 65e893b9cd poetry lock on langchain. (#9476) 2023-08-22 14:09:23 -04:00
Predrag Gruevski 64a54d8ad8 poetry lock the top-level environment. (#9477) 2023-08-22 14:09:11 -04:00
Predrag Gruevski 3c7cc4d440 Test experimental package with langchain on master branch. (#9621)
It's possible that langchain-experimental works fine with the latest
*published* langchain, but is broken with the langchain on `master`.
Unfortunately, you can see this is currently the case — this is why this
PR also includes a minor fix for the `langchain` package itself.

We want to catch situations like that *before* releasing a new
langchain, hence this test.
2023-08-22 13:35:21 -04:00
Eugene Yurtsev 3408810748 Add batch util (#9620)
Add `batch` utility to langchain
2023-08-22 12:31:18 -04:00
Predrag Gruevski acb54d8b9d Reduce cache timeouts to ensure faster builds on timeout. (#9619)
The current timeouts are too long, and mean that if the GitHub cache
decides to act up, jobs get bogged down for 15min at a time. This has
happened 2-3 times already this week -- a tiny fraction of our total
workflows but really annoying when it happens to you. We can do better.

Installing deps on cache miss takes about ~4min, so it's not worth
waiting more than 4min for the deps cache. The black and mypy caches
save 1 and 2min, respectively, so wait only up to that long to download
them.
2023-08-22 12:11:38 -04:00
Predrag Gruevski a1e89aa8d5 Explicitly add the contents: write permission for publishing releases. (#9617) 2023-08-22 08:38:18 -07:00
Predrag Gruevski c75e1aa5ed Eliminate special-casing from test CI workflows. (#9562)
The previous approach was relying on `_test.yml` taking an input
parameter, and then doing almost completely orthogonal things for each
parameter value. I've separated out each of those test situations as its
own job or workflow file, which eliminated all the special-casing and,
in my opinion, improved maintainability by making it much more obvious
what code runs when.
2023-08-22 11:36:52 -04:00
Bagatur 2b663089b5 bump 271 (#9615) 2023-08-22 08:10:22 -07:00
klae01 b868ef23bc Add AINetwork blockchain toolkit integration (#9527)
# Description
This PR introduces a new toolkit for interacting with the AINetwork
blockchain. The toolkit provides a set of tools for performing various
operations on the AINetwork blockchain, such as transferring AIN,
reading and writing values to the blockchain database, managing apps,
setting rules and owners.

# Dependencies
[ain-py](https://github.com/ainblockchain/ain-py) >= 1.0.2

# Misc
The example notebook
(langchain/docs/extras/integrations/toolkits/ainetwork.ipynb) is in the
PR

---------

Co-authored-by: kriii <kriii@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-22 08:03:33 -07:00
Bagatur e99ef12cb1 Bagatur/litellm model name (#9613)
Co-authored-by: ishaan-jaff <ishaanjaffer0324@gmail.com>
2023-08-22 07:44:00 -07:00
Harrison Chase 1720e99397 add variables for field names (#9563) 2023-08-22 07:43:21 -07:00
Anthony Mahanna dfb9ff1079 bugfix: ArangoDB Empty Schema Case (#9574)
- Introduces a conditional in `ArangoGraph.generate_schema()` to exclude
empty ArangoDB Collections from the schema
- Add empty collection test case

Issue: N/A
Dependencies: None
2023-08-22 07:41:06 -07:00
Vanessa Arndorfer 1ea2f9adf4 Document AzureML Deployment Example (#9571)
Description: Link an example of deploying a Langchain app to an AzureML
online endpoint to the deployments documentation page.

Co-authored-by: Vanessa Arndorfer <vaarndor@microsoft.com>
2023-08-22 07:36:47 -07:00
Philippe PRADOS d4c49b16e4 Fix ChatMessageHistory (#9594)
The initialization of the array of ChatMessageHistory is buggy.
The list is shared with all instances.
2023-08-22 07:36:36 -07:00
toddkim95 fba29f203a Add to support polars (#9610)
### Description
Polars is a DataFrame interface on top of an OLAP Query Engine
implemented in Rust.
Polars is faster to read than pandas, so I'm looking forward to seeing
it added to the document loader.

### Dependencies
polars (https://pola-rs.github.io/polars-book/user-guide/)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-22 07:36:24 -07:00
Aashish Saini 3c4f32c8b8 Replacing Exception type from ValueError to ImportError (#9588)
I have restructured the code to ensure uniform handling of ImportError.
In place of previously used ValueError, I've adopted the standard
practice of raising ImportError with explanatory messages. This
modification enhances code readability and clarifies that any problems
stem from module importation.

@eyurtsev , @baskaryan 

Thanks
2023-08-22 07:34:05 -07:00
Julien Salinas 4d0b7bb8e1 Remove Dolphin and GPT-J from the embeddings docs.
These models are not proposed anymore.
2023-08-22 09:28:22 +02:00
Julien Salinas 033b874701 Remove some deprecated text generation parameters. 2023-08-22 09:26:37 +02:00
Bagatur 4e7e6bfe0a revert 2023-08-21 18:01:49 -07:00
Bagatur a9bf409a09 param 2023-08-21 17:37:07 -07:00
Bagatur fa478638a9 Merge branch 'master' into bagatur/locals_in_config 2023-08-21 17:31:39 -07:00
Bagatur 182b059bf4 param 2023-08-21 17:31:38 -07:00
Jeremy Suriel 0fa4516ce4 Fix typo (#9565)
Corrected a minor documentation typo here:
https://python.langchain.com/docs/modules/model_io/models/llms/#generate-batch-calls-richer-outputs
2023-08-21 15:54:38 -07:00
Bagatur 04f2d69b83 improve confluence doc loader param validation (#9568) 2023-08-21 15:02:36 -07:00
Jacob Lee 0fea987dd2 Add missing param to parent document retriever notebook (#9569) 2023-08-21 15:02:12 -07:00
Zizhong Zhang 00eff8c4a7 feat: Add PromptGuard integration (#9481)
Add PromptGuard integration
-------
There are two approaches to integrate PromptGuard with a LangChain
application.

1. PromptGuardLLMWrapper
2. functions that can be used in LangChain expression.

-----
- Dependencies
`promptguard` python package, which is a runtime requirement if you'd
try out the demo.

- @baskaryan @hwchase17 Thanks for the ideas and suggestions along the
development process.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-21 14:59:36 -07:00
Predrag Gruevski 6c308aabae Use the GitHub-suggested safer pattern for shell interpolation. (#9567)
Using `${{ }}` to construct shell commands is risky, since the `${{ }}`
interpolation runs first and ignores shell quoting rules. This means
that shell commands that look safely quoted, like `echo "${{
github.event.issue.title }}"`, are actually vulnerable to shell
injection.

More details here:
https://github.blog/2023-08-09-four-tips-to-keep-your-github-actions-workflows-secure/
2023-08-21 17:59:10 -04:00
Oleksandr Ichenskyi 8bc1a3dca8 docs: Add memgraph notebook (#9448)
- Description: added graph_memgraph_qa.ipynb which shows how to use LLMs
to provide a natural language interface to a Memgraph database using
[MemgraphGraph](https://github.com/langchain-ai/langchain/pull/8591)
class.
- Dependencies: given that the notebook utilizes the MemgraphGraph
class, it relies on both this class and several Python packages that are
installed in the notebook using pip (langchain, openai, neo4j,
gqlalchemy). The notebook is dependent on having a functional Memgraph
instance running, as it requires this instance to establish a
connection.
2023-08-21 13:45:04 -07:00
Sathindu 652c542b2f fix: Imports for the ConfluenceLoader:process_page (#9432)
### Description
When we're loading documents using `ConfluenceLoader`:`load` function
and, if both `include_comments=True` and `keep_markdown_format=True`,
we're getting an error saying `NameError: free variable 'BeautifulSoup'
referenced before assignment in enclosing scope`.
    
    loader = ConfluenceLoader(url="URI", token="TOKEN")
    documents = loader.load(
        space_key="SPACE", 
        include_comments=True, 
        keep_markdown_format=True, 
    )

This happens because previous imports only consider the
`keep_markdown_format` parameter, however to include the comments, it's
using `BeautifulSoup`

Now it's fixed to handle all four scenarios considering both
`include_comments` and `keep_markdown_format`.

### Twitter
`@SathinduGA`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-21 13:44:52 -07:00
Mike Salvatore 7c0b1b8171 Add session to ConfluenceLoader.__init__() (#9437)
- Description: Allows the user of `ConfluenceLoader` to pass a
`requests.Session` object in lieu of an authentication mechanism
- Issue: None
- Dependencies: None
- Tag maintainer: @hwchase17
2023-08-21 13:18:35 -07:00
Bagatur d09cdb4880 update data connection -> retrieval (#9561) 2023-08-21 13:03:29 -07:00
Kim Minjong 3d1095218c Update ChatOpenAI._astream to respect finish_reason (#9431)
Currently, ChatOpenAI._astream does not reflect finish_reason to
generation_info. Change it to reflect that.
2023-08-21 12:56:42 -07:00
Matthew Zeiler 949b2cf177 Improvements to the Clarifai integration (#9290)
- Improved docs
- Improved performance in multiple ways through batching, threading,
etc.
 - fixed error message 
 - Added support for metadata filtering during similarity search.

@baskaryan PTAL
2023-08-21 12:53:36 -07:00
ricki-epsilla 66a47d9a61 add Epsilla vectorstore (#9239)
[Epsilla](https://github.com/epsilla-cloud/vectordb) vectordb is an
open-source vector database that leverages the advanced academic
parallel graph traversal techniques for vector indexing.
This PR adds basic integration with
[pyepsilla](https://github.com/epsilla-cloud/epsilla-python-client)(Epsilla
vectordb python client) as a vectorstore.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-21 12:51:15 -07:00
Predrag Gruevski 2a3758a98e Reminder to not report security issues as "bug" type issues. (#9554)
Updated the issue template that pops up when users open a new issue.
2023-08-21 15:48:33 -04:00
Bagatur dda5b1e370 Bagatur/doc loader confluence (#9524)
Co-authored-by: chanjetsdp <chanjetsdp@chanjet.com>
2023-08-21 12:40:44 -07:00
Predrag Gruevski de1f63505b Add py.typed file to langchain-experimental. (#9557)
The package is linted with mypy, so its type hints are correct and
should be exposed publicly. Without this file, the type hints remain
private and cannot be used by downstream users of the package.
2023-08-21 15:37:16 -04:00
Bagatur 4999e8af7e pin pydantic api ref build (#9556) 2023-08-21 12:11:49 -07:00
Predrag Gruevski 0565d81dc5 Update SECURITY.md email address. (#9558) 2023-08-21 14:52:21 -04:00
Predrag Gruevski 9f08d29bc8 Use PyPI Trusted Publishing to publish langchain packages. (#9467)
Trusted Publishing is the current best practice for publishing Python
packages. Rather than long-lived secret keys, it uses OpenID Connect
(OIDC) to allow our GitHub runner to directly authenticate itself to
PyPI and get a short-lived publishing token. This locks down publishing
quite a bit:
- There's no long-lived publish key to steal anymore.
- Publishing is *only* allowed via the *specifically designated* GitHub
workflow in the designated repo.

It also is operationally easier: no keys means there's nothing that
needs to be periodically rotated, nothing to worry about leaking, and
nobody can accidentally publish a release from their laptop because they
happened to have PyPI keys set up.

After this gets merged, we'll need to configure PyPI to start expecting
trusted publishing. It's only a few clicks and should only take a
minute; instructions are here:
https://docs.pypi.org/trusted-publishers/adding-a-publisher/

More info:
- https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
- https://github.com/pypa/gh-action-pypi-publish
2023-08-21 14:44:29 -04:00
Predrag Gruevski 249752e8ee Require manually triggering release workflows. (#9552) 2023-08-21 13:54:44 -04:00
Raynor Chavez 973866c894 fix: Updated marqo integration for marqo version 1.0.0+ (#9521)
- Description: Updated marqo integration to use tensor_fields instead of
non_tensor_fields. Upgraded marqo version to 1.2.4
  - Dependencies: marqo 1.2.4

---------

Co-authored-by: Raynor Kirkson E. Chavez <raynor.chavez@192.168.254.171>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-21 10:43:15 -07:00
Predrag Gruevski b2e6d01e8f Add SECURITY.md file to the repo. (#9551) 2023-08-21 13:39:59 -04:00
Predrag Gruevski 875ea4b4c6 Fix conditional that erroneously always runs. (#9543)
The input it means to test for is `"libs/langchain"` and not
`"langchain"`.
2023-08-21 13:24:33 -04:00
Bagatur c7a5bb6031 bump 270 (#9549) 2023-08-21 10:18:46 -07:00