Commit Graph

235 Commits

Author SHA1 Message Date
Harrison Chase a2bbe3dda4 Harrison/mmr support for opensearch (#6349)
Co-authored-by: Mehmet Öner Yalçın <oneryalcin@gmail.com>
2023-06-17 12:22:37 -07:00
Harrison Chase 61e4a1adf9 Harrison/faiss score (#6341)
Co-authored-by: Frank Stein <16441059+simonfromla@users.noreply.github.com>
Co-authored-by: Sims Juju <sims@Ju.lan>
2023-06-17 11:00:47 -07:00
Slawomir Gonet eef62bf4e9 qdrant: search by vector (#6043)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Added support to `search_by_vector` to Qdrant Vector store.

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->


### Who can review
VectorStores / Retrievers / Memory
- @dev2049
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17



 -->
2023-06-17 09:44:28 -07:00
Richy Wang 444ca3f669 Improve AnalyticDB Vector Store implementation without affecting user (#6086)
Hi there:

As I implement the AnalyticDB VectorStore use two table to store the
document before. It seems just use one table is a better way. So this
commit is try to improve AnalyticDB VectorStore implementation without
affecting user behavior:

**1. Streamline the `post_init `behavior by creating a single table with
vector indexing.
2. Update the `add_texts` API for document insertion.
3. Optimize `similarity_search_with_score_by_vector` to retrieve results
directly from the table.
4. Implement `_similarity_search_with_relevance_scores`.
5. Add `embedding_dimension` parameter to support different dimension
embedding functions.**

Users can continue using the API as before. 
Test cases added before is enough to meet this commit.
2023-06-17 09:36:31 -07:00
Harrison Chase af18413d97 Harrison/deeplake new features (#6263)
Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-16 17:53:55 -07:00
ljeagle ad324a39ae Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316)
1. Changed the implementation of add_texts interface for the AwaDB
vector store in order to improve the performance
2. Upgrade the AwaDB from 0.3.2 to 0.3.3

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-16 16:50:01 -07:00
Wenchen Li f9edf76e7c Implement max_marginal_relevance_search in VectorStore of Pinecone (#6056)
This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
2023-06-13 10:46:45 -07:00
Harrison Chase d1561b74eb Harrison/cognitive search (#6011)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
2023-06-11 21:15:42 -07:00
Harrison Chase e05997c25e Harrison/hologres (#6012)
Co-authored-by: Changgeng Zhao <changgeng@nyu.edu>
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
2023-06-11 20:56:51 -07:00
ljeagle c5bce4a465 add from_documents interface in awadb vector store (#6023)
added new interface from_documents in awadb vector store
  @dev2049

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-11 19:35:03 -07:00
Akhil Vempali d7d629911b feat: Added filtering option to FAISS vectorstore (#5966)
Inspired by the filtering capability available in ChromaDB, added the
same functionality to the FAISS vectorestore as well. Since FAISS does
not have an inbuilt method of filtering used the approach suggested in
this [thread](https://github.com/facebookresearch/faiss/issues/1079)
Langchain Issue inspiration:
https://github.com/hwchase17/langchain/issues/4572

- [x] Added filtering capability to semantic similarly and MMR
- [x] Added test cases for filtering in
`tests/integration_tests/vectorstores/test_faiss.py`

#### Who can review?

Tag maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
  - @hwchase17
2023-06-11 13:20:03 -07:00
Ofer Mendelevitch f8cf09a230 Update to Vectara integration (#5950)
This PR updates the Vectara integration (@hwchase17 ):
* Adds reuse of requests.session to imrpove efficiency and speed.
* Utilizes Vectara's low-level API (instead of standard API) to better
match user's specific chunking with LangChain
* Now add_texts puts all the texts into a single Vectara document so
indexing is much faster.
* updated variables names from alpha to lambda_val (to be consistent
with Vectara docs) and added n_context_sentence so it's available to use
if needed.
* Updates to documentation and tests

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-10 16:27:01 -07:00
Harrison Chase 9218684759 Add a new vector store - AwaDB (#5971) (#5992)
Added AwaDB vector store, which is a wrapper over the AwaDB, that can be
used as a vector storage and has an efficient similarity search. Added
integration tests for the vector store
Added jupyter notebook with the example

Delete a unneeded empty file and resolve the
conflict(https://github.com/hwchase17/langchain/pull/5886)

Please check, Thanks!

@dev2049
@hwchase17

---------

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: ljeagle <vincent_jieli@yeah.net>
Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-10 15:42:32 -07:00
Kacper Łukawski 7cc200766e Expose full params in Qdrant (#5947)
# Expose full params in Qdrant

There were many questions regarding supporting some additional
parameters in Qdrant integration. Qdrant supports many vector search
optimizations that were impossible to use directly in Qdrant before.
That includes:

1. Possibility to manipulate collection params while using
`Qdrant.from_texts`. The PR allows setting things such as quantization,
HNWS config, optimizers config, etc. That makes it consistent with raw
`QdrantClient`.
2. Extended options while searching. It includes HNSW options, exact
search, score threshold filtering, and read consistency in distributed
mode.

After merging that PR, #4858 might also be closed.

## Who can review?

VectorStores / Retrievers / Memory

@dev2049 @hwchase17
2023-06-09 08:56:32 -07:00
volodymyr-memsql a1549901ce Added SingleStoreDB Vector Store (#5619)
- Added `SingleStoreDB` vector store, which is a wrapper over the
SingleStore DB database, that can be used as a vector storage and has an
efficient similarity search.
- Added integration tests for the vector store
- Added jupyter notebook with the example

@dev2049

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 20:45:33 -07:00
Jeff Vestal 3294774148 Add knn and query search field options to ElasticKnnSearch (#5641)
in the `ElasticKnnSearch` class added 2 arguments that were not exposed
properly

`knn_search` added:
- `vector_query_field: Optional[str] = 'vector'`
-- vector_query_field: Field name to use in knn search if not default
'vector'

`knn_hybrid_search` added:
- `vector_query_field: Optional[str] = 'vector'`
-- vector_query_field: Field name to use in knn search if not default
'vector'
- `query_field: Optional[str] = 'text'`
-- query_field: Field name to use in search if not default 'text'



Fixes # https://github.com/hwchase17/langchain/issues/5633


cc: @dev2049 @hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 20:19:14 -07:00
Mark Marryatt cef79ca579 Fix exporting GCP Vertex Matching Engine from vectorstores (#5793)
The Vertex Matching Engine docs include [the
line](https://github.com/hwchase17/langchain/blob/b177a29d3f942eeccf85814f0f628c32509b9b6a/docs/modules/indexes/vectorstores/examples/matchingengine.ipynb?short_path=54ebfde#L32)
`from langchain.vectorstores import MatchingEngine` which doesn't work
as it wasn't added to the vectorestores module exports.



  - @dev2049
2023-06-07 19:45:33 -07:00
bnassivet 9355e3f5f5 qdrant vector store - search with relevancy scores (#5781)
Implementation of similarity_search_with_relevance_scores for quadrant
vector store.
As implemented the method is also compatible with other capacities such
as filtering.

Integration tests updated.


#### Who can review?

Tag maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
2023-06-07 19:26:40 -07:00
Paul-Emile Brotons daf3e99b96 fixing from_documents method of the MongoDB Atlas vector store (#5794)
FIxed a bug in from_documents method --> Collection objects do not
implement truth value testing or bool().
@dev2049
2023-06-06 14:22:23 -07:00
berkedilekoglu f907b62526 Scores are explained in vectorestore docs (#5613)
# Scores in Vectorestores' Docs Are Explained

Following vectorestores can return scores with similar documents by
using `similarity_search_with_score`:
- chroma
- docarray_hnsw
- docarray_in_memory
- faiss
- myscale
- qdrant
- supabase
- vectara
- weaviate

However, in documents, these scores were either not explained at all or
explained in a way that could lead to misunderstandings (e.g., FAISS).
For instance in FAISS document: if we consider the score returned by the
function as a similarity score, we understand that a document returning
a higher score is more similar to the source document. However, since
the scores returned by the function are distance scores, we should
understand that smaller scores correspond to more similar documents.

For the libraries other than Vectara, I wrote the scores they use by
investigating from the source libraries. Since I couldn't be certain
about the score metric used by Vectara, I didn't make any changes in its
documentation. The links mentioned in Vectara's documentation became
broken due to updates, so I replaced them with working ones.

VectorStores / Retrievers / Memory
  - @dev2049

my twitter: [berkedilekoglu](https://twitter.com/berkedilekoglu)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-05 20:39:49 -07:00
Adil Ansari 233b52735e feat: Support for Tigris Vector Database for vector search (#5703)
### Changes
- New vector store integration - [Tigris](https://tigrisdata.com)
- Adds [tigrisdb](https://pypi.org/project/tigrisdb/) optional
dependency
- Example notebook demonstrating usage

Fixes #5535 
Closes tigrisdata/tigris-client-python#40

#### Twitter handles
We'd love a shoutout on our
[@TigrisData](https://twitter.com/TigrisData) and
[@adilansari](https://twitter.com/adilansari) twitter handles

#### Who can review?
@dev2049

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-05 20:39:16 -07:00
Hao Chen a4c9053d40 Integrate Clickhouse as Vector Store (#5650)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

#### Description

This PR is mainly to integrate open source version of ClickHouse as
Vector Store as it is easy for both local development and adoption of
LangChain for enterprises who already have large scale clickhouse
deployment.

ClickHouse is a open source real-time OLAP database with full SQL
support and a wide range of functions to assist users in writing
analytical queries. Some of these functions and data structures perform
distance operations between vectors, [enabling ClickHouse to be used as
a vector
database](https://clickhouse.com/blog/vector-search-clickhouse-p1).
Recently added ClickHouse capabilities like [Approximate Nearest
Neighbour (ANN)
indices](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes)
support faster approximate matching of vectors and provide a promising
development aimed to further enhance the vector matching capabilities of
ClickHouse.

In LangChain, some ClickHouse based commercial variant vector stores
like
[Chroma](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py)
and
[MyScale](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/myscale.py),
etc are already integrated, but for some enterprises with large scale
Clickhouse clusters deployment, it will be more straightforward to
upgrade existing clickhouse infra instead of moving to another similar
vector store solution, so we believe it's a valid requirement to
integrate open source version of ClickHouse as vector store.

As `clickhouse-connect` is already included by other integrations, this
PR won't include any new dependencies.

#### Before submitting

<!-- If you're adding a new integration, please include:

1. Added a test for the integration:
https://github.com/haoch/langchain/blob/clickhouse/tests/integration_tests/vectorstores/test_clickhouse.py
2. Added an example notebook and document showing its use: 
* Notebook:
https://github.com/haoch/langchain/blob/clickhouse/docs/modules/indexes/vectorstores/examples/clickhouse.ipynb
* Doc:
https://github.com/haoch/langchain/blob/clickhouse/docs/integrations/clickhouse.md

See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

1. Added a test for the integration:
https://github.com/haoch/langchain/blob/clickhouse/tests/integration_tests/vectorstores/test_clickhouse.py
2. Added an example notebook and document showing its use: 
* Notebook:
https://github.com/haoch/langchain/blob/clickhouse/docs/modules/indexes/vectorstores/examples/clickhouse.ipynb
* Doc:
https://github.com/haoch/langchain/blob/clickhouse/docs/integrations/clickhouse.md


#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
@hwchase17 @dev2049 Could you please help review?

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-05 13:32:04 -07:00
Gustavo Brian 2f2d27fd82 Error in documentation: Chroma constructor (#5731)
Chroma("langchain_store", embeddings.embed_query) must be
Chroma("langchain_store", embeddings)
2023-06-05 13:30:58 -07:00
Tobias Herbold 3fb0e4872a sqlalchemy MovedIn20Warning declarative_base DEPRICATION fix (#5676)
fix for the sqlalchemy deprecated declarative_base import :

```
MovedIn20Warning: The ``declarative_base()`` function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
  Base = declarative_base()  # type: Any
```

Import is wrapped in an try catch Block to fallback to the old import if
needed.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-04 16:52:52 -07:00
Jiayao Yu 6a3ceaa377 Support similarity_score_threshold retrieval with Chroma (#5655)
Fixes https://github.com/hwchase17/langchain/issues/5067

Verified the following code now works correctly:
```
db = Chroma(persist_directory=index_directory(index_name), embedding_function=embeddings)
retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.4})
docs = retriever.get_relevant_documents(query)
```
2023-06-03 16:57:00 -07:00
rajib 1c51d3db0f Created fix for 5475 (#5659)
Created fix for 5475
Currently in PGvector, we do not have any function that returns the
instance of an existing store. The from_documents always adds embeddings
and then returns the store. This fix is to add a function that will
return the instance of an existing store

Also changed the jupyter example for PGVector to show the example of
using the function

<!-- Remove if not applicable -->

Fixes # 5475

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
@dev2049
@hwchase17 

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: rajib76 <rajib76@yahoo.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-03 16:47:52 -07:00
Paul-Emile Brotons 92f218207b removing client+namespace in favor of collection (#5610)
removing client+namespace in favor of collection for an easier
instantiation and to be similar to the typescript library

@dev2049
2023-06-03 16:27:31 -07:00
Caleb Ellington c5a7a85a4e fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug (#5584)
# Chroma update_document full document embeddings bugfix

Chroma update_document takes a single document, but treats the
page_content sting of that document as a list when getting the new
document embedding.

This is a two-fold problem, where the resulting embedding for the
updated document is incorrect (it's only an embedding of the first
character in the new page_content) and it calls the embedding function
for every character in the new page_content string, using many tokens in
the process.

Fixes #5582


Co-authored-by: Caleb Ellington <calebellington@Calebs-MBP.hsd1.ca.comcast.net>
2023-06-02 11:12:48 -07:00
Kacper Łukawski 71a7c16ee0 Fix: Qdrant ids (#5515)
# Fix Qdrant ids creation

There has been a bug in how the ids were created in the Qdrant vector
store. They were previously calculated based on the texts. However,
there are some scenarios in which two documents may have the same piece
of text but different metadata, and that's a valid case. Deduplication
should be done outside of insertion.

It has been fixed and covered with the integration tests.
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-02 08:57:34 -07:00
Jeff Vestal d1f65d8dc1 Es knn index search 5346 (#5569)
# Create elastic_vector_search.ElasticKnnSearch class

This extends `langchain/vectorstores/elastic_vector_search.py` by adding
a new class `ElasticKnnSearch`

Features:
- Allow creating an index with the `dense_vector` mapping compataible
with kNN search
- Store embeddings in index for use with kNN search (correct mapping
creates HNSW data structure)
- Perform approximate kNN search
- Perform hybrid BM25 (`query{}`) + kNN (`knn{}`) search
- perform knn search by either providing a `query_vector` or passing a
hosted `model_id` to use query_vector_builder to automatically generate
a query_vector at search time

Connection options
- Using `cloud_id` from Elastic Cloud
- Passing elasticsearch client object

search options
- query
- k
- query_vector
- model_id
- size
- source
- knn_boost (hybrid search)
- query_boost (hybrid search)
- fields


This also adds examples to
`docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb`


Fixes # [5346](https://github.com/hwchase17/langchain/issues/5346)

cc: @dev2049

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-02 08:40:35 -07:00
sseide 001b147450 Documentation fixes (linting and broken links) (#5563)
# Lint sphinx documentation and fix broken links

This PR lints multiple warnings shown in generation of the project
documentation (using "make docs_linkcheck" and "make docs_build").
Additionally documentation internal links to (now?) non-existent files
are modified to point to existing documents as it seemed the new correct
target.

The documentation is not updated content wise.
There are no source code changes.

Fixes # (issue)

- broken documentation links to other files within the project
- sphinx formatting (linting)

## Before submitting

No source code changes, so no new tests added.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-01 13:06:17 -07:00
Blithe 80b3fdf2f7 make the elasticsearch api support version which below 8.x (#5495)
the api which create index or search in the elasticsearch below 8.x is
different with 8.x. When use the es which below 8.x , it will throw
error. I fix the problem


Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>
2023-06-01 10:58:20 -07:00
Davis Chase 6afb463e9b Qdrant self query (#5567)
Add self query abilities to qdrant vectorstore
2023-06-01 08:40:31 -07:00
Bharat Ramanathan 22603d19e0 feat(integrations): Add WandbTracer (#4521)
# WandbTracer
This PR adds the `WandbTracer` and deprecates the existing
`WandbCallbackHandler`.

Added an example notebook under the docs section alongside the
`LangchainTracer`
Here's an example
[colab](https://colab.research.google.com/drive/1pY13ym8ENEZ8Fh7nA99ILk2GcdUQu0jR?usp=sharing)
with the same notebook and the
[trace](https://wandb.ai/parambharat/langchain-tracing/runs/8i45cst6)
generated from the colab run


Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-01 00:01:19 -07:00
Sheng Han Lim 3bae595182 Add texts with embeddings to PGVector wrapper (#5500)
Similar to #1813 for faiss, this PR is to extend functionality to pass
text and its vector pair to initialize and add embeddings to the
PGVector wrapper.

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
  - @dev2049
2023-05-31 17:31:52 -07:00
Harrison Chase 470b2822a3 Add matching engine vectorstore (#3350)
Co-authored-by: Tom Piaggio <tomaspiaggio@google.com>
Co-authored-by: scafati98 <jupyter@matchingengine.us-central1-a.c.scafati-joonix.internal>
Co-authored-by: scafati98 <scafatieugenio@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 02:28:02 -07:00
Kacper Łukawski 8bcaca435a Feature: Qdrant filters supports (#5446)
# Support Qdrant filters

Qdrant has an [extensive filtering
system](https://qdrant.tech/documentation/concepts/filtering/) with rich
type support. This PR makes it possible to use the filters in Langchain
by passing an additional param to both the
`similarity_search_with_score` and `similarity_search` methods.

## Who can review?

@dev2049 @hwchase17

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 02:26:16 -07:00
Janos Tolgyesi 1111f18eb4 Add maximal relevance search to SKLearnVectorStore (#5430)
# Add maximal relevance search to SKLearnVectorStore

This PR implements the maximum relevance search in SKLearnVectorStore. 

Twitter handle: jtolgyesi (I submitted also the original implementation
of SKLearnVectorStore)

## Before submitting

Unit tests are included.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 16:13:33 -07:00
Kacper Łukawski f93d256190 Feat: Add batching to Qdrant (#5443)
# Add batching to Qdrant

Several people requested a batching mechanism while uploading data to
Qdrant. It is important, as there are some limits for the maximum size
of the request payload, and without batching implemented in Langchain,
users need to implement it on their own. This PR exposes a new optional
`batch_size` parameter, so all the documents/texts are loaded in batches
of the expected size (64, by default).

The integration tests of Qdrant are extended to cover two cases:
1. Documents are sent in separate batches.
2. All the documents are sent in a single request.
2023-05-30 15:33:54 -07:00
Jan Brinkmann 0d3a9d481f Fixed docstring in faiss.py for load_local (#5440)
# Fix for docstring in faiss.py vectorstore (load_local)

The doctring should reflect that load_local loads something FROM the
disk.
2023-05-30 11:41:00 -07:00
Paul-Emile Brotons a61b7f7e7c adding MongoDBAtlasVectorSearch (#5338)
# Add MongoDBAtlasVectorSearch for the python library

Fixes #5337
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 07:59:01 -07:00
Martin Holecek 44b48d9518 Fix update_document function, add test and documentation. (#5359)
# Fix for `update_document` Function in Chroma

## Summary
This pull request addresses an issue with the `update_document` function
in the Chroma class, as described in
[#5031](https://github.com/hwchase17/langchain/issues/5031#issuecomment-1562577947).
The issue was identified as an `AttributeError` raised when calling
`update_document` due to a missing corresponding method in the
`Collection` object. This fix refactors the `update_document` method in
`Chroma` to correctly interact with the `Collection` object.

## Changes
1. Fixed the `update_document` method in the `Chroma` class to correctly
call methods on the `Collection` object.
2. Added the corresponding test `test_chroma_update_document` in
`tests/integration_tests/vectorstores/test_chroma.py` to reflect the
updated method call.
3. Added an example and explanation of how to use the `update_document`
function in the Jupyter notebook tutorial for Chroma.

## Test Plan
All existing tests pass after this change. In addition, the
`test_chroma_update_document` test case now correctly checks the
functionality of `update_document`, ensuring that the function works as
expected and updates the content of documents correctly.

## Reviewers
@dev2049

This fix will ensure that users are able to use the `update_document`
function as expected, without encountering the previous
`AttributeError`. This will enhance the usability and reliability of the
Chroma class for all users.

Thank you for considering this pull request. I look forward to your
feedback and suggestions.
2023-05-29 06:39:25 -07:00
Matt Wells 9a5c9df809 Fixes iter error in FAISS add_embeddings call (#5367)
# Remove re-use of iter within add_embeddings causing error

As reported in https://github.com/hwchase17/langchain/issues/5336 there
is an issue currently involving the atempted re-use of an iterator
within the FAISS vectorstore adapter

Fixes # https://github.com/hwchase17/langchain/issues/5336

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
2023-05-28 09:59:30 -07:00
Janos Tolgyesi 5f4552391f Add SKLearnVectorStore (#5305)
# Add SKLearnVectorStore

This PR adds SKLearnVectorStore, a simply vector store based on
NearestNeighbors implementations in the scikit-learn package. This
provides a simple drop-in vector store implementation with minimal
dependencies (scikit-learn is typically installed in a data scientist /
ml engineer environment). The vector store can be persisted and loaded
from json, bson and parquet format.

SKLearnVectorStore has soft (dynamic) dependency on the scikit-learn,
numpy and pandas packages. Persisting to bson requires the bson package,
persisting to parquet requires the pyarrow package.

## Before submitting

Integration tests are provided under
`tests/integration_tests/vectorstores/test_sklearn.py`

Sample usage notebook is provided under
`docs/modules/indexes/vectorstores/examples/sklear.ipynb`

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-28 08:17:42 -07:00
Davis Chase 3be9ba14f3 OpenSearch top k parameter fix (#5216)
For most queries it's the `size` parameter that determines final number
of documents to return. Since our abstractions refer to this as `k`, set
this to be `k` everywhere instead of expecting a separate param. Would
be great to have someone more familiar with OpenSearch validate that
this is reasonable (e.g. that having `size` and what OpenSearch calls
`k` be the same won't lead to any strange behavior). cc @naveentatikonda

Closes #5212
2023-05-25 09:51:23 -07:00
Ati Sharma 40b086d6e8 Allow to specify ID when adding to the FAISS vectorstore. (#5190)
# Allow to specify ID when adding to the FAISS vectorstore

This change allows unique IDs to be specified when adding documents /
embeddings to a faiss vectorstore.

- This reflects the current approach with the chroma vectorstore.
- It allows rejection of inserts on duplicate IDs
- will allow deletion / update by searching on deterministic ID (such as
a hash).
- If not specified, a random UUID is generated (as per previous
behaviour, so non-breaking).

This commit fixes #5065 and #3896 and should fix #2699 indirectly. I've
tested adding and merging.

Kindly tagging @Xmaster6y @dev2049 for review.

---------

Co-authored-by: Ati Sharma <ati@agalmic.ltd>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-24 22:26:46 -07:00
Matt Wells c173bf1c62 Fixes scope of query Session in PGVector (#5194)
`vectorstore.PGVector`: The transactional boundary should be increased
to cover the query itself

Currently, within the `similarity_search_with_score_by_vector` the
transactional boundary (created via the `Session` call) does not include
the select query being made.

This can result in un-intended consequences when interacting with the
PGVector instance methods directly


---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 10:37:45 -07:00
Shukri b00c77dc62 Improve weaviate vectorstore docs (#5201)
# Improve weaviate vectorstore docs
2023-05-24 09:31:48 -07:00
Saba Sturua 47e4ee4370 adjust docarray docstrings (#5185)
Follow up of https://github.com/hwchase17/langchain/pull/5015

Thanks for catching this! 

Just a small PR to adjust couple of strings to these changes

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
2023-05-24 07:50:35 -07:00
Ofer Mendelevitch c81fb88035 Vectara (#5069)
# Vectara Integration

This PR provides integration with Vectara. Implemented here are:
* langchain/vectorstore/vectara.py
* tests/integration_tests/vectorstores/test_vectara.py
* langchain/retrievers/vectara_retriever.py
And two IPYNB notebooks to do more testing:
* docs/modules/chains/index_examples/vectara_text_generation.ipynb
* docs/modules/indexes/vectorstores/examples/vectara.ipynb

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 01:24:58 -07:00