Commit Graph

915 Commits

Author SHA1 Message Date
Xin Qiu ea142f6a32 feat: add drop index in redis and fix prefix generate logic (#1857)
# Description

Add `drop_index` for redis

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redis import Redis

Redis.drop_index(index_name="doc",delete_documents=False)
```
2023-03-22 19:44:42 -07:00
Eli 12f868b292 Propagate "filter" arg in Chroma similarity_search (#1869)
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
2023-03-22 19:40:10 -07:00
Memento Mori 31f9ecfc19 Fix tiktoken version (#1882)
Fix https://github.com/hwchase17/langchain/issues/1881
This issue occurs when using `'gpt-3.5-turbo'` with
`VectorDBQAWithSourcesChain`
2023-03-22 19:39:57 -07:00
Eric Zhu 273e9bf296 Simplify AzureChatOpenAI implementation. (#1902)
Change AzureChatOpenAI class implementation as Azure just added support
for chat completion API. See:
https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions.
This should make the code much simpler.
2023-03-22 19:36:51 -07:00
Maurício Maia f155d9d3ec Add metadata filter to PGVector search (#1872)
Add ability to filter pgvector documents by metadata.
2023-03-22 15:21:40 -07:00
Klein Tahiraj d3d4503ce2 Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891)
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
2023-03-22 15:19:42 -07:00
Harrison Chase 1f93c5cf69 extraction docs (#1898) 2023-03-22 15:00:44 -07:00
Sean Zheng 15b5a08f4b Update how_to_guides.rst (#1893)
Adding OpenSearch examples
2023-03-22 14:30:43 -07:00
Kushal Chordiya ff4a25b841 Fix minor bug in opensearch vector store add_texts function (#1878)
In the langchain.vectorstores.opensearch_vector_search.py, in the
add_texts function, around line 247, we have the following code

```python
embeddings = [
     self.embedding_function.embed_documents(list(text))[0] for text in texts
]
```

the goal of the `list(text)` part I believe is to pass a list to the
embed_documents list instead of a a str. However, `list(text)` is a
subtle bug

`list(text)` would convert the string text into an array, where each
element of the array is a character of the string

<img width="937" alt="Screenshot 2023-03-22 at 1 27 18 PM"
src="https://user-images.githubusercontent.com/88190553/226836470-384665a1-2f13-46bc-acfc-9a37417cd918.png">

The correct way should be to change the code to 

```python
embeddings = [
      self.embedding_function.embed_documents([text])[0] for text in texts
]
```
Which wraps the string inside a list.
2023-03-22 11:27:32 -07:00
Maurício Maia 2212520a6c Add PGVector collection metadata (#1887)
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
2023-03-22 11:27:07 -07:00
Harrison Chase d08f940336 principles list (#1888) 2023-03-22 10:48:38 -07:00
Harrison Chase 2280a2cb2f bump version to 119 (#1886) 2023-03-22 08:36:09 -07:00
Harrison Chase ce5d97bcb3 Harrison/guarded output parser (#1804)
Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>
2023-03-21 22:07:23 -07:00
DeadBranch 8fa1764c60 docs: update gpt index references to LlamaIndex (#1856)
The GPT Index project is transitioning to the new project name,
LlamaIndex.

I've updated a few files referencing the old project name and repository
URL to the current ones.

From the [LlamaIndex repo](https://github.com/jerryjliu/llama_index):
> NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out
this transition gradually.
>
> 2/25/2023: By default, our docs/notebooks/instructions now reference
"LlamaIndex" instead of "GPT Index".
>
> 2/19/2023: By default, our docs/notebooks/instructions now use the
llama-index package. However the gpt-index package still exists as a
duplicate!
>
> 2/16/2023: We have a duplicate llama-index pip package. Simply replace
all imports of gpt_index with llama_index if you choose to pip install
llama-index.

I'm not associated with LlamaIndex in any way. I just noticed the
discrepancy when studying the lanchain documentation.
2023-03-21 22:01:05 -07:00
Harrison Chase f299bd1416 clean up sagemaker nb (#1875) 2023-03-21 22:00:08 -07:00
Philipp Schmid 064be93edf [Embeddings] Add SageMaker Endpoint Embedding class (#1859)
# What does this PR do? 

This PR adds similar to `llms` a SageMaker-powered `embeddings` class.
This is helpful if you want to leverage Hugging Face models on SageMaker
for creating your indexes.

I added a example into the
[docs/modules/indexes/examples/embeddings.ipynb](https://github.com/hwchase17/langchain/compare/master...philschmid:add-sm-embeddings?expand=1#diff-e82629e2894974ec87856aedd769d4bdfe400314b03734f32bee5990bc7e8062)
document. The example currently includes some `_### TEMPORARY: Showing
how to deploy a SageMaker Endpoint from a Hugging Face model ###_ ` code
showing how you can deploy a sentence-transformers to SageMaker and then
run the methods of the embeddings class.

@hwchase17 please let me know if/when i should remove the `_###
TEMPORARY: Showing how to deploy a SageMaker Endpoint from a Hugging
Face model ###_` in the description i linked to a detail blog on how to
deploy a Sentence Transformers so i think we don't need to include those
steps here.

I also reused the `ContentHandlerBase` from
`langchain.llms.sagemaker_endpoint` and changed the output type to `any`
since it is depending on the implementation.
2023-03-21 21:51:48 -07:00
anupam-tiwari 86822d1cc2 Fixes the import typo in the vector db text generator notebook (#1874)
Fixes the import typo in the vector db text generator notebook for the
chroma library

Co-authored-by: Anupam <anupam@10-16-252-145.dynapool.wireless.nyu.edu>
2023-03-21 21:48:26 -07:00
Harrison Chase a581bce379 remove key (#1863) 2023-03-21 12:43:41 -07:00
Harrison Chase 2ffc643086 add listen api docs (#1855) 2023-03-21 09:29:34 -07:00
Harrison Chase 2136dc94bb bump version to 118 (#1854) 2023-03-21 09:15:52 -07:00
Matt Tucker a92344f476 Use regex match for bash process error output test assertion. (#1837)
I was getting the same issue reported in #1339 by
[MacYang555](https://github.com/MacYang555) when running the test suite
on my Mac. I implemented the fix they suggested to use a regex match in
the output assertion for the scenario under test.

Resolves #1339
2023-03-21 09:06:52 -07:00
Tomoko Uchida b706966ebc Add setup instruction in Getting Started for Indexing (#1847)
`VectorstoreIndexCreator` [uses Chroma as the vectorstore by
default](https://github.com/hwchase17/langchain/blob/1c22657256a69ecf739134da7d9cec5e9365a75f/langchain/indexes/vectorstore.py#L49).
It may be helpful to add a short note for the setup.

You can see how the notebook looks here.

https://github.com/mocobeta/langchain/blob/feat/add-setup-instruction-to-index-getting-started/docs/modules/indexes/getting_started.ipynb
2023-03-21 09:06:35 -07:00
Harrison Chase 1c22657256 Harrison/faiss merge (#1843)
Co-authored-by: Ting Su <ting.su.1995@outlook.com>
2023-03-20 22:54:08 -07:00
Harrison Chase 6f02286805 Harrison/subtitles (#1842)
Co-authored-by: David Ruan <ruanwz@gmail.com>
Co-authored-by: David Ruan <david.ruan@analyticservice.net>
2023-03-20 22:53:52 -07:00
Simon Zhou 3674074eb0 Add Qdrant to ecosystem page (#1830)
Add [Qdrant](https://qdrant.tech/) to [LangChain
ecosystem](https://langchain.readthedocs.io/en/latest/ecosystem.html)
page.
2023-03-20 22:06:40 -07:00
Wenbin Fang a7e09d46c5 Add podcast api tool to use NLP to search all podcasts or episodes. (#1833)
Use the following code to test:

```python
import os
from langchain.llms import OpenAI
from langchain.chains.api import podcast_docs
from langchain.chains import APIChain

# Get api key here: https://openai.com/pricing
os.environ["OPENAI_API_KEY"] = "sk-xxxxx"

# Get api key here: https://www.listennotes.com/api/pricing/
listen_api_key = 'xxx'

llm = OpenAI(temperature=0)
headers = {"X-ListenAPI-Key": listen_api_key}
chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)
chain.run("Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results")
```

Known issues: the api response data might be too big, and we'll get such
error:
`openai.error.InvalidRequestError: This model's maximum context length
is 4097 tokens, however you requested 6733 tokens (6477 in your prompt;
256 for the completion). Please reduce your prompt; or completion
length.`
2023-03-20 22:04:17 -07:00
Matt Tucker fa2e546b76 Add workaround for debugpy install issue to contrib docs. (#1835)
When following the Quick Start instructions in the contributing docs, I
was getting a "WheelFileValidationError" on installation of debugpy
which was blocking the installation of a number of other deps. Google
turned up this [GitHub
issue](https://github.com/microsoft/debugpy/issues/1246) indicating a
regression in Poetry 1.4.1 and workarounds.

This PR updates the contrib docs noting the issue and the workarounds.
2023-03-20 22:03:19 -07:00
Daniel Dror (Dubovski) c592b12043 Allow passing in encoding to csv_loader (#1836) 2023-03-20 22:03:00 -07:00
Ikko Eltociear Ashimine 9555bbd5bb Fix typo in sqlite.ipynb (#1828)
overriden -> overridden
2023-03-20 16:47:19 -07:00
Harrison Chase 0ca1641b14 release 0.0.117 (#1819) 2023-03-20 08:04:04 -07:00
Harrison Chase d5b4393bb2 Harrison/llm math (#1808)
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2023-03-20 07:53:26 -07:00
Bryan Helmig 7b6ff7fe00 Follow up to #1803 to remove dynamic docs route. (#1818)
The base docs are going to be more stable and familiar for folks.
Dynamic route is currently in flux.
2023-03-20 07:52:41 -07:00
Harrison Chase 76c7b1f677 Harrison/wandb (#1764)
Co-authored-by: Anish Shah <93145909+ash0ts@users.noreply.github.com>
2023-03-20 07:52:27 -07:00
Paul 5aa8ece211 Corrected small typo in error message. (#1791) 2023-03-20 07:51:35 -07:00
Harrison Chase f6d24d5740 fix bug with openai token count (#1806) 2023-03-20 07:51:18 -07:00
Harrison Chase b1c4480d7c fix typing (#1807) 2023-03-20 07:50:49 -07:00
Daniel Chalef b6ba989f2f Add request timeout to ChatOpenAI (#1798)
Add request_timeout field to ChatOpenAI. Defaults to 60s.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-03-19 20:19:42 -07:00
Ankush Gola 04acda55ec Don't use dynamic api endpoint for Zapier NLA (#1803)
From Robert "Right now the dynamic/ route for specifically the above
endpoints is acting on all providers a user has set up, not just the
provider for the supplied API key."
2023-03-19 20:12:33 -07:00
Harrison Chase 8e5c4ac867 bump version to 0.0.116 (#1788) 2023-03-19 11:01:16 -07:00
Aratako df8702fead Small fix: Remove unused variable summary_message_role (#1789)
After the changes in #1783, `summary_message_role` is no longer used in
`ConversationSummaryBufferMemory`, so this PR removes it.
2023-03-19 11:01:03 -07:00
Harrison Chase d5d50c39e6 Harrison/azure embeddings (#1787)
Co-authored-by: Hemant <4627288+ghaccount@users.noreply.github.com>
2023-03-19 10:42:33 -07:00
Harrison Chase 1f18698b2a Harrison/token buffer memory (#1786)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-19 10:42:24 -07:00
Harrison Chase ef4945af6b Harrison/chat token usage (#1785) 2023-03-19 10:32:31 -07:00
Harrison Chase 7de2ada3ea Harrison/add source column (#1784)
Co-authored-by: Brian Graham <46691715+briangrahamww@users.noreply.github.com>
Co-authored-by: briangrahamww <brian.graham@ww.com>
2023-03-19 10:32:13 -07:00
Bernat Felip i Díaz 262d4cb9a8 Use embedding instead of embedding function in ElasticVectorStore (#1692)
While it might be a bit more restrictive, I find that using the
Embedding interface as an input for the vector store creation is better
than an embedding function because we can use bulk requests and possibly
the retry logic if needed.

I have seen that some vector store implementations use Embedding while
others use embedding function so I don't know what is the criteria to
have one or the other, in my opinion they should all just be Embedding
or have a way more complex embedding function that accepts multiple
texts instead of one by one.

---------

Co-authored-by: Bernat Felip <bernat.felip@rea.ch>
2023-03-19 10:23:38 -07:00
Harrison Chase 951c158106 Harrison/summary message rol (#1783)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-19 10:09:18 -07:00
Bao Nguyen 85e4dd7fc3 Fix wrong prompt in refine chain (#1770)
I got this during testing 

```
ValueError: Missing some input keys: {'existing_answer'}
```

Upon review, the initial prompt should be `QUESTION_PROMPT_SELECTOR`.

Co-authored-by: Bao Nguyen <bnguyen@roku.com>
2023-03-19 10:03:45 -07:00
Harrison Chase b1b4a4065a change chat default (#1782)
Resolves https://github.com/hwchase17/langchain/issues/1532, resolves
https://github.com/hwchase17/langchain/issues/1652.
2023-03-19 10:01:59 -07:00
Huang Chongdi 08f23c95d9 add encoding parameter to ObsidianLoader (#1752) 2023-03-19 09:48:31 -07:00
hitoshi44 3cf493b089 Fix Document & Expose StringPromptTemplate as a custom-prompt-template. (#1753)
Regarding [this
issue](https://github.com/hwchase17/langchain/issues/1754), the code in
the document [Creating a custom prompt
template](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/custom_prompt_template.html)
is no longer functional and outdated.

To address this, I have made the following changes:

1. Updated the guide in the document to use `StringPromptTemplate`
instead of `BasePromptTemplate`.
2. Exposed `StringPromptTemplate` in `prompts/__init__.py` for easier
importing.
2023-03-19 09:47:56 -07:00