Commit Graph

1707 Commits

Author SHA1 Message Date
lonestriker 6f36f0f930 Add oobabooga/text-generation-webui support as a llm (#5997)
Add oobabooga/text-generation-webui support as an LLM. Currently,
supports using text-generation-webui's non-streaming API interface.
Allows users who already have text-gen running to use the same models
with langchain.

#### Before submitting

Simple usage, similar to existing LLM supported:

```
from langchain.llms import TextGen
llm = TextGen(model_url = "http://localhost:5000")
```
#### Who can review?

 @hwchase17 - project lead

---------

Co-authored-by: Hien Ngo <Hien.Ngo@adia.ae>
2023-06-17 09:42:15 -07:00
Richy Wang 444ca3f669 Improve AnalyticDB Vector Store implementation without affecting user (#6086)
Hi there:

As I implement the AnalyticDB VectorStore use two table to store the
document before. It seems just use one table is a better way. So this
commit is try to improve AnalyticDB VectorStore implementation without
affecting user behavior:

**1. Streamline the `post_init `behavior by creating a single table with
vector indexing.
2. Update the `add_texts` API for document insertion.
3. Optimize `similarity_search_with_score_by_vector` to retrieve results
directly from the table.
4. Implement `_similarity_search_with_relevance_scores`.
5. Add `embedding_dimension` parameter to support different dimension
embedding functions.**

Users can continue using the API as before. 
Test cases added before is enough to meet this commit.
2023-06-17 09:36:31 -07:00
Ja-sonYun cdd1d78bf2 make modelname_to_contextsize as a staticmethod (#6040)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes ##6039

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17 @agola11
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-17 09:13:08 -07:00
Saba Sturua 427551eabf DocArray as a Retriever (#6031)
## DocArray as a Retriever

[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.

Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers

Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever


# define document schema
class MyDoc(BaseDoc):
    description: str
    description_embedding: NdArray[1536]


embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
    [
        MyDoc(description=desc, description_embedding=embedding)
        for desc, embedding in zip(descriptions, desc_embeddings)
    ]
)

# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)

# create a retriever
retriever = DocArrayRetriever(
    index=db,
    embeddings=embeddings,
    search_field="description_embedding",
    content_field="description",
)

# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```

#### Who can review?

@dev2049

---------

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
2023-06-17 09:09:33 -07:00
James O'Dwyer 0475d015fe Handle Managed Motorhead Data Key (#6169)
# Handle Managed Motorhead Data Key
Managed motorhead will return a payload with a `data` key. we need to
handle this to properly access messages from the server.
2023-06-16 20:36:18 -07:00
Luke Stanley 364f8e7b5d Better Entity Memory code documentation (#6318)
Just adds some comments and docstring improvements.

There was some behaviour that was quite unclear to me at first like:
- "when do things get updated?"
- "why are there only entity names and no summaries?"
- "why do the entity names disappear?" 

Now it can be much more obvious to many.

I am lukestanley on Twitter.
2023-06-16 18:08:44 -07:00
Harrison Chase af18413d97 Harrison/deeplake new features (#6263)
Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-16 17:53:55 -07:00
ljeagle ad324a39ae Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316)
1. Changed the implementation of add_texts interface for the AwaDB
vector store in order to improve the performance
2. Upgrade the AwaDB from 0.3.2 to 0.3.3

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-16 16:50:01 -07:00
Pierre Alexandre SCHEMBRI 9ca11c06b7 Fixes #6282 (#6283)
Fixes #6282 

1 liner to fix default http headers not passed by `LLMRequestsChain`
2023-06-16 16:21:01 -07:00
Davis Chase 87e502c6bc Doc refactor (#6300)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-16 11:52:56 -07:00
hp0404 b01cf0dd54 ArxivAPIWrapper - doc_content_chars_max (#6063)
This PR refactors the ArxivAPIWrapper class making
`doc_content_chars_max` parameter optional. Additionally, tests have
been added to ensure the functionality of the doc_content_chars_max
parameter.

Fixes #6027 (issue)
2023-06-15 22:16:42 -07:00
Daniel King a9b97aa6f4 Update output format of MosaicML endpoint to be more flexible (#6060)
There will likely be another change or two coming over the next couple
weeks as we stabilize the API, but putting this one in now which just
makes the integration a bit more flexible with the response output
format.

```
(langchain) danielking@MML-1B940F4333E2 langchain % pytest tests/integration_tests/llms/test_mosaicml.py tests/integration_tests/embeddings/test_mosaicml.py 
=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0
rootdir: /Users/danielking/github/langchain
configfile: pyproject.toml
plugins: asyncio-0.20.3, mock-3.10.0, dotenv-0.5.2, cov-4.0.0, anyio-3.6.2
asyncio: mode=strict
collected 12 items                                                                                                                                                                        

tests/integration_tests/llms/test_mosaicml.py ......                                                                                                                                [ 50%]
tests/integration_tests/embeddings/test_mosaicml.py ......                                                                                                                          [100%]

=================================================================================== slowest 5 durations ===================================================================================
4.76s call     tests/integration_tests/llms/test_mosaicml.py::test_retry_logic
4.74s call     tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_llm_call
4.13s call     tests/integration_tests/llms/test_mosaicml.py::test_instruct_prompt
0.91s call     tests/integration_tests/llms/test_mosaicml.py::test_short_retry_does_not_loop
0.66s call     tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_extra_kwargs
=================================================================================== 12 passed in 19.70s ===================================================================================
```

#### Who can review?

  @hwchase17
  @dev2049
2023-06-15 22:15:39 -07:00
JaysonAlbert 50d9c7d5a4 Fix: change the chatgpt plugin retriever metadata format (#5920)
the current implement put the doc itself as the metadata, but the
document chatgpt plugin retriever returned already has a `metadata`
field, it's better to use that instead.

the original code will throw the following exception when using
`RetrievalQAWithSourcesChain`, becuse it can not find the field
`metadata`:

```python
Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Document prompt requires documents to have metadata variables: ['source']. Received document with missing metadata: ['source'].
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 27, in format_document
    raise ValueError(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in <listcomp>
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in _get_inputs
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 85, in combine_docs
    inputs = self._get_inputs(docs, **kwargs)
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call
    output, extra_return_dict = self.combine_docs(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e
```

Additionally, the `metadata` filed in the `chatgpt plugin retriever`
have these fileds by default:
```json
{
    "source":  "file",   //email, file or chat
    "source_id": "filename.docx", // the filename
    "url": "", 
    ...
}
```
so, we should set `source_id` to `source` in the langchain metadata.

```python
metadata = d.pop("metadata", d)
if(metadata.get("source_id")):
    metadata["source"] = metadata.pop("source_id")
```

#### Who can review?
@dev2049

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: wangjie <wangjie@htffund.com>
2023-06-15 22:04:45 -07:00
Harrison Chase e67b26eee9 Harrison/openai functions (#6261)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 21:54:39 -07:00
Harrison Chase 6aafb46807 Harrison/openai functions (#6223)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 21:43:33 -07:00
Zander Chase bc9b8c8239 Improve Error Message for failed callback (#6247)
Include the handler class name in the warning
2023-06-15 19:18:37 -07:00
Alon Roth 0013256e81 Support chat history persistence in AutoGPT (#5716)
**Short Description**
Added a new argument to AutoGPT class which allows to persist the chat
history to a file.

**Changes**
1. Removed the `self.full_message_history: List[BaseMessage] = []`
2. Replaced it with `chat_history_memory` which can take any subclasses
of `BaseChatMessageHistory`

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-15 17:49:03 -07:00
Martin Antos 1913320cbe Feature/add acreom loader (#5780)
adding new loader for [acreom](https://acreom.com) vaults. It's based on
the Obsidian loader with some additional text processing for acreom
specific markdown elements.

 @eyurtsev please take a look!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-15 11:53:00 -07:00
Zander Chase ae76e473e1 Add Tags for LLMs (#6229)
- [x] Add tracing tags to LLMs + Chat Models (both inheritable and
local)
- [x] Add tags for the run_on_dataset helper function(s)
2023-06-15 11:24:11 -07:00
Harrison Chase e82687ddf4 Harrison/use functions agent (#6185)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 08:18:50 -07:00
Kyle Roth c7db9febb0 count tokens for new OpenAI model versions (#6195)
Trying to call `ChatOpenAI.get_num_tokens_from_messages` returns the
following error for the newly announced models `gpt-3.5-turbo-0613` and
`gpt-4-0613`:

```
NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-3.5-turbo-0613.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.
```

This adds support for counting tokens for those models, by counting
tokens the same way they're counted for the previous versions of
`gpt-3.5-turbo` and `gpt-4`.

#### reviewers

  - @hwchase17
  - @agola11
2023-06-15 06:16:03 -07:00
xu0o0 7ad13cdbdb feat: add content_format param to ConfluenceLoader.load() (#5922)
Confluence API supports difference format of page content. The storage
format is the raw XML representation for storage. The view format is the
HTML representation for viewing with macros rendered as though it is
viewed by users.

Add the `content_format` parameter to `ConfluenceLoader.load()` to
specify the content format, this is
set to `ContentFormat.STORAGE` by default.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-14 16:56:28 -07:00
0xJordan c5a46e7435 feat: Add support for the Solidity language (#6054)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

## Add Solidity programming language support for code splitter.

Twitter: @0xjord4n_

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:25:02 -07:00
Zander Chase e0e3ef1c57 Update Name (#6136) 2023-06-13 22:25:36 -07:00
Zander Chase 4555ad5d1f Add Run Collector Callback (#6133)
Add a callback handler that can collect nested run objects. Useful for
evaluation.
2023-06-13 22:17:37 -07:00
Harrison Chase 6ac120f299 bump ver to 200 (#6130) 2023-06-13 19:33:51 -07:00
Harrison Chase e41f0b341c add functions agent (#6113) 2023-06-13 18:51:01 -07:00
Zander Chase b3b155d488 Return session name in runner response (#6112)
Makes it easier to then run evals w/o thinking about specifying a
session
2023-06-13 16:59:43 -07:00
Harrison Chase e74733ab9e support streaming for functions (#6115) 2023-06-13 15:26:26 -07:00
Nuno Campos 11ab0be11a Add support for tags (#5898)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-13 12:30:59 -07:00
Wenchen Li f9edf76e7c Implement max_marginal_relevance_search in VectorStore of Pinecone (#6056)
This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
2023-06-13 10:46:45 -07:00
Harrison Chase 970b2f9d38 convert tools to openai (#6100) 2023-06-13 10:40:49 -07:00
Harrison Chase 292accde2b support functions (#6099) 2023-06-13 10:32:58 -07:00
Keshav Kumar 8fdf88b8e3 Fix for ModuleNotFoundError while running langchain-server. Issue #5833 (#6077)
This PR fixes the error
`ModuleNotFoundError: No module named 'langchain.cli'`
Fixes https://github.com/hwchase17/langchain/issues/5833 (issue)
2023-06-13 08:37:07 -07:00
Zander Chase 0c52275bdb Use Run object from SDK (#6067)
Update the Run object in the tracer to extend that in the SDK to include
the parameters necessary for tracking/tracing
2023-06-13 07:14:11 -07:00
Harrison Chase cde1e8739a turn off repr (#6078) 2023-06-12 22:45:24 -07:00
Nuno Campos a9b3b2e327 Enable serialization for anthropic (#6049) 2023-06-12 22:39:10 -07:00
Harrison Chase 6ac5d80286 propogate kwargs fully (#6076) 2023-06-12 22:37:55 -07:00
Harrison Chase ec1a2adf9c improve tools (#6062) 2023-06-12 22:19:03 -07:00
Julius Lipp 5b6bbf4ab2 Add embaas document extraction api endpoints (#6048)
# Introduces embaas document extraction api endpoints

In this PR, we add support for embaas document extraction endpoints to
Text Embedding Models (with LLMs, in different PRs coming). We currently
offer the MTEB leaderboard top performers, will continue to add top
embedding models and soon add support for customers to deploy thier own
models. Additional Documentation + Infomation can be found
[here](https://embaas.io).

While developing this integration, I closely followed the patterns
established by other langchain integrations. Nonetheless, if there are
any aspects that require adjustments or if there's a better way to
present a new integration, let me know! :)

Additionally, I fixed some docs in the embeddings integration.

Related PR: #5976 

#### Who can review?
  DataLoaders
  - @eyurtsev
2023-06-12 19:13:52 -07:00
Zander Chase 2f0088039d Log tracer errors (#6066)
Example (would log several times if not for the helper fn. Would emit no
logs due to mulithreading previously)

![image](https://github.com/hwchase17/langchain/assets/130414180/070d25ae-1f06-4487-9617-0a6f66f3f01e)
2023-06-12 17:13:49 -07:00
Lance Martin b023f0c0f2 Text splitter for Markdown files by header (#5860)
This creates a new kind of text splitter for markdown files.

The user can supply a set of headers that they want to split the file
on.

We define a new text splitter class, `MarkdownHeaderTextSplitter`, that
does a few things:

(1) For each line, it determines the associated set of user-specified
headers
(2) It groups lines with common headers into splits

See notebook for example usage and test cases.
2023-06-12 15:46:42 -07:00
Harrison Chase d1561b74eb Harrison/cognitive search (#6011)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
2023-06-11 21:15:42 -07:00
wenmeng zhou bb7ac9edb5 add dashscope text embedding (#5929)
#### What I do
Adding embedding api for
[DashScope](https://help.aliyun.com/product/610100.html), which is the
DAMO Academy's multilingual text unified vector model based on the LLM
base. It caters to multiple mainstream languages worldwide and offers
high-quality vector services, helping developers quickly transform text
data into high-quality vector data. Currently supported languages
include Chinese, English, Spanish, French, Portuguese, Indonesian, and
more.

#### Who can review?

  Models
  - @hwchase17
  - @agola11

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 21:14:20 -07:00
Harrison Chase e05997c25e Harrison/hologres (#6012)
Co-authored-by: Changgeng Zhao <changgeng@nyu.edu>
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
2023-06-11 20:56:51 -07:00
ljeagle c5bce4a465 add from_documents interface in awadb vector store (#6023)
added new interface from_documents in awadb vector store
  @dev2049

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-11 19:35:03 -07:00
Zander Chase a197acfcd3 Update check (#6020)
We were assigning the name as None in on_chat_model_start then not
updating, resulting in a validation error.
2023-06-11 17:59:09 -07:00
Nuno Campos 18af149e91 nc/load (#5733)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 15:51:28 -07:00
Zander Chase 614cff89bc I before E (#6015) 2023-06-11 15:45:12 -07:00
Harrison Chase a7227ee01b Harrison/embaas (#6010)
Co-authored-by: Julius Lipp <43986145+juliuslipp@users.noreply.github.com>
2023-06-11 13:35:14 -07:00