Commit Graph

1862 Commits

Author SHA1 Message Date
Davis Chase 0f93de0a59 Release 0.0.166 (#4510) 2023-05-11 08:53:48 -07:00
Sunish Sheth 812e5f43f5 Add _type for all parsers (#4189)
Used for serialization. Also add test that recurses through
our subclasses to check they have them implemented

Would fix https://github.com/hwchase17/langchain/issues/3217
Blocking: https://github.com/mlflow/mlflow/pull/8297

---------

Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 01:27:58 -07:00
Akshaya Annavajhala b21d7c138c Callback Handler for MLflow (#4150)
Rebased Mahmedk's PR with the callback refactor and added the example
requested by hwchase plus a couple minor fixes

---------

Co-authored-by: Ahmed K <77802633+mahmedk@users.noreply.github.com>
Co-authored-by: Ahmed K <mda3k27@gmail.com>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 01:10:40 -07:00
kYLe 0d51a1f12b Add LLMs support for Anyscale Service (#4350)
Add Anyscale service integration under LLM

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 00:39:59 -07:00
Kristóf Dombi 99b2400048 [Docs]: Add Kinsta to the list of deployment providers (#4445)
We're fans of the LangChain framework thus we wanted to make sure we
provide an easy way for our customers to be able to utilize this
framework for their LLM-powered applications at our platform.
2023-05-11 00:29:48 -07:00
Evan Jones f668251948 parameterized distance metrics; lint; format; tests (#4375)
# Parameterize Redis vectorstore index

Redis vectorstore allows for three different distance metrics: `L2`
(flat L2), `COSINE`, and `IP` (inner product). Currently, the
`Redis._create_index` method hard codes the distance metric to COSINE.

I've parameterized this as an argument in the `Redis.from_texts` method
-- pretty simple.

Fixes #4368 

## Before submitting

I've added an integration test showing indexes can be instantiated with
all three values in the `REDIS_DISTANCE_METRICS` literal. An example
notebook seemed overkill here. Normal API documentation would be more
appropriate, but no standards are in place for that yet.

## Who can review?

Not sure who's responsible for the vectorstore module... Maybe @eyurtsev
/ @hwchase17 / @agola11 ?
2023-05-11 00:20:01 -07:00
Nick Omeyer f46710d408 Fix minor issues in self-query retriever prompt formatting (#4450)
# Fix minor issues in self-query retriever prompt formatting

I noticed a few minor issues with the self-query retriever's prompt
while using it, so here's PR to fix them 😇

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
2023-05-11 00:10:41 -07:00
Zander Chase d969f43ed8 Load HuggingFace Tool (#4475)
# Add option to `load_huggingface_tool`

Expose a method to load a huggingface Tool from the HF hub

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 00:07:36 -07:00
Davis Chase cd01de49cf Update contribution guidelines (#4431)
provide more guidance on pr's
2023-05-11 00:05:25 -07:00
Eugene Yurtsev 146616aa5d Test workflow, fix minor typos (#4495)
# Fix 2 minor typos in test workflow.

This PR does not result in any functional changes.
2023-05-10 22:36:50 -04:00
Eugene Yurtsev f373883c1a Refactor test workflow (#4457)
# Refactor the test workflow

This PR refactors the tests to run using a single test workflow. This
makes it easier to relaunch failing tests and see in the UI which test
failed since the jobs are grouped together.

## Before submitting

## Who can review?
2023-05-10 21:57:39 -04:00
Davis Chase b77e103ca6 Add aleph alpha api key attribute (#4489)
@tugot17 applied your change to master
2023-05-10 17:29:57 -07:00
Harrison Chase 3ce29cb4a6 Harrison/new search (#4359)
Co-authored-by: Jiaping(JP) Zhang <vincentzhangv@gmail.com>
2023-05-10 17:09:16 -07:00
Jakob Heyder 545ae8b756 Fix: Add run_manager on all AgentFinish returns in AgentExecutor (#4466) 2023-05-10 16:25:23 -07:00
Ankush Gola ae8d6d5a89 Add docs for tracing environment variable (#4477) 2023-05-10 16:07:02 -07:00
Davis Chase 9ec60ad832 Add azure cognitive search retriever (#4467)
All credit to @UmerHA, made a couple small changes

---------

Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
2023-05-10 15:27:27 -07:00
Davis Chase 46b100ea63 Add DocArray vector stores (#4483)
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line

---------

Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
2023-05-10 15:22:16 -07:00
Davis Chase f2a536b445 release 165 (#4486)
bump version
2023-05-10 15:20:43 -07:00
Harrison Chase b2f920e891 add tracing v2 env var (#4465)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-05-10 11:08:29 -07:00
Zander Chase 9231143f91 Fix Duplicate trust_remote_code in pipeline (#4369)
### Fix issue with duplicate specification of `trust_remote_code` in
HuggingFacePipeline

Fixes # 4351
2023-05-10 10:21:54 -07:00
Davis Chase 6fbdb9ce51 Release 0.0.164 (#4454) 2023-05-10 08:44:14 -07:00
Davis Chase 04475bea7d Mv plan and execute to experimental (#4459) 2023-05-10 08:31:53 -07:00
netseye 1ad180f6de Add request timeout to openai embedding (#4144)
Add request_timeout field to openai embedding. Defaults to None

---------

Co-authored-by: Jeakin <Jeakin@botu.cc>
2023-05-10 08:11:32 -07:00
zvrr 274dc4bc53 add clickhouse prompt (#4456)
# Add clickhouse prompt

Add clickhouse database sql prompt
2023-05-10 10:22:42 -04:00
Paresh Mathur 05e749d9fe make running specific unit tests easier (#4336)
I find it's easier to do TDD if i can run specific unit tests. I know
watch is there but some people prefer running their tests manually.
2023-05-10 09:39:22 -04:00
Eugene Yurtsev 80558b5b27 Add workflow for testing with all deps (#4410)
# Add action to test with all dependencies installed

PR adds a custom action for setting up poetry that allows specifying a
cache key:
https://github.com/actions/setup-python/issues/505#issuecomment-1273013236

This makes it possible to run 2 types of unit tests: 

(1) unit tests with only core dependencies
(2) unit tests with extended dependencies (e.g., those that rely on an
optional pdf parsing library)


As part of this PR, we're moving some pdf parsing tests into the
unit-tests section and making sure that these unit tests get executed
when running with extended dependencies.
2023-05-10 09:35:07 -04:00
Matt Robinson 3637d6da6e feat: add loader for open office odt files (#4405)
# ODF File Loader

Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.

### Testing

The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).

```python
from langchain.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()

loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
2023-05-10 01:37:17 -07:00
Zander Chase 65f85af242 Improve math chain error msg (#4415) 2023-05-10 01:08:01 -07:00
Davis Chase f6c97e6af4 Fix Lark import error (#4421)
Any import that touches langchain.retrievers currently requires Lark.
Here's one attempt to fix. Not very pretty, very open to other ideas.
Alternatives I thought of are 1) make Lark requirement, 2) put
everything in parser.py in the try/except. Neither sounds much better

Related to #4316, #4275
2023-05-10 01:07:34 -07:00
Harrison Chase f0cfed636f change nb name 2023-05-09 21:22:35 -07:00
Harrison Chase 6b8d144ccc Harrison/plan and solve (#4422) 2023-05-09 21:07:56 -07:00
StephaneBereux d383c0cb43 fixed the filtering error in chromadb (#1621)
Fixed two small bugs (as reported in issue #1619 ) in the filtering by
metadata for `chroma` databases :
- ```langchain.vectorstores.chroma.similarity_search``` takes a
```filter``` input parameter but do not forward it to
```langchain.vectorstores.chroma.similarity_search_with_score```
- ```langchain.vectorstores.chroma.similarity_search_by_vector```
doesn't take this parameter in input, although it could be very useful,
without any additional complexity - and it would thus be coherent with
the syntax of the two other functions.

Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
2023-05-09 16:43:00 -07:00
jrhe 28091c2101 Use passed LLM for default chain in MultiPromptChain (#4418)
Currently, MultiPromptChain instantiates a ChatOpenAI LLM instance for
the default chain to use if none of the prompts passed match. This seems
like an error as it means that you can't use your choice of LLM, or
configure how to instantiate the default LLM (e.g. passing in an API key
that isn't in the usual env variable).
2023-05-09 16:15:25 -07:00
Davis Chase 5c8e12558d Dev2049/pinecone try except (#4424)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bernie G <bernie.gandin2@gmail.com>
2023-05-09 16:03:19 -07:00
Rukmani 2b14036126 Update WhatsAppChatLoader to include the character ~ in the sender name (#4420)
Fixes #4153

If the sender of a message in a group chat isn't in your contact list,
they will appear with a ~ prefix in the exported chat. This PR adds
support for parsing such lines.
2023-05-09 15:00:04 -07:00
Zander Chase f2150285a4 Fix nested runs example ID (#4413)
#### Only reference example ID on the parent run

Previously, I was assigning the example ID to every child run. 
Adds a test.
2023-05-09 12:21:53 -07:00
Davis Chase e4ca511ec8 Delete comment (#4412) 2023-05-09 10:38:44 -07:00
mbchang 9fafe7b2b9 fix: remove unnecessary line of code (#4408)
Removes unnecessary line of code in
https://python.langchain.com/en/latest/use_cases/agent_simulations/two_agent_debate_tools.html
2023-05-09 10:35:09 -07:00
Aivin V. Solatorio 6335cb5b3a Add support for Qdrant nested filter (#4354)
# Add support for Qdrant nested filter

This extends the filter functionality for the Qdrant vectorstore. The
current filter implementation is limited to a single-level metadata
structure; however, Qdrant supports nested metadata filtering. This
extends the functionality for users to maximize the filter functionality
when using Qdrant as the vectorstore.

Reference: https://qdrant.tech/documentation/filtering/#nested-key

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
2023-05-09 10:34:11 -07:00
Martin Holzhauer 872605a5c5 Add an option to extract more metadata from crawled websites (#4347)
This pr makes it possible to extract more metadata from websites for
later use.

my usecase:
parsing ld+json or microdata from sites and store it as structured data
in the metadata field
2023-05-09 10:18:33 -07:00
Leonid Ganeline ce15ffae6a added Wikipedia retriever (#4302)
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
2023-05-09 10:08:39 -07:00
Davis Chase ea83eed9ba Bump to version 0.0.163 (#4382) 2023-05-09 07:51:51 -07:00
Prayson Wilfred Daniel 2b4ba203f7 query correction from when to what (#4383)
# Minor Wording Documentation Change 

```python
agent_chain.run("When's my friend Eric's surname?")
# Answer with 'Zhu'
```

is change to 

```python
agent_chain.run("What's my friend Eric's surname?")
# Answer with 'Zhu'
```

I think when is a residual of the old query that was "When’s my friends
Eric`s birthday?".
2023-05-09 07:42:47 -07:00
Eugene Yurtsev 2ceb807da2 Add PDF parser implementations (#4356)
# Add PDF parser implementations

This PR separates the data loading from the parsing for a number of
existing PDF loaders.

Parser tests have been designed to help encourage developers to create a
consistent interface for parsing PDFs.

This interface can be made more consistent in the future by adding
information into the initializer on desired behavior with respect to splitting by
page etc.

This code is expected to be backwards compatible -- with the exception
of a bug fix with pymupdf parser which was returning `bytes` in the page
content rather than strings.

Also changing the lazy parser method of document loader to return an
Iterator rather than Iterable over documents.

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
2023-05-09 10:24:17 -04:00
Eugene Yurtsev ae0c3382dd Add MimeType based parser (#4376)
# Add MimeType Based Parser

This PR adds a MimeType Based Parser. The parser inspects the mime-type
of the blob it is parsing and based on the mime-type can delegate to the sub
parser.

## Before submitting

Waiting on adding notebooks until more implementations are landed. 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


@hwchase17
@vowelparrot
2023-05-09 10:22:56 -04:00
Leonid Ganeline c485e7ab59 added GitHub star number (#4214)
added GitHub star number with a link to the `GitHub star history chart`
This is an interesting chart https://star-history.com/#hwchase17/langchain :)
2023-05-09 09:39:53 -04:00
Heath 0d568daacb Update writer integration (#4363)
# Update Writer LLM integration

Changes the parameters and base URL to be in line with Writer's current
API.
Based on the documentation on this page:
https://dev.writer.com/reference/completions-1
2023-05-08 21:59:46 -07:00
BioErrorLog 04f765b838 Fix grammar in Text Splitters docs (#4373)
# Fix grammar in Text Splitters docs

Just a small fix of grammar in the documentation:

"That means there two different axes" -> "That means there are two
different axes"
2023-05-08 22:38:40 -04:00
Zander Chase c73cec5ac1 Add Example Notebook for LCP Client (#4207)
Add a notebook in the `experimental/` directory detailing:
- How to capture traces with the v2 endpoint
- How to create datasets
- How to run traces over the dataset
2023-05-08 18:33:19 -07:00
mbchang f1401a6dff new example: two agent debate with tools (#4024) 2023-05-08 17:10:44 -07:00