Commit Graph

595 Commits

Author SHA1 Message Date
Kacper Łukawski ab1a3cccac Hotfix: Qdrant content retrieval (revert: #1088) (#1093)
The #1088 introduced a bug in Qdrant integration. That PR reverts those
changes and provides class attributes to ensure consistent payload keys.
In addition to that, an exception will be thrown if any of texts is None
(that could have been an issue reported in #1087)
2023-02-16 12:46:06 -08:00
Harrison Chase 6322b6f657 bump version 0.0.88 (#1090) 2023-02-16 07:32:32 -08:00
Francisco Ingham 3462130e2d Modify number of types of chains (#1089)
Changed number of types of chains to make it consistent with the rest of
the docs
2023-02-16 07:06:30 -08:00
Rishabh Raizada 5d11e5da40 Update qdrant.py (#1088)
Fixes #1087
2023-02-16 07:06:02 -08:00
Harrison Chase 7745505482 chat qa with sources (#1084) 2023-02-16 00:29:47 -08:00
Harrison Chase badeeb37b0 fix stuff count (#1083) 2023-02-15 23:57:13 -08:00
Harrison Chase 971458c5de docs for batch size (#1082) 2023-02-15 23:53:56 -08:00
Harrison Chase 5e10e19bfe Harrison/align table (#1081)
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-02-15 23:53:37 -08:00
Harrison Chase c60954d0f8 Harrison/telegram loader (#1080)
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
2023-02-15 23:24:32 -08:00
Dennis Antela Martinez a1c296bc3c docs: increase width (#1049)
This addresses #948.

I set the documentation max width to 2560px, but can be adjusted - see
screenshot below.

<img width="1741" alt="Screenshot 2023-02-14 at 13 05 57"
src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">
2023-02-15 23:07:01 -08:00
Harrison Chase c96ac3e591 Harrison/semantic subset (#1079)
Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu>
2023-02-15 23:06:48 -08:00
Harrison Chase 19c2797bed add anthropic example (#1041)
Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>
Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com>
2023-02-15 23:04:28 -08:00
blob42 3ecdea8be4 SearxNG meta search api helper (#854)
This is a work in progress PR to track my progres.

## TODO:

- [x]  Get results using the specifed searx host
- [x]  Prioritize returning an  `answer`  or results otherwise
    - [ ] expose the field `infobox` when available
    - [ ] expose `score` of result to help agent's decision
- [ ] expose the `suggestions` field to agents so they could try new
queries if no results are found with the orignial query ?

- [ ] Dynamic tool description for agents ?
- Searx offers many engines and a search syntax that agents can take
advantage of. It would be nice to generate a dynamic Tool description so
that it can be used many times as a tool but for different purposes.

- [x]  Limit number of results
- [ ]   Implement paging
- [x]  Miror the usage of the Google Search tool
- [x] easy selection of search engines
- [x]  Documentation
    - [ ] update HowTo guide notebook on Search Tools
- [ ] Handle async 
- [ ]  Tests

###  Add examples / documentation on possible uses with
 - [ ]  getting factual answers with `!wiki` option and `infoboxes`
 - [ ]  getting `suggestions`
 - [ ]  getting `corrections`

---------

Co-authored-by: blob42 <spike@w530>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-02-15 23:03:57 -08:00
Hasegawa Yuya e08961ab25 Fixed openai embeddings to be safe by batching them based on token size calculation. (#991)
I modified the logic of the batch calculation for embedding according to
this cookbook

https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
2023-02-15 23:02:32 -08:00
seanaedmiston f0a258555b Support similarity search by vector (in FAISS) (#961)
Alternate implementation to PR #960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
2023-02-15 22:50:00 -08:00
Jonathan Pedoeem 05ad399abe Update PromptLayerOpenAI LLM to include support for ASYNC API (#1066)
This PR updates `PromptLayerOpenAI` to now support requests using the
[Async
API](https://langchain.readthedocs.io/en/latest/modules/llms/async_llm.html)
It also updates the documentation on Async API to let users know that
PromptLayerOpenAI also supports this.

`PromptLayerOpenAI` now redefines `_agenerate` a similar was to how it
redefines `_generate`
2023-02-15 22:48:09 -08:00
Harrison Chase 98186ef180 Harrison/evernote nb (#1078)
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
2023-02-15 22:47:30 -08:00
rogerserper e46cd3b7db Google Search API integration with serper.dev (wrapper, tests, docs, … (#909)
Adds Google Search integration with [Serper](https://serper.dev) a
low-cost alternative to SerpAPI (10x cheaper + generous free tier).
Includes documentation, tests and examples. Hopefully I am not missing
anything.

Developers can sign up for a free account at
[serper.dev](https://serper.dev) and obtain an api key.

## Usage

```python
from langchain.utilities import GoogleSerperAPIWrapper
from langchain.llms.openai import OpenAI
from langchain.agents import initialize_agent, Tool

import os
os.environ["SERPER_API_KEY"] = ""
os.environ['OPENAI_API_KEY'] = ""

llm = OpenAI(temperature=0)
search = GoogleSerperAPIWrapper()
tools = [
    Tool(
        name="Intermediate Answer",
        func=search.run
    )
]

self_ask_with_search = initialize_agent(tools, llm, agent="self-ask-with-search", verbose=True)
self_ask_with_search.run("What is the hometown of the reigning men's U.S. Open champion?")
```

### Output
```
Entering new AgentExecutor chain...
 Yes.
Follow up: Who is the reigning men's U.S. Open champion?
Intermediate answer: Current champions Carlos Alcaraz, 2022 men's singles champion.
Follow up: Where is Carlos Alcaraz from?
Intermediate answer: El Palmar, Spain
So the final answer is: El Palmar, Spain

> Finished chain.

'El Palmar, Spain'
```
2023-02-15 22:47:17 -08:00
Harrison Chase 52753066ef Harrison/handle stop tokens ai21 (#1077)
Co-authored-by: Andrew Huang <jhuang16888@gmail.com>
2023-02-15 22:44:55 -08:00
Akshay d8ed286200 Update and rename everynote.py to evernote.py (#1060)
Updating this base file as well as the .ipynb file of the example on the
website:

https://github.com/hwchase17/langchain/compare/master...akshayvkt:langchain:patch-1

https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/everynote.html
2023-02-15 22:41:42 -08:00
Jeff Huber 34cba2da32 Fix typo in integration with Chroma (#1070)
We introduced a breaking change but missed this call. This PR fixes
`langchain` to work with upstream `chroma`.
2023-02-15 22:37:58 -08:00
Jonathan Pedoeem 05df480376 Update PromptLayerOpenAI LLM usage instructions in documentation (#1053)
This PR updates the usage instructions for PromptLayerOpenAI in
Langchain's documentation. The updated instructions provide more detail
and conform better to the style of other LLM integration documentation
pages.

No code changes were made in this PR, only improvements to the
documentation. This update will make it easier for users to understand
how to use `PromptLayerOpenAI`
2023-02-15 22:37:48 -08:00
Matt Robinson 3ea1e5af1e feat: added element metadata to unstructured loader (#1068)
### Summary

Adds tracked metadata from `unstructured` elements to the document
metadata when `UnstructuredFileLoader` is used in `"elements"` mode.
Tracked metadata is available in `unstructured>=0.4.9`, but the code is
written for backward compatibility with older `unstructured` versions.

### Testing

Before running, make sure to upgrade to `unstructured==0.4.9`. In the
code snippet below, you should see `page_number`, `filename`, and
`category` in the metadata for each document. `doc[0]` should have
`page_number: 1` and `doc[-1]` should have `page_number: 2`. The example
document is `layout-parser-paper-fast.pdf` from the [`unstructured`
sample
docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders import UnstructuredFileLoader
loader = UnstructuredFileLoader(file_path=f"layout-parser-paper-fast.pdf", mode="elements")
docs = loader.load()
```
2023-02-15 22:36:18 -08:00
Harrison Chase bac676c8e7 bump version (#1057) 2023-02-15 07:09:10 -08:00
Ankush Gola d8ac274fc2 add to async chain notebook (#1056) 2023-02-14 18:20:38 -08:00
Ankush Gola caa8e4742e Enable streaming for OpenAI LLM (#986)
* Support a callback `on_llm_new_token` that users can implement when
`OpenAI.streaming` is set to `True`
2023-02-14 15:06:14 -08:00
Harrison Chase f05f025e41 bump version to 0086 (#1050) 2023-02-14 07:14:40 -08:00
Sasmitha Manathunga c67c5383fd docs: fix typo in notebook (#1046) 2023-02-14 07:06:08 -08:00
Harrison Chase 88bebb4caa Harrison/llm integrations (#1039)
Co-authored-by: jped <jonathanped@gmail.com>
Co-authored-by: Justin Torre <justintorre75@gmail.com>
Co-authored-by: Ivan Vendrov <ivan@anthropic.com>
2023-02-13 22:06:25 -08:00
Harrison Chase ec727bf166 Align table info (#999) (#1034)
Currently the chain is getting the column names and types on the one
side and the example rows on the other. It is easier for the llm to read
the table information if the column name and examples are shown together
so that it can easily understand to which columns do the examples refer
to. For an instantiation of this, please refer to the changes in the
`sqlite.ipynb` notebook.

Also changed `eval` for `ast.literal_eval` when interpreting the results
from the sample row query since it is a better practice.

---------

Co-authored-by: Francisco Ingham <>

---------

Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-02-13 21:48:41 -08:00
Harrison Chase 8c45f06d58 Harrison/standarize prompt loading (#1036)
Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>
2023-02-13 21:48:09 -08:00
Enrico Shippole f30dcc6359 Add GooseAI, CerebriumAI, Petals, ForefrontAI (#981)
Add GooseAI, CerebriumAI, Petals, ForefrontAI
2023-02-13 21:20:19 -08:00
Anton Troynikov d43d430d86 Chroma persistence (#1028)
This PR adds persistence to the Chroma vector store.

Users can supply a `persist_directory` with any of the `Chroma` creation
methods. If supplied, the store will be automatically persisted at that
directory.

If a user creates a new `Chroma` instance with the same persistence
directory, it will get loaded up automatically. If they use `from_texts`
or `from_documents` in this way, the documents will be loaded into the
existing store.

There is the chance of some funky behavior if the user passes a
different embedding function from the one used to create the collection
- we will make this easier in future updates. For now, we log a warning.
2023-02-13 21:09:06 -08:00
Harrison Chase 012a6dfb16 Harrison/makefile (#1033)
Co-authored-by: blob42 <contact@blob42.xyz>
Co-authored-by: blob42 <spike@w530>
2023-02-13 21:08:47 -08:00
Harrison Chase 6a31a59400 add links (#1027) 2023-02-13 16:33:30 -08:00
Oliver Klingefjord 20889205e8 Added retry for openai.error.ServiceUnavailableError (#1022)
Imho retries should be performed for ServiceUnavailableError (which
tends to happen to me quite often).
2023-02-13 13:30:06 -08:00
Harrison Chase fc2502cd81 bump version to 0085 (#1017) 2023-02-13 07:32:36 -08:00
Harrison Chase 0f0e69adce agent refactors (#997) 2023-02-12 23:02:13 -08:00
Harrison Chase 7fb33fca47 chroma docs (#1012) 2023-02-12 23:02:01 -08:00
Harrison Chase 0c553d2064 Harrion/kg (#1016)
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
2023-02-12 23:01:26 -08:00
Anton Troynikov 78abd277ff Chroma in LangChain (#1010)
Chroma is a simple to use, open-source, zero-config, zero setup
vectorstore.

Simply `pip install chromadb`, and you're good to go. 

Out-of-the-box Chroma is suitable for most LangChain workloads, but is
highly flexible. I tested to 1M embs on my M1 mac, with out issues and
reasonably fast query times.

Look out for future releases as we integrate more Chroma features with
LangChain!
2023-02-12 17:43:48 -08:00
cragwolfe 05d8969c79 Unstructured example notebook: add a pdf, related deps (#1011)
Updates the Unstructured example notebook with a PDF example. Includes
additional dependencies for PDF processing (and images, etc).
2023-02-12 14:56:48 -08:00
Dhruv Anand 03e5794978 typo fix on chat vector db docs (#1007)
simple typo fix: because --> between
2023-02-12 12:09:21 -08:00
Harrison Chase 6d44a2285c bump version to 0084 (#1005) 2023-02-12 07:47:10 -08:00
Harrison Chase 0998577dfe Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
Harrison Chase bbb06ca4cf pdfminer (#1003) 2023-02-12 07:29:26 -08:00
Francisco Ingham 0b6aa6a024 Added initial capital letter to bullet points that had it missing (#1000)
Co-authored-by: Francisco Ingham <>
2023-02-11 20:31:34 -08:00
Harrison Chase 10e7297306 Harrison/fake llm (#990)
Co-authored-by: Stefan Keselj <skeselj@princeton.edu>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-11 15:12:35 -08:00
Harrison Chase e51fad1488 Harrison/0083 (#996)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-11 08:29:28 -08:00
Shahriar Tajbakhsh b7747017d7 Import of declarative_base when SQLAlchemy <1.4 (#883)
In
[pyproject.toml](https://github.com/hwchase17/langchain/blob/master/pyproject.toml),
the expectation is `SQLAlchemy = "^1"`. But, the way `declarative_base`
is imported in
[cache.py](https://github.com/hwchase17/langchain/blob/master/langchain/cache.py)
will only work with SQLAlchemy >=1.4. This PR makes sure Langchain can
be run in environments with SQLAlchemy <1.4
2023-02-10 18:33:47 -08:00