Commit Graph

27 Commits

Author SHA1 Message Date
Harrison Chase d90a287d8f Harrison/updating docs (#1196) 2023-02-20 22:54:26 -08:00
Dennis Antela Martinez 23243ae69c add gitbook document loader (#1180)
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
2023-02-20 20:05:04 -08:00
Harrison Chase 65cc81c479 directory loader improvements (#1162) 2023-02-19 20:47:08 -08:00
Harrison Chase fb3c73d194 add srt loader (#1140) 2023-02-18 10:58:39 -08:00
Harrison Chase 483821ea3b fix docs (#1133) 2023-02-18 08:13:54 -08:00
Harrison Chase d5f3dfa1e1 Harrison/hn loader (#1130)
Co-authored-by: William X <william.y.xuan@gmail.com>
2023-02-17 15:15:02 -08:00
Harrison Chase c60954d0f8 Harrison/telegram loader (#1080)
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
2023-02-15 23:24:32 -08:00
Harrison Chase 98186ef180 Harrison/evernote nb (#1078)
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
2023-02-15 22:47:30 -08:00
Harrison Chase 7fb33fca47 chroma docs (#1012) 2023-02-12 23:02:01 -08:00
cragwolfe 05d8969c79 Unstructured example notebook: add a pdf, related deps (#1011)
Updates the Unstructured example notebook with a PDF example. Includes
additional dependencies for PDF processing (and images, etc).
2023-02-12 14:56:48 -08:00
Harrison Chase 0998577dfe Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
Harrison Chase bbb06ca4cf pdfminer (#1003) 2023-02-12 07:29:26 -08:00
Francisco Ingham 0b6aa6a024 Added initial capital letter to bullet points that had it missing (#1000)
Co-authored-by: Francisco Ingham <>
2023-02-11 20:31:34 -08:00
Harrison Chase 2e96704d59 Harrison/airbyte (#989)
Co-authored-by: zanderchase <zanderchase@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
2023-02-10 18:08:00 -08:00
zanderchase c2d1d903fa Zander/online pdf loader (#984) 2023-02-10 15:42:30 -08:00
Matt Robinson 07a407d89a feat: adds UnstructuredURLLoader for loading data from urls (#979)
### Summary

Adds a `UnstructuredURLLoader` that supports loading data from a list of
URLs.


### Testing

```python
from langchain.document_loaders import UnstructuredURLLoader

urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023"
]
loader = UnstructuredURLLoader(urls=urls)
raw_documents = loader.load()
```
2023-02-10 10:18:38 -08:00
Harrison Chase c64f98e2bb Harrison/format agent instructions (#973)
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
2023-02-10 10:07:26 -08:00
Harrison Chase 5469d898a9 Harrison/everynote (#974)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-10 08:02:35 -08:00
Harrison Chase 01fa2d8117 Harrison/youtube fixes (#955)
Co-authored-by: Ji <jizhang.work@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-09 08:12:22 -08:00
zanderchase 8e126bc9bd adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
Harrison Chase 3e1901e1aa gutenberg books (#946)
Co-authored-by: zanderchase <zander@unfold.ag>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-08 12:00:47 -08:00
Harrison Chase 44ecec3896 Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
Harrison Chase 637c0d6508 Harrison/obsidian (#920) 2023-02-06 22:21:16 -08:00
Ankush Gola 6bd1529cb7 add GoogleDriveLoader (#914)
only deal with docs files for now
2023-02-06 21:44:35 -08:00
Harrison Chase 2ec25ddd4c add unstructured examples (#913) 2023-02-06 18:13:46 -08:00
Harrison Chase 71e662e88d update docs (#905) 2023-02-06 00:26:20 -08:00
Harrison Chase 53d56d7650 Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00