mirror of
https://github.com/kennethreitz/langchain.git
synced 2026-06-05 23:00:18 +00:00
3c489be773
### Summary
Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.
### Testing
```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace
loader = UnstructuredFileLoader(
"./example_data/layout-parser-paper.pdf",
mode="elements",
post_processors=[clean_extra_whitespace],
)
docs = loader.load()
docs[:5]
```
### Reviewrs
- @rlancemartin
- @eyurtsev
- @hwchase17