mirror of
https://github.com/kennethreitz/langchain.git
synced 2026-06-05 23:00:18 +00:00
0c7f1d8b21
**Description:** Textract PDF Loader generating linearized output, meaning it will replicate the structure of the source document as close as possible based on the features passed into the call (e. g. LAYOUT, FORMS, TABLES). With LAYOUT reading order for multi-column documents or identification of lists and figures is supported and with TABLES it will generate the table structure as well. FORMS will indicate "key: value" with columms. - **Issue:** the issue fixes #12068 - **Dependencies:** amazon-textract-textractor is added, which provides the linearization - **Tag maintainer:** @3coins --------- Co-authored-by: Bagatur <baskaryan@gmail.com>