Commit Graph

5 Commits

Author SHA1 Message Date
Eugene Yurtsev 09587a3201 Clean up tests for pdf parsers (#4595)
# Organize tests for pdf parsers

Clean up tests for pdf parsers, remove duplicate tests, convert to unit
tests.
2023-05-15 14:21:05 -04:00
Lester Yang cd3f9865f3 Feature: pdfplumber PDF loader with BaseBlobParser (#4552)
# Feature: pdfplumber PDF loader with BaseBlobParser

* Adds pdfplumber as a PDF loader
* Adds pdfplumber as a blob parser.
2023-05-15 09:47:02 -04:00
Eugene Yurtsev 08ed927c32 Turn on extended tests (#4588)
# Turn on strict extended tests

This PR turns on strict testing for extended tests.
2023-05-12 14:50:08 -04:00
Eugene Yurtsev 80558b5b27 Add workflow for testing with all deps (#4410)
# Add action to test with all dependencies installed

PR adds a custom action for setting up poetry that allows specifying a
cache key:
https://github.com/actions/setup-python/issues/505#issuecomment-1273013236

This makes it possible to run 2 types of unit tests: 

(1) unit tests with only core dependencies
(2) unit tests with extended dependencies (e.g., those that rely on an
optional pdf parsing library)


As part of this PR, we're moving some pdf parsing tests into the
unit-tests section and making sure that these unit tests get executed
when running with extended dependencies.
2023-05-10 09:35:07 -04:00
Eugene Yurtsev ae0c3382dd Add MimeType based parser (#4376)
# Add MimeType Based Parser

This PR adds a MimeType Based Parser. The parser inspects the mime-type
of the blob it is parsing and based on the mime-type can delegate to the sub
parser.

## Before submitting

Waiting on adding notebooks until more implementations are landed. 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


@hwchase17
@vowelparrot
2023-05-09 10:22:56 -04:00