mirror of
https://github.com/kennethreitz/heroku-buildpack-python.git
synced 2026-06-05 23:10:16 +00:00
4212e06309
* NLTK support: Update test to use multiple corpora So that the incorrect handling of multiple IDs seen in #444 would have been caught. Also switches to some of the smaller corpora, to reduce time spent downloading during tests (see sizes on http://www.nltk.org/nltk_data/). * NLTK support: Fix passing of multiple corpora identifiers As part of fixing the shellcheck warnigns in #438, double quotes had been placed around `$nltk_packages` passed to the `nltk.downloader`, which causes multiple identifiers to be treated as though it were just one identifier that contains spaces. The docs for the shellcheck warning in question recommend using arrays if the intended behaviour really is to split on spaces: https://github.com/koalaman/shellcheck/wiki/SC2086#exceptions As such, `readarray` has been used, which is present in bash >=4. The `[*]` array form is used in the log message, to prevent shellcheck warning SC2145, whereas `[@]` is used when passed to `nltk.downloader` to ensure the array elements are unpacked as required. Note: Both before and after this fix, using anything but unix line endings in `nltk.txt` will also cause breakage.
35 lines
1.1 KiB
Bash
Executable File
35 lines
1.1 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
|
|
# This script serves as the NLTK build step of the
|
|
# [**Python Buildpack**](https://github.com/heroku/heroku-buildpack-python)
|
|
# compiler.
|
|
#
|
|
# A [buildpack](https://devcenter.heroku.com/articles/buildpacks) is an
|
|
# adapter between a Python application and Heroku's runtime.
|
|
#
|
|
# This script is invoked by [`bin/compile`](/).
|
|
|
|
# Syntax sugar.
|
|
# shellcheck source=bin/utils
|
|
source "$BIN_DIR/utils"
|
|
|
|
# Check that nltk was installed by pip, otherwise obviously not needed
|
|
if sp-grep -s nltk; then
|
|
puts-step "Downloading NLTK corpora..."
|
|
|
|
nltk_packages_definition="$BUILD_DIR/nltk.txt"
|
|
|
|
if [ -f "$nltk_packages_definition" ]; then
|
|
|
|
readarray -t nltk_packages < "$nltk_packages_definition"
|
|
puts-step "Downloading NLTK packages: ${nltk_packages[*]}"
|
|
|
|
python -m nltk.downloader -d "$BUILD_DIR/.heroku/python/nltk_data" "${nltk_packages[@]}" | indent
|
|
set_env NLTK_DATA "/app/.heroku/python/nltk_data"
|
|
|
|
else
|
|
puts-warn "'nltk.txt' not found, not downloading any corpora"
|
|
puts-warn "Learn more: https://devcenter.heroku.com/articles/python-nltk"
|
|
fi
|
|
fi
|