Cache pipfile parsing

On a (390+ line) Pipfile, it takes ~5s to parse the entire thing :O.
Pipenv has to parse the pipfile repeatedly and all over the place, so
caching the contents speeds things up dramatically (at least in this
case).

This PR establishes a little cache based upon file location + md5sum of
contents for the pipfile (and the hashing is pretty fast here). Given
that the cache key is based on the file contents, should be completely
fine to do (only possible issue is if parsed_pipfile gets mutated -
which is why I've added a defensive deepcopy).

Without the deepcopy, cache hit takes ~0.09ms.
With the deepcopy, cache hit takes 1.19ms.

If we can confirm no need for deepcopy, would shave off a second or two
off really big Pipfiles.
This commit is contained in:
Jeff Tratner
2018-03-15 03:12:01 -07:00
parent 0760cf0a9c
commit 511561122b
+15
View File
@@ -1,4 +1,5 @@
# -*- coding: utf-8 -*-
import copy
import json
import os
import re
@@ -10,6 +11,7 @@ import hashlib
import contoml
import delegator
import pipfile
import threading
import toml
from pip9 import ConfigOptionParser
@@ -47,6 +49,10 @@ if PIPENV_PIPFILE:
PIPENV_PIPFILE = normalize_drive(os.path.abspath(PIPENV_PIPFILE))
_cache = threading.local()
_cache.pipfile_cache = {}
class Project(object):
"""docstring for Project"""
@@ -292,6 +298,15 @@ class Project(object):
# Open the pipfile, read it into memory.
with open(self.pipfile_location) as f:
contents = f.read()
# this should be pretty fast (ish) and we need this pipfile a lot
cache_key = (self.pipfile_location, hashlib.md5(contents.encode('utf8')).hexdigest())
if cache_key not in _cache.pipfile_cache:
parsed = self._parse_pipfile(contents)
_cache.pipfile_cache[cache_key] = parsed
# deepcopy likely unnecessary but why not avoid bugs?
return copy.deepcopy(_cache.pipfile_cache[cache_key])
def _parse_pipfile(self, contents):
# If any outline tables are present...
if ('[packages.' in contents) or ('[dev-packages.' in contents):
data = toml.loads(contents)