You are here: Home ‣ Dive Into Python 3 ‣
Difficulty level: ♦♦♦♦♢
❝ FIXME ❞
— FIXME
FIXME
Open the Python Shell and define the following variable:
>>> shell = 1
Keep that window open. Now open another Python Shell and define the following variable:
>>> shell = 2
Throughout this chapter, I will use the shell variable to indicate which Python Shell is being used in each example.
⁂
FIXME - introduction to pickle module, concepts, what datatypes can be pickled w/o additional work
The pickle module works with data structures. Let’s build one.
>>> shell ① 1 >>> entry = {} ② >>> entry['title'] = 'Dive into history, 2009 edition' >>> entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition' >>> entry['comments_link'] = None >>> entry['internal_id'] = b'\xde\xd5\xb4\xf8' >>> entry['tags'] = ('diveintopython', 'docbook', 'html') >>> entry['published'] = True >>> import time >>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009') ③ >>> entry['published_date'] time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)
pickle module. Don’t read too much into these values.
time module contains a data structure (time_struct) to represent a point in time (accurate to one millisecond) and functions to manipulate time structs. The strptime() function takes a formatted string an converts it to a time_struct. This string is in the default format, but you can control that with format codes. See the time module for more details.
That’s a handsome-looking Python dictionary. Let’s save it to a file.
>>> shell ① 1 >>> import pickle >>> with open('entry.pickle', 'wb') as f: ② ... pickle.dump(entry, f) ③ ...
open() function to open a file. Set the file mode to 'wb' to open the file for writing in binary mode. Wrap it in a with statement to ensure the file is closed automatically when you’re done with it.
dump() function in the pickle module takes a serializable Python data structure, serializes it into a binary, Python-specific format using the latest version of the pickle protocol, and saves it to an open file.
That last sentence was pretty important.
pickle module takes a Python data structure and saves it to a file.
pickle protocol.”
pickle protocol is Python-specific; there is no guarantee of cross-language compatibility. You probably couldn’t take the entry.pickle file you just created and do anything useful with it in Perl, PHP, Java, or any other language.
pickle module. The pickle protocol has changed several times as new data types have been added to the Python language, but there are still limitations.
pickle module will use the latest version of the pickle protocol. This ensures that you have maximum flexibility in the types of data you can serialize, but it also means that the resulting file will not be readable by older versions of Python that do not support the latest version of the pickle protocol.
pickle protocol is a binary protocol. Be sure to open your pickle files in binary mode, or the data will get corrupted during writing.
Now switch to your second Python Shell — i.e. not the one where you created the entry dictionary.
>>> shell ① 2 >>> entry ② Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'entry' is not defined >>> import pickle >>> with open('entry.pickle', 'rb') as f: ③ ... entry = pickle.load(f) ④ ... >>> entry ⑤ {'comments_link': None, 'internal_id': b'\xde\xd5\xb4\xf8', 'title': 'Dive into history, 2009 edition', 'tags': ('diveintopython', 'docbook', 'html'), 'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition', 'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1), 'published': True}
FIXME
>>> shell
1
>>> with open('entry.pickle', 'rb') as f: ①
... entry2 = pickle.load(f) ②
...
>>> entry2 == entry ③
True
>>> entry2['tags'] ④
('diveintopython', 'docbook', 'html')
>>> entry2['internal_id'] ⑤
b'\xde\xd5\xb4\xf8'
FIXME
FIXME - discussion of pickle protocol versions, backward incompatibility of protocol version 3 due to bytes/strings separation in Python 3, link to http://docs.python.org/3.1/library/pickle.html#data-stream-format
What does the pickle protocol look like? Let’s jump out of the Python Shell for a moment and take a look at that entry.pickle file we created.
you@localhost:~/diveintopython3/examples$ ls -l entry.pickle -rw-r--r-- 1 you you 324 Aug 3 13:34 entry.pickle you@localhost:~/diveintopython3/examples$ cat entry.pickle comments_linkqNXtagsqXdiveintopythonqXdocbookqXhtmlq?qX publishedq? XlinkXJhttp://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition q Xpublished_dateq ctime struct_time ?qRqXtitleqXDive into history, 2009 editionqu.
That wasn’t terribly helpful. You can see the strings, but other datatypes end up as unprintable (or at least unreadable) characters. Fields are not obviously delimited by tabs or spaces. This is not a format you would want to debug by yourself.
>>> shell
1
>>> import pickletools
>>> with open('entry.pickle', 'rb') as f:
... pickletools.dis(f)
0: \x80 PROTO 3
2: } EMPTY_DICT
3: q BINPUT 0
5: ( MARK
6: X BINUNICODE 'published_date'
25: q BINPUT 1
27: c GLOBAL 'time struct_time'
45: q BINPUT 2
47: ( MARK
48: M BININT2 2009
51: K BININT1 3
53: K BININT1 27
55: K BININT1 22
57: K BININT1 20
59: K BININT1 42
61: K BININT1 4
63: K BININT1 86
65: J BININT -1
70: t TUPLE (MARK at 47)
71: q BINPUT 3
73: } EMPTY_DICT
74: q BINPUT 4
76: \x86 TUPLE2
77: q BINPUT 5
79: R REDUCE
80: q BINPUT 6
82: X BINUNICODE 'comments_link'
100: q BINPUT 7
102: N NONE
103: X BINUNICODE 'internal_id'
119: q BINPUT 8
121: C SHORT_BINBYTES 'ÞÕ´ø'
127: q BINPUT 9
129: X BINUNICODE 'tags'
138: q BINPUT 10
140: X BINUNICODE 'diveintopython'
159: q BINPUT 11
161: X BINUNICODE 'docbook'
173: q BINPUT 12
175: X BINUNICODE 'html'
184: q BINPUT 13
186: \x87 TUPLE3
187: q BINPUT 14
189: X BINUNICODE 'title'
199: q BINPUT 15
201: X BINUNICODE 'Dive into history, 2009 edition'
237: q BINPUT 16
239: X BINUNICODE 'article_link'
256: q BINPUT 17
258: X BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
337: q BINPUT 18
339: X BINUNICODE 'published'
353: q BINPUT 19
355: \x88 NEWTRUE
356: u SETITEMS (MARK at 5)
357: . STOP
highest protocol among opcodes = 3
FIXME more here about fix_imports and such?
⁂
FIXME - discussion of pickling class instances, stateful objects, __getstate__ and __setstate__, links to http://docs.python.org/3.1/library/pickle.html#pickle-inst and http://docs.python.org/3.1/library/pickle.html#pickle-state
FIXME - pickled objects can be modified in memory, in transit, or on disk; no checksums; no built-in guarantee that the pickle you're loading is the pickle you dumped; never unpickle untrusted input; xref to "eval() is evil" discussion in advanced-iterators chapter
The data format used by the pickle module is Python-specific. It makes no attempt to be compatible with other programming languages. If cross-language compatibility is one of your requirements, you need to look at other serialization formats.
One format that is designed to be used by multiple programming languages is JSON.
FIXME - pickle format is python-specific; JSON format is designed to be cross-language (in fact, it was originally designed for JavaScript, hence the name); differences with pickle format (table or list); json module implements dumping and loading JSON-formatted data structures; JSON format is string-based (and always encoded as UTF-8 where bytes are required); compact vs. pretty-printing; JSONEncoder; JSONDecoder; iterencode
FIXME
| Notes | JSON | Python 3 | ||||||
|---|---|---|---|---|---|---|---|---|
| object | dictionary | |||||||
| array | list | |||||||
| string | string | |||||||
| integer | integer | |||||||
| real number | float | |||||||
true
| True
|
FIXME
>>> shell
1
>>> entry
FIXME
>>> import json
>>> with open('entry.json', 'w', encoding='utf-8') as f:
... json.dump(entry, f)
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "C:\Python31\lib\json\__init__.py", line 178, in dump
for chunk in iterable:
File "C:\Python31\lib\json\encoder.py", line 408, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python31\lib\json\encoder.py", line 382, in _iterencode_dict
for chunk in chunks:
File "C:\Python31\lib\json\encoder.py", line 416, in _iterencode
o = _default(o)
File "C:\Python31\lib\json\encoder.py", line 170, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'\xde\xd5\xb4\xf8' is not JSON serializable
FIXME
# customserializer.py
def to_json(python_object):
if isinstance(python_object, bytes):
return {'__class__': 'bytes',
'__value__': list(python_object)}
raise TypeError(repr(python_object) + ' is not JSON serializable')
FIXME
>>> shell
1
>>> import customserializer
>>> with open('entry.json', 'w', encoding='utf-8') as f:
... json.dump(entry, default = customserializer.to_json)
...
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
json.dump(entry, f, default=customserializer.to_json)
File "C:\Python31\lib\json\__init__.py", line 178, in dump
for chunk in iterable:
File "C:\Python31\lib\json\encoder.py", line 408, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python31\lib\json\encoder.py", line 382, in _iterencode_dict
for chunk in chunks:
File "C:\Python31\lib\json\encoder.py", line 416, in _iterencode
o = _default(o)
File "/Users/pilgrim/diveintopython3/examples/customserializer.py", line 12, in to_json
raise TypeError(repr(python_object) + ' is not JSON serializable')
TypeError: time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1) is not JSON serializable
FIXME
# customserializer.py
def to_json(python_object):
if isinstance(python_object, time.struct_time):
return {'__class__': 'time.asctime',
'__value__': time.asctime(python_object)}
if isinstance(python_object, bytes):
return {'__class__': 'bytes',
'__value__': list(python_object)}
raise TypeError(repr(python_object) + ' is not JSON serializable')
FIXME
>>> shell
1
>>> with open('entry.json', 'w', encoding='utf-8') as f:
... json.dump(entry, default = customserializer.to_json)
...
FIXME
you@localhost:~/diveintopython3/examples$ ls -l example.json
-rw-r--r-- 1 you you 391 Aug 3 13:34 entry.json
you@localhost:~/diveintopython3/examples$ cat example.json
{"published_date": {"__class__": "time.asctime", "__value__": "Fri Mar 27 22:20:42 2009"},
"comments_link": null, "internal_id": {"__class__": "bytes", "__value__": [222, 213, 180, 248]},
"tags": ["diveintopython", "docbook", "html"], "title": "Dive into history, 2009 edition",
"article_link": "http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition",
"published": true}
FIXME
>>> shell
2
>>> del entry
>>> entry
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'entry' is not defined
>>> import json
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry = json.load(f)
...
>>> entry
{'comments_link': None,
'internal_id': {'__class__': 'bytes', '__value__': [222, 213, 180, 248]},
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': {'__class__': 'time.asctime', '__value__': 'Fri Mar 27 22:20:42 2009'},
'published': True}
FIXME
# customserializer.py
def from_json(json_object):
if '__class__' in json_object:
if json_object['__class__'] == 'time.asctime':
return time.strptime(json_object['__value__'])
if json_object['__class__'] == 'bytes':
return bytes(json_object['__value__'])
return json_object
>>> shell
2
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry = json.load(f, object_hook = customserializer.from_json)
...
>>> entry
{'comments_link': None,
'internal_id': b'\xde\xd5\xb4\xf8',
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1),
'published': True}
FIXME
>>> shell
1
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
... entry2 = json.load(f, object_hook = customserializer.from_json)
...
>>> entry2 == entry
False
>>> entry['tags']
('diveintopython', 'docbook', 'html')
>>> entry2['tags']
['diveintopython', 'docbook', 'html']
FIXME
☞Many articles about the
picklemodule make references tocPickle. In Python 2, there were two implementations of thepicklemodule, one written in pure Python and another written in C (but still callable from Python). In Python 3, these two modules have been consolidated, so you should always justimport pickle. You may find these articles useful, but you should ignore the now-obsolete information aboutcPickle.
pickle module
pickle and cPickle — Python object serialization
pickle
json — JavaScript Object Notation Serializer
© 2001–9 Mark Pilgrim