Files
dive-into-python3/serializing.html
T

555 lines
32 KiB
HTML

<!DOCTYPE html>
<head>
<meta charset=utf-8>
<title>Serializing Python Objects - Dive into Python 3</title>
<!--[if IE]><script src=j/html5.js></script><![endif]-->
<link rel=stylesheet href=dip3.css>
<style>
body{counter-reset:h1 13}
</style>
<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
<link rel=stylesheet media=print href=print.css>
<meta name=viewport content='initial-scale=1.0'>
</head>
<form action=http://www.google.com/cse><div><input type=hidden name=cx value=014021643941856155761:l5eihuescdw><input type=hidden name=ie value=UTF-8>&nbsp;<input name=q size=25>&nbsp;<input type=submit name=root value=Search></div></form>
<p>You are here: <a href=index.html>Home</a> <span class=u>&#8227;</span> <a href=table-of-contents.html#serializing>Dive Into Python 3</a> <span class=u>&#8227;</span>
<p id=level>Difficulty level: <span class=u title=advanced>&#x2666;&#x2666;&#x2666;&#x2666;&#x2662;</span>
<h1>Serializing Python Objects</h1>
<blockquote class=q>
<p><span class=u>&#x275D;</span> FIXME <span class=u>&#x275E;</span><br>&mdash; FIXME
</blockquote>
<p id=toc>&nbsp;
<h2 id=divingin>Diving In</h2>
<p class=f>FIXME
<p>Open the Python Shell and define the following variable:
<pre class='nd screen'>
<samp class=p>>>> </samp><kbd class=pp>shell = 1</kbd></pre>
<p>Keep that window open. Now open another Python Shell and define the following variable:
<pre class='nd screen'>
<samp class=p>>>> </samp><kbd class=pp>shell = 2</kbd></pre>
<p>Throughout this chapter, I will use the <code>shell</code> variable to indicate which Python Shell is being used in each example.
<p class=a>&#x2042;
<h2 id=pickle-simple>Serializing Simple Python Objects</h2>
<p>The concept of <dfn>serialization</dfn> is simple. You have a data structure in memory that you want to save, reuse, or send to someone else. How would you do that? Well, that depends on how you want to save it, how you want to reuse it, and to whom you want to send it. Many games allow you to save your progress when you quit the game and pick up where you left off when you relaunch the game. (Actually, many non-gaming applications do this as well.) In this case, a data structure that captures &#8220;your progress so far&#8221; needs to be stored on disk when you quit, then loaded from disk when you relaunch. The data is only meant to be used by the same program that created it, never sent over a network, and never read by anything other than the program that created it. Therefore, the interoperability issues are limited to ensuring that later versions of the program can read data written by earlier versions.
<p>For cases like this, the <code>pickle</code> module is ideal. It&#8217;s part of the Python standard library, so it&#8217;s always available. It&#8217;s fast; the bulk of it is written in C, like the Python interpreter itself. It can store arbitrarily complex Python data structures.
<p>What can the <code>pickle</code> module store?
<ul>
<li>All the <a href=native-datatypes.html>native datatypes</a> that Python supports: booleans, integers, floating point numbers, complex numbers, strings, <code>bytes</code> objects, byte arrays, and <code>None</code>.
<li>Lists, tuples, dictionaries, and sets containing any combination of native datatypes.
<li>Lists, tuples, dictionaries, and sets containing any combination of lists, tuples, dictionaries, and sets containing any combination of native datatypes (and so on, to <a title='sys.getrecursionlimit()' href=http://docs.python.org/3.1/library/sys.html#sys.getrecursionlimit>the maximum nesting level that Python supports</a>).
<li>Functions, classes, and instances of classes (with caveats that I&#8217;ll explain shortly).
</ul>
<p>If this isn&#8217;t enough for you, the <code>pickle</code> module is also extensible, as you&#8217;ll see later in this chapter.
<h3 id=dump>Saving to a File</h3>
<p>The <code>pickle</code> module works with data structures. Let&#8217;s build one.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>shell</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>1</samp>
<a><samp class=p>>>> </samp><kbd class=pp>entry = {}</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>>>> </samp><kbd class=pp>entry['title'] = 'Dive into history, 2009 edition'</kbd>
<samp class=p>>>> </samp><kbd class=pp>entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'</kbd>
<samp class=p>>>> </samp><kbd class=pp>entry['comments_link'] = None</kbd>
<samp class=p>>>> </samp><kbd class=pp>entry['internal_id'] = b'\xde\xd5\xb4\xf8'</kbd>
<samp class=p>>>> </samp><kbd class=pp>entry['tags'] = ('diveintopython', 'docbook', 'html')</kbd>
<samp class=p>>>> </samp><kbd class=pp>entry['published'] = True</kbd>
<samp class=p>>>> </samp><kbd class=pp>import time</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>>>> </samp><kbd class=pp>entry['published_date']</kbd>
<samp class=pp>time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)</samp></pre>
<ol>
<li>Follow along in Python Shell #1.
<li>The idea here is to build a Python dictionary that could represent something useful, like an <a href=xml.html#xml-structure>entry in an Atom feed</a>. But I also want to ensure that it contains several different types of data, to show off the <code>pickle</code> module. Don&#8217;t read too much into these values.
<li>The <code>time</code> module contains a data structure (<code>time_struct</code>) to represent a point in time (accurate to one millisecond) and functions to manipulate time structs. The <code>strptime()</code> function takes a formatted string an converts it to a <code>time_struct</code>. This string is in the default format, but you can control that with format codes. See the <a href=http://docs.python.org/3.1/library/time.html><code>time</code> module</a> for more details.
</ol>
<p>That&#8217;s a handsome-looking Python dictionary. Let&#8217;s save it to a file.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>shell</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>1</samp>
<samp class=p>>>> </samp><kbd class=pp>import pickle</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>with open('entry.pickle', 'wb') as f:</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>... </samp><kbd class=pp> pickle.dump(entry, f)</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>... </samp></pre>
<ol>
<li>This is still in Python Shell #1.
<li>Use the <code>open()</code> function to open a file. Set the file mode to <code>'wb'</code> to open the file for writing <a href=files.html#binary>in binary mode</a>. Wrap it in a <a href=files.html#with><code>with</code> statement</a> to ensure the file is closed automatically when you&#8217;re done with it.
<li>The <code>dump()</code> function in the <code>pickle</code> module takes a serializable Python data structure, serializes it into a binary, Python-specific format using the latest version of the pickle protocol, and saves it to an open file.
</ol>
<p>That last sentence was pretty important.
<ul>
<li>The <code>pickle</code> module takes a Python data structure and saves it to a file.
<li>To do this, it <i>serializes</i> the data structure using a data format called &#8220;the pickle protocol.&#8221;
<li>The pickle protocol is Python-specific; there is no guarantee of cross-language compatibility. You probably couldn&#8217;t take the <code>entry.pickle</code> file you just created and do anything useful with it in Perl, <abbr>PHP</abbr>, Java, or any other language.
<li>Not every Python data structure can be serialized by the <code>pickle</code> module. The pickle protocol has changed several times as new data types have been added to the Python language, but there are still limitations.
<li>As a result of these changes, there is no guarantee of compatibility between different versions of Python itself. Newer versions of Python support the older serialization formats, but older versions of Python do not support newer formats (since they don&#8217;t support the newer data types).
<li>Unless you specify otherwise, the functions in the <code>pickle</code> module will use the latest version of the pickle protocol. This ensures that you have maximum flexibility in the types of data you can serialize, but it also means that the resulting file will not be readable by older versions of Python that do not support the latest version of the pickle protocol.
<li>The latest version of the pickle protocol is a binary format. Be sure to open your pickle files <a href=files.html#binary>in binary mode</a>, or the data will get corrupted during writing.
</ul>
<h3 id=load>Loading from a File</h3>
<p>Now switch to your second Python Shell&nbsp;&mdash;&nbsp;<i>i.e.</i> not the one where you created the <code>entry</code> dictionary.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>shell</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>2</samp>
<a><samp class=p>>>> </samp><kbd class=pp>entry</kbd> <span class=u>&#x2461;</span></a>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
NameError: name 'entry' is not defined</samp>
<samp class=p>>>> </samp><kbd class=pp>import pickle</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>with open('entry.pickle', 'rb') as f:</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>... </samp><kbd class=pp> entry = pickle.load(f)</kbd> <span class=u>&#x2463;</span></a>
<samp class=p>... </samp>
<a><samp class=p>>>> </samp><kbd class=pp>entry</kbd> <span class=u>&#x2464;</span></a>
<samp class=pp>{'comments_link': None,
'internal_id': b'\xde\xd5\xb4\xf8',
'title': 'Dive into history, 2009 edition',
'tags': ('diveintopython', 'docbook', 'html'),
'article_link':
'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1),
'published': True}</samp></pre>
<ol>
<li>This is Python Shell #2.
<li>There is no <var>entry</var> variable defined here. You defined an <var>entry</var> variable in Python Shell #1, but that&#8217;s a completely different environment with its own state.
<li>Open the <code>entry.pickle</code> file you created in Python Shell #1. The <code>pickle</code> module uses a binary data format, so you should always open pickle files in binary mode.
<li>The <code>pickle.load()</code> function takes a <a href=files.html#file-objects>stream object</a>, reads the serialized data from the stream, creates a new Python object, recreates the serialized data in the new Python object, and returns the new Python object.
<li>Now the <var>entry</var> variable is a dictionary with familiar-looking keys and values.
</ol>
<p>The <code>pickle.dump() / pickle.load()</code> cycle results in an identical copy of the original data structure.
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>shell</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>1</samp>
<a><samp class=p>>>> </samp><kbd class=pp>with open('entry.pickle', 'rb') as f:</kbd> <span class=u>&#x2461;</span></a>
<a><samp class=p>... </samp><kbd class=pp> entry2 = pickle.load(f)</kbd> <span class=u>&#x2462;</span></a>
<samp class=p>... </samp>
<a><samp class=p>>>> </samp><kbd class=pp>entry2 == entry</kbd> <span class=u>&#x2463;</span></a>
<samp class=pp>True</samp>
<a><samp class=p>>>> </samp><kbd class=pp>entry2['tags']</kbd> <span class=u>&#x2464;</span></a>
<samp class=pp>('diveintopython', 'docbook', 'html')</samp>
<samp class=p>>>> </samp><kbd class=pp>entry2['internal_id']</kbd>
<samp class=pp>b'\xde\xd5\xb4\xf8'</samp></pre>
<ol>
<li>Switch back to Python Shell #1.
<li>Open the <code>entry.pickle</code> file.
<li>Load the serialized data into a new variable, <var>entry2</var>.
<li>Python confirms that the two dictionaries, <var>entry</var> and <var>entry2</var>, are identical. In this shell, you built <var>entry</var> from the ground up, starting with an empty dictionary and manually assigning values to specific keys. You serialized this dictionary and stored it in the <code>entry.pickle</code> file. Now you&#8217;ve read the serialized data from that file and created a perfect replica of the original data structure.
<li>For reasons that will become clear later in this chapter, I want to point out that the value of the <code>'tags'</code> key is a tuple, and the value of the <code>'internal_id'</code> key is a <code>bytes</code> object.
</ol>
<h3 id=dumps>Pickling Without a File</h3>
<p>The examples in the previous section showed how to serialize a Python object directly to a file on disk. But what if you don&#8217;t want or need a file? You can also serialize to a <code>bytes</code> object in memory.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>1</samp>
<a><samp class=p>>>> </samp><kbd class=pp>b = pickle.dumps(entry)</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>type(b)</kbd> <span class=u>&#x2461;</span></a>
<samp class=pp>&lt;class 'bytes'></samp>
<a><samp class=p>>>> </samp><kbd class=pp>entry3 = pickle.loads(b)</kbd> <span class=u>&#x2462;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>entry3 == entry</kbd> <span class=u>&#x2463;</span></a>
<samp class=pp>True</samp></pre>
<ol>
<li>The <code>pickle.dumps()</code> function (note the <code>'s'</code> at the end of the function name) performs the same serialization as the <code>pickle.dump()</code> function. Instead of taking a stream object and writing the serialized data to a file on disk, it simply returns the serialized data.
<li>Since the pickle protocol uses a binary data format, the <code>pickle.dumps()</code> function returns a <code>bytes</code> object.
<li>The <code>pickle.loads()</code> function (again, note the <code>'s'</code> at the end of the function name) performs the same deserialization as the <code>pickle.load()</code> function. Instead of taking a stream object and reading the serialized data from a file, it takes a <code>bytes</code> object containing serialized data, such as the one returned by the <code>pickle.dumps()</code> function.
<li>The end result is the same: a perfect replica of the original dictionary.
</ol>
<h3 id=protocol-versions>Bytes and Strings Rear Their Ugly Heads Again</h3>
<p>The pickle protocol has been around for many years, and it has matured as Python itself has matured. There are now <a href=http://docs.python.org/3.1/library/pickle.html#data-stream-format>four different versions</a> of the pickle protocol.
<ul>
<li>Python 1.x had two pickle protocols, a text-based format (&#8220;version 0&#8221;) and a binary format (&#8220;version 1&#8221;).
<li>Python 2.3 introduced a new pickle protocol (&#8220;version 2&#8221;) to handle new functionality in Python class objects. It is a binary format.
<li>Python 3.0 introduced another pickle protocol (&#8220;version 3&#8221;) with explicit support for <code>bytes</code> objects and byte arrays. It is a binary format.
</ul>
<p>Oh look, <a href=strings.html#byte-arrays>the difference between bytes and strings</a> rears its ugly head again. (If you&#8217;re surprised, you haven&#8217;t been paying attention.) What this means in practice is that, while Python 3 can read data pickled with protocol version 2, Python 2 can not read data pickled with protocol version 3.
<h3 id=debugging>Debugging Pickle Files</h3>
<p>What does the pickle protocol look like? Let&#8217;s jump out of the Python Shell for a moment and take a look at that <code>entry.pickle</code> file we created.
<pre class=screen>
<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>ls -l entry.pickle</kbd>
<samp>-rw-r--r-- 1 you you 324 Aug 3 13:34 entry.pickle</samp>
<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>cat entry.pickle</kbd>
<samp>comments_linkqNXtagsqXdiveintopythonqXdocbookqXhtmlq?qX publishedq?
XlinkXJhttp://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition
q Xpublished_dateq
ctime
struct_time
?qRqXtitleqXDive into history, 2009 editionqu.</samp></pre>
<p>That wasn&#8217;t terribly helpful. You can see the strings, but other datatypes end up as unprintable (or at least unreadable) characters. Fields are not obviously delimited by tabs or spaces. This is not a format you would want to debug by yourself.
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>1</samp>
<samp class=p>>>> </samp><kbd class=pp>import pickletools</kbd>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.pickle', 'rb') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> pickletools.dis(f)</kbd>
<samp> 0: \x80 PROTO 3
2: } EMPTY_DICT
3: q BINPUT 0
5: ( MARK
6: X BINUNICODE 'published_date'
25: q BINPUT 1
27: c GLOBAL 'time struct_time'
45: q BINPUT 2
47: ( MARK
48: M BININT2 2009
51: K BININT1 3
53: K BININT1 27
55: K BININT1 22
57: K BININT1 20
59: K BININT1 42
61: K BININT1 4
63: K BININT1 86
65: J BININT -1
70: t TUPLE (MARK at 47)
71: q BINPUT 3
73: } EMPTY_DICT
74: q BINPUT 4
76: \x86 TUPLE2
77: q BINPUT 5
79: R REDUCE
80: q BINPUT 6
82: X BINUNICODE 'comments_link'
100: q BINPUT 7
102: N NONE
103: X BINUNICODE 'internal_id'
119: q BINPUT 8
121: C SHORT_BINBYTES 'ÞÕ´ø'
127: q BINPUT 9
129: X BINUNICODE 'tags'
138: q BINPUT 10
140: X BINUNICODE 'diveintopython'
159: q BINPUT 11
161: X BINUNICODE 'docbook'
173: q BINPUT 12
175: X BINUNICODE 'html'
184: q BINPUT 13
186: \x87 TUPLE3
187: q BINPUT 14
189: X BINUNICODE 'title'
199: q BINPUT 15
201: X BINUNICODE 'Dive into history, 2009 edition'
237: q BINPUT 16
239: X BINUNICODE 'article_link'
256: q BINPUT 17
258: X BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
337: q BINPUT 18
339: X BINUNICODE 'published'
353: q BINPUT 19
355: \x88 NEWTRUE
356: u SETITEMS (MARK at 5)
357: . STOP
<mark>highest protocol among opcodes = 3</mark></samp></pre>
<p>The most interesting piece of information in that disassembly is on the last line, because it includes the version of the pickle protocol with which this file was saved. There is no explicit version marker in the pickle protocol. To determine which protocol version was used to store a pickle file, you need to look at the markers (&#8220;opcodes&#8221;) within the pickled data and use hard-coded knowledge of which opcodes were introduced with each version of the pickle protocol. The <code>pickle.dis()</code> function does exactly that, and it prints the result in the last line of the disassembly output. Here is a function that returns just the version number, without printing anything:
<p class=d>[<a href=examples/pickleversion.py>download <code>pickleversion.py</code></a>]
<pre class=pp><code>import pickletools
def protocol_version(file_object):
maxproto = -1
for opcode, arg, pos in pickletools.genops(file_object):
maxproto = max(maxproto, opcode.proto)
return maxproto</code></pre>
<p>And here it is in action:</p>
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>import pickleversion</kbd>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.pickle', 'rb') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> v = pickleversion.protocol_version(f)</kbd>
<samp class=p>>>> </samp><kbd class=pp>v</kbd>
<samp class=pp>3</samp></pre>
<p class=a>&#x2042;
<h2 id=pickle-advanced>Serializing Complex Python Objects</h2>
<p>FIXME - discussion of pickling class instances, stateful objects, __getstate__ and __setstate__, links to http://docs.python.org/3.1/library/pickle.html#pickle-inst and http://docs.python.org/3.1/library/pickle.html#pickle-state
<h2 id=pickle-security>Security Concerns with Pickled Objects</h2>
<p>FIXME - pickled objects can be modified in memory, in transit, or on disk; no checksums; no built-in guarantee that the pickle you're loading is the pickle you dumped; never unpickle untrusted input; xref to "eval() is evil" discussion in advanced-iterators chapter
<h2 id=json>Serializing Python Objects to be Read by Other Languages</h2>
<p>The data format used by the <code>pickle</code> module is Python-specific. It makes no attempt to be compatible with other programming languages. If cross-language compatibility is one of your requirements, you need to look at other serialization formats.
<p>One format that <em>is</em> designed to be used by multiple programming languages is <a href=http://json.org/><abbr>JSON</abbr></a>.
<p>FIXME - pickle format is python-specific; JSON format is designed to be cross-language (in fact, it was originally designed for JavaScript, hence the name); differences with pickle format (table or list); json module implements dumping and loading JSON-formatted data structures; JSON format is string-based (and always encoded as UTF-8 where bytes are required); compact vs. pretty-printing; JSONEncoder; JSONDecoder; iterencode
<h3 id=json-types>Mapping of Python Datatypes to <abbr>JSON</abbr></h3>
<p>FIXME
<table>
<tr><th>Notes
<th>JSON
<th>Python 3
<tr><th>
<td>object
<td><a href=native-datatypes.html#dictionaries>dictionary</a>
<tr><th>
<td>array
<td><a href=native-datatypes.html#lists>list</a>
<tr><th>
<td>string
<td><a href=strings.html#divingin>string</a>
<tr><th>
<td>integer
<td><a href=native-datatypes.html#numbers>integer</a>
<tr><th>
<td>real number
<td><a href=native-datatypes.html#numbers>float</a>
<tr><th>
<td><code>true</code>
<td><code><a href=native-datatypes.html#booleans>True</a>
<tr><th>
<td><code>false</code>
<td><code><a href=native-datatypes.html#booleans>False</a></code>
<tr><th>
<td><code>null</code>
<td><code><a href=native-datatypes.html#none>None</a></code>
</table>
<p>FIXME
<h3 id=json-unknown-types>Serializing Datatypes Unsupported by <abbr>JSON</abbr></h3>
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>shell</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>1</samp>
<samp class=p>>>> </samp><kbd class=pp>entry</kbd>
<samp class=pp>FIXME</samp>
<samp class=p>>>> </samp><kbd class=pp>import json</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>with open('entry.json', 'w', encoding='utf-8') as f:</kbd> <span class=u>&#x2461;</span></a>
<samp class=p>... </samp><kbd class=pp> json.dump(entry, f)</kbd>
<samp class=p>... </samp>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 5, in &lt;module>
File "C:\Python31\lib\json\__init__.py", line 178, in dump
for chunk in iterable:
File "C:\Python31\lib\json\encoder.py", line 408, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python31\lib\json\encoder.py", line 382, in _iterencode_dict
for chunk in chunks:
File "C:\Python31\lib\json\encoder.py", line 416, in _iterencode
o = _default(o)
File "C:\Python31\lib\json\encoder.py", line 170, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'\xde\xd5\xb4\xf8' is not JSON serializable</samp></pre>
<ol>
<li>FIXME
<li>FIXME
</ol>
<p>FIXME
<pre class=pp><code># customserializer.py
def to_json(python_object):
if isinstance(python_object, bytes):
return {'__class__': 'bytes',
'__value__': list(python_object)}
raise TypeError(repr(python_object) + ' is not JSON serializable')</code></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>1</samp>
<samp class=p>>>> </samp><kbd class=pp>import customserializer</kbd>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.json', 'w', encoding='utf-8') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> json.dump(entry, default = customserializer.to_json)</kbd>
<samp class=p>... </samp>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 9, in &lt;module>
json.dump(entry, f, default=customserializer.to_json)
File "C:\Python31\lib\json\__init__.py", line 178, in dump
for chunk in iterable:
File "C:\Python31\lib\json\encoder.py", line 408, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python31\lib\json\encoder.py", line 382, in _iterencode_dict
for chunk in chunks:
File "C:\Python31\lib\json\encoder.py", line 416, in _iterencode
o = _default(o)
File "/Users/pilgrim/diveintopython3/examples/customserializer.py", line 12, in to_json
raise TypeError(repr(python_object) + ' is not JSON serializable')
TypeError: time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1) is not JSON serializable</samp></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=pp><code># customserializer.py
def to_json(python_object):
if isinstance(python_object, time.struct_time):
return {'__class__': 'time.asctime',
'__value__': time.asctime(python_object)}
if isinstance(python_object, bytes):
return {'__class__': 'bytes',
'__value__': list(python_object)}
raise TypeError(repr(python_object) + ' is not JSON serializable')</code></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>1</samp>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.json', 'w', encoding='utf-8') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> json.dump(entry, default = customserializer.to_json)</kbd>
<samp class=p>... </samp></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=screen>
<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>ls -l example.json</kbd>
<samp>-rw-r--r-- 1 you you 391 Aug 3 13:34 entry.json</samp>
<samp class=p>you@localhost:~/diveintopython3/examples$ </samp><kbd>cat example.json</kbd>
<samp>{"published_date": {"__class__": "time.asctime", "__value__": "Fri Mar 27 22:20:42 2009"},
"comments_link": null, "internal_id": {"__class__": "bytes", "__value__": [222, 213, 180, 248]},
"tags": ["diveintopython", "docbook", "html"], "title": "Dive into history, 2009 edition",
"article_link": "http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition",
"published": true}</samp></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>2</samp>
<samp class=p>>>> </samp><kbd class=pp>del entry</kbd>
<samp class=p>>>> </samp><kbd class=pp>entry</kbd>
<samp class=traceback>Traceback (most recent call last):
File "&lt;stdin>", line 1, in &lt;module>
NameError: name 'entry' is not defined</samp>
<samp class=p>>>> </samp><kbd class=pp>import json</kbd>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.json', 'r', encoding='utf-8') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> entry = json.load(f)</kbd>
<samp class=p>... </samp>
<samp class=p>>>> </samp><kbd class=pp>entry</kbd>
<samp class=pp>{'comments_link': None,
'internal_id': {'__class__': 'bytes', '__value__': [222, 213, 180, 248]},
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': {'__class__': 'time.asctime', '__value__': 'Fri Mar 27 22:20:42 2009'},
'published': True}</samp></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=pp><code># customserializer.py
def from_json(json_object):
if '__class__' in json_object:
if json_object['__class__'] == 'time.asctime':
return time.strptime(json_object['__value__'])
if json_object['__class__'] == 'bytes':
return bytes(json_object['__value__'])
return json_object</code></pre>
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>2</samp>
<samp class=p>>>> </samp><kbd class=pp>import customserializer</kbd>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.json', 'r', encoding='utf-8') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> entry = json.load(f, object_hook = customserializer.from_json)</kbd>
<samp class=p>... </samp>
<samp class=p>>>> </samp><kbd class=pp>entry</kbd>
<samp class=pp>{'comments_link': None,
'internal_id': b'\xde\xd5\xb4\xf8',
'title': 'Dive into history, 2009 edition',
'tags': ['diveintopython', 'docbook', 'html'],
'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1),
'published': True}</samp></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>shell</kbd>
<samp class=pp>1</samp>
<samp class=p>>>> </samp><kbd class=pp>import customserializer</kbd>
<samp class=p>>>> </samp><kbd class=pp>with open('entry.json', 'r', encoding='utf-8') as f:</kbd>
<samp class=p>... </samp><kbd class=pp> entry2 = json.load(f, object_hook = customserializer.from_json)</kbd>
<samp class=p>... </samp>
<samp class=p>>>> </samp><kbd class=pp>entry2 == entry</kbd>
<samp class=pp>False</samp>
<samp class=p>>>> </samp><kbd class=pp>entry['tags']</kbd>
<samp class=pp>('diveintopython', 'docbook', 'html')</samp>
<samp class=p>>>> </samp><kbd class=pp>entry2['tags']</kbd>
<samp class=pp>['diveintopython', 'docbook', 'html']</samp></pre>
<ol>
<li>FIXME
</ol>
<p>FIXME
<h2 id=furtherreading>Further Reading</h2>
<blockquote class=note>
<p><span class=u>&#x261E;</span>Many articles about the <code>pickle</code> module make references to <code>cPickle</code>. In Python 2, there were two implementations of the <code>pickle</code> module, one written in pure Python and another written in C (but still callable from Python). In Python 3, <a href=porting-code-to-python-3-with-2to3.html#othermodules>these two modules have been consolidated</a>, so you should always just <code>import pickle</code>. You may find these articles useful, but you should ignore the now-obsolete information about <code>cPickle</code>.
</blockquote>
<ul>
<li><a href=http://docs.python.org/3.1/library/pickle.html><code>pickle</code> module</a>
<li><a href=http://www.doughellmann.com/PyMOTW/pickle/><code>pickle</code> and <code>cPickle</code>&nbsp;&mdash;&nbsp;Python object serialization</a>
<li><a href=http://wiki.python.org/moin/UsingPickle>Using <code>pickle</code></a>
<li><a href=http://www.ibm.com/developerworks/library/l-pypers.html>Python persistence management</a>
<li><a href=http://www.doughellmann.com/PyMOTW/json/><code>json</code>&nbsp;&mdash;&nbsp;JavaScript Object Notation Serializer</a>
<li><a href=http://blog.quaternio.net/2009/07/16/json-encoding-and-decoding-with-custom-objects-in-python/>JSON encoding and ecoding with custom objects in Python</a>
</ul>
<p class=v><a rel=prev href=xml.html title='back to &#8220;XML&#8221;'><span class=u>&#x261C;</span></a> <a rel=next href=http-web-services.html title='onward to &#8220;HTTP Web Services&#8221;'><span class=u>&#x261E;</span></a>
<p class=c>&copy; 2001&ndash;9 <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/prettify.js></script>
<script src=j/dip3.js></script>