From 126caa7ce80ee0794d98110fdb9f3d755247c56c Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Sun, 9 Aug 2009 20:45:09 -0700 Subject: [PATCH] wrote #dump section --- serializing.html | 265 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 191 insertions(+), 74 deletions(-) diff --git a/serializing.html b/serializing.html index a63d1bb..f6c1a43 100644 --- a/serializing.html +++ b/serializing.html @@ -22,28 +22,32 @@ body{counter-reset:h1 13}

Diving In

FIXME -

+

Open the Python Shell and define the following variable: + +

 >>> shell = 1
-

FIXME +

Keep that window open. Now open another Python Shell and define the following variable: -

+
 >>> shell = 2
+

Throughout this chapter, I will use the shell variable to indicate which Python Shell is being used in each example. +

Serializing Simple Python Objects

FIXME - introduction to pickle module, concepts, what datatypes can be pickled w/o additional work -

Saving to (and Loading from) a File

+

Saving to a File

The pickle module works with data structures. Let’s build one.

->>> shell
+>>> shell                                                                                              
 1
->>> entry = {}
+>>> entry = {}                                                                                         
 >>> entry['title'] = 'Dive into history, 2009 edition'
 >>> entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
 >>> entry['comments_link'] = None
@@ -51,23 +55,108 @@ body{counter-reset:h1 13}
 >>> entry['tags'] = ('diveintopython', 'docbook', 'html')
 >>> entry['published'] = True
 >>> import time
->>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')
+>>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')                                
 >>> entry['published_date']
 time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)
    +
  1. Follow along in Python Shell #1. +
  2. The idea here is to build a Python dictionary that could represent something useful, like an entry in an Atom feed. But I also want to ensure that it contains several different types of data, to show off the pickle module. Don’t read too much into these values. +
  3. The time module contains a data structure (time_struct) to represent a point in time (accurate to one millisecond) and functions to manipulate time structs. The strptime() function takes a formatted string an converts it to a time_struct. This string is in the default format, but you can control that with format codes. See the time module for more details. +
+ +

That’s a handsome-looking Python dictionary. Let’s save it to a file. + +

+>>> shell                                    
+1
+>>> import pickle
+>>> with open('entry.pickle', 'wb') as f:    
+...     pickle.dump(entry, f)                
+... 
+
    +
  1. This is still in Python Shell #1. +
  2. Use the open() function to open a file. Set the file mode to 'wb' to open the file for writing in binary mode. Wrap it in a with statement to ensure the file is closed automatically when you’re done with it. +
  3. The dump() function in the pickle module takes a serializable Python data structure, serializes it into a binary, Python-specific format using the latest version of the pickle protocol, and saves it to an open file. +
+ +

That last sentence was pretty important. + +

    +
  • The pickle module takes a Python data structure and saves it to a file. +
  • To do this, it serializes the data structure using a data format called “the pickle protocol.” +
  • The pickle protocol is Python-specific; there is no guarantee of cross-language compatibility. You probably couldn’t take the entry.pickle file you just created and do anything useful with it in Perl, PHP, Java, or any other language. +
  • Not every Python data structure can be serialized by the pickle module. The pickle protocol has changed several times as new data types have been added to the Python language, but there are still limitations. +
  • As a result of these changes, there is no guarantee of compatibility between different versions of Python itself. Newer versions of Python support the older serialization formats, but older versions of Python do not support newer formats (since they don’t support the newer data types). +
  • Unless you specify otherwise, the functions in the pickle module will use the latest version of the pickle protocol. This ensures that you have maximum flexibility in the types of data you can serialize, but it also means that the resulting file will not be readable by older versions of Python that do not support the latest version of the pickle protocol. +
  • The latest version of the pickle protocol is a binary protocol. Be sure to open your pickle files in binary mode, or the data will get corrupted during writing. +
+ +

Loading from a File

+ +

Now switch to your second Python Shell — i.e. not the one where you created the entry dictionary. + +

+>>> shell                                    
+2
+>>> entry                                    
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+NameError: name 'entry' is not defined
+>>> import pickle
+>>> with open('entry.pickle', 'rb') as f:    
+...     entry = pickle.load(f)               
+... 
+>>> entry                                    
+{'comments_link': None,
+ 'internal_id': b'\xde\xd5\xb4\xf8',
+ 'title': 'Dive into history, 2009 edition',
+ 'tags': ('diveintopython', 'docbook', 'html'),
+ 'article_link':
+ 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
+ 'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1),
+ 'published': True}
+
    +
  1. FIXME +
  2. FIXME +
  3. FIXME +
  4. FIXME
  5. FIXME
+

FIXME +

 >>> shell
 1
->>> with open('entry.pickle', 'wb') as f:
-...     pickle.dump(entry, f)
-... 
+>>> with open('entry.pickle', 'rb') as f: +... entry2 = pickle.load(f) +... +>>> entry2 == entry +True +>>> entry2['tags'] +('diveintopython', 'docbook', 'html') +>>> entry2['internal_id'] +b'\xde\xd5\xb4\xf8'
  1. FIXME +
  2. +
  3. +
  4. +
+

Saving to (and Loading from) an Object in Memory

+ +

FIXME + +

Bytes and Strings Rear Their Ugly Heads (Again!)

+ +

FIXME - discussion of pickle protocol versions, backward incompatibility of protocol version 3 due to bytes/strings separation in Python 3, link to http://docs.python.org/3.1/library/pickle.html#data-stream-format + +

Debugging Pickle Files

+ +

What does the pickle protocol look like? Let’s jump out of the Python Shell for a moment and take a look at that entry.pickle file we created. +

 you@localhost:~/diveintopython3/examples$ ls -l entry.pickle
 -rw-r--r-- 1 you  you  324 Aug  3 13:34 entry.pickle
@@ -78,54 +167,72 @@ q   Xpublished_dateq
 ctime
 struct_time
 ?qRqXtitleqXDive into history, 2009 editionqu.
-
    -
  1. FIXME -
-

FIXME now switch to your second Python Shell - -

->>> shell
-2
->>> entry
-Traceback (most recent call last):
-  File "<stdin>", line 1, in <module>
-NameError: name 'entry' is not defined
->>> import pickle
->>> with open('entry.pickle', 'rb') as f:
-...     entry = pickle.load(f)
-... 
->>> entry
-FIXME
-
    -
  1. FIXME -
- -

FIXME +

That wasn’t terribly helpful. You can see the strings, but other datatypes end up as unprintable (or at least unreadable) characters. Fields are not obviously delimited by tabs or spaces. This is not a format you would want to debug by yourself.

 >>> shell
 1
+>>> import pickletools
 >>> with open('entry.pickle', 'rb') as f:
-...     entry2 = pickle.load(f)
-... 
->>> entry2 == entry
-True
->>> entry2['tags']
-('diveintopython', 'docbook', 'html')
->>> entry2['internal_id']
-b'\xde\xd5\xb4\xf8'
-
    -
  1. FIXME -
- -

Saving to (and Loading from) an Object in Memory

- -

FIXME - -

Bytes and Strings Rear Their Ugly Heads (Again!)

- -

FIXME - discussion of pickle protocol versions, backward incompatibility of protocol version 3 due to bytes/strings separation in Python 3, link to http://docs.python.org/3.1/library/pickle.html#data-stream-format +... pickletools.dis(f) + 0: \x80 PROTO 3 + 2: } EMPTY_DICT + 3: q BINPUT 0 + 5: ( MARK + 6: X BINUNICODE 'published_date' + 25: q BINPUT 1 + 27: c GLOBAL 'time struct_time' + 45: q BINPUT 2 + 47: ( MARK + 48: M BININT2 2009 + 51: K BININT1 3 + 53: K BININT1 27 + 55: K BININT1 22 + 57: K BININT1 20 + 59: K BININT1 42 + 61: K BININT1 4 + 63: K BININT1 86 + 65: J BININT -1 + 70: t TUPLE (MARK at 47) + 71: q BINPUT 3 + 73: } EMPTY_DICT + 74: q BINPUT 4 + 76: \x86 TUPLE2 + 77: q BINPUT 5 + 79: R REDUCE + 80: q BINPUT 6 + 82: X BINUNICODE 'comments_link' + 100: q BINPUT 7 + 102: N NONE + 103: X BINUNICODE 'internal_id' + 119: q BINPUT 8 + 121: C SHORT_BINBYTES 'ÞÕ´ø' + 127: q BINPUT 9 + 129: X BINUNICODE 'tags' + 138: q BINPUT 10 + 140: X BINUNICODE 'diveintopython' + 159: q BINPUT 11 + 161: X BINUNICODE 'docbook' + 173: q BINPUT 12 + 175: X BINUNICODE 'html' + 184: q BINPUT 13 + 186: \x87 TUPLE3 + 187: q BINPUT 14 + 189: X BINUNICODE 'title' + 199: q BINPUT 15 + 201: X BINUNICODE 'Dive into history, 2009 edition' + 237: q BINPUT 16 + 239: X BINUNICODE 'article_link' + 256: q BINPUT 17 + 258: X BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition' + 337: q BINPUT 18 + 339: X BINUNICODE 'published' + 353: q BINPUT 19 + 355: \x88 NEWTRUE + 356: u SETITEMS (MARK at 5) + 357: . STOP +highest protocol among opcodes = 3

⁂ @@ -147,29 +254,39 @@ NameError: name 'entry' is not defined

Mapping of Python Datatypes to JSON

-
-[source: help(json)]
+

FIXME -+---------------+-------------------+ -| JSON | Python | -+===============+===================+ -| object | dict | -+---------------+-------------------+ -| array | list | -+---------------+-------------------+ -| string | unicode | -+---------------+-------------------+ -| number (int) | int, long | -+---------------+-------------------+ -| number (real) | float | -+---------------+-------------------+ -| true | True | -+---------------+-------------------+ -| false | False | -+---------------+-------------------+ -| null | None | -+---------------+-------------------+ -

+ +
Notes +JSON +Python 3 +
+object +dictionary +
+array +list +
+string +string +
+integer +integer +
+real number +float +
+true +True +
+false +False +
+null +None +
+ +

FIXME

Serializing Datatypes Unsupported by JSON