diff --git a/serializing.html b/serializing.html index a63d1bb..f6c1a43 100644 --- a/serializing.html +++ b/serializing.html @@ -22,28 +22,32 @@ body{counter-reset:h1 13}

Diving In

FIXME -

+

Open the Python Shell and define the following variable: + +

 >>> shell = 1
-

FIXME +

Keep that window open. Now open another Python Shell and define the following variable: -

+
 >>> shell = 2
+

Throughout this chapter, I will use the shell variable to indicate which Python Shell is being used in each example. +

Serializing Simple Python Objects

FIXME - introduction to pickle module, concepts, what datatypes can be pickled w/o additional work -

Saving to (and Loading from) a File

+

Saving to a File

The pickle module works with data structures. Let’s build one.

->>> shell
+>>> shell                                                                                              
 1
->>> entry = {}
+>>> entry = {}                                                                                         
 >>> entry['title'] = 'Dive into history, 2009 edition'
 >>> entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
 >>> entry['comments_link'] = None
@@ -51,23 +55,108 @@ body{counter-reset:h1 13}
 >>> entry['tags'] = ('diveintopython', 'docbook', 'html')
 >>> entry['published'] = True
 >>> import time
->>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')
+>>> entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')                                
 >>> entry['published_date']
 time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)
    +
  1. Follow along in Python Shell #1. +
  2. The idea here is to build a Python dictionary that could represent something useful, like an entry in an Atom feed. But I also want to ensure that it contains several different types of data, to show off the pickle module. Don’t read too much into these values. +
  3. The time module contains a data structure (time_struct) to represent a point in time (accurate to one millisecond) and functions to manipulate time structs. The strptime() function takes a formatted string an converts it to a time_struct. This string is in the default format, but you can control that with format codes. See the time module for more details. +
+ +

That’s a handsome-looking Python dictionary. Let’s save it to a file. + +

+>>> shell                                    
+1
+>>> import pickle
+>>> with open('entry.pickle', 'wb') as f:    
+...     pickle.dump(entry, f)                
+... 
+
    +
  1. This is still in Python Shell #1. +
  2. Use the open() function to open a file. Set the file mode to 'wb' to open the file for writing in binary mode. Wrap it in a with statement to ensure the file is closed automatically when you’re done with it. +
  3. The dump() function in the pickle module takes a serializable Python data structure, serializes it into a binary, Python-specific format using the latest version of the pickle protocol, and saves it to an open file. +
+ +

That last sentence was pretty important. + +

+ +

Loading from a File

+ +

Now switch to your second Python Shell — i.e. not the one where you created the entry dictionary. + +

+>>> shell                                    
+2
+>>> entry                                    
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+NameError: name 'entry' is not defined
+>>> import pickle
+>>> with open('entry.pickle', 'rb') as f:    
+...     entry = pickle.load(f)               
+... 
+>>> entry                                    
+{'comments_link': None,
+ 'internal_id': b'\xde\xd5\xb4\xf8',
+ 'title': 'Dive into history, 2009 edition',
+ 'tags': ('diveintopython', 'docbook', 'html'),
+ 'article_link':
+ 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
+ 'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1),
+ 'published': True}
+
    +
  1. FIXME +
  2. FIXME +
  3. FIXME +
  4. FIXME
  5. FIXME
+

FIXME +

 >>> shell
 1
->>> with open('entry.pickle', 'wb') as f:
-...     pickle.dump(entry, f)
-... 
+>>> with open('entry.pickle', 'rb') as f: +... entry2 = pickle.load(f) +... +>>> entry2 == entry +True +>>> entry2['tags'] +('diveintopython', 'docbook', 'html') +>>> entry2['internal_id'] +b'\xde\xd5\xb4\xf8'
  1. FIXME +
  2. +
  3. +
  4. +
+

Saving to (and Loading from) an Object in Memory

+ +

FIXME + +

Bytes and Strings Rear Their Ugly Heads (Again!)

+ +

FIXME - discussion of pickle protocol versions, backward incompatibility of protocol version 3 due to bytes/strings separation in Python 3, link to http://docs.python.org/3.1/library/pickle.html#data-stream-format + +

Debugging Pickle Files

+ +

What does the pickle protocol look like? Let’s jump out of the Python Shell for a moment and take a look at that entry.pickle file we created. +

 you@localhost:~/diveintopython3/examples$ ls -l entry.pickle
 -rw-r--r-- 1 you  you  324 Aug  3 13:34 entry.pickle
@@ -78,54 +167,72 @@ q   Xpublished_dateq
 ctime
 struct_time
 ?qRqXtitleqXDive into history, 2009 editionqu.
-
    -
  1. FIXME -
-

FIXME now switch to your second Python Shell - -

->>> shell
-2
->>> entry
-Traceback (most recent call last):
-  File "<stdin>", line 1, in <module>
-NameError: name 'entry' is not defined
->>> import pickle
->>> with open('entry.pickle', 'rb') as f:
-...     entry = pickle.load(f)
-... 
->>> entry
-FIXME
-
    -
  1. FIXME -
- -

FIXME +

That wasn’t terribly helpful. You can see the strings, but other datatypes end up as unprintable (or at least unreadable) characters. Fields are not obviously delimited by tabs or spaces. This is not a format you would want to debug by yourself.

 >>> shell
 1
+>>> import pickletools
 >>> with open('entry.pickle', 'rb') as f:
-...     entry2 = pickle.load(f)
-... 
->>> entry2 == entry
-True
->>> entry2['tags']
-('diveintopython', 'docbook', 'html')
->>> entry2['internal_id']
-b'\xde\xd5\xb4\xf8'
-
    -
  1. FIXME -
- -

Saving to (and Loading from) an Object in Memory

- -

FIXME - -

Bytes and Strings Rear Their Ugly Heads (Again!)

- -

FIXME - discussion of pickle protocol versions, backward incompatibility of protocol version 3 due to bytes/strings separation in Python 3, link to http://docs.python.org/3.1/library/pickle.html#data-stream-format +... pickletools.dis(f) + 0: \x80 PROTO 3 + 2: } EMPTY_DICT + 3: q BINPUT 0 + 5: ( MARK + 6: X BINUNICODE 'published_date' + 25: q BINPUT 1 + 27: c GLOBAL 'time struct_time' + 45: q BINPUT 2 + 47: ( MARK + 48: M BININT2 2009 + 51: K BININT1 3 + 53: K BININT1 27 + 55: K BININT1 22 + 57: K BININT1 20 + 59: K BININT1 42 + 61: K BININT1 4 + 63: K BININT1 86 + 65: J BININT -1 + 70: t TUPLE (MARK at 47) + 71: q BINPUT 3 + 73: } EMPTY_DICT + 74: q BINPUT 4 + 76: \x86 TUPLE2 + 77: q BINPUT 5 + 79: R REDUCE + 80: q BINPUT 6 + 82: X BINUNICODE 'comments_link' + 100: q BINPUT 7 + 102: N NONE + 103: X BINUNICODE 'internal_id' + 119: q BINPUT 8 + 121: C SHORT_BINBYTES 'ÞÕ´ø' + 127: q BINPUT 9 + 129: X BINUNICODE 'tags' + 138: q BINPUT 10 + 140: X BINUNICODE 'diveintopython' + 159: q BINPUT 11 + 161: X BINUNICODE 'docbook' + 173: q BINPUT 12 + 175: X BINUNICODE 'html' + 184: q BINPUT 13 + 186: \x87 TUPLE3 + 187: q BINPUT 14 + 189: X BINUNICODE 'title' + 199: q BINPUT 15 + 201: X BINUNICODE 'Dive into history, 2009 edition' + 237: q BINPUT 16 + 239: X BINUNICODE 'article_link' + 256: q BINPUT 17 + 258: X BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition' + 337: q BINPUT 18 + 339: X BINUNICODE 'published' + 353: q BINPUT 19 + 355: \x88 NEWTRUE + 356: u SETITEMS (MARK at 5) + 357: . STOP +highest protocol among opcodes = 3

⁂ @@ -147,29 +254,39 @@ NameError: name 'entry' is not defined

Mapping of Python Datatypes to JSON

-
-[source: help(json)]
+

FIXME -+---------------+-------------------+ -| JSON | Python | -+===============+===================+ -| object | dict | -+---------------+-------------------+ -| array | list | -+---------------+-------------------+ -| string | unicode | -+---------------+-------------------+ -| number (int) | int, long | -+---------------+-------------------+ -| number (real) | float | -+---------------+-------------------+ -| true | True | -+---------------+-------------------+ -| false | False | -+---------------+-------------------+ -| null | None | -+---------------+-------------------+ -

+ +
Notes +JSON +Python 3 +
+object +dictionary +
+array +list +
+string +string +
+integer +integer +
+real number +float +
+true +True +
+false +False +
+null +None +
+ +

FIXME

Serializing Datatypes Unsupported by JSON