marshal
Python 專用的序列化算法,PyCodeObject 就是用該算法序列化后保存到 pyc 二進制文件。與具體的機器架構(gòu)無關(guān),但可能隨 Python 版本發(fā)生變化。通常不建議用來存儲自定義數(shù)據(jù)。
支持:None, bool, int, long, float, complex, str, unicode, tuple, list, set, frozenset, dict,code objects, StopIteration。容器元素必須是所支持類型,不能是遞歸引用。
>>> from marshal import dump, load, dumps, loads
>>> s = dumps(range(10))
>>> s
'[\n\x00\x00\x00i\x00\...\x00\x00'
>>> loads(s)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
保存序列化結(jié)果到文件。
>>> with file("test.dat", "w") as f:
... dump(range(10), f)
>>> with file("test.dat", "r") as f:
... print load(f)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pickle
應(yīng)該用 cPickle 代替 pickle,按官方文檔的說法有千倍的提升,且可相互替換。支持用戶自定義類型,支持三種協(xié)議版本:
>>> import pickle, cPickle
>>> s = "Hello, World"
>>> d = cPickle.dumps(s, 2)
>>> d
'\x80\x02U\rHello, Worldq\x01.'
>>> cPickle.loads(d)
'Hello, World'
>>> pickle.loads(d) # 和 pickle 格式完全相同。
'Hello, World'
同樣有讀寫文件的 dump、load 函數(shù)??纯粗С值臄?shù)據(jù)類型:
看看對自定義類型的測試。
>>> class Data(object):
... def __init__(self, x, y):
... print "__init__"
... self._x = x
... self._y = y
>>> d = Data(100, 200)
__init__
>>> s = cPickle.dumps(d, 2)
>>> d2 = cPickle.loads(s) # 反序列化并沒有調(diào)用 __init__
>>> d2.__dict__
{'_x': 100, '_y': 200}
利用 pickle protocol 可以控制序列化的細節(jié)。比如下面例子中,我們不像保留 _y 字段。
>>> class Data(object):
... def __init__(self, x, y):
... self._x = x
... self._y = y
...
... def __getstate__(self):
... d = self.__dict__.copy()
... del d["_y"]
... return d
...
... def __setstate__(self, state):
... self.__dict__.update(state)
>>> d = Data(10, 20)
>>> s = cPickle.dumps(d, 2)
>>> d2 = cPickle.loads(s)
>>> d2.__dict__
{'_x': 10}
將對象 pickle 序列化,然后保存到 anydbm 格式文件。anydbm 是個 KV 結(jié)構(gòu)的數(shù)據(jù)庫,可以保存多個序列化的對象。當然也可以選擇使用 dbm、gdbm、bdb。
>>> import shelve
>>> from contextlib import closing
>>> with closing(shelve.open("test", protocol = 2)) as f:
... f["a"] = dict(name = "Tom", age = 34, sex = "male")
... f["b"] = (1, ["a", 3], "abcdefg")
>>> !xxd -g 1 -l 100 test.db
0000000: 00 06 15 61 00 00 00 02 00 00 04 d2 00 00 10 00 ...a............
0000010: 00 00 00 0c 00 00 01 00 00 00 01 00 00 00 00 08 ................
0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000060: 00 00 00 00 ....
>>> with closing(shelve.open("test", protocol = 2)) as f:
... print f["a"]
... print f["b"]
... print ["c"]
{'age': 34, 'name': 'Tom', 'sex': 'male'}
(1, ['a', 3], 'abcdefg')
['c']