Most Powerful Open Source ERP

Technical Note on Persistency

Short explanation of persistency in ERP5.
  • Last Update:2016-06-28
  • Version:001
  • Language:en

Explanation how persistency can be achieved inside ERP5.

Table of Contents

Persistency

A persistent object is an object which, once instanciated as a property of another persistent object, will be stored persistently in the Zope object database with its own OID - a unique persistent object identifier.

Simple example

from persistent import Persistent
class Foo(Persistent):
  def __init__(self):
    self.my_property = 1

This class, if instanciated and put as a property on an existing persistent object (on an "empty" ZODB it will be the special root object, in an empty Zope site it will probably be the Application object), will be stored in the ZODB under a newly assigned OID.

But there is a non-persistent object which also has been stored : the int value stored in the my_property instance. Another one is the very name of that property, the "my_property" string.

If the int value is changed, what will be stored in the ZODB is a new complete serialization of the Foo instance, with same OID. The change is detected because we call __setattr__ defined on Persistent, which explains why modifying a mutable property value will not cause container object to be saved in the ZODB - change is not detected.

Complex example

based on workflow history handling

Consider the following object-gathering line.

foo_document.workflow_history['edit_workflow'][-1]['date']

Dissection of classes traversed by that line:

foo_document      -> Document
workflow_history  -> PersistentMapping
['edit_workflow'] -> tuple
[-1]              -> dict
['date']          -> DateTime

Document is any documentation type, which inheritates from Persistent.

PersistentMapping is a standard Zope type inheritating from Persistent.

tuple and dict are standard python types which do not inherit form Persistent.

DateTime is a standard Zope type which does not inherit from Persistent.

This means that:

  • Modifying foo_document properties (involving calling foo_document.__setattr__ at some point) will cause foo_document to be saved, which does not include serializing workflow_history, because workflow_history is also persistent and saved separately from foo_document.
  • Modifying workflow_history (same remark as above) will cause workflow_history to be saved, which does not include modifying foo_document in any way, so it's not serialized again. But this does include serializing all the non-persistent subobjects.
  • Modifying tuple is impossible because it's an immutable type, but replacing it by another tuple is possible, so we fall in "modifying workflow_history" case. As the ZODB keeps history of previous object versions - until packed - it means that increasing one by one the tuple length will cause ZODB size to increase exponentially: 1 + 2 + 3 + 4...
  • Modifying dict will cause nothing to happen at storage level, because it is mutable and not persistent.
  • Modifying DateTime is just the same as modifying the dict, nothing will happen at storage level.

When to inherit from Persistent

When implemeting a class which instances will often be modified (the tuple in above example) you should make it persistent, to avoid impacting the container at each change.

Note that some container handle this impact better than others. An example is BTreeFolder2, because it avoids being modified completely when a subobject is added/deleted or modified (if it implies calling __setattr__ on the BTreeFolder2).

When not to inherit from Persistent

When implementing a class which instances will be and stay small (only reading the pickle from ZODB can tell you if the object is small) compared to the size of ZODB object header (which is basically the class name). Otherwise it will hurt information density, and the ZODB will contain more object header data than actual object payload.

Tools

You should first take a look to ZOPE's standard tools (in bin directory of your zope installation) related to persistency. What you can get from which tool:

Note : accuracy has not been checked. Feel free to comment the entries on that subject.
  • analyze.py
    transaction overhead (Data.fs size minus record size) - record overhead (record size minus object size) - old object size impact (pack gain estimation) - list of used classes with size repartition.
  • fsdump.py (simply uses ZODB product tool with same name)
    list of transactions - list of object in each transaction, with class, oid and size.
  • fstail.py
    user, description, and size of a few last transactions
  • netspace.py
    display the size of objects including their subobjects
  • netspace.py
  • space.py
    number of instances of each used class with total size

You might also want to check treenalyzer.py we developped (based on netspace.py described above):

to display the size of objects including their subobjects - displays hexadecimal dump of persistent objects - displays hexadecimal dump of individual non-persistent properties with statistics similar to space.py described above.

z3c.zodbbrowser is a Zope 3 product which is actually a stand-alone GTK ZODB browser which will show you more objects that ZMI, but with few low-level details (like no file offset or sizes). Definitely worth trying for a quick object lookup, but maybe not enough.

Related Articles