Distributed storage
invent and explain distributed storage This is the main article of Category:distributed storage.
Contents
Goals
- Storage of the data must be decentralized, i.e. distributed among many different storage nodes located far apart form each other and not subject to one central power.
- Data must be stored redundantly (synchronous replication) in some (5..10, say) different places on this planet, ideally far apart from each other, so that neither a government nor an earthquake can harm the availability of the data.
Some definitions
Notion | Shortdef |
---|---|
Node | A machine running a database accessible via a storage server through the server-storage interface |
Storage | the store for objects and relations (classes and instances), finally distributed among a network |
Table join problem | Performance lacks when joining tables on different nodes |
Theory
Distribution of data among nodes
We want to distribute data among several nodes. This should be taken into account:
- in view of limitations of speed and capacity
- make use of the fundamental k-coordinates
- try to have the computing limitations taken into account already when creating a class
We use one database per node, i.e. network nodes are at the same time database nodes.
node list
Each node must be able to locate each other node in the network. This requires an up-to-date list of all nodes kept on each node. - Since an immediate synchronization of all these lists across the network cannot be achieved when a node joins or leaves the network, we need a robust update mechanism.
On which nodes is a k-object stored?
This is tough. Especially since tables might become too big to bestored on a single node.
Networking
See distributed transactions for some general theory.
Possibilities for communication between nodes:
- The spread toolkit
- Twisted spread facilitates communication between objects in distinct locations.
- Twisted Web2
- python libs for distributed programming