Distributed storage

From cosmopool meta
Jump to navigation Jump to search

invent and explain distributed storage This is the main article of Category:distributed storage.

Goals

  • Storage of the data must be decentralized, i.e. distributed among many different storage nodes located far apart form each other and not subject to one central power.
  • Data must be stored redundantly (synchronous replication) in some (5..10, say) different places on this planet, ideally far apart from each other, so that neither a government nor an earthquake can harm the availability of the data.

Some definitions

NotionShortdef
NodeA machine running a database accessible via a storage server through the server-storage interface
Storagethe store for objects and relations (classes and instances), finally distributed among a network
Table join problemPerformance lacks when joining tables on different nodes

Theory

Distribution of data among nodes

We want to distribute data among several nodes. This should be taken into account:

  • in view of limitations of speed and capacity
  • make use of the fundamental k-coordinates
  • try to have the computing limitations taken into account already when creating a class

We use one database per node, i.e. network nodes are at the same time database nodes.

node list

Each node must be able to locate each other node in the network. This requires an up-to-date list of all nodes kept on each node. - Since an immediate synchronization of all these lists across the network cannot be achieved when a node joins or leaves the network, we need a robust update mechanism.

On which nodes is a k-object stored?

This is tough. Especially since tables might become too big to bestored on a single node.

Networking

See distributed transactions for some general theory.

Possibilities for communication between nodes:

Routing

Protocols

Synchronous replication

Links