Re: Improving real world performance by moving files closer to their target workloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Gordan Bobic wrote:
Derek Price wrote:

If all nodes do attempt to stay up to date with this information, then if the node accepting a write goes incommunicado, the quorum can simply and effectively roll back the transaction by revoking the node's lock and rolling back its idea of the current version number of the affective file or directory.

And with that we're back to the journalling idea. If we have a per file journal (write-ahead log, if you will), then if the writing node fails during the write, when it's lock is expired, the other nodes can just roll back the transaction.

Just to further define the ideas of "timeout" and "expire", I think it would be ideal if timeouts are only handled lazily. i.e. no roll back or automatic lock release happens unless a new node requests a lock and the node that holds the current lock is determined to be incommunicado (a state determined, i think, based on an expire time for the lock the node holds, reset each time new data is received from the node).

I'm not even sure how complex the "journal" needs to be here. Except in the case of O_APPEND, a "rollback" could just mean granting a new node a higher transaction number than the timed-out node held. When the downed node comes back up, then the file on the node with the highest transaction number wins.

O_APPEND isn't much different. Until a file gets mirrored beyond the minimum threshold, the "latest good" transaction number (the newest file version that exists on the minimum # of mirrors) will need to be remembered. If an O_APPEND lock is needed and a write lock expires, then the "previous good" transaction number is used to find a version of the file to roll back to.

So, I think this means our "journal" is simply the "latest good" transaction number and a sort of atomic property, where nodes will not dispose of the "latest good" content of a given file beyond the minimum threshold until a newer version has been confirmed to be mirrored across the minimum number of nodes and becomes the new "latest good".

Derek
--
Derek R. Price
Solutions Architect
Ximbiot, LLC <http://ximbiot.com>
Get CVS and Subversion Support from Ximbiot!

v: +1 248.835.1260
f: +1 248.246.1176




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux