On 24/09/2010 09:42, Gordan Bobic wrote:
This sounds remarkably similar to how DLM in GFS works. It caches file
locks, so performance is reasonable where a set of files is only
accessed from one of the nodes. Might it be easier to interface with
DLM for locking control instead of implementing such a thing from
scratch?
The bulk of the performance hit comes from ping times, rather than
bandwidth issues / writeback caching. Latencies on a LAN are typically
100us on Gb ethernet, vs. a typical RAM latency of 50ns, so call it a
2000x difference. If this overhead on file lock acquisition can be
avoided, it'll make a lot more difference than data caching.
You have hit the nub of the issue.
Actually there are two issues here:
1) it would be excellent to see some kind of optimistic locking (which I
think is a good generic description of this kind of optimisation)
implemented in gluster
2) actually there are some subtleties in implementing this algorithm if
you relax the restriction that all nodes are reliable, eg what happens
if the node holding the locking information dies? What happens if the
server holding the lock dies (and a request turns up at the secondary
server)? How long do you wait before you decide to do something to
break the deadlock and continue?
Paxos/Fatlease are clever implementations of a locking strategy in the
face of non robust "bricks". That said, google used a similar strategy
up until recently and mogile and most other cluster systems still do use
a non redundant locking strategy, so gluster going down the simple route
would still largely be "state of the art" - however, a robust, shared
nothing, redundant lock strategy is just icing on the cake?
Anyone interested in getting some sponsorship to implement something?
Thanks
Ed W