Re: Replace cluster wide gluster locks with volume wide locks

Avra Sengupta <asengupt@xxxxxxxxxx> · Fri, 13 Sep 2013 12:14:22 +0530

Hi,

Please see comments inline >>>

On 09/13/2013 10:58 AM, Vijay Bellur wrote:
On 09/13/2013 12:30 AM, Avra Sengupta wrote:
Hi,

After having further discussions, we revisited the requirements and it
looks possible to further improve them, as well
as the design.

1. We classify all gluster operations in three different classes :
Create volume, Delete volume, and volume specific
    operations.
2. At any given point of time, we should allow two simultaneous
operations (create, delete or volume specific), as long
    as each both the operations are not happening on the same volume.
3. If two simultaneous operations are performed on the same volume, the
operation which manages to acquire the volume
    lock will succeed, while the other will fail.

In order to achieve this, we propose a locking engine, which will
receive lock requests from these three types of
operations.

How is the locking engine proposed to be implemented? Is it part of 
glusterd or a separate process?
>>>The locking engine will be part of glusterd. Today glusterd on every 
node holds a global lock(global to that node), for which
every gluster command running on that node contests. We propose to use 
the same infra that is in place today(add a new
rpc to accomodate the volume name in the new lock, instead of using the 
old rpc), and instead of a single global lock, maintain
multiple volume locks(volume name and node-uuid), for which the 
respective volume operations will contest.

Each such request for a particular volume will contest for
the same volume lock (based on the volume name
and the node-uuid). For example, a delete volume command for volume1 and
a volume status command for volume 1 will
contest for the same lock (comprising of the volume name, and the uuid
of the node winning the lock), in which case,
one of these commands will succeed and the other one will not, failing
to acquire the lock.

Will volume status need to hold a lock?
>>>Commands like volume status which don't need to hold a lock, will be 
lock less.

Whereas, if two operations are simultaneously performed on a different
volumes they should happen smoothly, as both
these operations would request the locking engine for two different
locks, and will succeed in locking them in parallel.

How do you propose to manage the op state machine? Right now it is 
global in scope - how does that fit into this model?
>>>Although the op state machine is different from the syncop 
framework, that runs on the originator glusterd, it still goes through
the same states, and also uses the same locking infra today. We propose 
to use the new locking engine for both the state machine
and the syncop framework. Hetrogeneous clusters, running older versions 
we will use op-versioning to ensure that they use the
cluster wide lock.

Regards,
Avra

-Vijay