-Atin
Sent from one plus one
On 02-Mar-2016 3:41 pm, "Avra Sengupta" <asengupt@xxxxxxxxxx>
wrote:
>
> On 03/02/2016 02:55 PM, Venky Shankar wrote:
>>
>> On Wed, Mar 02, 2016 at 02:29:26PM +0530, Avra Sengupta
wrote:
>>>
>>> On 03/02/2016 02:02 PM, Venky Shankar wrote:
>>>>
>>>> On Wed, Mar 02, 2016 at 01:40:08PM +0530, Avra
Sengupta wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> All fops in NSR, follow a specific workflow
as described in this UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing).
>>>>> However all locking fops will follow a
slightly different workflow as
>>>>> described below. This is a first proposed
draft for handling locks, and we
>>>>> would like to hear your concerns and
queries regarding the same.
>>>>>
>>>>> 1. On receiving the lock, the leader will
Journal the lock himself, and then
>>>>> try to actually acquire the lock. At this
point in time, if it fails to
>>>>> acquire the lock, then it will invalidate
the journal entry, and return a
>>>>> -ve ack back to the client. However, if it
is successful in acquiring the
>>>>> lock, it will mark the journal entry as
complete, and forward the fop to the
>>>>> followers.
>>>>
>>>> So, does a contending non-blocking lock
operation check only on the leader
>>>> since the followers might have not yet ack'd an
earlier lock operation?
>>>
>>> A non-blocking lock follows the same work flow, and
thereby checks on the
>>> leader first. In this case, it would be blocked on
the leader, till the
>>> leader releases the lock. Then it will follow the
same workflow.
>>
>> A non-blocking lock should ideally return EAGAIN if the
region is already locked.
>> Checking just on the leader (posix/locks on the leader
server stack) and returning
>> EAGAIN is kind of incomplete as the earlier lock
request might not have been granted
>> (due to failure on followers).
>>
>> or does it even matter if we return EAGAIN during the
transient state?
>>
>> We could block the lock on the leader until an earlier
lock request is satisfied
>> (in which case return EAGAIN) or in case of failure try
to satisfy the lock request.
>
> That is what I said, it will be blocked on the leader till
the leader releases the already held lock.
>
>>
>>>>> 2. The followers on receiving the fop, will
journal it, and then try to
>>>>> actually acquire the lock. If it fails to
acquire the lock, then it will
>>>>> invalidate the journal entry, and return a
-ve ack back to the leader. If it
>>>>> is successful in acquiring the lock, it
will mark the journal entry as
>>>>> complete,and send a +ve ack to the leader.
>>>>>
>>>>> 3. The leader on receiving all acks, will
perform a quorum check. If quorum
>>>>> meets, it will send a +ve ack to the
client. If the quorum fails, it will
>>>>> send a rollback to the followers.
>>>>>
>>>>> 4. The followers on receiving the rollback,
will journal it first, and then
>>>>> release the acquired lock. It will update
the rollback entry in the journal
>>>>> as complete and send an ack to the leader.
>>>>
>>>> What happens if the rollback fails for whatever
reason?
>>>
>>> The leader receives a -ve rollback ack, but there's
little it can do about
>>> it. Depending on the failure, it will be resolved
during reconciliation
>>>>>
>>>>> 5. The leader on receiving the rollback
acks, will journal it's own
>>>>> rollback, and then release the acquired
lock. It will update the rollback
>>>>> entry in the journal, and send a -ve ack to
the client.
>>>>>
>>>>> Few things to be noted in the above
workflow are:
>>>>> 1. It will be a synchronous operation,
across the replica volume.
>
> Atin, I am not sure how AFR handles it.
If AFR/EC handle them asynchronously do you see any performance
bottleneck with NSR for this case?
Well it's not synchronous to the point that the follwers would
perform it one after the other. AFR/EC clients would also have to
wait for acks from a quorum of servers till they can ack the client.
The same is true with the NSR leader, who will have to wait till it
gets ack from a quorum of followers.