Re: clients fail to reclaim locks after server reboot or manual sm-notify

Pavel A <free.lan.c2.718r@xxxxxxxxx> · Wed, 16 Nov 2011 23:56:05 +0200

2011/11/16 Bryan Schumaker <bjschuma@xxxxxxxxxx>:
> On 11/16/2011 03:08 PM, J. Bruce Fields wrote:
>> On Wed, Nov 16, 2011 at 09:09:07PM +0200, Pavel A wrote:
>>> I've read about this issue here:
>>> http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
>>>
>>> /*-----
>>> In the event of server failure (e.g. server reboot or lock daemon
>>> restart), all client locks are lost. However, the clients are not
>>> informed of this, and because the other operations (read, write, and
>>> so on) are not visibly interrupted, they have no reliable way to
>>> prevent other clients from obtaining a lock on a file they think they
>>> have locked.
>>> -----*/
>>
>> That's incorrect.  Perhaps the article is out of date, I don't know.
>
> Looks like it was written about 11 years ago, so I'll believe that it's out of date.

Yes, should have watched out for that.

>
> - Bryan
>
>>
>>> Can't get this. If there is a grace period after reboot and clients
>>> can successfully reclaim locks, then how other clients can obtain
>>> locks?
>>
>> That's right, in the absence of bugs, if a client succesfully reclaims a
>> lock, then it knows that no other client can have acquired that lock in
>> the interim: since the reclaim succeeded, that means the server is still
>> in the grace period, which means the only other locks that it has
>> allowed are also reclaims.  If some reclaim conflicts with this lock,
>> then the other client must have reclaimed a lock that it didn't actually
>> hold before (hence must be buggy).
>>
>>>> You need to restart nfsd on the node that is taking over.  That means
>>>> that clients usings both filesystems (A and B) will have to do lock
>>>> recovery, when in theory only those using volume B should have to, and
>>>> that is suboptimal.  But it is also correct.
>>>>
>>>
>>> Seems to work. As of a more optimal solution: what do you think of the
>>> contents of /proc/locks? May it be possible to use this info to then
>>> perform locking locally on the other node (after failover)?
>>
>> No, I don't think so.  And I'd be careful about using /proc/locks for
>> anything but debugging.
>>
>> --b.
>
>
Well, looks like this is it.
Thank you very much, Bruce, Bryan - you real helped me to keep this going :)
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html