Re: [PATCH v2 0/5] nfsd: support for lifting grace period early

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 26, 2014 at 5:47 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> On Fri, Sep 26, 2014 at 04:58:47PM -0400, Trond Myklebust wrote:
>> On Fri, Sep 26, 2014 at 4:45 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>> > On Fri, Sep 26, 2014 at 04:37:23PM -0400, Trond Myklebust wrote:
>> >> On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>> >> >
>> >> > As I understand it, the rule for the client is: you're allowed to
>> >> > reclaim only the set locks that you held previously, where "the set of
>> >> > locks you held previously" is "the set of locks held by the clientid
>> >> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
>> >> > example once client1 sends that unrelated OPEN reclaim it's giving up on
>> >> > anything else it doesn't manage to reclaim this time around.
>> >>
>> >> The rule for the client is very simple: "You may attempt to reclaim
>> >> any locks that were held immediately prior to the reboot of the
>> >> server."
>> >> It doesn't matter how those locks were established (ordinary OPEN,
>> >> delegated open, reclaim open, LOCK, reclaim lock...).
>> >>
>> >> However if the server reboots and the client did not manage to
>> >> re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>> >> EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
>> >> server's responsibility to block that client from reclaiming any
>> >> locks, since the client has no way to know how many times the server
>> >> has rebooted.
>> >> Ditto, of course, if the client tries to reclaim any locks outside the
>> >> grace period and the server isn't tracking whether or not those locks
>> >> have been handed out to another client.
>> >
>> > Agreed with everything except:
>> >
>> >         (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>> >         EXCHANGE_ID+CREATE_SESSION)
>> >
>> > If I remember correctly: RFC 5661 says the point where this happens is
>> > actually RECLAIM_COMPLETE.  RFC 3530 was more vague but suggested first
>> > OPEN reclaim or OPEN_CONFIRM, and 3530bis makes that explicit.
>> >
>> > But the client can choose an earlier point without violating the
>> > protocol--it means it will decline reclaiming some things it could have,
>> > but that's safer than the reverse mistake.
>> >
>>
>> Where is this documented? I'm not seeing it.
>
> It's more vague than I remembered:
>
> http://tools.ietf.org/html/rfc5661#section-8.4.3
>
>         The server will set this for any client record in stable
>         storage where the client has not done a suitable
>         RECLAIM_COMPLETE (global or file system-specific depending on
>         the target of the lock request) before it grants any new (i.e.,
>         not reclaimed) lock to any client.

Yes, I read that. Then I read this:

   For the second edge condition, after the server restarts for a second
   time, the indication that the client had not completed its reclaims
   at the time at which the grace period ended means that the server
   must reject a reclaim from client A with the error NFS4ERR_NO_GRACE.

This text is just plain wrong, and we should fix it in an errata.
There is absolutely NO edge case condition for those locks that were
successfully reclaimed after the first reboot. There is absolutely no
reason why the client shouldn't be able to reclaim those locks after
reboot number 2.

All the absence of the RECLAIM_COMPLETE tells the server is that the
client MAY have held more locks; what is the server supposed to do
with that information? It's not the server that is responsible for
reclaiming locks.

> And the corresponding langue in 8.6.3 of rfc 3530 is:
>
>         a timestamp that is updated the first time after a server boot
>         or reboot the client acquires record locking, share reservation,
>         or delegation state on the server.  The timestamp need not be
>         updated on subsequent lock requests until the server reboots.
>
> I thought there was something referring specifically to OPEN reclaim or
> OPEN_CONFIRM as the point where "the client acquires record locking" but
> can't find it on a quick skim.
>
> I also say this is "vague" because, unfortunately, in both cases, this
> language is part of a description of an example server implementation,
> no actual protocol requirement is made explicit.
>
> Which is weird given that noticing the partial-reclaim case was actually
> Dave Noveck's original motivation for introducing RECLAIM_COMPLETE (then
> RECOVERY_COMPLETE), with the grace-period shortening an extra benefit:
>
>         http://osdir.com/ml/ietf.nfsv4/2006-01/msg00020.html
>
>         Adding the RECOVERY_COMPLETE op allows this situation to be
>         dealt with fairly simply. If a client has not recovered all of
>         its locks and we have the possiblity of having given out a lock
>         inconsistent with one of those (the normal realization of this
>         would be that once we declare grace over with some client's
>         reclaims not complete) we mark that client as essentially having
>         had a lock effectively revoked and thus it would not allowed to
>         reclaim locks after a subsequent reboot since it could no longer
>         vouch for all the locks it thinks it had.

Sigh. Another example of one of Dave's proposals going through without
adequate review because the reader died of exhaustion while wrestling
with the preamble text.
So, I agree that if the client were to try to reclaim a lock that it
didn't own (because it didn't manage to reclaim it) in the previous
boot instance would be a problem. However WHY would a sane client do
this?

> In the 3530 case we decided that the only safe point to choose was the
> one described in the sample server implementation, so 3530bis says:
>
>         A server may consider a client's lease "successfully
>         established" once it has received an open operation from that
>         client.
>
> (And "open operation" probably is still too vague.)
>
> Sorry for the length.
>
> Anyway, if the client's currently doing this at SETCLIENTID_CONFIRM and
> CREATE_SESSION then I think that's correct but more conservative than
> necessary.  Which may be a good idea given that I think the chance of a
> random server implementor making there way through all this is small.
>

The client is pruning all those locks that it did not manage to
reclaim before the grace period expired and/or the second reboot
occurred. That is the correct behaviour.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux