Re: [PATCH v3 5/7] nfsdcltrack: update schema to v2

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Fri, 12 Sep 2014 12:29:01 -0400

On Fri, Sep 12, 2014 at 12:07 PM, Jeff Layton
<jeff.layton@xxxxxxxxxxxxxxx> wrote:
> On Fri, 12 Sep 2014 11:54:17 -0400
> Trond Myklebust <trondmy@xxxxxxxxx> wrote:
>
>> On Fri, Sep 12, 2014 at 11:21 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>> > On Fri, Sep 12, 2014 at 10:36:21AM -0400, J. Bruce Fields wrote:
>> >> On Fri, Sep 12, 2014 at 10:21:53AM -0400, Jeff Layton wrote:
>> >> > Grace period
>> >> > eventually ends, and its record is purged from the DB.
>> >> >
>> >> > Now we have a client that has reclaimed some files but that has no
>> >> > record on stable storage.
>> >> >
>> >> > One possibility is to prematurely expire v4.1+ clients that have not
>> >> > sent a RECLAIM_COMPLETE when the grace period ends.
>> >> >
>> >> > That seems problematic though -- what about clients that just happen to
>> >> > do an EXCHANGE_ID just before the grace period is going to end, and
>> >> > that get expired before they can issue their RECLAIM_COMPLETE. Will
>> >> > that be a problem for them?
>> >>
>> >> In that case a client will send a reclaim, get back a NO_GRACE error,
>> >> mark the rest of its state as unrecoverable, send the RECLAIM_COMPLETE,
>> >> and continue normally.  (To the extent it can--signalling affected
>> >> processes or EIOing further attempts to use the unreclaimed state, or
>> >> whatever.)
>> >
>> > The one thing the server *could* do in this sort of case is extend the
>> > grace period by a little--I seem to recall the spec giving some leeway
>> > for this kind of thing.
>>
>>
>> Section 8.4.2.1.
>>
>> > So for example the server could have a heuristics like: extend the grace
>> > period by another second each time we notice there's been an EXCHANGE_ID
>> > or reclaim in the previous second, up to some maximum.  And I suppose it
>> > could also delay the grace period until someone actually attempts a
>> > non-reclaim open.
>> >
>> > In isolation a single client slipping in the end like that sounds like a
>> > freak event, but if there's a ton of state to reclaim perhaps it could
>> > become more likely.
>> >
>> > I don't think that's a priority, we might just want to make sure we know
>> > how to do that in the future.
>> >
>> > But now that I think about it I don't see the existing or proposed
>> > nfsdcltrack stuff tying our hands in any way here.  It just gives the
>> > kernel some extra information, and the kernel still has discretion about
>> > when exactly it wants to end the grace period.
>> >
>>
>> It is even allowed to grant reclaim lock attempts after the grace
>> period has ended _if_ and only if it can guarantee that no conflicting
>> locks were issued.
>>
>> However note that the NFSv4.1 client is not actually allowed to issue
>> non-reclaim lock requests before it has issued a RECLAIM_COMPLETE. I
>> dunno how religiously we stick to that in Linux (I think we do), but
>> the point is that the server can and should rely on the client
>> _always_ sending a RECLAIM_COMPLETE if it is going to establish new
>> locks.
>
> Yeah, I'm pretty sure that bit is enforced. The problem situation that
> I think Bruce was referring to is this:
>
> Server reboots. Client1 reclaims some of its locks (but not all) and
> never sends a RECLAIM_COMPLETE. Grace period ends and then server
> hands out a lock to client2 that was previously held by client1 but
> that didn't get reclaimed.
>
> Server reboots again, prior to the client1 expiring (so its record is
> still in the DB). Now client1 comes back and starts reclaiming again.
> This time it reclaims all of its locks and we have a conflict between
> it and client2.
>
> It's a solvable problem, but I'll need to work through how best to do
> so.
>
> --

That's the first edge condition described in section 8.4.3.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html