Re: [PATCH v2 0/5] nfsd: support for lifting grace period early

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Mon, 29 Sep 2014 12:44:26 -0400

On Sat, Sep 27, 2014 at 09:04:58AM -0400, Jeff Layton wrote:
> On Fri, 26 Sep 2014 15:46:17 -0400
> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> 
> > On Fri, Sep 26, 2014 at 02:54:46PM -0400, Jeff Layton wrote:
> > > On Fri, 26 Sep 2014 14:39:49 -0400
> > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> > > 
> > > > By the way, I've seen the following *before* your patches, but in case
> > > > you're still looking at reboot recovery problems:
> > > > 
> > > > I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
> > > > succeeds after a previous boot (with full grace period) during which the
> > > > client had failed to reclaim.
> > > > 
> > > > I managed to catch one trace, the relevant parts looked like:
> > > > 
> > > > 	SETCLIENTID client1
> > > > 	OPEN
> > > > 	LOCK
> > > > 
> > > > 	(server restart here)
> > > > 
> > > > 	SETCLIENTID client2
> > > > 	OPEN
> > > > 	LOCK (lock that conflicts with client1's)
> > > > 
> > > > 	(server restart here)
> > > > 
> > > > 	SETCLIENTID client1
> > > > 	OPEN CLAIM_PREVIOUS
> > > > 
> > > > And all those ops (including the last reclaim open) succeeded.
> > > > 
> > > > So I didn't have a chance to review it more carefully, but it certainly
> > > > looks like a server bug, not a test bug.  (Well, technically the server
> > > > behavior above is correct since it's not required to refuse anything
> > > > till we actually attempt to reclaim the original lock, but we know our
> > > > server's not that smart.)
> > > > 
> > > > But I haven't gotten any further than that....
> > > > 
> > > > --b.
> > > > 
> > > 
> > > Ewww...v4.0... ;)
> > > 
> > > Well, I guess that could happen if, after the first reboot, client1 also
> > > did a SETCLIENTID *and* reclaimed something that didn't conflict with
> > > the lock that client2 grabs...or, did an OPEN/OPEN_CONFIRM after the
> > > grace period without reclaiming its lock previously.
> > > 
> > > If it didn't do one or the other, then its record should have been
> > > cleaned out of the DB after the grace period ended between the reboots
> > > and it wouldn't have been able to reclaim after the second reboot.
> > 
> > Yeah.  Is there an easy way to tell nfsdcltrack to log everything?  I'm
> > only seeing this occasionally.
> > 
> 
> If you can make sure that it's run with the '-d' flag then that'll make
> it do debug level logging. Unfortunately, the kernel doesn't have a
> handy switch to enable that so you'll need to patch the kernel (or
> maybe wrap nfsdcltrack) to make that happen.
> 
> Maybe we should add a module parm to nfsd.ko that makes it run
> nfsdcltrack with -d?

Would userland configuration (e.g. and optional /etc/ file) be more
flexible?

> 
> > > It's a bit of a pathological case, and I don't see a way to fix that in
> > > the context of v4.0. The fact that there's no RECLAIM_COMPLETE is a
> > > pretty nasty protocol bug, IMO. Yet another reason to start really
> > > moving people toward v4.1+...
> > 
> > I don't belive there's a protocol bug here.
> > 
> > A correct NFSv4.0 client wouldn't send the open reclaim in the case you
> > describe above.
> > 
> 
> It would in the partial reclaim case.

My (and I think Trond's) understanding is that client1 would not send
the reclaim in that case.  In the case of a partial reclaim, a correct
4.0 client will stop trying to reclaim the state that it failed to
reclaim previously.  My understanding is that is how the linux client in
fact behaves.

So the bug would be in the client, not the protocol.  That said,
admittedly:

	- the "protocol" here seems to be largely in our heads; the spec
	  does a poor job of explaining all this.
	- we (not suprisingly) disagree on some of the details.

Nevertheless, I believe that despite our disagreement there's no actual
bug at least between the existing Linux client and server.  And that
3530bis and 5661 were at least *supposed* to deal with this case....

Oh, look, Trond and you have written a pile more of text on this on the
ietf list.  Should I even try to catch up, or would be better off just
spending the rest of my day back in bed?

--b.

> 
> Suppose we start reclaiming things but don't get everything before
> there's a network partition. The server ends the grace period and then
> hands out the conflicting lock to client2. It then reboots again and the
> network partition heals. At that point, client1 could try to reclaim
> everything it had before (including the lock that conflicts with the
> one that client2 had).
> 
> I'd still argue that this is a protocol bug. Without RECLAIM_COMPLETE,
> the server simply has no way to know whether the reclaims done by
> client1 are complete or not.
> 
> > As I understand it, the rule for the client is: you're allowed to
> > reclaim only the set locks that you held previously, where "the set of
> > locks you held previously" is "the set of locks held by the clientid
> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
> > example once client1 sends that unrelated OPEN reclaim it's giving up
> > on anything else it doesn't manage to reclaim this time around.
> > 
> > Any server that loses client state on reboot has no choice but to
> > trust clients to get this sort of thing right.
> > 
> 
> 
> -- 
> Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html