On Wed, Aug 26, 2015 at 09:54:22PM +0200, Ulrich Gemkow wrote: > Hello Bruce, > > On Tuesday 25 August 2015 23:54:56 J. Bruce Fields wrote: > > The SERVERFAULT is on SETCLIENTID_CONFIRM. > > > > In nfsd4_setclientid_confirm(): > > > > conf = find_confirmed_client(clid, false, nn); > > unconf = find_unconfirmed_client(clid, false, nn); > > /* > > * We try hard to give out unique clientid's, so if we get an > > * attempt to confirm the same clientid with a different cred, > > * there's a bug somewhere. Let's charitably assume it's our > > * bug. > > */ > > status = nfserr_serverfault; > > if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) > > goto out; > > if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) > > goto out; > > > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical > > auth_unix creds. > > > > The clientid that were looking up there was returned from the previous > > SETCLIENTID, generated by this logic: > > > > if (conf && same_verf(&conf->cl_verifier, &clverifier)) > > /* case 1: probable callback update */ > > copy_clid(new, conf); > > else /* case 4 (new client) or cases 2, 3 (client reboot): */ > > gen_clid(new, nn); > > > > So it should be a brand new clientid, unless the client was reusing the old > > verifier. > > > > So perhaps the client is sending the SETCLIENTID with a verifier set to what it > > used on the previous boot? That sounds like a client bug. The linux > > client uses a timestamp for the verifier, looks like the Solaris client > > might too. Is there some reason the clock on this client isn't > > advancing on reboot? > > Thank you for the analysis. But the clock of the client advances > regularely and as one would expect. OK, thanks for checking that. > The client is SPARC Solaris 10 with the latest patches > applied - I cannot believe that this client has such a > basic NFS bug. To confirm or deny my hypothesis, I think what we want is a longer capture that gets the failing SETCLIENTID_CONFIRM (as seen in the previous capture) but also shows what clientid the client was using before the reboot. So ideal might be something like: - start the capture - mount - create a file (I just want to make sure the client does at least one open) - reboot the client - mount again, see the failure - stop the capture > Can you think of any kind of server configuration bug > (as said, Vanilla Linux 4.1.6) that causes this error? > The NFS server startup system is "self-made"... I can't think of any off hand, if there's a server-side problem here I'd suspect the code before the configuration. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html