On Mon, Aug 31, 2015 at 02:08:08PM +0200, Ulrich Gemkow wrote: > Hallo Bruce, > > On Wednesday 26 August 2015 22:09:40 you wrote: > > On Wed, Aug 26, 2015 at 09:54:22PM +0200, Ulrich Gemkow wrote: > > > Hello Bruce, > > > > > > On Tuesday 25 August 2015 23:54:56 J. Bruce Fields wrote: > > > > The SERVERFAULT is on SETCLIENTID_CONFIRM. > > > > > > > > In nfsd4_setclientid_confirm(): > > > > > > > > conf = find_confirmed_client(clid, false, nn); > > > > unconf = find_unconfirmed_client(clid, false, nn); > > > > /* > > > > * We try hard to give out unique clientid's, so if we get an > > > > * attempt to confirm the same clientid with a different cred, > > > > * there's a bug somewhere. Let's charitably assume it's our > > > > * bug. > > > > */ > > > > status = nfserr_serverfault; > > > > if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) > > > > goto out; > > > > if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) > > > > goto out; > > > > > > > > The SETCLIENTID and SETCLIENTID_CONFIRM are done with identical > > > > auth_unix creds. > > > > > > > > The clientid that were looking up there was returned from the previous > > > > SETCLIENTID, generated by this logic: > > > > > > > > if (conf && same_verf(&conf->cl_verifier, &clverifier)) > > > > /* case 1: probable callback update */ > > > > copy_clid(new, conf); > > > > else /* case 4 (new client) or cases 2, 3 (client reboot): */ > > > > gen_clid(new, nn); > > > > > > > > So it should be a brand new clientid, unless the client was reusing the old > > > > verifier. > > > > > > > > So perhaps the client is sending the SETCLIENTID with a verifier set to what it > > > > used on the previous boot? That sounds like a client bug. The linux > > > > client uses a timestamp for the verifier, looks like the Solaris client > > > > might too. Is there some reason the clock on this client isn't > > > > advancing on reboot? > > > > > > Thank you for the analysis. But the clock of the client advances > > > regularely and as one would expect. > > > > OK, thanks for checking that. > > > > > The client is SPARC Solaris 10 with the latest patches > > > applied - I cannot believe that this client has such a > > > basic NFS bug. > > > > To confirm or deny my hypothesis, I think what we want is a longer > > capture that gets the failing SETCLIENTID_CONFIRM (as seen in the > > previous capture) but also shows what clientid the client was using > > before the reboot. So ideal might be something like: > > > > - start the capture > > - mount > > - create a file (I just want to make sure the client does at > > least one open) > > - reboot the client > > - mount again, see the failure > > - stop the capture > > I tried but probably made a mistake: To be sure to have a > defined state for the test I rebooted the server while clearing > all its NFS state and I reinstalled the client - both with the > exact same configuration as before. > > And now the bug unfortunately does not happen again, the mount > always succeeds. I did the reinstall of the client also before > my first mail to be sure so it seems that the server may have > reached an invalid state before - whatever this may has caused. That's interesting! > I can only wait until the bug happens again (hoping not :-). > > Maybe you are able to find a reason from the information > given before. I regret to be of no more help. If I can do > something please tell me. I'm not coming up with any ideas right now. Do let us know if you get into that state again. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html