On Fri, 2012-03-16 at 15:46 +0000, Sachin Prabhu wrote: > We have a user report that they see the following messages > in /var/log/messages and the NFS share hangs when a user's kerberos > credentials expire. > > kernel: Error: state manager encountered RPCSEC_GSS session expired > against NFSv4 server vm140-31. > > The reproducer is as follows > > 1. Configure NFS4 + Kerberos, mount nfs4 share on the client side using > sec=krb5. > > 2. Create 2 nfsusers, login as user1, obtain a kerberos ticket with a > short duration and open a file on the nfs share. Leave this file open > # su - user1 > $ kinit -l 5m > $ cd /home/user1 > $ touch file1.txt > $ sleep 100000 < file1.txt & > > 3. After 300 seconds, on a different terminal, login as user2, obtain a > kerberos ticket and attempt to open a file. > # su - user2 > $ kinit > $ cd /home/user2 > $ touch myfile1.txt > . > . > At this point, the process hangs and /var/log/messages are filled up > with the following messages. > kernel: Error: state manager encountered RPCSEC_GSS session expired > against NFSv4 server $(hostname) > > On further debugging, we found the cause to be the that the state > manager uses the credentials of the first stateowner with open files it > finds. These are returned by nfs4_get_renew_cred_locked() -> > nfs4_get_renew_cred_server_locked() to call the RENEW. > > 1) The server before it opens a file needs to set a client id. It does > this by calling the SET_CLIENTID call. The server in response returns a > client id. > Since kernel 2.6.29(commit a7b721037f898b29a8083da59b1dccd3da385b07) the > SET_CLIENTID call is made using the machine credentials. > > 2) However all subsequent RENEW calls for that clientid, the server uses > the first credential it finds which is used by an open file on that > machine. In our test case, it is the user with the expired ticket. > When the ticket expires, the call to refresh the credentials, made at > call_refresh -> rpcauth_refreshcred -> gss_refresh() > returns EKEYEXPIRED. > This means that the RENEW call fails before it could be sent over the > wire. > The clientid on the server eventually expires. > > 3) When the user with the valid ticket then attempts to open a file, the > server returns a NFS4ERR_EXPIRED which indicates that clientid at the > server is no longer valid. A warning message is printed out at this > time. To fix this, the client attempts to RENEW. This hits the problem > in step 2. > > Step 2 and 3 now run continously and no RENEW calls are sent over the > wire. > > The SET_CLIENTID calls are made using the machine creds. Why don't we > simply use the machine creds to renew the clientid? The problem is that if the client doesn't have a machine cred, then you end up taking a random user credential that may not currently be holding any OPEN files. In that case too the RENEW will fail. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥