On Mon, 2012-11-12 at 15:49 -0500, Andy Adamson wrote: > On Mon, Nov 12, 2012 at 3:29 PM, Myklebust, Trond > <Trond.Myklebust@xxxxxxxxxx> wrote: > > On Mon, 2012-11-12 at 15:22 -0500, Andy Adamson wrote: > >> On Mon, Nov 12, 2012 at 1:35 PM, <bjschuma@xxxxxxxxxx> wrote: > >> > From: Bryan Schumaker <bjschuma@xxxxxxxxxx> > >> > > >> > During recovery the NFS4_SESSION_DRAINING flag might be set on the > >> > client structure. This can cause lease renewal to abort when > > > > Not lease renewal. It is lease verification (i.e. checking that we have > > a valid lease and session) that will deadlock. > > > >> > nfs41_setup_sequence sees that we are doing recovery. As a result, the > >> > client never recovers and all activity with the NFS server halts. > >> > >> > >> When does this happen? Say the session is draining, and an RPC returns > >> from one of the compounds such as nfs_open, nfs_locku etc whose > >> rpc_call_done routine calls renew_lease after freeing it's slot. Like > >> all calls on a draining session, the renew_lease sleeps on the session > >> slot_tbl_waitq - is this what you mean by "causes lease renewal to > >> abort"? How does this cause the client to never recover? > >> > >> The only other call to renew_lease is from the state manager itself > >> which runs one function at a time, and will not call renew_lease until > >> the recovery is over.... > >> > >> What am I missing....? > > > > nfs4_check_lease() is run exclusively by the state manager thread in > > order to check that the clientid (and session id on NFSv4.1) are valid. > > It will deadlock the state manager thread if NFS4_SESSION_DRAINING is > > already set. > > OK. NFS4_SESSION_DRAINING is also set exclusively by the state manager > thread. Why is the state manager told to check the lease when it's > also draining a session? NFS4_SESSION_DRAINING does not just mean that the session is being drained; it remains set until open and lock state recovery is completed. IOW: if something happens during state recovery that requires us to check the lease (e.g. something returns NFS4ERR_EXPIRED) then the current code will deadlock. > No matter what the answer, please update the patch description to > better explain the problem being solved. ACK. I agree that the changelog entry can be improved. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥