On Tue, 2017-01-24 at 15:48 -0500, Olga Kornievskaia wrote: > On Tue, Jan 24, 2017 at 3:35 PM, Trond Myklebust > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > On Jan 24, 2017, at 14:50, Olga Kornievskaia <aglo@xxxxxxxxx> > > > wrote: > > > > > > On Tue, Jan 24, 2017 at 2:44 PM, Trond Myklebust > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > On Jan 24, 2017, at 14:40, Olga Kornievskaia <aglo@xxxxxxxxx> > > > > > wrote: > > > > > > > > > > On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > On Jan 24, 2017, at 14:06, Olga Kornievskaia <aglo@umich. > > > > > > > edu> wrote: > > > > > > > > > > > > > > Hi Trond, > > > > > > > > > > > > > > Is there a reason that nfs4_proc_reclaim_complete() > > > > > > > isn't wrapped > > > > > > > with a do while() to handle errors? > > > > > > > > > > > > What do we not already handle correctly in > > > > > > nfs4_reclaim_complete_done()? > > > > > > > > > > Could this be because when an error occurs rpc_done isn't > > > > > called > > > > > (rpc_release is called)? What I see is that if > > > > > RECLAIM_COMPLETE gets > > > > > an error (BAD_SESSION) the client just ignores it. > > > > > > > > > > > > > That’s correct. Why do we need to handle BAD_SESSION there? > > > > We’re done with state recovery, so if the server rebooted, we > > > > can catch that later. > > > > > > (1) don't we want to handle session errors as soon as possible? > > > (2) I ran into a problem (not sure yet if reproducible) where I > > > had a > > > client stuck in an infinite loop of RECLAIM_COMPLETE being sent > > > with > > > reply of BAD_SESSION. > > > > > > yes I don't know why the client is looping but it made me look > > > into > > > the fact that we are not handling session errors on reclaim > > > complete > > > which I simulated by having the server return BAD_SESSION to > > > RECLAIM_COMPLETE and I see that client simply ignores it. > > > > It doesn’t ignore it: > > > > static int nfs41_reclaim_complete_handle_errors(struct rpc_task > > *task, struct nfs_client *clp) > > { > > switch(task->tk_status) { > > case 0: > > case -NFS4ERR_COMPLETE_ALREADY: > > case -NFS4ERR_WRONG_CRED: /* What to do here? */ > > break; > > case -NFS4ERR_DELAY: > > rpc_delay(task, NFS4_POLL_RETRY_MAX); > > /* fall through */ > > case -NFS4ERR_RETRY_UNCACHED_REP: > > return -EAGAIN; > > default: > > nfs4_schedule_lease_recovery(clp); > > } > > return 0; > > } > > > > IOW: what the code does is schedule another round of lease > > recovery. > > We already agreed that this doesn't get called because rpc_call_done > isn't called on the error. What am I missing? Why wouldn't it get called? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥