On Tue, Jan 24, 2017 at 3:35 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > >> On Jan 24, 2017, at 14:50, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >> >> On Tue, Jan 24, 2017 at 2:44 PM, Trond Myklebust >> <trondmy@xxxxxxxxxxxxxxx> wrote: >>> >>>> On Jan 24, 2017, at 14:40, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>> >>>> On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust >>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>>> On Jan 24, 2017, at 14:06, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>> >>>>>> Hi Trond, >>>>>> >>>>>> Is there a reason that nfs4_proc_reclaim_complete() isn't wrapped >>>>>> with a do while() to handle errors? >>>>> >>>>> What do we not already handle correctly in nfs4_reclaim_complete_done()? >>>> >>>> Could this be because when an error occurs rpc_done isn't called >>>> (rpc_release is called)? What I see is that if RECLAIM_COMPLETE gets >>>> an error (BAD_SESSION) the client just ignores it. >>>> >>> >>> That’s correct. Why do we need to handle BAD_SESSION there? We’re done with state recovery, so if the server rebooted, we can catch that later. >> >> (1) don't we want to handle session errors as soon as possible? >> (2) I ran into a problem (not sure yet if reproducible) where I had a >> client stuck in an infinite loop of RECLAIM_COMPLETE being sent with >> reply of BAD_SESSION. >> >> yes I don't know why the client is looping but it made me look into >> the fact that we are not handling session errors on reclaim complete >> which I simulated by having the server return BAD_SESSION to >> RECLAIM_COMPLETE and I see that client simply ignores it. > > It doesn’t ignore it: > > static int nfs41_reclaim_complete_handle_errors(struct rpc_task *task, struct nfs_client *clp) > { > switch(task->tk_status) { > case 0: > case -NFS4ERR_COMPLETE_ALREADY: > case -NFS4ERR_WRONG_CRED: /* What to do here? */ > break; > case -NFS4ERR_DELAY: > rpc_delay(task, NFS4_POLL_RETRY_MAX); > /* fall through */ > case -NFS4ERR_RETRY_UNCACHED_REP: > return -EAGAIN; > default: > nfs4_schedule_lease_recovery(clp); > } > return 0; > } > > IOW: what the code does is schedule another round of lease recovery. We already agreed that this doesn't get called because rpc_call_done isn't called on the error. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html