> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: > > On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@xxxxxxxxxx> wrote: >> >> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote: >>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote: >>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust < >>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote: >>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia >>>>>> wrote: >>>>>>> From: Olga Kornievskaia <kolga@xxxxxxxxxx> >>>>>>> >>>>>>> When the server tries to do a callback and a client fails it >>>>>>> due to >>>>>>> authentication problems, we need the server to set callback >>>>>>> down >>>>>>> flag in RENEW so that client can recover. >>>>>> >>>>>> I was looking at this. It looks to me like this should really be >>>>>> just: >>>>>> >>>>>> case 1: >>>>>> if (task->tk_status) >>>>>> nfsd4_mark_cb_down(clp, task->tk_status); >>>>>> >>>>>> If tk_status showed an error, and the ->done method doesn't >>>>>> return 0 >>>>>> to >>>>>> tell us it something worth retrying, then the callback failed >>>>>> permanently, so we should mark the callback path down, regardless >>>>>> of >>>>>> the >>>>>> exact error. >>>>> >>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see >>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the >>>>> process of returning the delegation being recalled. Why should that >>>>> result in the callback channel being marked as down? >>>>> >>>> >>>> Are you talking about say the connection going down and server should >>>> just reconnect instead of recovering the callback channel. I assumed >>>> that connection break is something that's not recoverable by the >>>> callback but perhaps I'm wrong. >>> >>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1' >>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm >>> not seeing why either of those errors should be handled by marking the >>> callback channel as being down. >>> >>> Looking further, it seems that the same function will also return '1' >>> without checking the value of task->tk_status if the delegation has >>> been revoked or returned. So that would mean that even NFS4ERR_DELAY >>> could trigger the call to nfsd4_mark_cb_down() with the above change. >> >> Yeah, OK, that's wrong, apologies. >> >> I'm just a little worried about the attempt to enumerate transport level >> errors in nfsd4_cb_done(). Are we sure that EIO, ETIMEDOUT, EACCESS is >> the right list? > > Looking at call_transmit_status error handling, I don't think > connection errors are returned. Instead the code tries to fix the > connection by retrying unless the rpc_timeout is reached and then only > EIO,TIMEDOUT is returned. > > Can then my original patch be considered without resubmission? Bruce has authorized v1 of this patch, but that one has the uncorrected patch description. Post a v4? >> --b. >> >>> >>>> >>>>>> >>>>>> --b. >>>>>> >>>>>>> >>>>>>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> >>>>>>> --- >>>>>>> fs/nfsd/nfs4callback.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c >>>>>>> index 052be5bf9ef5..7325592b456e 100644 >>>>>>> --- a/fs/nfsd/nfs4callback.c >>>>>>> +++ b/fs/nfsd/nfs4callback.c >>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task >>>>>>> *task, void *calldata) >>>>>>> switch (task->tk_status) { >>>>>>> case -EIO: >>>>>>> case -ETIMEDOUT: >>>>>>> + case -EACCES: >>>>>>> nfsd4_mark_cb_down(clp, task- >>>>>>>> tk_status); >>>>>>> } >>>>>>> break; >>>>>>> -- >>>>>>> 2.27.0 >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Trond Myklebust >>>>> Linux NFS client maintainer, Hammerspace >>>>> trond.myklebust@xxxxxxxxxxxxxxx >>>>> >>>>> >>> >>> -- >>> Trond Myklebust >>> Linux NFS client maintainer, Hammerspace >>> trond.myklebust@xxxxxxxxxxxxxxx -- Chuck Lever