Re: [PATCH 8/8] sunrpc: Drop the connection when the server drops a request

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 2 Mar 2020 20:12:51 +0000

On Mon, 2020-03-02 at 11:27 -0500, David Wysochanski wrote:
> On Sun, Mar 1, 2020 at 6:25 PM Trond Myklebust <trondmy@xxxxxxxxx>
> wrote:
> > If a server wants to drop a request, then it should also drop the
> > connection, in order to let the client know.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > ---
> >  net/sunrpc/svc_xprt.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index de3c077733a7..83a527e56c87 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -873,6 +873,13 @@ int svc_recv(struct svc_rqst *rqstp, long
> > timeout)
> >  }
> >  EXPORT_SYMBOL_GPL(svc_recv);
> > 
> > +static void svc_drop_connection(struct svc_xprt *xprt)
> > +{
> > +       if (test_bit(XPT_TEMP, &xprt->xpt_flags) &&
> > +           !test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags))
> > +               svc_xprt_enqueue(xprt);
> > +}
> > +
> >  /*
> >   * Drop request
> >   */
> > @@ -880,6 +887,8 @@ void svc_drop(struct svc_rqst *rqstp)
> >  {
> >         trace_svc_drop(rqstp);
> >         dprintk("svc: xprt %p dropped request\n", rqstp->rq_xprt);
> > +       /* Close the connection when dropping a request */
> > +       svc_drop_connection(rqstp->rq_xprt);
> >         svc_xprt_release(rqstp);
> >  }
> >  EXPORT_SYMBOL_GPL(svc_drop);
> > @@ -1148,6 +1157,7 @@ static void svc_revisit(struct
> > cache_deferred_req *dreq, int too_many)
> >         if (too_many || test_bit(XPT_DEAD, &xprt->xpt_flags)) {
> >                 spin_unlock(&xprt->xpt_lock);
> >                 dprintk("revisit canceled\n");
> > +               svc_drop_connection(xprt);
> >                 svc_xprt_put(xprt);
> >                 trace_svc_drop_deferred(dr);
> >                 kfree(dr);
> > --
> > 2.24.1
> > 
> 
> Trond, back in 2014 you had this NFSv4 only patch that took a more
> surgical approach:
> https://marc.info/?l=linux-nfs&m=141414531832768&w=2
> 
> It looks like discussion died out on it after it was ineffective to
> solve a different problem.
> Is there a reason why you don't want to do that approach now?
> 

Let me resend this patch with a better proposal. I think the main 2
problems here are really

   1. the svc_revisit() case, where we cancel the revisit. That case
      affects all versions of NFS, and can lead to performance issues.
   2. the NFSv2,v3,v4.0 replay cache, where dropping the replay (e.g.
      after a connection breakage) can cause a performance hit, and for
      something like TCP, which has long (usually 60 second) timeouts it
      could cause the replay to be delayed until after the reply gets
      kicked out of the cache. This is the case where NFSv4.0 can probably
      end up hanging, since the replay won't be forthcoming until a new
      connection breakage occurs.

I think (1) is best served by a patch like this one.
Perhaps (2) is better served by adopting the svc_defer() mechanism?

Hmm... Perhaps 2 patches are in order...

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx