Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Sun, 28 Jun 2020 21:16:38 +0000

On Sun, 2020-06-28 at 14:03 -0400, Olga Kornievskaia wrote:
> Trond/Anna,
> 
> Any comments on this patch?
> 
> On Tue, Jun 23, 2020 at 11:22 AM Olga Kornievskaia
> <olga.kornievskaia@xxxxxxxxx> wrote:
> > Current behaviour: every time a v3 operation is re-sent to the
> > server
> > we update (double) the timeout. There is no distinction between
> > whether
> > or not the previous timer had expired before the re-sent happened.
> > 
> > Here's the scenario:
> > 1. Client sends a v3 operation
> > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > connection
> > is immediately reset)
> > 3. Client re-sends a v3 operation but the timeout is now 120sec.

Ah... The problem here is clearly '3.' incrementing the timeout value
before we've actually hit a minor or major timeout...

So I think we want to look carefully at xprt_adjust_timeout(). The
first rule there should be that if we're below the threshold for a
minor timeout, we just want to exit without changing anything.

The second rule is then that if we're below the threshold for a major
timeout, then we adjust the timeout value by doubling it (if to-
>to_exponential) or adding the value to->to_increment (if !to-
>to_exponential) and then exit.

Finally, if this is a major timeout, we reset req->rq_timeout to to-
>to_initval, reset req->rq_retries, call xprt_reset_majortimeo(), reset
the RTT counters and return ETIMEDOUT.

None of this should be specific to your connection reset case. This is
how we want timeouts to work in the generic case, so we need to fix
that.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx