On Sun, 2020-06-28 at 14:03 -0400, Olga Kornievskaia wrote: > Trond/Anna, > > Any comments on this patch? > > On Tue, Jun 23, 2020 at 11:22 AM Olga Kornievskaia > <olga.kornievskaia@xxxxxxxxx> wrote: > > Current behaviour: every time a v3 operation is re-sent to the > > server > > we update (double) the timeout. There is no distinction between > > whether > > or not the previous timer had expired before the re-sent happened. > > > > Here's the scenario: > > 1. Client sends a v3 operation > > 2. Server RST-s the connection (prior to the timeout) (eg., > > connection > > is immediately reset) > > 3. Client re-sends a v3 operation but the timeout is now 120sec. Ah... The problem here is clearly '3.' incrementing the timeout value before we've actually hit a minor or major timeout... So I think we want to look carefully at xprt_adjust_timeout(). The first rule there should be that if we're below the threshold for a minor timeout, we just want to exit without changing anything. The second rule is then that if we're below the threshold for a major timeout, then we adjust the timeout value by doubling it (if to- >to_exponential) or adding the value to->to_increment (if !to- >to_exponential) and then exit. Finally, if this is a major timeout, we reset req->rq_timeout to to- >to_initval, reset req->rq_retries, call xprt_reset_majortimeo(), reset the RTT counters and return ETIMEDOUT. None of this should be specific to your connection reset case. This is how we want timeouts to work in the generic case, so we need to fix that. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx