Re: interrupted rpcs problem

Olga Kornievskaia <aglo@xxxxxxxxx> · Mon, 13 Jan 2020 16:05:07 -0500

On Mon, Jan 13, 2020 at 1:24 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, 2020-01-13 at 13:09 -0500, Olga Kornievskaia wrote:
> > On Mon, Jan 13, 2020 at 11:49 AM Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > On Mon, 2020-01-13 at 11:08 -0500, Olga Kornievskaia wrote:
> > > > On Fri, Jan 10, 2020 at 4:03 PM Trond Myklebust <
> > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > On Fri, 2020-01-10 at 14:29 -0500, Olga Kornievskaia wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > We are having an issue with an interrupted RPCs again. Here's
> > > > > > what I
> > > > > > see when xfstests were ctrl-c-ed.
> > > > > >
> > > > > > frame 332 SETATTR call slot=0 seqid=0x000013ca (I'm assuming
> > > > > > this
> > > > > > is
> > > > > > interrupted and released)
> > > > > > frame 333 CLOSE call slot=0 seqid=0x000013cb  (only way the
> > > > > > slot
> > > > > > could
> > > > > > be free before the reply if it was interrupted, right?
> > > > > > Otherwise
> > > > > > we
> > > > > > should never have the slot used by more than one outstanding
> > > > > > RPC)
> > > > > > frame 334 reply to 333 with SEQ_MIS_ORDERED (I'm assuming
> > > > > > server
> > > > > > received frame 333 before 332)
> > > > > > frame 336 CLOSE call slot=0 seqid=0x000013ca (??? why did we
> > > > > > decremented it. I mean I know why it's in the current code :-
> > > > > > / )
> > > > > > frame 337 reply to 336 SEQUENCE with ERR_DELAY
> > > > > > frame 339 reply to 332 SETATTR which nobody is waiting for
> > > > > > frame 543 CLOSE call slot=0 seqid=0x000013ca (retry after
> > > > > > waiting
> > > > > > for
> > > > > > err_delay)
> > > > > > frame 544 reply to 543 with SETATTR (out of the cache).
> > > > > >
> > > > > > What this leads to is: file is never closed on the server.
> > > > > > Can't
> > > > > > remove it. Unmount fails with CLID_BUSY.
> > > > > >
> > > > > > I believe that's the result of commit
> > > > > > 3453d5708b33efe76f40eca1c0ed60923094b971.
> > > > > > We used to have code that bumped the sequence up when the
> > > > > > slot
> > > > > > was
> > > > > > interrupted but after the commit "NFSv4.1: Avoid false
> > > > > > retries
> > > > > > when
> > > > > > RPC calls are interrupted".
> > > > > >
> > > > > > Commit has this "The obvious fix is to bump the sequence
> > > > > > number
> > > > > > pre-emptively if an
> > > > > >     RPC call is interrupted, but in order to deal with the
> > > > > > corner
> > > > > > cases
> > > > > >     where the interrupted call is not actually received and
> > > > > > processed
> > > > > > by
> > > > > >     the server, we need to interpret the error
> > > > > > NFS4ERR_SEQ_MISORDERED
> > > > > >     as a sign that we need to either wait or locate a correct
> > > > > > sequence
> > > > > >     number that lies between the value we sent, and the last
> > > > > > value
> > > > > > that
> > > > > >     was acked by a SEQUENCE call on that slot."
> > > > > >
> > > > > > If we can't no longer just bump the sequence up, I don't
> > > > > > think
> > > > > > the
> > > > > > correct action is to automatically bump it down (as per
> > > > > > example
> > > > > > here)?
> > > > > > The commit doesn't describe the corner case where it was
> > > > > > necessary to
> > > > > > bump the sequence up. I wonder if we can return the knowledge
> > > > > > of
> > > > > > the
> > > > > > interrupted slot and make a decision based on that as well as
> > > > > > whatever
> > > > > > the other corner case is.
> > > > > >
> > > > > > I guess what I'm getting is, can somebody (Trond) provide the
> > > > > > info
> > > > > > for
> > > > > > the corner case for this that patch was created. I can see if
> > > > > > I
> > > > > > can
> > > > > > fix the "common" case which is now broken and not break the
> > > > > > corner
> > > > > > case....
> > > > > >
> > > > >
> > > > > There is no pure client side solution for this problem.
> > > > >
> > > > > The change was made because if you have multiple interruptions
> > > > > of
> > > > > the
> > > > > RPC call, then the client has to somehow figure out what the
> > > > > correct
> > > > > slot number is. If it starts low, and then goes high, and the
> > > > > server is
> > > > > not caching the arguments for the RPC call that is in the
> > > > > session
> > > > > cache, then we will _always_ hit this bug because we will
> > > > > always
> > > > > hit
> > > > > the replay of the last entry.
> > > > >
> > > > > At least if we start high, and iterate by low, then we reduce
> > > > > the
> > > > > problem to being a race with the processing of the interrupted
> > > > > request
> > > > > as it is in this case.
> > > > >
> > > > > However, as I said, the real solution here has to involve the
> > > > > server.
> > > >
> > > > Ok I see your point that if the server cached the arguments, then
> > > > the
> > > > server would tell that 2nd rpc using the same slot+seqid has
> > > > different
> > > > args and would not use the replay cache.
> > > >
> > > > However, I wonder if the client can do better. Can't we be more
> > > > aware
> > > > of when we are interrupting the rpc? For instance, if we are
> > > > interrupted after we started to wait on the RPC, doesn't it mean
> > > > the
> > > > rpc is sent on the network and since network is reliable then
> > > > server
> > > > must have consumed the seqid for that slot (in this case
> > > > increment
> > > > seqid)? That's the case that's failing now.
> > > >
> > >
> > > "Reliable transport" does not mean that a client knows what got
> > > received and processed by the server and what didn't. All the
> > > client
> > > knows is that if the connection is still up, then the TCP layer
> > > will
> > > keep retrying transmission of the request. There are plenty of
> > > error
> > > scenarios where the client gets no information back as to whether
> > > or
> > > not the data was received by the server (e.g. due to lost ACKs).
> > >
> > > Furthermore, if a RPC call is interrupted on the client, either due
> > > to
> > > a timeout or a signal,
> >
> > What timeout are you referring to here since 4.1 rcp can't timeout. I
> > think it only leaves a signal.
>
> If you use 'soft' or 'softerr' mount options, then NFSv4.1 will time
> out when the server is being unresponsive. That behaviour is different
> to the behaviour under a signal, but has the same effect of
> interrupting the RPC call without us being able to know if the server
> received the data.
>
> > > then it almost always ends up breaking the
> > > connection in order to avoid corruption of the data stream (by
> > > interrupting the transmission before the entire RPC call has been
> > > sent). You generally have to be lucky to see the timeout/signal
> > > occur
> > > only when all the RPC calls being cancelled have exactly fit into
> > > the
> > > socket buffer.
> >
> > Wouldn't a retransmission (due to a connection reset for whatever
> > reason) be different and doesn't involve reprocessing of the slot.
>
> I'm not talking about retransmissions here. I'm talking only about
> NFSv4.x RPC calls that suffer a fatal interruption (i.e. no
> retransmission).
>
> > > Finally, just because the server's TCP layer ACKed receipt of the
> > > RPC
> > > call data, that does not mean that it will process that call. The
> > > connection could break before the call is read out of the receiving
> > > socket, or the server may later decide to drop it on the floor and
> > > break the connection.
> > >
> > > IOW: the RPC protocol here is not that "reliable transport implies
> > > processing is guaranteed". It is rather that "connection is still
> > > up
> > > implies processing may eventually occur".
> >
> > "eventually occur" means that its process of the rpc is guaranteed
> > "in
> > time". Again unless the client is broken, we can't have more than an
> > interrupted rpc (that has nothing waiting) and the next rpc (both of
> > which will be re-transmitted if connection is dropped) going to the
> > server.
> >
> > Can we distinguish between interrupted due to re-transmission and
> > interrupted due to ctrl-c of the thread? If we can't, then I'll stop
> > arguing that client can do better.
>
> There is no "interrupted due to re-transmission" case. We only
> retransmit NFSv4 requests if the TCP connection breaks.
>
> As far as I'm concerned, this discussion is only about interruptions
> that cause the RPC call to be abandoned (i.e. fatal timeouts and
> signals).
>
> > But right now we are left in a bad state. Client leaves opened state
> > on the server and will not allow for files to be deleted. I think in
> > case the "next rpc" is the write that will never be completed it
> > would
> > leave the machine in a hung state. I just don't see how can you
> > justify that having the current code is any better than having the
> > solution that was there before.
>
> That's a general problem with allowing interruptions that is largely
> orthogonal to the question of which strategy we choose when
> resynchronising the slot numbers after an interruption has occurred.
>

I'm re-reading the spec and in section 2.10.6.2 we have "A requester
MUST wait for a reply to a request before using the slot for another
request". Are we even legally using the slot when we have an
interrupted slot?

> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>