On Fri, Jun 5, 2020 at 8:06 AM Tom Talpey <tom@xxxxxxxxxx> wrote: > > On 6/4/2020 5:21 PM, Olga Kornievskaia wrote: > > Hi Trond, > > > > There is a problem with interrupted slots (yet again). > > > > We send an operation to the server and it gets interrupted by the a signal. > > > > We used to send a sole SEQUENCE to remove the problem of having real > > operation get an out of the cache reply and failing. Now we are not > > doing it again (since 3453d5708 NFSv4.1: Avoid false retries when RPC > > calls are interrupted"). So the problem is > > > > We bump the sequence on the next use of the slot, and get SEQ_MISORDERED. > > Misordered? It sounds like the client isn't managing the sequence > number, or perhaps the server never saw the original request, and > is being overly strict. Well, both the client and the server are acting appropriately. I'm not arguing against bumping the sequence. Client sent say REMOVE with slot=1 seq=5 which got interrupted. So client doesn't know in what state the slot is left. So it sends the next operation say READ with slot=1 seq=6. Server acts appropriately too, as it's version of the slot has seq=4, this request with seq=6 gets SEQ_MISORDERED. > > We decrement the number back to the interrupted operation. This gets > > us a reply out of the cache. We again fail with REMOTE EIO error. > > Ew. The client *decrements* the sequence? Yes, as client then decides that server never received seq=5 operation so it re-sends with seq=5. But in reality seq=5 operation also reached the server so it has 2 requests REMOVE/READ both with seq=5 for slot=1. This leads to READ failing with some error. We used to before send a sole SEQUENCE when we have an interrupted slot to sync up the seq numbers. But commit 3453d5708 changed that and I would like to understand why. As I think we need to go back to sending sole SEQUENCE. > Tom. > > > Going back to the commit's message. I don't see the logic that the > > server can't tell if this is a new call or the old one. We used to > > send a lone SEQUENCE as a way to protect reuse of slot by a normal > > operation. An interrupted slot couldn't have been another SEQUENCE. So > > I don't see how the server can't tell a difference between SEQUENCE > > and any other operations. > > > >