On Wed, 2020-07-08 at 11:50 -0400, schumaker.anna@xxxxxxxxx wrote: > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > We used to do this before 3453d5708b33, but this was changed to > better > handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the > slot > re-use case when the server doesn't receive the interrupted > operation, > but if the server does receive the operation then it could still end > up > replying to the client with mis-matched operations from the reply > cache. > > We can fix this by sending a SEQUENCE to the server while recovering > from > a SEQ_MISORDERED error when we detect that we are in an interrupted > slot > situation. > > Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are > interrupted) > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > --- > fs/nfs/nfs4proc.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index e32717fd1169..5de41a5772f0 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -774,6 +774,14 @@ static void nfs4_slot_sequence_acked(struct > nfs4_slot *slot, > slot->seq_nr_last_acked = seqnr; > } > > +static void nfs4_probe_sequence(struct nfs_client *client, const > struct cred *cred, > + struct nfs4_slot *slot) > +{ > + struct rpc_task *task = _nfs41_proc_sequence(client, cred, > slot, true); > + if (!IS_ERR(task)) > + rpc_wait_for_completion_task(task); Hmm... I am a little concerned about the wait here, since we don't know what kind of thread this is. Any chance we could kick off a _nfs41_proc_sequence asynchronously, and then perhaps requeue the original task to wait for the next free slot? I suppose one issue there would be if the 'original task is an earlier call to _nfs41_proc_sequence, but perhaps that can be worked around? > +} > + > static int nfs41_sequence_process(struct rpc_task *task, > struct nfs4_sequence_res *res) > { > @@ -790,6 +798,7 @@ static int nfs41_sequence_process(struct rpc_task > *task, > goto out; > > session = slot->table->session; > + clp = session->clp; > > trace_nfs4_sequence_done(session, res); > > @@ -804,7 +813,6 @@ static int nfs41_sequence_process(struct rpc_task > *task, > nfs4_slot_sequence_acked(slot, slot->seq_nr); > /* Update the slot's sequence and clientid lease timer > */ > slot->seq_done = 1; > - clp = session->clp; > do_renew_lease(clp, res->sr_timestamp); > /* Check sequence flags */ > nfs41_handle_sequence_flag_errors(clp, res- > >sr_status_flags, > @@ -852,10 +860,15 @@ static int nfs41_sequence_process(struct > rpc_task *task, > /* > * Were one or more calls using this slot interrupted? > * If the server never received the request, then our > - * transmitted slot sequence number may be too high. > + * transmitted slot sequence number may be too high. > However, > + * if the server did receive the request then it might > + * accidentally give us a reply with a mismatched > operation. > + * We can sort this out by sending a lone sequence > operation > + * to the server on the same slot. > */ > if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1) > { > slot->seq_nr--; > + nfs4_probe_sequence(clp, task->tk_msg.rpc_cred, > slot); > goto retry_nowait; > } > /* -- Trond Myklebust CTO, Hammerspace Inc 4984 El Camino Real, Suite 208 Los Altos, CA 94022 www.hammer.space