Re: [PATCH v1 1/1] NFS: Fix interrupted slots by sending a solo SEQUENCE operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 8, 2020 at 12:00 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 2020-07-08 at 11:50 -0400, schumaker.anna@xxxxxxxxx wrote:
> > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
> >
> > We used to do this before 3453d5708b33, but this was changed to
> > better
> > handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the
> > slot
> > re-use case when the server doesn't receive the interrupted
> > operation,
> > but if the server does receive the operation then it could still end
> > up
> > replying to the client with mis-matched operations from the reply
> > cache.
> >
> > We can fix this by sending a SEQUENCE to the server while recovering
> > from
> > a SEQ_MISORDERED error when we detect that we are in an interrupted
> > slot
> > situation.
> >
> > Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are
> > interrupted)
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
> > ---
> >  fs/nfs/nfs4proc.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index e32717fd1169..5de41a5772f0 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -774,6 +774,14 @@ static void nfs4_slot_sequence_acked(struct
> > nfs4_slot *slot,
> >       slot->seq_nr_last_acked = seqnr;
> >  }
> >
> > +static void nfs4_probe_sequence(struct nfs_client *client, const
> > struct cred *cred,
> > +                             struct nfs4_slot *slot)
> > +{
> > +     struct rpc_task *task = _nfs41_proc_sequence(client, cred,
> > slot, true);
> > +     if (!IS_ERR(task))
> > +             rpc_wait_for_completion_task(task);
>
> Hmm... I am a little concerned about the wait here, since we don't know
> what kind of thread this is.
>
> Any chance we could kick off a _nfs41_proc_sequence asynchronously, and
> then perhaps requeue the original task to wait for the next free slot?
> I suppose one issue there would be if the 'original task is an earlier
> call to _nfs41_proc_sequence, but perhaps that can be worked around?

I'll try it and see what happens. Thanks for the feedback!
Anna

>
> > +}
> > +
> >  static int nfs41_sequence_process(struct rpc_task *task,
> >               struct nfs4_sequence_res *res)
> >  {
> > @@ -790,6 +798,7 @@ static int nfs41_sequence_process(struct rpc_task
> > *task,
> >               goto out;
> >
> >       session = slot->table->session;
> > +     clp = session->clp;
> >
> >       trace_nfs4_sequence_done(session, res);
> >
> > @@ -804,7 +813,6 @@ static int nfs41_sequence_process(struct rpc_task
> > *task,
> >               nfs4_slot_sequence_acked(slot, slot->seq_nr);
> >               /* Update the slot's sequence and clientid lease timer
> > */
> >               slot->seq_done = 1;
> > -             clp = session->clp;
> >               do_renew_lease(clp, res->sr_timestamp);
> >               /* Check sequence flags */
> >               nfs41_handle_sequence_flag_errors(clp, res-
> > >sr_status_flags,
> > @@ -852,10 +860,15 @@ static int nfs41_sequence_process(struct
> > rpc_task *task,
> >               /*
> >                * Were one or more calls using this slot interrupted?
> >                * If the server never received the request, then our
> > -              * transmitted slot sequence number may be too high.
> > +              * transmitted slot sequence number may be too high.
> > However,
> > +              * if the server did receive the request then it might
> > +              * accidentally give us a reply with a mismatched
> > operation.
> > +              * We can sort this out by sending a lone sequence
> > operation
> > +              * to the server on the same slot.
> >                */
> >               if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1)
> > {
> >                       slot->seq_nr--;
> > +                     nfs4_probe_sequence(clp, task->tk_msg.rpc_cred,
> > slot);
> >                       goto retry_nowait;
> >               }
> >               /*
> --
> Trond Myklebust
> CTO, Hammerspace Inc
> 4984 El Camino Real, Suite 208
> Los Altos, CA 94022
> www.hammer.space
>
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux