Re: question about pnfs logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 1, 2019 at 7:10 AM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> Hi Olga,
>
> On Tue, 2019-04-30 at 17:25 -0400, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > I'm trying to figure existing code in order to fix a problem. Can you
> > please help?
> >
> > Say there was an IO to the DS and it timeouts. Pnfs code in the
> > filelayout_async_handle_error() marks the device unavailable and
> > marks
> > the layout to be returned. Timed out IO is retried against the MDS.
> > The new IO tries to do the pnfs and because the device is marked
> > unavailable (returning EINVAL) it propagates back all the way to the
> > pg_init setup and fails the IO with EINVAL. But IO shouldn't fail.
> >
> > My question is why are we returning the status of the
> > filelayout_check_deviceid(), I propose that instead we at that point
> > set lseg=NULL and then the IO would be redirected to the MDS.
> >
> > Or maybe I'm missing something else?
> >
> > My proposed fix would be:
> > diff --git a/fs/nfs/filelayout/filelayout.c
> > b/fs/nfs/filelayout/filelayout.c
> > index 61f46facb39c..b3e8ba3bd654 100644
> > --- a/fs/nfs/filelayout/filelayout.c
> > +++ b/fs/nfs/filelayout/filelayout.c
> > @@ -904,7 +904,7 @@ fl_pnfs_update_layout(struct inode *ino,
> >         status = filelayout_check_deviceid(lo, fl, gfp_flags);
> >         if (status) {
> >                 pnfs_put_lseg(lseg);
> > -               lseg = ERR_PTR(status);
> > +               lseg = NULL;
> >         }
> >  out:
> >         return lseg;
>
> I fully agree with your assessment. The objective of the code there
> should indeed be to fall back to doing I/O through the MDS rather than
> failing it altogether.
>
> Given that the 5.2 merge window is likely to open next week, and that
> we haven't seen any urgent reports of this bug before now, I suggest
> that you add the signed-off-by line to the above patch and send it to
> Anna for inclusion in that tree.
> Please also add a 'Fixes:' line so that the patch can be automatically
> backported to older kernel.

Thanks Trond for the confirmation. I'll do this as soon as I'll chase
down one more element that I can't explain in this failing scenario.

>
> Cheers
>   Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux