RE: [PATCH] NFS41: Drop lseg ref before fallthru to MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Peng Tao [mailto:bergwolf@xxxxxxxxx]
> Sent: Tuesday, July 26, 2011 1:33 PM
> To: Myklebust, Trond
> Cc: tao.peng@xxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; bhalevy@xxxxxxxxxx
> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
> 
> On Tue, Jul 26, 2011 at 11:50 PM, Myklebust, Trond
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> -----Original Message-----
> >> From: Peng Tao [mailto:bergwolf@xxxxxxxxx]
> >> Sent: Tuesday, July 26, 2011 11:37 AM
> >> To: Myklebust, Trond
> >> Cc: tao.peng@xxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; bhalevy@xxxxxxxxxx
> >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS
> >>
> >> Hi, Trond,
> >>
> >> On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust
> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@xxxxxxx wrote:
> >> >> Hi, Trond,
> >> >>
> >> >> Any comments on this patch? I still get kernel crash when pnfs
> write
> >> is attempted but fails and calls pnfs_ld_write_done(). It seems
> object
> >> layout uses the same code path as well. But I don't find the patch
> in
> >> either your tree or Benny's tree. Are there any concerns?
> >> >>
> >> >> Thanks,
> >> >> Tao
> >> >
> >> > The whole pnfs_ld_write_done thing is bogus and needs to be
> replaced
> >> > with something sane. It is trying to initiate a WRITE RPC call
> with
> >> the
> >> > wrong block size, and is calling the MDS rpc_call_done() and
> >> > rpc_release() with an uninitialised rpc task pointer.
> >> >
> >> > Ditto for pnfs_ld_read_done.
> >> Thanks for your explanation. Is there any plan on how to fix
> >> pnfs_ld_read/write_done? Basically, we would need an interface that
> >> can redirect the IO to MDS if pnfs_error is set or do all necessary
> >> cleanup work to end read/write if pnfs_error is 0. IMHO, the
> >> recoalesce logic need to access nfs_pageio_descriptor but we do not
> >> have that information at pnfs_ld_read/write_done.
> >
> > As far as I can see, the right thing to do is to mark the layout as
> invalid and then redirty the page. It should be easy to have fsync()
> re-send the pages in this case. These should be extremely rare events,
> since we expect to catch most of the pNFS failures when we do the
> actual LAYOUTGET in the ->pg_init().
> Agreed. This should be easier than re-coalescing and sending to MDS at
> read/write_done.
> 
> >
> > My main worry is for aio/dio where there is no good mechanism for
> retrying. I'm still working on that...
> For dio, we may have to send the failed pages to MDS instead of
> relying on next fsync() to retry.

The problem isn't what to do, it is more one of _who_ does it. The rpciod/nfsiod queues aren't the ideal place to set up a resend since it involves allocating memory.

Cheers
  Trond
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux