> -----Original Message----- > From: Peng Tao [mailto:bergwolf@xxxxxxxxx] > Sent: Tuesday, July 26, 2011 11:37 AM > To: Myklebust, Trond > Cc: tao.peng@xxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; bhalevy@xxxxxxxxxx > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS > > Hi, Trond, > > On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust > <Trond.Myklebust@xxxxxxxxxx> wrote: > > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@xxxxxxx wrote: > >> Hi, Trond, > >> > >> Any comments on this patch? I still get kernel crash when pnfs write > is attempted but fails and calls pnfs_ld_write_done(). It seems object > layout uses the same code path as well. But I don't find the patch in > either your tree or Benny's tree. Are there any concerns? > >> > >> Thanks, > >> Tao > > > > The whole pnfs_ld_write_done thing is bogus and needs to be replaced > > with something sane. It is trying to initiate a WRITE RPC call with > the > > wrong block size, and is calling the MDS rpc_call_done() and > > rpc_release() with an uninitialised rpc task pointer. > > > > Ditto for pnfs_ld_read_done. > Thanks for your explanation. Is there any plan on how to fix > pnfs_ld_read/write_done? Basically, we would need an interface that > can redirect the IO to MDS if pnfs_error is set or do all necessary > cleanup work to end read/write if pnfs_error is 0. IMHO, the > recoalesce logic need to access nfs_pageio_descriptor but we do not > have that information at pnfs_ld_read/write_done. As far as I can see, the right thing to do is to mark the layout as invalid and then redirty the page. It should be easy to have fsync() re-send the pages in this case. These should be extremely rare events, since we expect to catch most of the pNFS failures when we do the actual LAYOUTGET in the ->pg_init(). My main worry is for aio/dio where there is no good mechanism for retrying. I'm still working on that... Cheers Trond ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥