Re: [PATCH 1/5] pNFS: recoalesce when ld write pagelist fails

Peng Tao <bergwolf@xxxxxxxxx> · Fri, 12 Aug 2011 07:53:21 +0800



On Fri, Aug 12, 2011 at 2:53 AM, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
> On 08/10/2011 05:03 PM, Peng Tao wrote:
>> On Thu, Aug 11, 2011 at 1:52 AM, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
>>> On 08/06/2011 07:53 PM, Peng Tao wrote:
>>>> For pnfs pagelist write failure, we need to pg_recoalesce and resend
>>>> IO to mds.
>>>>
>>>
>>> I have not given this subject any thought or investigation, so I don't
>>> know what we should do, but the gut feeling is that I have seen all this
>>> code else where and we could be having a bigger re-use of existing code.
>>>
>>> What if we dig into:
>>>        data->mds_ops->rpc_call_done(&data->task, data);
>>>        data->mds_ops->rpc_release(data);
>>>
>>> And do all the pages tear-down and unlocks but if there is an error
>>> not set them as clean. That is keep them dirty. Then mark the layout
>>> as error and let the normal code choose an MDS write_out. (Just a wild
>>> thought)
>> This may work only for write failures. But for read, we will have to
>> recoalesce and send to MDS. So I prefer to let read and write have
>> similar retry code path like this.
>>
>
> I disagree. Look even now the read path is very different then the write
> path. (See your two patches: write-patch is 3 times bigger the read-patch)
I mean their logic is the same: if pnfs_error is set, recoalesce the
pages and re-send to MDS :)

>
> You should see if what I say is possible for write. And then maybe some
> thing will come up also for read. They do not necessarily need to be the
> same. (I think)
I agree that it is possible for write. We can re-dirty the pages and
rely on next flush to write it out to MDS. This is mentioned by Trond
before. However, the method won't work for read failures. I don't see
how we can queue failed read pages and let someone else re-send it
later.

-- 
Thanks,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html