On Fri, Aug 12, 2011 at 2:53 AM, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote: > On 08/10/2011 05:03 PM, Peng Tao wrote: >> On Thu, Aug 11, 2011 at 1:52 AM, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote: >>> On 08/06/2011 07:53 PM, Peng Tao wrote: >>>> For pnfs pagelist write failure, we need to pg_recoalesce and resend >>>> IO to mds. >>>> >>> >>> I have not given this subject any thought or investigation, so I don't >>> know what we should do, but the gut feeling is that I have seen all this >>> code else where and we could be having a bigger re-use of existing code. >>> >>> What if we dig into: >>> data->mds_ops->rpc_call_done(&data->task, data); >>> data->mds_ops->rpc_release(data); >>> >>> And do all the pages tear-down and unlocks but if there is an error >>> not set them as clean. That is keep them dirty. Then mark the layout >>> as error and let the normal code choose an MDS write_out. (Just a wild >>> thought) >> This may work only for write failures. But for read, we will have to >> recoalesce and send to MDS. So I prefer to let read and write have >> similar retry code path like this. >> > > I disagree. Look even now the read path is very different then the write > path. (See your two patches: write-patch is 3 times bigger the read-patch) I mean their logic is the same: if pnfs_error is set, recoalesce the pages and re-send to MDS :) > > You should see if what I say is possible for write. And then maybe some > thing will come up also for read. They do not necessarily need to be the > same. (I think) I agree that it is possible for write. We can re-dirty the pages and rely on next flush to write it out to MDS. This is mentioned by Trond before. However, the method won't work for read failures. I don't see how we can queue failed read pages and let someone else re-send it later. -- Thanks, Tao -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html