2012/4/17 NeilBrown <neilb@xxxxxxx>: > On Tue, 17 Apr 2012 13:09:38 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote: > >> >> REQ_SYNC is ignored in current raid5 code. Block layer does use it to do >> policy, >> for example ioscheduler. This patch adds it. >> >> Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx> >> --- >> drivers/md/raid5.c | 12 ++++++++++-- >> drivers/md/raid5.h | 1 + >> 2 files changed, 11 insertions(+), 2 deletions(-) >> >> Index: linux/drivers/md/raid5.c >> =================================================================== >> --- linux.orig/drivers/md/raid5.c 2012-04-17 11:59:24.556008869 +0800 >> +++ linux/drivers/md/raid5.c 2012-04-17 12:06:06.306007114 +0800 >> @@ -518,6 +518,8 @@ static void ops_run_io(struct stripe_hea >> replace_only = 1; >> } else >> continue; >> + if (test_and_clear_bit(R5_SyncIO, &sh->dev[i].flags)) >> + rw |= REQ_SYNC; >> >> bi = &sh->dev[i].req; >> rbi = &sh->dev[i].rreq; /* For writing to replacement */ >> @@ -1114,6 +1116,8 @@ ops_run_biodrain(struct stripe_head *sh, >> dev->sector + STRIPE_SECTORS) { >> if (wbi->bi_rw & REQ_FUA) >> set_bit(R5_WantFUA, &dev->flags); >> + if (wbi->bi_rw & REQ_SYNC) >> + set_bit(R5_SyncIO, &dev->flags); >> tx = async_copy_data(1, wbi, dev->page, >> dev->sector, tx); >> wbi = r5_next_bio(wbi, dev->sector); >> @@ -1131,13 +1135,15 @@ static void ops_complete_reconstruct(voi >> int pd_idx = sh->pd_idx; >> int qd_idx = sh->qd_idx; >> int i; >> - bool fua = false; >> + bool fua = false, sync = false; >> >> pr_debug("%s: stripe %llu\n", __func__, >> (unsigned long long)sh->sector); >> >> - for (i = disks; i--; ) >> + for (i = disks; i--; ) { >> fua |= test_bit(R5_WantFUA, &sh->dev[i].flags); >> + sync |= test_bit(R5_SyncIO, &sh->dev[i].flags); >> + } >> >> for (i = disks; i--; ) { >> struct r5dev *dev = &sh->dev[i]; >> @@ -1146,6 +1152,8 @@ static void ops_complete_reconstruct(voi >> set_bit(R5_UPTODATE, &dev->flags); >> if (fua) >> set_bit(R5_WantFUA, &dev->flags); >> + if (sync) >> + set_bit(R5_SyncIO, &dev->flags); >> } >> } >> >> Index: linux/drivers/md/raid5.h >> =================================================================== >> --- linux.orig/drivers/md/raid5.h 2012-04-17 11:59:26.866008859 +0800 >> +++ linux/drivers/md/raid5.h 2012-04-17 12:00:29.196008586 +0800 >> @@ -285,6 +285,7 @@ enum r5dev_flags { >> */ >> R5_Wantdrain, /* dev->towrite needs to be drained */ >> R5_WantFUA, /* Write should be FUA */ >> + R5_SyncIO, /* The IO is sync */ >> R5_WriteError, /* got a write error - need to record it */ >> R5_MadeGood, /* A bad block has been fixed by writing to it */ >> R5_ReadRepl, /* Will/did read from replacement rather than orig */ > > Thanks. > There was some interesting white-space damage in this patch, but I sorted it > out and applied it. Thanks and sorry, will double check my email client. > However I'm not convinced that it is correct. It is probably better than > nothing so I'll allow it for now, but you might want to fix it up to make it > better. > > The point of REQ_SYNC is to remove any delays in handling the write, and get > it to disk as soon as possible. > However with RAID5, the write isn't complete until all the blocks in the > stripe have been written, so just setting REQ_SYNC on the write for one of > the blocks may not achieve the goal. > You at least need to use REQ_SYNC on the parity block, and you probably want > it on all the blocks in the stripe. > > So rather than setting R5_SyncIO on the dev, you should be setting in on the > whole stripe_head. > > Similarly WantFUA should affect all device writes in the strip as they all > need to hit the media for any of the data to be considered 'safe'. > > If you would like to fix both of those issues at once, that would be great :-) In current code, all written disks and parity disks will have FUA/SYNC set, please see ops_complete_reconstruct. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html