On Wed, 26 Sep 2012 13:25:21 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote: > On 2012-09-26 09:20 NeilBrown <neilb@xxxxxxx> Wrote: > >On Sat, 22 Sep 2012 11:14:47 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote: > > > >> Because raid5 had implemented badsector feature,some failed-stripe which because the > >> badsector can try to correct instead of only failing. > >> > >> For example: > >> 1:Create a raid5 by four disks missing/sdb/sdc/sdd. > >> 2:If readed data from sdb and met error,so the stripe is failed > >> 3:If later write-stripe contains sdb which overwrite.The stripe has a > >> chance to correct. > > > >Your patch looks much more complicated that should be necessary, and you > >don't provide any explanation for how it works, so it is very hard for me to > >review. > > > Before implementing badsector log, failed-stripes were becasue the faulty or removed disks. > For those, it's unnecessary to do something for those failed-stripe. > But after implementing badsector log, failed-stripe can by some badsector. > One example, RAID5 has four disk: one removed and one contained badsector, so the stripe is failed; > Another example is recovery, when recovery a stripe and fail,so all the disks set badsector. > But for above example, if full-write on this stripe,the badsector will be correct and failed-stripe become normal. > > This patch is do this to try recorrect some failed-stripe because badsectors. > The workflow is: > 1:in func analyse_stripe, we count the overwrite-on-failed-disks(contained badsector or faulty or removed), noworkeddisk(faulty or remove) > 2:in func handle_stripe, it determines whether to be processed.There will be three cases: > A: noworkdisk > max_degraded. It indicates there are many removed or faulty disks.It only failed > B: All faild-disks are overwrited,which dosen't need read old data.So it can read other goog disks or computer parity data using RCW. > C: Some failed-disks are overwrited.If stripe didn't set STRIPE_PREREAD_ACTIVE, stripe can delay for waitting other failed-disks write-data. > Otherwise, it only failed the stripe. > 3;in func handle_stripe_dirtying do two things: > A: delay for failed-disks write-data > B: set RCW to control this situation > 4:After successly writed, one thing must to do.It will be rewrite and reread for check. > But for this case, it's unnecessary to rewrite,it only reread the failed-disks(only contains badsector). Thanks - I think I know what you are trying to do now. Please resend the patch with *only* the changes that you actually need and I'll try to review it. i.e. > >> @@ -548,9 +549,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) > >> else > >> rw = WRITE; > >> } else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) > >> - rw = READ; > >> + rw = READ; > >> else if (test_and_clear_bit(R5_WantReplace, > >> - &sh->dev[i].flags)) { > >> + &sh->dev[i].flags)) { > >> rw = WRITE; > >> replace_only = 1; > >> } else This is purely cosmetic so shouldn't be in the patch. > >> @@ -1737,6 +1738,7 @@ static void raid5_end_read_request(struct bio * bi, int error) > >> s = sh->sector + rdev->new_data_offset; > >> else > >> s = sh->sector + rdev->data_offset; > >> + > >> if (uptodate) { > >> set_bit(R5_UPTODATE, &sh->dev[i].flags); > >> if (test_bit(R5_ReadError, &sh->dev[i].flags)) { So is this. > >> @@ -1852,8 +1856,8 @@ static void raid5_end_write_request(struct bio *bi, int error) > >> } > >> } > >> pr_debug("end_write_request %llu/%d, count %d, uptodate: %d.\n", > >> - (unsigned long long)sh->sector, i, atomic_read(&sh->count), > >> - uptodate); > >> + (unsigned long long)sh->sector, i, > >> + atomic_read(&sh->count), uptodate); > >> if (i == disks) { > >> BUG(); > >> return; So is this. etc. NeilBrown
Attachment:
signature.asc
Description: PGP signature