Re: [RFC PATCH] raid5: Correct some failed-stripe which because the badsector.

NeilBrown <neilb@xxxxxxx> · Tue, 8 Jan 2013 09:17:41 +1100

On Wed, 26 Sep 2012 13:25:21 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote:

> On 2012-09-26 09:20 NeilBrown <neilb@xxxxxxx> Wrote:
> >On Sat, 22 Sep 2012 11:14:47 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote:
> >
> >> Because raid5 had implemented badsector feature,some failed-stripe which because the
> >> badsector can try to correct instead of only failing.
> >> 
> >> For example:
> >> 1:Create a raid5 by four disks missing/sdb/sdc/sdd.
> >> 2:If readed  data from sdb and met error,so the stripe is failed
> >> 3:If later write-stripe contains sdb which overwrite.The stripe has a
> >> chance to correct.
> >
> >Your patch looks much more complicated that should be necessary, and you
> >don't provide any explanation for how it works, so it is very hard for me to
> >review.
> >
> Before implementing badsector log, failed-stripes were becasue the faulty or removed disks.
> For those, it's unnecessary to do something for those failed-stripe.
> But after implementing badsector log, failed-stripe can by some badsector.
> One example, RAID5 has four disk: one removed and one contained badsector, so the stripe is failed;
> Another example is recovery, when recovery a stripe and fail,so all the disks set badsector.
> But for above example, if full-write on this stripe,the badsector will be correct and failed-stripe become normal.
> 
> This patch is do this to try recorrect some failed-stripe because badsectors.
> The workflow is:
> 1:in func analyse_stripe, we count the overwrite-on-failed-disks(contained badsector or faulty or removed), noworkeddisk(faulty or remove)
> 2:in func handle_stripe, it determines whether to be processed.There will be three cases:
> A: noworkdisk > max_degraded. It indicates there are many removed or faulty disks.It only failed
> B: All faild-disks are overwrited,which dosen't need read old data.So it can read other goog disks or computer parity data using RCW.
> C: Some failed-disks are overwrited.If stripe didn't set STRIPE_PREREAD_ACTIVE, stripe can delay for waitting other failed-disks write-data.
> Otherwise, it only failed the stripe.
> 3;in func handle_stripe_dirtying do two things:
> 	A: delay for failed-disks write-data
> 	B: set RCW to control this situation
> 4:After successly writed, one thing must to do.It will be rewrite and reread for check.
> But for this case, it's unnecessary to rewrite,it only reread the failed-disks(only contains badsector).

Thanks - I think I know what you are trying to do now.

Please resend the patch with *only* the changes that you actually need and
I'll try to review it.

i.e.

> >> @@ -548,9 +549,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
> >>  			else
> >>  				rw = WRITE;
> >>  		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
> >> -			rw = READ;
> >> +				rw = READ;
> >>  		else if (test_and_clear_bit(R5_WantReplace,
> >> -					    &sh->dev[i].flags)) {
> >> +				&sh->dev[i].flags)) {
> >>  			rw = WRITE;
> >>  			replace_only = 1;
> >>  		} else

This is purely cosmetic so shouldn't be in the patch.

> >> @@ -1737,6 +1738,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
> >>  		s = sh->sector + rdev->new_data_offset;
> >>  	else
> >>  		s = sh->sector + rdev->data_offset;
> >> +
> >>  	if (uptodate) {
> >>  		set_bit(R5_UPTODATE, &sh->dev[i].flags);
> >>  		if (test_bit(R5_ReadError, &sh->dev[i].flags)) {

So is this.

> >> @@ -1852,8 +1856,8 @@ static void raid5_end_write_request(struct bio *bi, int error)
> >>  		}
> >>  	}
> >>  	pr_debug("end_write_request %llu/%d, count %d, uptodate: %d.\n",
> >> -		(unsigned long long)sh->sector, i, atomic_read(&sh->count),
> >> -		uptodate);
> >> +		(unsigned long long)sh->sector, i,
> >> +		atomic_read(&sh->count), uptodate);
> >>  	if (i == disks) {
> >>  		BUG();
> >>  		return;

So is this.
etc.

NeilBrown

Attachment:
signature.asc

Description: PGP signature