RE: [md PATCH 4/5] md: Fix: BIO I/O Error during reshape for external metadata

"Kwolek, Adam" <adam.kwolek@xxxxxxxxx> · Fri, 18 Jun 2010 09:48:34 +0100

> -----Original Message-----
> From: Neil Brown [mailto:neilb@xxxxxxx]
> Sent: Wednesday, June 16, 2010 7:03 AM
> To: Kwolek, Adam
> Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: [md PATCH 4/5] md: Fix: BIO I/O Error during reshape for
> external metadata
> 
> On Wed, 9 Jun 2010 15:22:27 +0100
> "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote:
> 
> > (md: Online Capacity Expansion for IMSM)
> > When sum of added disks and degraded disks is greater than
> max_degraded number, reshape decides that stripe is broken, so bio i/o
> error is a result.
> > Added disks without data has no impact on volume degradation
> (contains no data so far), so we have to be sure that all disks used to
> reshape has In_sync flag set.
> > We have to do this for disks without data.
> 
> Again, I'm not really following you.
> I agree that devices that are added to make up numbers for a shape
> should be
> marked In_sync, but that is already happening, roughly in the middle of
> raid5_start_reshape.
> 
> Again, can you give me a specific situation where the current code does
> the
> wrong thing?
> 
> Thanks,
> NeilBrown
> 
> 
> > ---
> >
> >  drivers/md/raid5.c |   17 ++++++++++++++++-
> >  1 files changed, 16 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index
> dc25a32..cb74045 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -5468,7 +5468,7 @@ static int raid5_start_reshape(mddev_t *mddev)
> >  	/* Add some new drives, as many as will fit.
> >  	 * We know there are enough to make the newly sized array work.
> >  	 */
> > -	list_for_each_entry(rdev, &mddev->disks, same_set)
> > +	list_for_each_entry(rdev, &mddev->disks, same_set) {
> >  		if (rdev->raid_disk < 0 &&
> >  		    !test_bit(Faulty, &rdev->flags)) {
> >  			if (raid5_add_disk(mddev, rdev) == 0) { @@ -5488,6
> +5488,21 @@ static int raid5_start_reshape(mddev_t *mddev)
> >  			} else
> >  				break;
> >  		}
> > +		/* if there is Online Capacity Expansion
> > +		 * on degraded array for external meta
> > +		 */
> > +		if (mddev->external &&
> > +		    (conf->raid_disks <= (disk_count + conf-
> >max_degraded))) {
> > +			/* check if not spare */
> > +			if (!(rdev->raid_disk < 0 &&
> > +			      !test_bit(Faulty, &rdev->flags)))
> > +				/* make sure that all disks,
> > +				 * even added previously have
> > +				 * in sync flag set
> > +				 */
> > +				set_bit(In_sync, &rdev->flags);
> > +		}
> > +	}
> >
> >  	/* When a reshape changes the number of devices, ->degraded
> >  	 * is measured against the large of the pre and post number of
> >

When disks are added to md for reshape for external metadata they have cleared in_sync flag (they are not used so far).
If the number of added disks is smaller than maximum degraded disks number, there is no problem.
If (i.e.) 2 or more disks are added to raid5 during reshape, stripes looks as degraded and bio error occurs.
For takeovered raid0 (it is changed to degraded raid5), if single disk only is added
we have problem, because we exceeded allowed degradation factor. Checking stripes degradation code during reshape
generates bio error as well.

To avoid this in_sync flag has to be set for all disks that are taken in to reshape/grow before process is executed.

For native metadata, in_sync flag is maintained during metadata writing/validating ...

BR
Adam

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html