Re: [RFC PATCH v1 0/7] Block/XFS: Support alternative mirror device retry

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Sun, 9 Dec 2018 20:30:15 -0800

On Sat, Dec 08, 2018 at 10:49:44PM +0800, Bob Liu wrote:
> On 11/28/18 3:45 PM, Christoph Hellwig wrote:
> > On Wed, Nov 28, 2018 at 04:33:03PM +1100, Dave Chinner wrote:
> >> 	- how does propagation through stacked layers work?
> > 
> > The only way it works is by each layering driving it.  Thus my
> > recommendation above bilding on your earlier one to use an index
> > that is filled by the driver at I/O completion time.
> > 
> > E.g.
> > 
> > 	bio_init:		bi_leg = -1
> > 
> > 	raid1:			submit bio to lower driver
> > 	raid 1 completion:	set bi_leg to 0 or 1
> > 
> > Now if we want to allow stacking we need to save/restore bi_leg
> > before submitting to the underlying device.  Which is possible,
> > but quite a bit of work in the drivers.
> > 
> 
> I found it's still very challenge while writing the code.
> save/restore bi_leg may not enough because the drivers don't know how to do fs-metadata verify.
> 
> E.g two layer raid1 stacking
> 
> fs:                  md0(copies:2)
>                      /          \
> layer1/raid1   md1(copies:2)    md2(copies:2)
>                   /    \          /     \
> layer2/raid1   dev0   dev1      dev2    dev3
> 
> Assume dev2 is corrupted
>  => md2: don't know how to do fs-metadata verify. 
>    => md0: fs verify fail, retry md1(preserve md2).
> Then md2 will never be retried even dev3 may also has the right copy.
> Unless the upper layer device(md0) can know the amount of copy is 4 instead of 2? 
> And need a way to handle the mapping.
> Did I miss something? Thanks!

<shrug> It seems reasonable to me that the raid1 layer should set the
number of retries to (number of raid1 mirrors) * min(retry count of all
mirrors) so that the upper layer device (md0) would advertise 4 retry
possibilities instead of 2.

--D

> -Bob
> 
> >> 	- is it generic/abstract enough to be able to work with
> >> 	  RAID5/6 to trigger verification/recovery from the parity
> >> 	  information in the stripe?
> > 
> > If we get the non -1 bi_leg for paritity raid this is an inidicator
> > that parity rebuild needs to happen.  For multi-parity setups we could
> > also use different levels there.
> > 
>