On Sat, Dec 08, 2018 at 10:49:44PM +0800, Bob Liu wrote: > On 11/28/18 3:45 PM, Christoph Hellwig wrote: > > On Wed, Nov 28, 2018 at 04:33:03PM +1100, Dave Chinner wrote: > >> - how does propagation through stacked layers work? > > > > The only way it works is by each layering driving it. Thus my > > recommendation above bilding on your earlier one to use an index > > that is filled by the driver at I/O completion time. > > > > E.g. > > > > bio_init: bi_leg = -1 > > > > raid1: submit bio to lower driver > > raid 1 completion: set bi_leg to 0 or 1 > > > > Now if we want to allow stacking we need to save/restore bi_leg > > before submitting to the underlying device. Which is possible, > > but quite a bit of work in the drivers. > > > > I found it's still very challenge while writing the code. > save/restore bi_leg may not enough because the drivers don't know how to do fs-metadata verify. > > E.g two layer raid1 stacking > > fs: md0(copies:2) > / \ > layer1/raid1 md1(copies:2) md2(copies:2) > / \ / \ > layer2/raid1 dev0 dev1 dev2 dev3 > > Assume dev2 is corrupted > => md2: don't know how to do fs-metadata verify. > => md0: fs verify fail, retry md1(preserve md2). > Then md2 will never be retried even dev3 may also has the right copy. > Unless the upper layer device(md0) can know the amount of copy is 4 instead of 2? > And need a way to handle the mapping. > Did I miss something? Thanks! <shrug> It seems reasonable to me that the raid1 layer should set the number of retries to (number of raid1 mirrors) * min(retry count of all mirrors) so that the upper layer device (md0) would advertise 4 retry possibilities instead of 2. --D > -Bob > > >> - is it generic/abstract enough to be able to work with > >> RAID5/6 to trigger verification/recovery from the parity > >> information in the stripe? > > > > If we get the non -1 bi_leg for paritity raid this is an inidicator > > that parity rebuild needs to happen. For multi-parity setups we could > > also use different levels there. > > >