On Thu, Aug 16 2018, Xiao Ni wrote: > Hi Shaohua > > I encounter one panic recent in rhel7.6. The test is: > 1. Create some VDO devices > 2. Create raid10 device on 4 vdo devices > 3. Reshape raid10 device to 6 vdo devices. > > When sector_nr <= last it needs to goto read_more. If the r10_bio containing the > read_bio which is submitted before goto has freed and lower_barrier has called. > It'll panic at BUG_ON(force && !conf->barrier) > > The possibility of this is decreased by c85ba1 (md: raid1/raid10: don't handle failure of bio_add_page()) > In the test case bio_add_page fails after adding one page. It usually calls goto read_more. So the > problem happens easily. > > But in upstream it still has the possibility to hit the BUG_ON. Because the max_sectors return from > read_balance can let sector_nr <= last. > > Do you think it's the right way to fix this? No. The right way to fix it is: 1/ before the read_more: label, put raise_barrier(conf, 0); 2/ in place of the existing raise_barrier() put raise_barrier(conf, 1); 3/ after the "goto read_more" put lower_barrier(conf); NeilBrown > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 35bd3a6..f6de031 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -4535,7 +4535,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, > /* Now schedule reads for blocks from sector_nr to last */ > r10_bio = raid10_alloc_init_r10buf(conf); > r10_bio->state = 0; > - raise_barrier(conf, sectors_done != 0); > + raise_barrier(conf, 0); > atomic_set(&r10_bio->remaining, 0); > r10_bio->mddev = mddev; > r10_bio->sector = sector_nr; > > Best Regards > Xiao
Attachment:
signature.asc
Description: PGP signature