On Fri, Apr 14, 2023 at 10:30 AM Yu Kuai <yukuai3@xxxxxxxxxx> wrote: > > Hello, > > rai10 support that add or remove conf->mirrors[].rdev/replacement can > concurrent with io, and this is handled by something like following > code: > > raid10_write_request > if (rrdev) > // decide to write replacement > r10_bio->devs[i].repl_bio = bio > > raid10_write_one_disk > if (replacement) { > rdev = conf->mirrors[devnum].replacement; > if (rdev == NULL) { > /* Replacement just got moved to main 'rdev' */ > smp_mb(); > rdev = conf->mirrors[devnum].rdev; > } > } else > rdev = conf->mirrors[devnum].rdev; > > And this scheme is not complete and can cause kernel panic or data loss > in multiple places. > > for example, null-ptr-dereference: > > t1: t2: > raid10_write_request: > > // read rdev > rdev = conf->mirros[].rdev; > raid10_remove_disk > p = conf->mirros + number; > rdevp = &p->rdev; > // reset rdev > *rdevp = NULL > raid10_write_one_disk > // reread rdev got NULL > rdev = conf->mirrors[devnum].rdev > // null-ptr-dereference > mbio = bio_alloc_clone(rdev->bdev...) > synchronize_rcu() Hi Yu kuai raid10_write_request adds the rdev->nr_pending with rcu lock protection. Can this case happen? After adding ->nr_pending, the rdev can't be removed. > > for example, data loss: > > t1: > // assum that rdev is NULL, and replacement is not NULL How can trigger this? Could you give the detailed commands? Best Regards Xiao Ni > raid10_write_request > // read replacement > rrdev = conf->mirros[].replacement > > t2: replacement moved to rdev > raid10_remove_disk > p->rdev = p->replacement > p->replacement = NULL > > t3: add a new replacement > raid10_add_disk > p->replacement = rdev > raid10_write_one_disk > // read again, and got wrong replacement > // should write to rdev in this case > rrdev = conf->mirros[].replacement > > In order to fix these problems, I come up with two different solutions: > > 1) based on current solution, make sure rdev is only accessd once while > handling io, this can be done by changing r10bio: > diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h > index 63e48b11b552..c132cba1953c 100644 > --- a/drivers/md/raid10.h > +++ b/drivers/md/raid10.h > @@ -145,12 +145,12 @@ struct r10bio { > */ > struct r10dev { > struct bio *bio; > - union { > - struct bio *repl_bio; /* used for resync and > - * writes */ > - struct md_rdev *rdev; /* used for reads > - * (read_slot >= 0) */ > - }; > + struct md_rdev *rdev; > + > + /* used for resync and writes */ > + struct bio *repl_bio; > + struct md_rdev *replacement; > + > sector_t addr; > int devnum; > } devs[]; > > And only read conf once, then record rdev/replacement. This requires to > change all the context to issue io and complete io. > > 2) add a new synchronization that configuration can't concurrent with > any io, however, I think this can work but I'm not 100% percent sure. > Code changes should be less, and this will allow some cleanups and > simplify logic. > > Any suggestions? > > Thanks, > Kuai >