On Tue, Oct 11, 2022 at 9:39 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > > I finally reproduced it with 6.0 kernel, and It is a NULL dereference. That's cool. Could you share the steps? > >>> > >>> The reason is that raid0 doesn't do anything in raid0_quiesce. During > >>> the level change in level_store, it needs > >>> to make sure all inflight bios to finish. We can add a count to do > >>> this in raid0. Is it a good way? > >>> > >>> There is a count mddev->active_io in mddev. But it's decreased in > >>> md_handle_request rather than in the callback > >>> endio function. Is it right to decrease active_io in callbcak endio function? > >>> > >> I think it is a race between ios and takeover action, since > >> mddev_suspend called by > >> level_store should ensure no io is submitted to array at that time by below. > >> > >> wait_event(mddev->sb_wait, atomic_read(&mddev->active_io) == 0) > >> > >> However it can't guarantee new io comes after mddev_suspend because the > >> empty raid0 quiesce. Maybe we can do something like this. > > I was wrong given md_end_io_acct could happen after raid0_make_request, > so it could > be the previous io not finished before mddev_suspend, which means we > need to find a > way to drain member disk somehow but I doubt it is reasonable. It's what I wanted to mention in the original email. Sorry for not describing clearly. The inflight ios which mentioned in the first email are the bios that submit to the member disks. Because we introduced md_end_io_acct now, we need to ensure all the inflight ios come back and then free the memory. I put a patch in last email that tried to resolve it. > > Also change personality from raid0 seems doesn't obey the second rule. > > /* request to change the personality. Need to ensure: > * - array is not engaged in resync/recovery/reshape > * - *old personality can be suspended* > * - new personality will access other array. > */ > > And I don't like the idea to add another clone during io path for the > special case which That is what my last patch does. I didn't figure out a better way. > hurts performance. I would just warn user don't change personality from > raid0 unless > QUEUE_FLAG_IO_STAT flag is cleared for a while. Or do something like. > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 729be2c5296c..55e975233f66 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -3979,6 +3979,12 @@ level_store(struct mddev *mddev, const char *buf, > size_t len) > goto out_unlock; > } > > + if (blk_queue_io_stat(mddev->queue)) { > + blk_queue_flag_clear(QUEUE_FLAG_IO_STAT, mddev->queue); > + /* We want the previous bio is finished */ > + msleep(1000); > + } > + > /* Looks like we have a winner */ > mddev_suspend(mddev); > mddev_detach(mddev); > @@ -4067,6 +4073,7 @@ level_store(struct mddev *mddev, const char *buf, > size_t len) > pers->run(mddev); > set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); > mddev_resume(mddev); > + blk_queue_flag_set(QUEUE_FLAG_IO_STAT, mddev->queue); > > If you really want to fix it, maybe we have to ensure all inflight IOs > is finished by increase > inflight num in md_account_bio, then decrease it in md_end_io_acct after > add a new > inflight_num in mddev and also add "struct mddev *mddev" in md_io_acct, > just FYI. I thought about this method too. But then I thought it's only raid0 problem. So I didn't choose this method. Thanks for all the suggestions. Best Regards Xiao