On Mon, Nov 27, 2023 at 2:16 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Fri, Nov 24, 2023 at 10:54 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > > > Currently rcu is used to protect iterating rdev from submit_flushes(): > > > > submit_flushes remove_and_add_spares > > synchronize_rcu > > pers->hot_remove_disk() > > rcu_read_lock() > > rdev_for_each_rcu > > if (rdev->raid_disk >= 0) > > rdev->radi_disk = -1; > > atomic_inc(&rdev->nr_pending) > > rcu_read_unlock() > > bi = bio_alloc_bioset() > > bi->bi_end_io = md_end_flush > > bi->private = rdev > > submit_bio > > // issue io for removed rdev > > > > Fix this problem by grabbing 'acive_io' before iterating rdev, make sure > > that remove_and_add_spares() won't concurrent with submit_flushes(). > > > > Fixes: a2826aa92e2e ("md: support barrier requests on all personalities.") > > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > > --- > > Changes v2: > > - Add WARN_ON in case md_flush_request() is not called from > > md_handle_request() in future. > > > > drivers/md/md.c | 22 ++++++++++++++++------ > > 1 file changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index 86efc9c2ae56..2ffedc39edd6 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -538,6 +538,9 @@ static void md_end_flush(struct bio *bio) > > rdev_dec_pending(rdev, mddev); > > > > if (atomic_dec_and_test(&mddev->flush_pending)) { > > + /* The pair is percpu_ref_tryget() from md_flush_request() */ > > + percpu_ref_put(&mddev->active_io); > > + > > /* The pre-request flush has finished */ > > queue_work(md_wq, &mddev->flush_work); > > } > > @@ -557,12 +560,8 @@ static void submit_flushes(struct work_struct *ws) > > rdev_for_each_rcu(rdev, mddev) > > if (rdev->raid_disk >= 0 && > > !test_bit(Faulty, &rdev->flags)) { > > - /* Take two references, one is dropped > > - * when request finishes, one after > > - * we reclaim rcu_read_lock > > - */ > > struct bio *bi; > > - atomic_inc(&rdev->nr_pending); > > + > > atomic_inc(&rdev->nr_pending); > > rcu_read_unlock(); > > bi = bio_alloc_bioset(rdev->bdev, 0, > > @@ -573,7 +572,6 @@ static void submit_flushes(struct work_struct *ws) > > atomic_inc(&mddev->flush_pending); > > submit_bio(bi); > > rcu_read_lock(); > > - rdev_dec_pending(rdev, mddev); > > } > > rcu_read_unlock(); > > if (atomic_dec_and_test(&mddev->flush_pending)) > > @@ -626,6 +624,18 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio) > > /* new request after previous flush is completed */ > > if (ktime_after(req_start, mddev->prev_flush_start)) { > > WARN_ON(mddev->flush_bio); > > + /* > > + * Grab a reference to make sure mddev_suspend() will wait for > > + * this flush to be done. > > + * > > + * md_flush_reqeust() is called under md_handle_request() and > > + * 'active_io' is already grabbed, hence percpu_ref_tryget() > > + * won't fail, percpu_ref_tryget_live() can't be used because > > + * percpu_ref_kill() can be called by mddev_suspend() > > + * concurrently. > > + */ > > + if (WARN_ON(percpu_ref_tryget(&mddev->active_io))) > > This should be "if (!WARN_ON(..))", right? > > Song > > > + percpu_ref_get(&mddev->active_io); Actually, we can just use percpu_ref_get(), no? Thanks, Song