Hi Shaohua, On Fri, May 11, 2018 at 6:23 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > From: Shaohua Li <shli@xxxxxx> > > The recent flush request handling seems unncessary complicated. The main > issue is in rdev_end_flush we can either get rdev of the bio or the > flush_info, not both, or we need extra memory to for the other. With the > extra memory, we need reallocate the memory in disk hotadd/remove. > Actually the original patch forgets one case of add_new_disk for memory > allocation, and we have kernel crash. > > The idea is always to increase all rdev reference in md_flush_request > and decrease the references after bio finish. In this way, > rdev_end_flush doesn't need to know rdev, so we don't need to allocate > extra memory. > > Cc: Xiao Ni <xni@xxxxxxxxxx> > Signed-off-by: Shaohua Li <shli@xxxxxx> > --- > drivers/md/md.c | 89 ++++++++++++++------------------------------------------- > drivers/md/md.h | 3 +- > 2 files changed, 23 insertions(+), 69 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 0bb1e2f..d9474f8 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -435,16 +435,12 @@ static void rdev_end_flush(struct bio *bi) > struct bio *fbio = fi->fbio; > struct md_rdev *rdev; > > - rcu_read_lock(); > - rdev_for_each_rcu(rdev, mddev) > - if (fi->bios[rdev->raid_disk] == bi) { > - fi->bios[rdev->raid_disk] = NULL; > + if (atomic_dec_and_test(&fi->flush_pending)) { > + rcu_read_lock(); > + rdev_for_each_rcu(rdev, mddev) > rdev_dec_pending(rdev, mddev); > - break; > - } > - rcu_read_unlock(); > + rcu_read_unlock(); > > - if (atomic_dec_and_test(&fi->flush_pending)) { > if (fbio->bi_iter.bi_size == 0) { > /* an empty barrier - all done */ > bio_endio(fbio); > @@ -465,14 +461,12 @@ void md_flush_request(struct mddev *mddev, struct bio *fbio) > { > struct md_rdev *rdev; > struct flush_info *fi; > - char *p = (char*)mddev->flush_info; > int index; > > atomic_inc(&mddev->flush_io); > > index = jhash((void*)fbio, sizeof(fbio), 0) % NR_FLUSHS; > - fi = (struct flush_info *)(p + index * (sizeof(struct flush_info) > - + mddev->raid_disks * sizeof(struct bio*))); > + fi = &mddev->flush_info[index]; > > spin_lock_irq(&fi->flush_lock); > wait_event_lock_irq(fi->flush_queue, This way uses jhash for allocating flush_info, if two bio maps to same jhash value, then extra(often unnecessary) waiting is introduced for the latter bio since there can be other flush_info available. Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html