On Fri, May 11, 2018 at 08:47:53AM +0800, Ming Lei wrote: > Hi Shaohua, > > On Fri, May 11, 2018 at 6:23 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > > From: Shaohua Li <shli@xxxxxx> > > > > The recent flush request handling seems unncessary complicated. The main > > issue is in rdev_end_flush we can either get rdev of the bio or the > > flush_info, not both, or we need extra memory to for the other. With the > > extra memory, we need reallocate the memory in disk hotadd/remove. > > Actually the original patch forgets one case of add_new_disk for memory > > allocation, and we have kernel crash. > > > > The idea is always to increase all rdev reference in md_flush_request > > and decrease the references after bio finish. In this way, > > rdev_end_flush doesn't need to know rdev, so we don't need to allocate > > extra memory. > > > > Cc: Xiao Ni <xni@xxxxxxxxxx> > > Signed-off-by: Shaohua Li <shli@xxxxxx> > > --- > > drivers/md/md.c | 89 ++++++++++++++------------------------------------------- > > drivers/md/md.h | 3 +- > > 2 files changed, 23 insertions(+), 69 deletions(-) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index 0bb1e2f..d9474f8 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -435,16 +435,12 @@ static void rdev_end_flush(struct bio *bi) > > struct bio *fbio = fi->fbio; > > struct md_rdev *rdev; > > > > - rcu_read_lock(); > > - rdev_for_each_rcu(rdev, mddev) > > - if (fi->bios[rdev->raid_disk] == bi) { > > - fi->bios[rdev->raid_disk] = NULL; > > + if (atomic_dec_and_test(&fi->flush_pending)) { > > + rcu_read_lock(); > > + rdev_for_each_rcu(rdev, mddev) > > rdev_dec_pending(rdev, mddev); > > - break; > > - } > > - rcu_read_unlock(); > > + rcu_read_unlock(); > > > > - if (atomic_dec_and_test(&fi->flush_pending)) { > > if (fbio->bi_iter.bi_size == 0) { > > /* an empty barrier - all done */ > > bio_endio(fbio); > > @@ -465,14 +461,12 @@ void md_flush_request(struct mddev *mddev, struct bio *fbio) > > { > > struct md_rdev *rdev; > > struct flush_info *fi; > > - char *p = (char*)mddev->flush_info; > > int index; > > > > atomic_inc(&mddev->flush_io); > > > > index = jhash((void*)fbio, sizeof(fbio), 0) % NR_FLUSHS; > > - fi = (struct flush_info *)(p + index * (sizeof(struct flush_info) > > - + mddev->raid_disks * sizeof(struct bio*))); > > + fi = &mddev->flush_info[index]; > > > > spin_lock_irq(&fi->flush_lock); > > wait_event_lock_irq(fi->flush_queue, > > This way uses jhash for allocating flush_info, if two bio maps to same jhash > value, then extra(often unnecessary) waiting is introduced for the latter bio > since there can be other flush_info available. That's possible, but it doesn't really matter. Our goal is not to completedly avoid lock contention, reduce the contention is good enough. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html