On Tue, May 15, 2018 at 12:46 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > On Fri, May 11, 2018 at 08:47:53AM +0800, Ming Lei wrote: >> Hi Shaohua, >> >> On Fri, May 11, 2018 at 6:23 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: >> > From: Shaohua Li <shli@xxxxxx> >> > >> > The recent flush request handling seems unncessary complicated. The main >> > issue is in rdev_end_flush we can either get rdev of the bio or the >> > flush_info, not both, or we need extra memory to for the other. With the >> > extra memory, we need reallocate the memory in disk hotadd/remove. >> > Actually the original patch forgets one case of add_new_disk for memory >> > allocation, and we have kernel crash. >> > >> > The idea is always to increase all rdev reference in md_flush_request >> > and decrease the references after bio finish. In this way, >> > rdev_end_flush doesn't need to know rdev, so we don't need to allocate >> > extra memory. >> > >> > Cc: Xiao Ni <xni@xxxxxxxxxx> >> > Signed-off-by: Shaohua Li <shli@xxxxxx> >> > --- >> > drivers/md/md.c | 89 ++++++++++++++------------------------------------------- >> > drivers/md/md.h | 3 +- >> > 2 files changed, 23 insertions(+), 69 deletions(-) >> > >> > diff --git a/drivers/md/md.c b/drivers/md/md.c >> > index 0bb1e2f..d9474f8 100644 >> > --- a/drivers/md/md.c >> > +++ b/drivers/md/md.c >> > @@ -435,16 +435,12 @@ static void rdev_end_flush(struct bio *bi) >> > struct bio *fbio = fi->fbio; >> > struct md_rdev *rdev; >> > >> > - rcu_read_lock(); >> > - rdev_for_each_rcu(rdev, mddev) >> > - if (fi->bios[rdev->raid_disk] == bi) { >> > - fi->bios[rdev->raid_disk] = NULL; >> > + if (atomic_dec_and_test(&fi->flush_pending)) { >> > + rcu_read_lock(); >> > + rdev_for_each_rcu(rdev, mddev) >> > rdev_dec_pending(rdev, mddev); >> > - break; >> > - } >> > - rcu_read_unlock(); >> > + rcu_read_unlock(); >> > >> > - if (atomic_dec_and_test(&fi->flush_pending)) { >> > if (fbio->bi_iter.bi_size == 0) { >> > /* an empty barrier - all done */ >> > bio_endio(fbio); >> > @@ -465,14 +461,12 @@ void md_flush_request(struct mddev *mddev, struct bio *fbio) >> > { >> > struct md_rdev *rdev; >> > struct flush_info *fi; >> > - char *p = (char*)mddev->flush_info; >> > int index; >> > >> > atomic_inc(&mddev->flush_io); >> > >> > index = jhash((void*)fbio, sizeof(fbio), 0) % NR_FLUSHS; >> > - fi = (struct flush_info *)(p + index * (sizeof(struct flush_info) >> > - + mddev->raid_disks * sizeof(struct bio*))); >> > + fi = &mddev->flush_info[index]; >> > >> > spin_lock_irq(&fi->flush_lock); >> > wait_event_lock_irq(fi->flush_queue, >> >> This way uses jhash for allocating flush_info, if two bio maps to same jhash >> value, then extra(often unnecessary) waiting is introduced for the latter bio >> since there can be other flush_info available. > > That's possible, but it doesn't really matter. Our goal is not to completedly > avoid lock contention, reduce the contention is good enough. Given NR_FLUSHS is small enough, it is very possible to see two bios with same jhash(). Then this way will degrade sync IO performance because unnecessary wait is introduced, which should have been avoided easily, IMO. Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html