Re: [PATCH] md: simplify flush request handling

Ming Lei <tom.leiming@xxxxxxxxx> · Fri, 11 May 2018 08:47:53 +0800

Hi Shaohua,

On Fri, May 11, 2018 at 6:23 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
> From: Shaohua Li <shli@xxxxxx>
>
> The recent flush request handling seems unncessary complicated. The main
> issue is in rdev_end_flush we can either get rdev of the bio or the
> flush_info, not both, or we need extra memory to for the other. With the
> extra memory, we need reallocate the memory in disk hotadd/remove.
> Actually the original patch forgets one case of add_new_disk for memory
> allocation, and we have kernel crash.
>
> The idea is always to increase all rdev reference in md_flush_request
> and decrease the references after bio finish. In this way,
> rdev_end_flush doesn't need to know rdev, so we don't need to allocate
> extra memory.
>
> Cc: Xiao Ni <xni@xxxxxxxxxx>
> Signed-off-by: Shaohua Li <shli@xxxxxx>
> ---
>  drivers/md/md.c | 89 ++++++++++++++-------------------------------------------
>  drivers/md/md.h |  3 +-
>  2 files changed, 23 insertions(+), 69 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0bb1e2f..d9474f8 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -435,16 +435,12 @@ static void rdev_end_flush(struct bio *bi)
>         struct bio *fbio = fi->fbio;
>         struct md_rdev *rdev;
>
> -       rcu_read_lock();
> -       rdev_for_each_rcu(rdev, mddev)
> -               if (fi->bios[rdev->raid_disk] == bi) {
> -                       fi->bios[rdev->raid_disk] = NULL;
> +       if (atomic_dec_and_test(&fi->flush_pending)) {
> +               rcu_read_lock();
> +               rdev_for_each_rcu(rdev, mddev)
>                         rdev_dec_pending(rdev, mddev);
> -                       break;
> -               }
> -       rcu_read_unlock();
> +               rcu_read_unlock();
>
> -       if (atomic_dec_and_test(&fi->flush_pending)) {
>                 if (fbio->bi_iter.bi_size == 0) {
>                         /* an empty barrier - all done */
>                         bio_endio(fbio);
> @@ -465,14 +461,12 @@ void md_flush_request(struct mddev *mddev, struct bio *fbio)
>  {
>         struct md_rdev *rdev;
>         struct flush_info *fi;
> -       char *p = (char*)mddev->flush_info;
>         int index;
>
>         atomic_inc(&mddev->flush_io);
>
>         index = jhash((void*)fbio, sizeof(fbio), 0) % NR_FLUSHS;
> -       fi = (struct flush_info *)(p + index * (sizeof(struct flush_info)
> -                       + mddev->raid_disks * sizeof(struct bio*)));
> +       fi = &mddev->flush_info[index];
>
>         spin_lock_irq(&fi->flush_lock);
>         wait_event_lock_irq(fi->flush_queue,

This way uses jhash for allocating flush_info, if two bio maps to same jhash
value, then extra(often unnecessary) waiting is introduced for the latter bio
since there can be other flush_info available.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html