Re: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to MQ

John Garry <john.garry@xxxxxxxxxx> · Mon, 27 Apr 2020 18:06:32 +0100

Hi Kashyap,

hmmmm...

I did some more experiments. It looks like issue is with both <none> and
<mq-deadline> scheduler.  Let me simplify what happens with ioscheduler =
<none>.

I know it's good to compare like-for-like, but, as I understand, "none" 
is more suited for MQ host, while deadline is more suited for SQ host.

Old Driver which has nr_hw_queue = 1 and I issue IOs from <fio>  queue depth
= 128. We get 3.1M IOPS in this config. This eventually exhaust host
can_queue.

So I think I need to find a point where we start to get throttled.

Note - Very low contention in sbitmap_get()

-   23.58%     0.25%  fio              [kernel.vmlinux]            [k]
blk_mq_make_request
    - 23.33% blk_mq_make_request
       - 21.68% blk_mq_get_request
          - 20.19% blk_mq_get_tag
             + 10.08% prepare_to_wait_exclusive
             + 4.51% io_schedule
             - 3.59% __sbitmap_queue_get
                - 2.82% sbitmap_get
                     0.86% __sbitmap_get_word
                     0.75% _raw_spin_lock_irqsave
                     0.55% _raw_spin_unlock_irqrestore

Driver with RFC which has nr_hw_queue = N and I issue IOs from <fio>  queue
depth = 128. We get 2.3 M IOPS in this config. This eventually exhaust host
can_queue.
Note - Very high contention in sbitmap_get()

-   42.39%     0.12%  fio              [kernel.vmlinux]            [k]
generic_make_request
    - 42.27% generic_make_request
       - 41.00% blk_mq_make_request
          - 38.28% blk_mq_get_request
             - 33.76% blk_mq_get_tag
                - 30.25% __sbitmap_queue_get
                   - 29.90% sbitmap_get
                      + 9.06% _raw_spin_lock_irqsave
                      + 7.94% _raw_spin_unlock_irqrestore
                      + 3.86% __sbitmap_get_word
                      + 1.78% call_function_single_interrupt
                      + 0.67% ret_from_intr
                + 1.69% io_schedule
                  0.59% prepare_to_wait_exclusive
                  0.55% __blk_mq_get_tag

In this particular case, I observed alloc_hint = zeros which means,
sbitmap_get is not able to find free tags from hint. That may lead to
contention.
This condition is not happening with nr_hw_queue=1 (without RFC) driver.

alloc_hint=
{663, 2425, 3060, 54, 3149, 4319, 4175, 4867, 543, 2481, 0, 4779, 377,
***0***, 2010, 0, 909, 3350, 1546, 2179, 2875, 659, 3902, 2224, 3212, 836,
1892, 1669, 2420,
3415, 1904, 512, 3027, 4810, 2845, 4690, 712, 3105, 0, 0, 0, 3268, 4915,
3897, 1349, 547, 4, 733, 1765, 2068, 979, 51, 880, 0, 370, 3520, 2877, 4097,
418, 4501, 3717,
2893, 604, 508, 759, 3329, 4038, 4829, 715, 842, 1443, 556}

Driver with RFC which has nr_hw_queue = N and I issue IOs from <fio>  queue
depth = 32. We get 3.1M IOPS in this config. This workload does *not*
exhaust host can_queue.

Please ensure .host_tagset is set for whenever nr_hw_queue = N. This is 
as per RFC, and I don't think you modified from the RFC for your test. 
But I just wanted to mention that to be crystal clear.

-    5.07%     0.14%  fio              [kernel.vmlinux]  [k]
generic_make_request
    - 4.93% generic_make_request
       - 3.61% blk_mq_make_request
          - 2.04% blk_mq_get_request
             - 1.08% blk_mq_get_tag
                - 0.70% __sbitmap_queue_get
                     0.67% sbitmap_get

In summary, RFC has some performance bottleneck in sbitmap_get () if
outstanding per shost is about to exhaust.  Without this RFC also driver
works in nr_hw_queue = 1, but that case is managed very well.
I am not sure why it happens only with shared host tag ? Theoretically all
the hctx is sharing the same bitmaptag which is same as nr_hw_queue=1, so
why contention is only visible in shared host tag case.

Let me check this.

If you want to reproduce this issue, may be you have to reduce the can_queue
in hisi_sas driver.

Thanks,
John