Re: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to MQ

John Garry <john.garry@xxxxxxxxxx> · Wed, 22 Apr 2020 22:28:44 +0100

On 22/04/2020 19:59, Kashyap Desai wrote:

Hi Kashyap,

So I tested this on hisi_sas with x12 SAS SSDs, and performance with "mq-
deadline" is comparable with "none" @ ~ 2M IOPs. But after a while
performance drops alot, to maybe 700K IOPS. Do you have a similar
experience?

I am using mq-deadline only for HDD. I have not tried on SSD since it is not
useful scheduler for SSDs.

I ask as I only have SAS SSDs to test.

I noticed that when I used mq-deadline, performance drop starts if I have
more number of drives.
I am running <fio> script which has 64 Drives, 64 thread and all treads are
bound to local numa node which has 36 logical cores.
I noticed that lock contention is in " dd_dispatch_request". I am not sure
why there is a no penalty of same lock in nr_hw_queue  = 1 mode.

So this could be just pre-existing issue of exposing multiple queues for 
SCSI HBAs combined with mq-deadline iosched. I mean, that's really the 
only significant change in this series, apart from the shared sbitmap, 
and, at this point, I don't think that is the issue.

static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
         struct deadline_data *dd = hctx->queue->elevator->elevator_data;
         struct request *rq;

         spin_lock(&dd->lock);

So if multiple hctx's are accessing this lock, then much contention 
possible.

         rq = __dd_dispatch_request(dd);
         spin_unlock(&dd->lock);

         return rq;
}

Here is perf report -

-    1.04%     0.99%  kworker/18:1H+k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
      0.99% ret_from_fork
     -   kthread
       - worker_thread
          - 0.98% process_one_work
             - 0.98% __blk_mq_run_hw_queue
                - blk_mq_sched_dispatch_requests
                   - 0.98% blk_mq_do_dispatch_sched
                      - 0.97% dd_dispatch_request
                         + 0.97% queued_spin_lock_slowpath
+    1.04%     0.00%  kworker/18:1H+k  [kernel.vmlinux]  [k]
queued_spin_lock_slowpath
+    1.03%     0.95%  kworker/19:1H-k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
+    1.03%     0.00%  kworker/19:1H-k  [kernel.vmlinux]  [k]
queued_spin_lock_slowpath
+    1.02%     0.97%  kworker/20:1H+k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
+    1.02%     0.00%  kworker/20:1H+k  [kernel.vmlinux]  [k]
queued_spin_lock_slowpath
+    1.01%     0.96%  kworker/21:1H+k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath

I'll try to capture a perf report and compare to mine.

Thanks very much,
john