Re: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to MQ

John Garry <john.garry@xxxxxxxxxx> · Thu, 23 Apr 2020 17:31:56 +0100

So I tested this on hisi_sas with x12 SAS SSDs, and performance with 
"mq-
deadline" is comparable with "none" @ ~ 2M IOPs. But after a while
performance drops alot, to maybe 700K IOPS. Do you have a similar
experience?

I am using mq-deadline only for HDD. I have not tried on SSD since it 
is not
useful scheduler for SSDs.

I ask as I only have SAS SSDs to test.

I noticed that when I used mq-deadline, performance drop starts if I have
more number of drives.
I am running <fio> script which has 64 Drives, 64 thread and all 
treads are
bound to local numa node which has 36 logical cores.
I noticed that lock contention is in " dd_dispatch_request". I am not 
sure
why there is a no penalty of same lock in nr_hw_queue  = 1 mode.

So this could be just pre-existing issue of exposing multiple queues for 
SCSI HBAs combined with mq-deadline iosched. I mean, that's really the 
only significant change in this series, apart from the shared sbitmap, 
and, at this point, I don't think that is the issue.

As an experiment, I modified hisi_sas mainline driver to expose hw 
queues and manage tags itself, and I see the same issue I mentioned:

Jobs: 12 (f=12): [R(12)] [14.8% done] [7592MB/0KB/0KB /s] [1943K/0/0 
iops] [eta
Jobs: 12 (f=12): [R(12)] [16.4% done] [7949MB/0KB/0KB /s] [2035K/0/0 
iops] [eta
Jobs: 12 (f=12): [R(12)] [18.0% done] [7940MB/0KB/0KB /s] [2033K/0/0 
iops] [eta
Jobs: 12 (f=12): [R(12)] [19.7% done] [7984MB/0KB/0KB /s] [2044K/0/0 
iops] [eta
Jobs: 12 (f=12): [R(12)] [21.3% done] [7984MB/0KB/0KB /s] [2044K/0/0 
iops] [eta
Jobs: 12 (f=12): [R(12)] [23.0% done] [2964MB/0KB/0KB /s] [759K/0/0 
iops] [eta 0
Jobs: 12 (f=12): [R(12)] [24.6% done] [2417MB/0KB/0KB /s] [619K/0/0 
iops] [eta 0
Jobs: 12 (f=12): [R(12)] [26.2% done] [2909MB/0KB/0KB /s] [745K/0/0 
iops] [eta 0
Jobs: 12 (f=12): [R(12)] [27.9% done] [2366MB/0KB/0KB /s] [606K/0/0 
iops] [eta 0

The odd time I see "sched: RT throttling activated" around the time the 
throughput falls. I think issue is the per-queue threaded irq threaded 
handlers consuming too many cycles. With "none" io scheduler, IOPS is 
flat at around 2M.

static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
         struct deadline_data *dd = hctx->queue->elevator->elevator_data;
         struct request *rq;

         spin_lock(&dd->lock);

So if multiple hctx's are accessing this lock, then much contention 
possible.

         rq = __dd_dispatch_request(dd);
         spin_unlock(&dd->lock);

         return rq;
}

Here is perf report -

-    1.04%     0.99%  kworker/18:1H+k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
      0.99% ret_from_fork
     -   kthread
       - worker_thread
          - 0.98% process_one_work
             - 0.98% __blk_mq_run_hw_queue
                - blk_mq_sched_dispatch_requests
                   - 0.98% blk_mq_do_dispatch_sched
                      - 0.97% dd_dispatch_request
                         + 0.97% queued_spin_lock_slowpath
+    1.04%     0.00%  kworker/18:1H+k  [kernel.vmlinux]  [k]
queued_spin_lock_slowpath
+    1.03%     0.95%  kworker/19:1H-k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
+    1.03%     0.00%  kworker/19:1H-k  [kernel.vmlinux]  [k]
queued_spin_lock_slowpath
+    1.02%     0.97%  kworker/20:1H+k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
+    1.02%     0.00%  kworker/20:1H+k  [kernel.vmlinux]  [k]
queued_spin_lock_slowpath
+    1.01%     0.96%  kworker/21:1H+k  [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath

I'll try to capture a perf report and compare to mine.

Mine is spending a huge amount of time (circa 33% on a cpu servicing 
completion irqs) in mod_delayed_work_on():

--79.89%--sas_scsi_task_done |
   |--76.72%--scsi_mq_done
   |    |
   |     --76.53%--blk_mq_complete_request
   |    |
   |    |--74.81%--scsi_softirq_done
   |    |    |
   |    |     --73.91%--scsi_finish_command
   |    |    |
   |    |    |--72.11%--scsi_io_completion
   |    |    |    |
   |    |    |     --71.89%--scsi_end_request
   |    |    |    |
   |    |    |    |--40.82%--blk_mq_run_hw_queues
   |    |    |    |    |
   |    |    |    |    |--35.86%--blk_mq_run_hw_queue
   |    |    |    |    |    |
   |    |    |    |    |     --33.59%--__blk_mq_delay_run_hw_queue
   |    |    |    |    |    |
   |    |    |    |    |     --33.38%--kblockd_mod_delayed_work_on
   |    |    |    |    |          |
   |    |    |    |    |                --33.31%--mod_delayed_work_on

hmmmm...

Thanks,
John