RE: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to MQ

Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> · Tue, 7 Apr 2020 16:44:42 +0530

> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -373,24 +373,24 @@ megasas_get_msix_index(struct megasas_instance
> *instance,  {
>  	int sdev_busy;
>
> -	/* nr_hw_queue = 1 for MegaRAID */
> -	struct blk_mq_hw_ctx *hctx =
> -		scmd->device->request_queue->queue_hw_ctx[0];
> +	struct blk_mq_hw_ctx *hctx = scmd->request->mq_hctx;

Hi John,

There is one outstanding patch which will eventually remove device_busy
from sdev. To fix this interface, we may have to track per scsi device
outstanding within a driver.
For my testing I used below since we still have below interface available.

        sdev_busy = atomic_read(&scmd->device->device_busy);

We have done some level of testing to know performance impact on SAS SSDs
and HDD setup. Here is my finding -
My testing used - Two socket Intel Skylake/Lewisburg/Purley
Output of numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41
42 43 44 45 46 47 48 49 50 51 52 53
node 0 size: 31820 MB
node 0 free: 21958 MB
node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 1 size: 32247 MB
node 1 free: 21068 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

64 HDD setup -

With higher QD and io schedulder = mq-deadline, shared host tag is not
scaling well. If I use ioscheduler = none, I can see consistent 2.0M IOPs.
This issue is seen only with RFC. Without RFC mq-deadline scales up to
2.0M IOPS.

Perf Top result of RFC - (IOPS = 1.4M IOPS)

   78.20%  [kernel]        [k] native_queued_spin_lock_slowpath
     1.46%  [kernel]        [k] sbitmap_any_bit_set
     1.14%  [kernel]        [k] blk_mq_run_hw_queue
     0.90%  [kernel]        [k] _mix_pool_bytes
     0.63%  [kernel]        [k] _raw_spin_lock
     0.57%  [kernel]        [k] blk_mq_run_hw_queues
     0.56%  [megaraid_sas]  [k] complete_cmd_fusion
     0.54%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
     0.50%  [kernel]        [k] dd_has_work
     0.38%  [kernel]        [k] _raw_spin_lock_irqsave
     0.36%  [kernel]        [k] gup_pgd_range
     0.35%  [megaraid_sas]  [k] megasas_build_ldio_fusion
     0.31%  [kernel]        [k] io_submit_one
     0.29%  [kernel]        [k] hctx_lock
     0.26%  [kernel]        [k] try_to_grab_pending
     0.24%  [kernel]        [k] scsi_queue_rq
     0.22%  fio             [.] __fio_gettime
     0.22%  [kernel]        [k] insert_work
     0.20%  [kernel]        [k] native_irq_return_iret

Perf top without RFC driver - (IOPS = 2.0 M IOPS)

    58.40%  [kernel]          [k] native_queued_spin_lock_slowpath
     2.06%  [kernel]          [k] _mix_pool_bytes
     1.38%  [kernel]          [k] _raw_spin_lock_irqsave
     0.97%  [kernel]          [k] _raw_spin_lock
     0.91%  [kernel]          [k] scsi_queue_rq
     0.82%  [kernel]          [k] __sbq_wake_up
     0.77%  [kernel]          [k] _raw_spin_unlock_irqrestore
     0.74%  [kernel]          [k] scsi_mq_get_budget
     0.61%  [kernel]          [k] gup_pgd_range
     0.58%  [kernel]          [k] aio_complete_rw
     0.52%  [kernel]          [k] elv_rb_add
     0.50%  [kernel]          [k] llist_add_batch
     0.50%  [kernel]          [k] native_irq_return_iret
     0.48%  [kernel]          [k] blk_rq_map_sg
     0.48%  fio               [.] __fio_gettime
     0.47%  [kernel]          [k] blk_mq_get_tag
     0.44%  [kernel]          [k] blk_mq_dispatch_rq_list
     0.40%  fio               [.] io_u_queued_complete
     0.39%  fio               [.] get_io_u

If you want me to test any top up patch, please let me know.  BTW, we also
wants to provide module parameter for user to switch back to older
nr_hw_queue = 1 mode. I will work on that part.

24 SSD setup -

I am able to see performance using RFC and without RFC is almost same.
There is one specific drop, but that is generic kernel issue. Not related
to RFC.
We can discuss this issue separately. -

5.6 kernel is not able to scale very well if there is heavy outstanding
from application.
Example -
24 SSD setup and BS = 8K QD = 128 gives 1.73M IOPs which is h/w max, but
at QD = 256 it gives 1.4M IOPs. It looks like there are some overhead  of
finding free tags at sdev or shost level which leads drops in IOPs.

Kashyap