RE: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> · Fri, 8 Oct 2021 02:01:52 +0530

> > -----Original Message-----
> > From: John Garry [mailto:john.garry@xxxxxxxxxx]
> > Sent: Tuesday, October 5, 2021 7:05 PM
> > To: Jens Axboe <axboe@xxxxxxxxx>; kashyap.desai@xxxxxxxxxxxx
> > Cc: linux-block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > ming.lei@xxxxxxxxxx; hare@xxxxxxx; linux-scsi@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory
> > footprint for shared sbitmap
> >
> > On 05/10/2021 13:35, Jens Axboe wrote:
> > >> Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-
> 5.16/io_uring'
> > >> into for-next
> > > Let's get this queued up for testing, thanks John.
> >
> > Cheers, appreciated
> >
> > @Kashyap, You mentioned that when testing you saw a performance
> > regression from v5.11 -> v5.12 - any idea on that yet? Can you
> > describe the scenario, like IO scheduler and how many disks and the
> > type? Does disabling host_tagset_enable restore performance?
>
> John - I am still working on this. System was not available due to some
> other
> debugging.

John -

I tested this patchset on 5.15-rc4 (master) -
https://github.com/torvalds/linux.git

#1 I noticed some performance regression @mq-deadline scheduler which is not
related to this series. I will bisect and get more detail about this issue
separately.
#2  w.r.t this patchset, I noticed one issue which is related to cpu usage
is high in certain case.

I have covered test on same setup using same h/w. I tested on Aero MegaRaid
Controller.

Test #1 : Total 24 SAS SSDs in JBOD mode.
(numactl -N 1 fio
24.fio --rw=randread --bs=4k --iodepth=256 --numjobs=1
--ioscheduler=none/mq-deadline)
No performance regression is noticed using this patchset. I can get 3.1 M
IOPs (max IOPs on this setup). I noticed some CPU hogging issue if iodepth
from application is high.

Cpu usage data from (top)
%Node1 :  6.4 us, 57.5 sy,  0.0 ni, 23.7 id,  0.0 wa,  0.0 hi, 12.4 si,  0.0
st

Perf top data -
     19.11%  [kernel]        [k] native_queued_spin_lock_slowpath
     4.72%  [megaraid_sas]  [k] complete_cmd_fusion
     3.70%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
     2.76%  [megaraid_sas]  [k] megasas_build_ldio_fusion
     2.16%  [kernel]        [k] syscall_return_via_sysret
     2.16%  [kernel]        [k] entry_SYSCALL_64
     1.87%  [megaraid_sas]  [k] megasas_queue_command
     1.58%  [kernel]        [k] io_submit_one
     1.53%  [kernel]        [k] llist_add_batch
     1.51%  [kernel]        [k] blk_mq_find_and_get_req
     1.43%  [kernel]        [k] llist_reverse_order
     1.42%  [kernel]        [k] scsi_complete
     1.18%  [kernel]        [k] blk_mq_rq_ctx_init.isra.51
     1.17%  [kernel]        [k] _raw_spin_lock_irqsave
     1.15%  [kernel]        [k] blk_mq_get_driver_tag
     1.09%  [kernel]        [k] read_tsc
     0.97%  [kernel]        [k] native_irq_return_iret
     0.91%  [kernel]        [k] scsi_queue_rq
     0.89%  [kernel]        [k] blk_complete_reqs

Perf top data indicates lock contention in "blk_mq_find_and_get_req" call.

1.31%     1.31%  kworker/57:1H-k  [kernel.vmlinux]
     native_queued_spin_lock_slowpath
     ret_from_fork
     kthread
     worker_thread
     process_one_work
     blk_mq_timeout_work
     blk_mq_queue_tag_busy_iter
     bt_iter
     blk_mq_find_and_get_req
     _raw_spin_lock_irqsave
     native_queued_spin_lock_slowpath

Kernel v5.14 Data -

%Node1 :  8.4 us, 31.2 sy,  0.0 ni, 43.7 id,  0.0 wa,  0.0 hi, 16.8 si,  0.0
st
     4.46%  [kernel]       [k] complete_cmd_fusion
     3.69%  [kernel]       [k] megasas_build_and_issue_cmd_fusion
     2.97%  [kernel]       [k] blk_mq_find_and_get_req
     2.81%  [kernel]       [k] megasas_build_ldio_fusion
     2.62%  [kernel]       [k] syscall_return_via_sysret
     2.17%  [kernel]       [k] __entry_text_start
     2.01%  [kernel]       [k] io_submit_one
     1.87%  [kernel]       [k] scsi_queue_rq
     1.77%  [kernel]       [k] native_queued_spin_lock_slowpath
     1.76%  [kernel]       [k] scsi_complete
     1.66%  [kernel]       [k] llist_reverse_order
     1.63%  [kernel]       [k] _raw_spin_lock_irqsave
     1.61%  [kernel]       [k] llist_add_batch
     1.39%  [kernel]       [k] aio_complete_rw
     1.37%  [kernel]       [k] read_tsc
     1.07%  [kernel]       [k] blk_complete_reqs
     1.07%  [kernel]       [k] native_irq_return_iret
     1.04%  [kernel]       [k] __x86_indirect_thunk_rax
     1.03%  fio            [.] __fio_gettime
     1.00%  [kernel]       [k] flush_smp_call_function_queue

Test #2: Three VDs (each VD consist of 8 SAS SSDs).
(numactl -N 1 fio
3vd.fio --rw=randread --bs=4k --iodepth=32 --numjobs=8
--ioscheduler=none/mq-deadline)

There is a performance regression but it is not due to this patch set.
Kernel v5.11 gives 2.1M IOPs on mq-deadline but 5.15 (without this patchset)
gives 1.8M IOPs.
In this test I did not noticed CPU issue as mentioned in Test-1.

In general, I noticed host_busy is incorrect once I apply this patchset. It
should not be more than can_queue, but sysfs host_busy value is very high
when IOs are running. This issue is only after applying this patchset.

Is this patch set only change the behavior of <shared_host_tag> enabled
driver ? Will there be any impact on mpi3mr driver ? I can test that as
well.

Kashyap

>
> >
> >  From checking differences between those kernels, I don't see anything
> > directly relevant in sbitmap support or in the megaraid sas driver.
> >
> > Thanks,
> > John
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature