Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ

Hannes Reinecke <hare@xxxxxxx> · Tue, 14 Jan 2020 08:05:33 +0100

On 1/13/20 6:42 PM, John Garry wrote:
> On 10/01/2020 04:00, Sumit Saxena wrote:
>> On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@xxxxxxx> wrote:
>>>
>>> On 12/9/19 11:10 AM, Sumit Saxena wrote:
>>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@xxxxxxx> wrote:
>>>>>
>>>>> Fusion adapters can steer completions to individual queues, and
>>>>> we now have support for shared host-wide tags.
>>>>> So we can enable multiqueue support for fusion adapters and
>>>>> drop the hand-crafted interrupt affinity settings.
>>>>
>>>> Hi Hannes,
>>>>
>>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>>> time back and it had resulted in performance drop-
>>>> https://patchwork.kernel.org/patch/10969511/
>>>>
>>>> So, we will do some performance tests with this patch and update you.
>>>> Thank you.
>>>
>>> I'm aware of the results of Ming Leis work, but I do hope this patchset
>>> performs better.
>>>
>>> And when you do performance measurements, can you please run with both,
>>> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
>>> I've measured quite a performance improvements when using mq-deadline,
>>> up to the point where I've gotten on-par performance with the original,
>>> non-mq, implementation.
>>> (As a data point, on my setup I've measured about 270k IOPS and 1092
>>> MB/s througput, running on just 2 SSDs).
>>> asas_build_ldio_fusion
>>> But thanks for doing a performance test here.
>>
>> Hi Hannes,
>>
>> Sorry for the delay in replying, I observed a few issues with this
>> patchset:
>>
>> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
>> which IO submitter CPU is affined with. Due to this IO submission and
>> completion CPUs are different which causes performance drop for low
>> latency workloads.
> 
> Hi Sumit,
> 
> So the new code has:
> 
> megasas_build_ldio_fusion()
> {
> 
> cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag);
> 
> }
> 
> So the value here is hw queue index from blk-mq point of view, and not
> megaraid_sas msix index, as you alluded to.
> 
> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
> midlayer.
> 
> So I think that this should be:
> 
> cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
> 
> 
Indeed, that sounds reasonable.
(The whole queue mapping stuff isn't exactly well documented :-( )

I'll be updating the patch.

>>
>> lspcu:
>>
>> # lscpu
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                72
>> On-line CPU(s) list:   0-71
>> Thread(s) per core:    2
>> Core(s) per socket:    18
>> Socket(s):             2
>> NUMA node(s):          2
>> Vendor ID:             GenuineIntel
>> CPU family:            6
>> Model:                 85
>> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
>> Stepping:              4
>> CPU MHz:               3204.246
>> CPU max MHz:           3700.0000
>> CPU min MHz:           1200.0000
>> BogoMIPS:              5400.00
>> Virtualization:        VT-x
>> L1d cache:             32K
>> L1i cache:             32K
>> L2 cache:              1024K
>> L3 cache:              25344K
>> NUMA node0 CPU(s):     0-17,36-53
>> NUMA node1 CPU(s):     18-35,54-71
>> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
>> tm pbe s
>> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
>> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
>> dtes64 monitor
>> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
>> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
>> lahf_lm abm
>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
>> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
>> bmi1 hle
>> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
>> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
>> xsavec
>> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
>>
>>
> 
> [snip]
> 
>> 4. This patch removes below code from driver so what this piece of
>> code does is broken-
>>
>>
>> -                               if (instance->adapter_type >=
>> INVADER_SERIES &&
>> -                                   !instance->msix_combined) {
>> -                                       instance->msix_load_balance =
>> true;
>> -                                       instance->smp_affinity_enable
>> = false;
>> -                               }
> 
> Does this code need to be re-added? Would this have affected your test?
> Primarily this patch was required to enable interrupt affinity on my
machine (Lenovo RAID 930-8i).
Can you give me some information why the code is present in the first
place? Some hardware limitation, maybe?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer