On 3/3/20 12:53 PM, John Garry wrote: >>> And for these low-latency queues, I figure that the issue is not just >>> polling vs interrupt, but indeed how fast the HW queue can consume SQEs. >>> A HW queue may only be able to consume at a limited rate - that's why we >>> segregate. >> Yes, there is no polling in any of HW queues. High IOPs queues have >> interrupt coalescing enabled whereas >> low latency queues does not have interrupt coalescing. megaraid_sas >> driver would choose which set of queues >> among these two has to be used depending on workload. For latency >> oriented workload, driver would use low >> latency queues and for IOPs profile, driver would use High IOPs queues. >>> >>> As an aside, that is actually an issue for blk-mq. For 1 to many HW >>> queue-to-CPU mapping, limiting many CPUs a single queue can limit IOPs >>> since HW queues can only consume at a limited rate. >> We were able to achieve performance target for MegaRAID latest gen >> controller with this model of few set >> of HW queues mapped to local numa CPUs and low latency queues has one >> to one mapping to CPUs. >> This is default behavior of queues segregation in megaraid_sas driver >> to satisfy our IOPs and latency requirements altogether. >> However we have given module parameter- "perf_mode" to tune queues >> behavior. i.e turning on/off interrupt >> coalescing on all HW queues where this one to many queues to CPU >> mapping would not happen. > > Hi Sumit, > > OK, I have a rough idea of the concept. And again I'd say megaraid sas > may not be a good candidate to expose > 1 HW queues, as we hide HW > queues and don't maintain the symmetry with blk-mq layer. > > Indeed, I do not even expect a performance increase in exposing > 1 HW > queues since the driver already uses the reply map + managed interrupts. > > The main reason for that change in some drivers - apart from losing the > duplicated ugliness of the reply map - is to leverage the blk-mq feature > to drain a hctx for CPU hotplug [0] - is this something which megaraid > sas is vulnerable to and would benefit from? > I would guess so. Megaraid_sas (much like mpt3sas) has a mailbox interface, requiring you to just write the address of the command into it. The command itself carries the information about which MSIx interrupt firmware should post completions on, so if the cpu serving that interrupt is gone we're in trouble as we'll never see the completion. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer