Re: [RFC PATCH] megaraid_sas : threaded irq hybrid polling

Hannes Reinecke <hare@xxxxxxx> · Wed, 4 Mar 2020 07:57:01 +0100

On 2/17/20 12:55 PM, Kashyap Desai wrote:
> High performance HBAs under scsi layer can reach more than 3.0M IOPs.
> MegaRaid Aero controller can achieve to 3.3M IOPs.In future there may be requirement to reach 6.0+ M IOPs.
> One of the key bottlenecks is serving interrupts for each IO completion.
> Block layer has interface blk_poll which can be used as zero interrupt poll queue.
> Extending blk_poll to scsi mid layer helps and I was able to get max IOPs same as nvme <poll_queues> interface.
> 
> blk_poll is currently merged with io_uring interface and it requires application change to adopt blk_poll.
> 
> This RFC covers the logic of handling irq polling in driver using threaded ISR interface.
> Changes in this RFC is described as below -
> 
> - Use Threaded ISR interface.
> - Primary ISR handler runs from h/w interrupt context.
> - Secondary ISR handler runs from thread context.
> - Driver will drain reply queue from Primary ISR handler for every interrupt it receives.
> - Primary handler will decide to call Secondary handler or not.
>   This interface can be optimized later, if driver or block layer keep submission and completion stats per each h/w queue.
>   Current megaraid_sas driver is single h/w queue based, so I have picked below decision maker.
>   If per scsi device has outstanding command more than 8, mark that msix index as “attempt_irq_poll”.
> - Every time secondary ISR handler runs, primary handler will disable IRQ.
>   Once secondary handler completes the task, it will re-enable IRQ.
>   If there is no completion, let's wait for some time and retry polling as enable/disable irq is expensive operation.
>   Without this wait in threaded IRQ polling, we will not allow submitter to use CPU and pump more IO.
> 
> NVME driver is also trying something similar to reduce ISR overhead.
> Discussion started in Dec-2019.
> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@xxxxxxxxxx/
> 
I actually would like to have something more generic; threaded irq
polling looks like something where most high-performance drivers would
benefit from.
So I think it might be worthwhile posting a topic for LSF/MM to have a
broader discussion.

Thing is, I wonder if it wouldn't be more efficient for high-performance
devices to first try for completions in-line, ie start with polling
_first_, then enable interrupt handler, and then shift to polling for
more completions.
But this will involve timeouts which probably would be need to be
tweaked per hardware/driver; one could even look into disable individual
functionality completely (if you disable the first and the last part
you're back to the original implementation, if you disable the first
it's the algorithm you proposed).

But as I said, that probably warrants a wider discussion.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer