High performance HBAs under scsi layer can reach more than 3.0M IOPs. MegaRaid Aero controller can achieve to 3.3M IOPs.In future there may be requirement to reach 6.0+ M IOPs. One of the key bottlenecks is serving interrupts for each IO completion. Block layer has interface blk_poll which can be used as zero interrupt poll queue. Extending blk_poll to scsi mid layer helps and I was able to get max IOPs same as nvme <poll_queues> interface. blk_poll is currently merged with io_uring interface and it requires application change to adopt blk_poll. This RFC covers the logic of handling irq polling in driver using threaded ISR interface. Changes in this RFC is described as below - - Use Threaded ISR interface. - Primary ISR handler runs from h/w interrupt context. - Secondary ISR handler runs from thread context. - Driver will drain reply queue from Primary ISR handler for every interrupt it receives. - Primary handler will decide to call Secondary handler or not. This interface can be optimized later, if driver or block layer keep submission and completion stats per each h/w queue. Current megaraid_sas driver is single h/w queue based, so I have picked below decision maker. If per scsi device has outstanding command more than 8, mark that msix index as “attempt_irq_poll”. - Every time secondary ISR handler runs, primary handler will disable IRQ. Once secondary handler completes the task, it will re-enable IRQ. If there is no completion, let's wait for some time and retry polling as enable/disable irq is expensive operation. Without this wait in threaded IRQ polling, we will not allow submitter to use CPU and pump more IO. NVME driver is also trying something similar to reduce ISR overhead. Discussion started in Dec-2019. https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@xxxxxxxxxx/ Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> --- drivers/scsi/megaraid/megaraid_sas.h | 3 ++ drivers/scsi/megaraid/megaraid_sas_base.c | 11 +++-- drivers/scsi/megaraid/megaraid_sas_fusion.c | 73 +++++++++++++++++++++++++++++ 3 files changed, 83 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index 83d8c4c..f4f898a 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -2212,6 +2212,7 @@ struct megasas_irq_context { struct irq_poll irqpoll; bool irq_poll_scheduled; bool irq_line_enable; + bool attempt_irq_poll; }; struct MR_DRV_SYSTEM_INFO { @@ -2709,4 +2710,6 @@ int megasas_adp_reset_wait_for_ready(struct megasas_instance *instance, int ocr_context); int megasas_irqpoll(struct irq_poll *irqpoll, int budget); void megasas_dump_fusion_io(struct scsi_cmnd *scmd); +irqreturn_t megasas_irq_check_fusion(int irq, void *devp); +irqreturn_t megasas_irq_fusion_thread(int irq, void *devp); #endif /*LSI_MEGARAID_SAS_H */ diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index fd4b5ac..6120bd0 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -5585,7 +5585,7 @@ megasas_setup_irqs_ioapic(struct megasas_instance *instance) static int megasas_setup_irqs_msix(struct megasas_instance *instance, u8 is_probe) { - int i, j; + int i, j, ret; struct pci_dev *pdev; pdev = instance->pdev; @@ -5596,9 +5596,12 @@ megasas_setup_irqs_msix(struct megasas_instance *instance, u8 is_probe) instance->irq_context[i].MSIxIndex = i; snprintf(instance->irq_context[i].name, MEGASAS_MSIX_NAME_LEN, "%s%u-msix%u", "megasas", instance->host->host_no, i); - if (request_irq(pci_irq_vector(pdev, i), - instance->instancet->service_isr, 0, instance->irq_context[i].name, - &instance->irq_context[i])) { + ret = request_threaded_irq(pci_irq_vector(pdev, i), + megasas_irq_check_fusion, + megasas_irq_fusion_thread, IRQF_ONESHOT , + instance->irq_context[i].name, + &instance->irq_context[i]); + if (ret) { dev_err(&instance->pdev->dev, "Failed to register IRQ for vector %d.\n", i); for (j = 0; j < i; j++) diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index f3b36fd..5000c36 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -371,6 +371,7 @@ megasas_get_msix_index(struct megasas_instance *instance, struct megasas_cmd_fusion *cmd, u8 data_arms) { + struct megasas_irq_context *irq_ctx; int sdev_busy; /* nr_hw_queue = 1 for MegaRAID */ @@ -391,6 +392,12 @@ megasas_get_msix_index(struct megasas_instance *instance, else cmd->request_desc->SCSIIO.MSIxIndex = instance->reply_map[raw_smp_processor_id()]; + + irq_ctx = &instance->irq_context[cmd->request_desc->SCSIIO.MSIxIndex]; + + /* More outstanding IOs, so let's attempt polling on this reply queue.*/ + if (sdev_busy > data_arms * MR_DEVICE_HIGH_IOPS_DEPTH) + irq_ctx->attempt_irq_poll = true; } /** @@ -2754,6 +2761,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance, u16 ld; u32 start_lba_lo, start_lba_hi, device_id, datalength = 0; u32 scsi_buff_len; + struct megasas_irq_context *irq_ctx; struct MPI2_RAID_SCSI_IO_REQUEST *io_request; struct IO_REQUEST_INFO io_info; struct fusion_context *fusion; @@ -3101,6 +3109,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance, u16 pd_index = 0; u16 os_timeout_value; u16 timeout_limit; + struct megasas_irq_context *irq_ctx; struct MR_DRV_RAID_MAP_ALL *local_map_ptr; struct RAID_CONTEXT *pRAID_Context; struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync; @@ -3817,6 +3826,70 @@ static irqreturn_t megasas_isr_fusion(int irq, void *devp) ? IRQ_HANDLED : IRQ_NONE; } +/* + * megasas_irq_fusion_thread: + */ +irqreturn_t megasas_irq_fusion_thread(int irq, void *devp) +{ + int total_count = 0, num_completed = 0; + struct megasas_irq_context *irq_context = devp; + struct megasas_instance *instance = irq_context->instance; + + do { + num_completed = complete_cmd_fusion(instance, irq_context->MSIxIndex, irq_context); + + /* If there is no completion, let's sleep and poll once again + * since enable/disable irq is expensive operation. + * It will not help polling without any sleep since submission and + * completion happens on the same cpu. + * Polling in tight loop blocks activity on submission. + */ + if (!num_completed) { + usleep_range(2, 20); + num_completed = complete_cmd_fusion(instance, irq_context->MSIxIndex, irq_context); + } + + total_count += num_completed; + } while (num_completed && total_count < instance->cur_can_queue); + + irq_context->attempt_irq_poll = false; + enable_irq(irq_context->os_irq); + + return IRQ_HANDLED; +} + +/* + * megasas_irq_check_fusion: + * + * For threaded interrupts, this handler will be called and its job is to + * complete command in first attempt before it calls threaded isr handler. + * + * Threaded ISR handler will be called if there is a prediction of more + * completion pending. + */ +irqreturn_t megasas_irq_check_fusion(int irq, void *devp) +{ + irqreturn_t ret; + struct megasas_irq_context *irq_context = devp; + struct megasas_instance *instance = irq_context->instance; + + if (instance->mask_interrupts) + return IRQ_NONE; + + /* First attempt from primary handler */ + ret = megasas_isr_fusion(irq, devp); + + /* Primary handler predict more IO in completion queue, + * so let's use threaded irq poll. + */ + if (!irq_context->attempt_irq_poll) + return IRQ_HANDLED; + + disable_irq_nosync(irq_context->os_irq); + return IRQ_WAKE_THREAD; +} + + /** * build_mpt_mfi_pass_thru - builds a cmd fo MFI Pass thru * @instance: Adapter soft state -- 2.9.5