________________________________________ From: John Meneghini Sent: Wednesday, January 29, 2025 9:54 AM To: Sagar Biradar - C34249; jejb@xxxxxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx Cc: Tomas Henzl; Marco Patalano; Scott Benesh - C33703; Don Brace - C33706; Tom White - C33503; Abhinav Kuchibhotla - C70322 Subject: Re: [PATCH] aacraid: Fix reply queue mapping to CPUs based on IRQ affinity EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe Sagar, I'm having trouble applying this patch to scsi/6.14/staging. Please re-base your patch and submit a v2. You might want to fix the broken "--cc=" argument in your email. /John Hi John, Thanks for the efforts and pointing out the merge issues. I have submitted a v2 of this patch and also added a comment before the fucntion, as you rightly pointed to. Could you please take a look and provide your thoughts? Thanks in advance Applying: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity Patch failed at 0001 aacraid: Fix reply queue mapping to CPUs based on IRQ affinity error: patch failed: drivers/scsi/aacraid/linit.c:1488 error: drivers/scsi/aacraid/linit.c: patch does not apply hint: Use 'git am --show-current-patch=diff' to see the failed patch hint: When you have resolved this problem, run "git am --continue". hint: If you prefer to skip this patch, run "git am --skip" instead. hint: To restore the original branch and stop patching, run "git am --abort". hint: Disable this message with "git config set advice.mergeConflict false" John A. Meneghini Senior Principal Platform Storage Engineer RHEL SST - Platform Storage Group jmeneghi@xxxxxxxxxx On 1/27/25 4:32 PM, Sagar Biradar wrote: > Fixes: "(c5becf57dd56 Revert "scsi: aacraid: Reply queue mapping to CPUs > based on IRQ affinity)" > Original patch: "(9dc704dcc09e scsi: aacraid: Reply queue mapping to > CPUs based on IRQ affinity)" > > Fix a rare I/O hang that arises because of an MSIx vector not having a > mapped online CPU upon receiving completion. > > A new modparam "aac_cpu_offline_feature" to control CPU offlining. > By default, it's disabled (0), but can be enabled during driver load > with: > insmod ./aacraid.ko aac_cpu_offline_feature=1 > Enabling this feature allows CPU offlining but may cause some IO > performance drop. It is recommended to enable it during driver load > as the relevant changes are part of the initialization routine. > > SCSI cmds use the mq_map to get the vector_no via blk_mq_unique_tag() > and blk_mq_unique_tag_to_hwq() - which are setup during the blk_mq init. > For reserved cmds, or the ones before the blk_mq init, use the vector_no > 0, which is the norm since don't yet have a proper mapping to the queues. > > Reviewed-by: Gilbert Wu <gilbert.wu@xxxxxxxxxxxxx> > Reviewed-by: John Meneghini <jmeneghi@xxxxxxxxxx> > Reviewed-by: Tomas Henzl <thenzl@xxxxxxxxxx> > Tested-by: Marco Patalano <mpatalan@xxxxxxxxxx> > Signed-off-by: Sagar Biradar <Sagar.Biradar@xxxxxxxxxxxxx> > --- > drivers/scsi/aacraid/aachba.c | 6 ++++++ > drivers/scsi/aacraid/aacraid.h | 2 ++ > drivers/scsi/aacraid/commsup.c | 10 +++++++++- > drivers/scsi/aacraid/linit.c | 16 ++++++++++++++++ > drivers/scsi/aacraid/src.c | 28 ++++++++++++++++++++++++++-- > 5 files changed, 59 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c > index abf6a82b74af..f325e79a1a01 100644 > --- a/drivers/scsi/aacraid/aachba.c > +++ b/drivers/scsi/aacraid/aachba.c > @@ -328,6 +328,12 @@ MODULE_PARM_DESC(wwn, "Select a WWN type for the arrays:\n" > "\t1 - Array Meta Data Signature (default)\n" > "\t2 - Adapter Serial Number"); > > +int aac_cpu_offline_feature; > +module_param_named(aac_cpu_offline_feature, aac_cpu_offline_feature, int, 0644); > +MODULE_PARM_DESC(aac_cpu_offline_feature, > + "This enables CPU offline feature and may result in IO performance drop in some cases:\n" > + "\t0 - Disable (default)\n" > + "\t1 - Enable"); > > static inline int aac_valid_context(struct scsi_cmnd *scsicmd, > struct fib *fibptr) { > diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h > index 8c384c25dca1..dba7ffc6d543 100644 > --- a/drivers/scsi/aacraid/aacraid.h > +++ b/drivers/scsi/aacraid/aacraid.h > @@ -1673,6 +1673,7 @@ struct aac_dev > u32 handle_pci_error; > bool init_reset; > u8 soft_reset_support; > + u8 use_map_queue; > }; > > #define aac_adapter_interrupt(dev) \ > @@ -2777,4 +2778,5 @@ extern int update_interval; > extern int check_interval; > extern int aac_check_reset; > extern int aac_fib_dump; > +extern int aac_cpu_offline_feature; > #endif > diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c > index ffef61c4aa01..5e12899823ac 100644 > --- a/drivers/scsi/aacraid/commsup.c > +++ b/drivers/scsi/aacraid/commsup.c > @@ -223,8 +223,16 @@ int aac_fib_setup(struct aac_dev * dev) > struct fib *aac_fib_alloc_tag(struct aac_dev *dev, struct scsi_cmnd *scmd) > { > struct fib *fibptr; > + u32 blk_tag; > + int i; > + > + if (aac_cpu_offline_feature == 1) { > + blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd)); > + i = blk_mq_unique_tag_to_tag(blk_tag); > + fibptr = &dev->fibs[i]; > + } else > + fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag]; > > - fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag]; > /* > * Null out fields that depend on being zero at the start of > * each I/O > diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c > index 68f4dbcfff49..56c5ce10555a 100644 > --- a/drivers/scsi/aacraid/linit.c > +++ b/drivers/scsi/aacraid/linit.c > @@ -504,6 +504,15 @@ static int aac_slave_configure(struct scsi_device *sdev) > return 0; > } > > +static void aac_map_queues(struct Scsi_Host *shost) > +{ > + struct aac_dev *aac = (struct aac_dev *)shost->hostdata; > + > + blk_mq_map_hw_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT], > + &aac->pdev->dev, 0); > + aac->use_map_queue = true; > +} > + > /** > * aac_change_queue_depth - alter queue depths > * @sdev: SCSI device we are considering > @@ -1488,6 +1497,7 @@ static const struct scsi_host_template aac_driver_template = { > .bios_param = aac_biosparm, > .shost_groups = aac_host_groups, > .slave_configure = aac_slave_configure, > + .map_queues = aac_map_queues, > .change_queue_depth = aac_change_queue_depth, > .sdev_groups = aac_dev_groups, > .eh_abort_handler = aac_eh_abort, > @@ -1775,6 +1785,11 @@ static int aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) > shost->max_lun = AAC_MAX_LUN; > > pci_set_drvdata(pdev, shost); > + if (aac_cpu_offline_feature == 1) { > + shost->nr_hw_queues = aac->max_msix; > + shost->can_queue = aac->vector_cap; > + shost->host_tagset = 1; > + } > > error = scsi_add_host(shost, &pdev->dev); > if (error) > @@ -1906,6 +1921,7 @@ static void aac_remove_one(struct pci_dev *pdev) > struct aac_dev *aac = (struct aac_dev *)shost->hostdata; > > aac_cancel_rescan_worker(aac); > + aac->use_map_queue = false; > scsi_remove_host(shost); > > __aac_shutdown(aac); > diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c > index 28115ed637e8..befc32353b84 100644 > --- a/drivers/scsi/aacraid/src.c > +++ b/drivers/scsi/aacraid/src.c > @@ -493,6 +493,10 @@ static int aac_src_deliver_message(struct fib *fib) > #endif > > u16 vector_no; > + struct scsi_cmnd *scmd; > + u32 blk_tag; > + struct Scsi_Host *shost = dev->scsi_host_ptr; > + struct blk_mq_queue_map *qmap; > > atomic_inc(&q->numpending); > > @@ -505,8 +509,28 @@ static int aac_src_deliver_message(struct fib *fib) > if ((dev->comm_interface == AAC_COMM_MESSAGE_TYPE3) > && dev->sa_firmware) > vector_no = aac_get_vector(dev); > - else > - vector_no = fib->vector_no; > + else { > + if (aac_cpu_offline_feature == 1) { > + if (!fib->vector_no || !fib->callback_data) { > + if (shost && dev->use_map_queue) { > + qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT]; > + vector_no = qmap->mq_map[raw_smp_processor_id()]; > + } > + /* > + * We hardcode the vector_no for > + * reserved commands as a valid shost is > + * absent during the init > + */ > + else > + vector_no = 0; > + } else { > + scmd = (struct scsi_cmnd *)fib->callback_data; > + blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd)); > + vector_no = blk_mq_unique_tag_to_hwq(blk_tag); > + } > + } else > + vector_no = fib->vector_no; > + } > > if (native_hba) { > if (fib->flags & FIB_CONTEXT_FLAG_NATIVE_HBA_TMF) {