RE: [PATCH V3 15/25] smartpqi: fix host qdepth limit

<Don.Brace@xxxxxxxxxxxxx> · Tue, 15 Dec 2020 20:23:33 +0000

Please see answers below. Hope this helps.

-----Original Message-----
From: Paul Menzel [mailto:pmenzel@xxxxxxxxxxxxx] 
Sent: Monday, December 14, 2020 11:54 AM
To: Don Brace - C33706 <Don.Brace@xxxxxxxxxxxxx>; Kevin Barnett - C33748 <Kevin.Barnett@xxxxxxxxxxxxx>; Scott Teel - C33730 <Scott.Teel@xxxxxxxxxxxxx>; Justin Lindley - C33718 <Justin.Lindley@xxxxxxxxxxxxx>; Scott Benesh - C33703 <Scott.Benesh@xxxxxxxxxxxxx>; Gerry Morong - C33720 <Gerry.Morong@xxxxxxxxxxxxx>; Mahesh Rajashekhara - I30583 <Mahesh.Rajashekhara@xxxxxxxxxxxxx>; hch@xxxxxxxxxxxxx; joseph.szczypek@xxxxxxx; POSWALD@xxxxxxxx; James E. J. Bottomley <jejb@xxxxxxxxxxxxx>; Martin K. Petersen <martin.petersen@xxxxxxxxxx>
Cc: linux-scsi@xxxxxxxxxxxxxxx; it+linux-scsi@xxxxxxxxxxxxx; Donald Buczek <buczek@xxxxxxxxxxxxx>; Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH V3 15/25] smartpqi: fix host qdepth limit

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

Dear Don, dear Mahesh,

Am 10.12.20 um 21:35 schrieb Don Brace:
> From: Mahesh Rajashekhara <mahesh.rajashekhara@xxxxxxxxxxxxx>
>
> * Correct scsi-mid-layer sending more requests than
>    exposed host Q depth causing firmware ASSERT issue.
>    * Add host Qdepth counter.

This supposedly fixes the regression between Linux 5.4 and 5.9, which we reported in [1].

     kernel: smartpqi 0000:89:00.0: controller is offline: status code 0x6100c
     kernel: smartpqi 0000:89:00.0: controller offline

Thank you for looking into this issue and fixing it. We are going to test this.

For easily finding these things in the git history or the WWW, it would be great if these log messages could be included (in the future).
DON> Thanks for your suggestion. Well add them in the next time.

Also, that means, that the regression is still present in Linux 5.10, released yesterday, and this commit does not apply to these versions.

DON> They have started 5.10-RC7 now. So possibly 5.11 or 5.12 depending when all of the patches are applied. The patch in question is among 28 other patches.

Mahesh, do you have any idea, what commit caused the regression and why the issue started to show up?
DON> The smartpqi driver sets two scsi_host_template member fields: .can_queue and .nr_hw_queues. But we have not yet converted to host_tagset. So the queue_depth becomes nr_hw_queues * can_queue, which is more than the hw can support. That can be verified by looking at scsi_host.h.
        /*
         * In scsi-mq mode, the number of hardware queues supported by the LLD.
         *
         * Note: it is assumed that each hardware queue has a queue depth of
         * can_queue. In other words, the total queue depth per host
         * is nr_hw_queues * can_queue. However, for when host_tagset is set,
         * the total queue depth is can_queue.
         */

So, until we make this change, the queue_depth change prevents the above issue from happening.
Note: you will see better performance and more evenly distributed performance with this patch applied.

James, Martin, how are regressions handled for the SCSI subsystem?

Regarding the diff, personally, I find the commit message much too terse. `pqi_scsi_queue_command()` will return `SCSI_MLQUEUE_HOST_BUSY` for the case of too many requests. Will that be logged by Linux in some log level? In my opinion it points to a performance problem, and should be at least logged as a notice or warning.
DON> We could add a ratelimited print, but we did not want to interrupt the CPU for logging these messages.
Also, you should see better and more even performance.

Can `ctrl_info->scsi_ml_can_queue` be queried somehow maybe in the logs?
`sudo find /sys -name queue` did not display something interesting.
All I find is /sys/class/scsi_host/host<X>/{cmd_per_lun, can_queue}, but not nr_hw_queues, but there is one queue for each CPU.

[1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
      "Linux 5.9: smartpqi: controller is offline: status code 0x6100c"

> Reviewed-by: Scott Benesh <scott.benesh@xxxxxxxxxxxxx>
> Reviewed-by: Scott Teel <scott.teel@xxxxxxxxxxxxx>
> Reviewed-by: Kevin Barnett <kevin.barnett@xxxxxxxxxxxxx>
> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@xxxxxxxxxxxxx>
> Signed-off-by: Don Brace <don.brace@xxxxxxxxxxxxx>
> ---
>   drivers/scsi/smartpqi/smartpqi.h      |    2 ++
>   drivers/scsi/smartpqi/smartpqi_init.c |   19 ++++++++++++++++---
>   2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/smartpqi/smartpqi.h 
> b/drivers/scsi/smartpqi/smartpqi.h
> index 0b94c755a74c..c3b103b15924 100644
> --- a/drivers/scsi/smartpqi/smartpqi.h
> +++ b/drivers/scsi/smartpqi/smartpqi.h
> @@ -1345,6 +1345,8 @@ struct pqi_ctrl_info {
>       struct work_struct ofa_quiesce_work;
>       u32             ofa_bytes_requested;
>       u16             ofa_cancel_reason;
> +
> +     atomic_t        total_scmds_outstanding;
>   };

What is the difference between the already existing

     atomic_t scsi_cmds_outstanding;

and the new counter?

     atomic_t   total_scmds_outstanding;

The names are quite similar, so different names or a comment might be useful.
DON> total_scmds_outstanding tracks the queue_depth for the entire driver instance.

>
>   enum pqi_ctrl_mode {
> diff --git a/drivers/scsi/smartpqi/smartpqi_init.c 
> b/drivers/scsi/smartpqi/smartpqi_init.c
> index 082b17e9bd80..4e088f47d95f 100644
> --- a/drivers/scsi/smartpqi/smartpqi_init.c
> +++ b/drivers/scsi/smartpqi/smartpqi_init.c
> @@ -5578,6 +5578,8 @@ static inline bool pqi_is_bypass_eligible_request(struct scsi_cmnd *scmd)
>   void pqi_prep_for_scsi_done(struct scsi_cmnd *scmd)
>   {
>       struct pqi_scsi_dev *device;
> +     struct pqi_ctrl_info *ctrl_info;
> +     struct Scsi_Host *shost;
>
>       if (!scmd->device) {
>               set_host_byte(scmd, DID_NO_CONNECT); @@ -5590,7 +5592,11 
> @@ void pqi_prep_for_scsi_done(struct scsi_cmnd *scmd)
>               return;
>       }
>
> +     shost = scmd->device->host;

The function already has a variable `device`, which is assigned “hostdata” though:

     device = scmd->device->hostdata;

This confuses me. Maybe this should be cleaned up in a followup commit, and the variable device be reused above in the `shost` assignment.
DON> host points back to the driver instance of our HW.
DON> hostdata is a driver usable field that points back to our internal device pointer <LUN or HBA>.

> +     ctrl_info = shost_to_hba(shost);
> +
>       atomic_dec(&device->scsi_cmds_outstanding);
> +     atomic_dec(&ctrl_info->total_scmds_outstanding);
>   }
>
>   static bool pqi_is_parity_write_stream(struct pqi_ctrl_info 
> *ctrl_info, @@ -5678,6 +5684,7 @@ static int pqi_scsi_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scm
>       bool raid_bypassed;
>
>       device = scmd->device->hostdata;
> +     ctrl_info = shost_to_hba(shost);
>
>       if (!device) {
>               set_host_byte(scmd, DID_NO_CONNECT); @@ -5686,8 +5693,11 
> @@ static int pqi_scsi_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scm
>       }
>
>       atomic_inc(&device->scsi_cmds_outstanding);
> -
> -     ctrl_info = shost_to_hba(shost);

I believe, style changes (re-ordering) in commits fixing regressions make it harder to backport it.

> +     if (atomic_inc_return(&ctrl_info->total_scmds_outstanding) >
> +             ctrl_info->scsi_ml_can_queue) {
> +             rc = SCSI_MLQUEUE_HOST_BUSY;
> +             goto out;
> +     }
>
>       if (pqi_ctrl_offline(ctrl_info) || pqi_device_in_remove(device)) {
>               set_host_byte(scmd, DID_NO_CONNECT); @@ -5730,8 +5740,10 
> @@ static int pqi_scsi_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scm
>       }
>
>   out:
> -     if (rc)
> +     if (rc) {
>               atomic_dec(&device->scsi_cmds_outstanding);
> +             atomic_dec(&ctrl_info->total_scmds_outstanding);
> +     }
>
>       return rc;
>   }
> @@ -8054,6 +8066,7 @@ static struct pqi_ctrl_info 
> *pqi_alloc_ctrl_info(int numa_node)
>
>       INIT_WORK(&ctrl_info->event_work, pqi_event_worker);
>       atomic_set(&ctrl_info->num_interrupts, 0);
> +     atomic_set(&ctrl_info->total_scmds_outstanding, 0);
>
>       INIT_DELAYED_WORK(&ctrl_info->rescan_work, pqi_rescan_worker);
>       INIT_DELAYED_WORK(&ctrl_info->update_time_work, 
> pqi_update_time_worker);

Kind regards,

Paul