On 22.01.21 00:32, Martin Wilck wrote:
On Thu, 2021-01-21 at 13:05 +0000, John Garry wrote:
Confirmed my suspicions - it looks like the host is sent more
commands
than it can handle. We would need many disks to see this issue
though,
which you have.
So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
I
suppose it could be simply fixed by setting .host_tagset in scsi host
template there.
If it's really just that, it should be easy enough to verify.
@Donald, you'd need to test with a 5.10 kernel, and after reproducing
the issue, add
.host_tagset = 1,
to the definition of pqi_driver_template in
drivers/scsi/smartpqi/smartpqi_init.c.
You don't need a patch to test that, I believe. Would you able to do
this test?
Sorry, I had overlooked this request. I reviewed this thread now, because I want to switch our production systems to 5.10 LTS.
I could reproduce the problem with Linux 5.10.22. When setting `host_tagset = 1`, the problem disappeared. Additionally, we have 5.10.22 with the patch running on two previously affected production systems for over 24 hours now. Statistics suggest, that these systems were very likely to trigger the problem in that time frame if the patch didn't work.
So I think this is a working fix which should go to 5.10 stable.
Best
Donald
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 9d0229656681f..be429a7cb1512 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -6571,6 +6571,7 @@ static struct scsi_host_template pqi_driver_template = {
.map_queues = pqi_map_queues,
.sdev_attrs = pqi_sdev_attrs,
.shost_attrs = pqi_shost_attrs,
+ .host_tagset = 1,
};
static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)