On 28/03/2022 09:42, Ajish Koshy wrote:
Executing driver on servers with more than 32 CPUs were faced with command
timeouts. This is because we were not geting completions for commands
submitted on IQ32 - IQ63.
Set E64Q bit to enable upper inbound and outbound queues 32 to 63 in the
MPI main configuration table.
Added 500ms delay after successful MPI initialization as mentioned in
controller datasheet.
Signed-off-by: Ajish Koshy <Ajish.Koshy@xxxxxxxxxxxxx>
Signed-off-by: Viswas G <Viswas.G@xxxxxxxxxxxxx>
---
I'm not sure if this change was meant to help, but I still see timeouts
on my 96-core arm64 machine with this change - see log at bottom.
I still get the feeling that this issue I mention is timing related, as
it goes away when I enable lots of heavy debug (like kasan, kmemleak,
prove_locking, etc.
drivers/scsi/pm8001/pm80xx_hwi.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index b92e82a576e3..f04c6c589615 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -766,6 +766,10 @@ static void init_default_table_values(struct pm8001_hba_info *pm8001_ha)
pm8001_ha->main_cfg_tbl.pm80xx_tbl.pcs_event_log_severity = 0x01;
pm8001_ha->main_cfg_tbl.pm80xx_tbl.fatal_err_interrupt = 0x01;
+ /* Enable higher IQs and OQs, 32 to 63, bit 16*/
+ if (pm8001_ha->max_q_num > 32)
+ pm8001_ha->main_cfg_tbl.pm80xx_tbl.fatal_err_interrupt |=
+ (1 << 16);
/* Disable end to end CRC checking */
pm8001_ha->main_cfg_tbl.pm80xx_tbl.crc_core_dump = (0x1 << 16);
@@ -1027,6 +1031,9 @@ static int mpi_init_check(struct pm8001_hba_info *pm8001_ha)
if (0x0000 != gst_len_mpistate)
return -EBUSY;
+ /* Wait for 500ms after successful MPI initialization*/
+ msleep(500);
+
return 0;
}
126.037932] EXT4-fs (sda1): recovery complete
[ 126.042297] EXT4-fs (sda1): mounted filesystem with ordered data
mode. Quota mode: none.
[ 159.939179] sas: Enter sas_scsi_recover_host busy: 256 failed: 256
[ 159.945390] sas: sas_scsi_find_task: aborting task 0x(____ptrval____)
[ 181.862870] sas: TMF task timeout for 5000c500a7b95a49 and not done
[ 193.436187] pm80xx0:: pm8001_abort_task 1126: rc= 5
[ 193.436188] pm80xx0:: mpi_ssp_completion 1937: sas IO status 0x1
[ 193.441064] sas: sas_scsi_find_task: querying task 0x(____ptrval____)
[ 193.447048] pm80xx0:: mpi_ssp_completion 1948: SAS Address of IO
Failure Drive:5000c500a7b95a49
[ 193.453528] pm80xx0:: mpi_ssp_completion 1937: sas IO status 0x3b
[ 193.462158] pm80xx0:: mpi_ssp_completion 2185: task
0x(____ptrval____) done with io_status 0x1 resp 0x0 stat 0x8c but
aborted by upper layer!
[ 193.468237] pm80xx0:: mpi_ssp_completion 1948: SAS Address of IO
Failure Drive:5000c500a7b95a49
[ 193.489658] sas: TMF task open reject failed 5000c500a7b95a49
[ 193.495538] pm80xx0:: mpi_ssp_completion 1937: sas IO status 0x3b
[ 193.501619] pm80xx0:: mpi_ssp_completion 1948: SAS Address of IO
Failure Drive:5000c500a7b95a49
[ 193.510371] sas: TMF task open reject failed 5000c500a7b95a49
[ 193.516252] pm80xx0:: mpi_ssp_completion 1937: sas IO status 0x3b
[ 193.522333] pm80xx0:: mpi_ssp_completion 1948: SAS Address of IO
Failure Drive:5000c500a7b95a49
[ 193.531075] sas: TMF task open reject failed 5000c500a7b95a49
[ 193.536899] sas: executing TMF for 5000c500a7b95a49 failed after 3
attempts!
[ 193.543937] pm80xx: rc= -5
[ 193.546631] sas: sas_scsi_find_task: task 0x(____ptrval____) result
code -5 not handled
[ 193.554622] sas: sas_scsi_find_task: aborting task 0x(____ptrval____)
[ 193.561052] sas: sas_eh_handle_sas_errors: task 0x(____ptrval____) is
done
[ 193.567917] sas: sas_scsi_find_task: aborting task 0x(____ptrval____)
ls'ing 0
[ 214.630859] sas: TMF task timeout for 5000c500a7b95a49 and not done
[ 226.241090] pm80xx0:: mpi_ssp_completion 1937: sas IO status 0x1
[ 226.241093] pm80xx0:: pm8001_abort_task 1126: rc= 5
[ 226.247084] pm80xx0:: mpi_ssp_completion 1948: SAS Address of IO
Failure Drive:5000c500a7b95a49
[ 226.247087] pm80xx0:: mpi_ssp_completion 2185: task
0x(____ptrval____) done with io_status 0x1 resp 0x0 stat 0x8c but
aborted by upper layer!
[ 226.273324] sas: sas_eh_handle_sas_errors: task 0x(____ptrval____) is
done
[ 226.280188] sas: sas_scsi_find_task: aborting task 0x(____ptrval____)
[ 247.398856] sas: TMF task timeout for 5000c500a7b95a49 and not done