On 2021/12/09 21:19, John Garry wrote: > On 09/12/2021 12:04, Ajish.Koshy@xxxxxxxxxxxxx wrote: >> Was testing the patch on arm server. Didn't see crash there but observing timeouts and error >> handling getting triggered for drives. But the same code works fine on x86. >> >> At your end do you still face similar situation on arm server ? > > Yeah, I see that as well even without enabling the IOMMU. > > root@(none)$ [ 163.974907] sas: Enter sas_scsi_recover_host busy: 222 > failed: 222 > [ 163.981108] sas: sas_scsi_find_task: aborting task 0x000000005c703676 > root@(none)$ > root@(none)$ [ 185.963714] pm80xx0:: pm8001_exec_internal_tmf_task > 757:TMF task[1]timeout. > > I figured that it was a card FW issue as I have been using what I got > out the box, and I have no tool to update the firmware on an arm host. > > It seems that SSP and STP commands are not completing for some reason, > from the "busy: 222" line. I have this HBA: c1:00.0 Serial Attached SCSI controller: ATTO Technology, Inc. ExpressSAS 12Gb/s SAS/SATA HBA (rev 06) Subsystem: ATTO Technology, Inc. ExpressSAS H120F Which uses the pm80xx driver and I do not see such error. E.g.: [335568.262395] pm80xx 0000:c1:00.0: pm80xx: driver version 0.1.40 [335568.268931] :: pm8001_pci_alloc 529:Setting link rate to default value [335569.489392] scsi host18: pm80xx [335570.801031] sas: phy-18:4 added to port-18:0, phy_mask:0x10 (50010860002f5644) [335570.801225] sas: DOING DISCOVERY on port 0, pid:58830 [335570.801310] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [335570.807638] sas: ata22: end_device-18:0: dev error handler [335570.964864] ata22.00: ATA-11: WDC WUH721818ALN604, PCGNW232, max UDMA/133 [335579.062526] ata22.00: 4394582016 sectors, multi 0: LBA48 NCQ (depth 32) [335579.070487] ata22.00: Features: NCQ-sndrcv NCQ-prio [335579.307260] ata22.00: configured for UDMA/133 [335579.313018] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [335579.323512] scsi 18:0:0:0: Direct-Access ATA WDC WUH721818AL W232 PQ: 0 ANSI: 5 [335579.333243] sas: DONE DISCOVERY on port 0, pid:58830, result:0 [335579.333338] sas: phy-18:5 added to port-18:1, phy_mask:0x20 (50010860002f5645) [335579.333453] sas: DOING DISCOVERY on port 1, pid:58830 [335579.333596] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [335579.341596] sas: ata23: end_device-18:1: dev error handler [335579.341640] sas: ata22: end_device-18:0: dev error handler [335579.500374] ata23.00: ATA-11: WDC WUH721818ALN604, PCGNWTW2, max UDMA/133 [335588.427115] ata23.00: 4394582016 sectors, multi 0: LBA48 NCQ (depth 32) [335588.435158] ata23.00: Features: NCQ-sndrcv NCQ-prio [335588.513212] ata23.00: configured for UDMA/133 [335588.519027] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [335588.537683] scsi 18:0:1:0: Direct-Access ATA WDC WUH721818AL WTW2 PQ: 0 ANSI: 5 [335588.565288] sas: DONE DISCOVERY on port 1, pid:58830, result:0 [335588.565543] sas: phy-18:7 added to port-18:2, phy_mask:0x80 (50010860002f5647) [335588.566917] sas: DOING DISCOVERY on port 2, pid:58830 [335588.567515] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [335588.574948] sas: ata22: end_device-18:0: dev error handler [335588.574971] sas: ata23: end_device-18:1: dev error handler [335588.574979] sas: ata24: end_device-18:2: dev error handler [335588.732190] ata24.00: ATA-11: WDC WSH722020ALN604, PCGMW803, max UDMA/133 [335597.778187] ata24.00: 4882956288 sectors, multi 0: LBA48 NCQ (depth 32) [335597.788081] ata24.00: Features: NCQ-sndrcv NCQ-prio [335597.850404] ata24.00: configured for UDMA/133 [335597.856225] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [335597.866680] scsi 18:0:2:0: Direct-Access-ZBC ATA WDC WSH722020AL W803 PQ: 0 ANSI: 7 [335597.876485] sas: DONE DISCOVERY on port 2, pid:58830, result:0 [335597.879720] sd 18:0:0:0: [sdd] 4394582016 4096-byte logical blocks: (18.0 TB/16.4 TiB) [335597.881483] sd 18:0:0:0: Attached scsi generic sg3 type 0 [335597.888827] sd 18:0:0:0: [sdd] Write Protect is off [335597.888830] sd 18:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [335597.888839] sd 18:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [335597.968683] sd 18:0:1:0: [sde] 4394582016 4096-byte logical blocks: (18.0 TB/16.4 TiB) [335597.969489] sd 18:0:1:0: Attached scsi generic sg4 type 0 [335597.978210] sd 18:0:1:0: [sde] Write Protect is off [335597.978214] sd 18:0:1:0: [sde] Mode Sense: 00 3a 00 00 [335597.978228] sd 18:0:1:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [335598.053869] sd 18:0:2:0: [sdf] Host-managed zoned block device [335598.054476] sd 18:0:2:0: Attached scsi generic sg5 type 20 [335598.066428] sd 18:0:2:0: [sdf] 4882956288 4096-byte logical blocks: (20.0 TB/18.2 TiB) [335598.093762] sd 18:0:2:0: [sdf] Write Protect is off [335598.100101] sd 18:0:2:0: [sdf] Mode Sense: 00 3a 00 00 [335598.100119] sd 18:0:2:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [335598.158832] sd 18:0:1:0: [sde] Attached SCSI disk [335598.158870] sd 18:0:0:0: [sdd] Attached SCSI disk [335600.015402] sd 18:0:2:0: [sdf] 74508 zones of 65536 logical blocks [335600.099235] sd 18:0:2:0: [sdf] Attached SCSI disk The driver is uselessly verbose (for some reasons, the dbg messages show up), but no errors. -- Damien Le Moal Western Digital Research