Re: [PATCH v3 1/3] scsi: hisi_sas: Enable force phy when SATA disk directly connected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2025/2/25 1:34, John Garry wrote:
On 24/02/2025 13:12, yangxingui wrote:
Hi, John

On 2025/2/24 20:21, John Garry wrote:
On 24/02/2025 09:36, yangxingui wrote:


So do you mean that all IO to this disk will error? If yes, then this is good.
Yes, IO error or IO result does not meet expectations. As shown in the log below, due to an abnormal port ID, the SNs of the two disks read are the same.

Do you mean that this is mainline kernel behaviour, below:
Yes


[448000.504979] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy1 link_rate=10(sata) [448000.505070] sas: phy-10:1 added to port-10:1, phy_mask:0x2 (5000000000000a01)
[448000.505247] sas: DOING DISCOVERY on port 1, pid:2239187
[448000.505255] hisi_sas_v3_hw 0000:d4:02.0: dev[2:5] found
[448000.505274] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[448000.505295] sas: ata31: end_device-10:0: dev error handler
[448000.505299] sas: ata32: end_device-10:1: dev error handler
[448001.300517] hisi_sas_v3_hw 0000:d4:02.0: phydown: phy1 phy_state=0x1   // phy1's hw port id released
[448001.300522] hisi_sas_v3_hw 0000:d4:02.0: ignore flutter phy1 down
[448001.436187] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy2 link_rate=10(sata) // phy2 occupies the hardware port ID of phy1 [448001.608766] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy1 link_rate=10(sata) // phy1 was assigned a new hardware port ID [448001.775605] ata32.00: ATA-11: WUH721816ALE6L4, PCGAW660, max UDMA/133 [448002.159364] sas: phy-10:2 added to port-10:2, phy_mask:0x4 (5000000000000a02)
[448002.159575] sas: DOING DISCOVERY on port 2, pid:2239187
[448002.159581] hisi_sas_v3_hw 0000:d4:02.0: dev[3:5] found
[448002.159602] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[448002.159623] sas: ata31: end_device-10:0: dev error handler
[448002.159633] sas: ata32: end_device-10:1: dev error handler
[448002.159636] sas: ata33: end_device-10:2: dev error handler
[448002.393349] hisi_sas_v3_hw 0000:d4:02.0: phydown: phy2 phy_state=0x3
[448002.393354] hisi_sas_v3_hw 0000:d4:02.0: ignore flutter phy2 down
[448002.684937] hisi_sas_v3_hw 0000:d4:02.0: phyup: phy2 link_rate=10(sata) [448002.851639] ata33.00: ATA-11: WUH721816ALE6L4, PCGAW660, max UDMA/133 [448002.851644] ata33.00: 31251759104 sectors, multi 0: LBA48 NCQ (depth 32)


But I still don't like the handling in this patch. If we get a phy up, then the directly-attached disk ideally should be gone already, so should not have to do this handling.
There is no problem when the disk is removed. The current problem is that multiple phy up at the same time. When one of the phys up and enters error handler to execute hardreset, the phy will down and then up. other phy up will probably occupy the hw port id of the previous phy which do hardreset in EH.

Could you do this work (itct update) in lldd_ata_check_ready CB?

It's a good idea only for sata disks, but the current problem is not only the scenario of connecting the sata disk. This phenomenon occasionally occurs when the SAS disk is connected after the controller is reset. The following is the log of the stress test recurrence after incorporating the current repair patch. Although we called hisi_sas_refresh_port_id() on controller reset.

[ 5387.235015] hisi_sas_v3_hw 0000:74:02.0: I_T nexus reset: internal abort (-5)
[ 5387.242126] sas: clear nexus ha
[ 5387.245283] hisi_sas_v3_hw 0000:74:02.0: controller resetting...
[ 5388.908489] hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=10(sata) [ 5388.915090] hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=10(sata)
[ 5388.934505] hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
[ 5388.941009] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=9(sata)
[ 5388.950976] hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
[ 5388.957048] hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=11
[ 5388.980097] hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=11
[ 5388.986169] hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=11 // phy3 attached a sas disk. [ 5389.065103] hisi_sas_v3_hw 0000:74:02.0: task prep: SAS port1 not attach device
[ 5389.072409] sas: executing TMF task failed 5000c500ae49c8f1 (-70)
[ 5389.078492] hisi_sas_v3_hw 0000:74:02.0: task prep: SAS port1 not attach device
[ 5389.085780] sas: executing TMF task failed 5000c500ae49c8f1 (-70)
[ 5389.091861] hisi_sas_v3_hw 0000:74:02.0: task prep: SAS port1 not attach device
[ 5389.099146] sas: executing TMF task failed 5000c500ae49c8f1 (-70)
[ 5389.107419] hisi_sas_v3_hw 0000:74:02.0: controller reset complete // controller reset finished
[ 5389.113686] hisi_sas_v3_hw 0000:74:02.0: phydown: phy0 phy_state=0xfe
[ 5389.120099] hisi_sas_v3_hw 0000:74:02.0: ignore flutter phy0 down
[ 5389.136399] hisi_sas_v3_hw 0000:74:02.0: phy3's hw port id changed from 1 to 7
[ 5389.308114] hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)



pm8001 sends sas_notify_port_event(sas_phy, PORTE_LINK_RESET_ERR,) link reset errors - can you consider doing that in hisi_sas_update_port_id() when you find an inconstant port id?
Currently during phyup, the hw port id may change, and the corresponding hisi_sas_port.id and the port id in itct are not updated synchronously. The problem caused is not a link error, so we don't need deform port, just update the port id when phyup.

Thanks.
Xingui





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux