Niklas Cassel <cassel@xxxxxxxxxx> 於 2024年2月6日 週二 下午9:12寫道: > > On Tue, Feb 06, 2024 at 04:39:02PM +0800, Jian-Hong Pan wrote: > > Niklas Cassel <cassel@xxxxxxxxxx> 於 2024年2月5日 週一 下午7:33寫道: > > > > Have the comparison: > > > > * Bind LPM policy with the patch "ata: ahci: Add force LPM policy > > quirk for ASUS B1400CEAE" based on kernel v6.8-rc2: > > > > $ dmesg | grep -E "(SATA|ata1|ahci)" > > [ 0.791497] ahci 10000:e0:17.0: version 3.0 > > [ 0.791499] ahci 10000:e0:17.0: force controller follow LPM policy > > [ 0.791517] ahci 10000:e0:17.0: can't derive routing for PCI INT A > > [ 0.791518] ahci 10000:e0:17.0: PCI INT A: no GSI > > [ 0.791637] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3 > > [ 0.791652] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: not Intel, > > the vendor is 0xffffffff > > [ 0.791662] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 > > Gbps 0x1 impl SATA mode > > [ 0.791663] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only > > pio slum part deso sadm sds > > [ 0.791771] scsi host0: ahci > > [ 0.791806] ata1: SATA max UDMA/133 abar m2048@0x76102000 port > > 0x76102100 irq 145 lpm-pol 3 > > [ 0.791808] ahci 10000:e0:17.0: ahci_init_one: probed > > [ 1.109393] ata1: sata_link_resume: rc=0 > > [ 1.109415] ata1: BUSY ? 0 (status: 0x50) SStatus.DET: 0x3 > > [ 1.109418] ata1: sata_link_hardreset: is 0 > > [ 1.109420] ata1: sata_link_hardreset: is on line, returns 0 > > [ 1.109444] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > > [ 1.110161] ata1.00: ATA-10: WDC WD10SPZX-80Z10T2, 04.01A04, max UDMA/133 > > [ 1.112047] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA > > [ 1.112054] ata1.00: Features: NCQ-prio > > [ 1.114814] ata1.00: configured for UDMA/133 > > [ 1.114821] ata1: ahci_set_lpm: policy=3 > > [ 1.114837] ata1: sata_link_scr_lpm: policy is 3 and original > > scontrol 0x00000300 > > [ 1.114840] ata1: sata_link_scr_lpm: write scontrol 0x00000000 > > > > The SATA link is up and SATA storage shows up. > > Full dmesg as the attachment of > > https://bugzilla.kernel.org/show_bug.cgi?id=217114#c28 > > > > * Bind LPM policy with PCI IDs like commit 104ff59af73a ("ata: ahci: > > Add Tiger Lake UP{3,4} AHCI controller"): > > > > $ dmesg | grep -E "(SATA|ata1|ahci)" > > [ 0.783125] ahci 10000:e0:17.0: version 3.0 > > [ 0.783143] ahci 10000:e0:17.0: can't derive routing for PCI INT A > > [ 0.783145] ahci 10000:e0:17.0: PCI INT A: no GSI > > [ 0.783257] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3 > > [ 0.783280] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: PCS_6 is 0x0000 > > [ 0.783281] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: write PCS_6 with 0x0001 > > [ 0.783296] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 > > Gbps 0x1 impl SATA mode > > [ 0.783298] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only > > pio slum part deso sadm sds > > [ 0.783402] scsi host0: ahci > > [ 0.783440] ata1: SATA max UDMA/133 abar m2048@0x76102000 port > > 0x76102100 irq 144 lpm-pol 3 > > [ 0.783442] ahci 10000:e0:17.0: ahci_init_one: probed > > [ 1.096930] ata1: sata_link_resume: rc=0 > > [ 1.096960] ata1: sata_link_hardreset: ata_phys_link_offline is True > > [ 1.096962] ata1: sata_link_hardreset: is off line, returns 0 > > [ 1.097000] ata1: SATA link down (SStatus 4 SControl 300) > > [ 1.097025] ata1: ahci_set_lpm: policy=3 > > [ 1.097051] ata1: sata_link_scr_lpm: policy is 3 and original > > scontrol 0x00000300 > > [ 1.097054] ata1: sata_link_scr_lpm: write scontrol 0x00000304 > > > > The SATA link is down and SATA storage disappears. > > Full dmesg as the attachment of > > https://bugzilla.kernel.org/show_bug.cgi?id=217114#c29 > > > > So in summary: > When Intel VMD is on, and the ahci_intel_pcs_quirk is applied => NOT OK > When Intel VMD is on, and the ahci_intel_pcs_quirk is not applied => OK > > When Intel VMD is off, and the ahci_intel_pcs_quirk is applied => OK > When Intel VMD is off, and the ahci_intel_pcs_quirk is not applied => ? When Intel VMD is off, and the ahci_intel_pcs_quirk is not applied => OK > Excellent find! > > > > In the bad case: > > sata_link_hardreset() sets SControl.DET to 1, to establish the interface > communication. Then sleeps for 1 ms. > > Then it calls sata_link_resume(), which clears SControl.DET to 0. > (This matches the AHCI spec which says that SControl.DET should be set > to 1 for at least 1 ms.) > > sata_link_hardreset() then calls ata_phys_link_offline(), > which is essentially defined as: return !(SStatus.DET == 0x3) > ata_phys_link_offline() returns true, since SStatus.DET == 0x4. > > SStatus.DET == 0x4 means: Phy in offline mode as a result of the > interface being disabled or running in a BIST loopback mode. > > If the physical link is not established, there is no point to call > ata_wait_ready() (which waits for the device to become ready on the > protocol level), as the physical link could not even be established. > > After that, we write SControl.DET to set bit 4 to disable the port, > in order to save power. This is only done because sata_link_hardreset() > failed to establish a link after toggling SControl.DET == 1. > > So the problem is that SStatus.DET never changed to 0x3 after toggling > SControl.DET == 1. > > > > > > However, I notice more interesting thing: > > "drivers/ata/ahci.c:ahci_intel_pcs_quirk()"! > > If bind LPM policy with PCI IDs matching, then it does the PCS quirk. > > But, binding with the patch "ata: ahci: Add force LPM policy quirk for > > ASUS B1400CEAE" does not, because the vendor is ANY vendor, not Intel. > > > > So, I did following test: > > > > If I modify the PCI vendor check condition with the pdev, not the PCI > > ID's vendor: > > > > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c > > index 7ecd56c8262a..ece709ac20d6 100644 > > --- a/drivers/ata/ahci.c > > +++ b/drivers/ata/ahci.c > > @@ -1706,12 +1709,16 @@ static void ahci_intel_pcs_quirk(struct > > pci_dev *pdev, struct ahci_host_priv *hp > > /* > > * Only apply the 6-port PCS quirk for known legacy platforms. > > */ > > - if (!id || id->vendor != PCI_VENDOR_ID_INTEL) > > + if (!id || pdev->vendor != PCI_VENDOR_ID_INTEL) { > > + dev_info(&pdev->dev, "%s: not Intel, the vendor is > > 0x%08x\n", __func__, id->vendor); > > return; > > + } > > The reason why you are seeing this is because Tiger Lake does not have > an entry in the ahci_pci_tbl in mainline, so it uses the generic entry > which matches on the AHCI class code: > https://github.com/torvalds/linux/blob/v6.8-rc3/drivers/ata/ahci.c#L636 > > If you revert 6210038aeaf4 ("ata: ahci: Revert "ata: ahci: Add Tiger Lake > UP{3,4} AHCI controller""), you will get an explicit entry in the > ahci_pci_tbl. > > But to clarify, I think that it would make sense to add: > > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c > index d2460fa985b7..e462509a45e8 100644 > --- a/drivers/ata/ahci.c > +++ b/drivers/ata/ahci.c > @@ -1672,12 +1672,18 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap, > > static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv) > { > - const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev); > + const struct pci_device_id *id; > u16 tmp16; > > + /* If the detected PCI device is not an Intel device, skip. */ > + if (pdev->vendor != PCI_VENDOR_ID_INTEL) > + return; > + > /* > - * Only apply the 6-port PCS quirk for known legacy platforms. > + * See if there is an explicit entry for this PCI device in > + * ahci_pci_tbl, if there is not, do not apply the quirk. > */ > + id = pci_match_id(ahci_pci_tbl, pdev); > if (!id || id->vendor != PCI_VENDOR_ID_INTEL) > return; > > > > > > > Then, the SATA HDD always disappears like binding the LPM policy with > > PCI IDs matching, even with the patch "ata: ahci: Add force LPM policy > > quirk for ASUS B1400CEAE". > > So, I think ahci_intel_pcs_quirk() is the key point. > > I agree. > > > Can you verify that things work as expected when doing a: > $ git revert 6210038aeaf49c395c2da57572246d93ec67f6d4 > to re-add the explicit entry, if you also do a: > > --- a/drivers/ata/ahci.c > +++ b/drivers/ata/ahci.c > @@ -1672,6 +1672,7 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap, > > static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv) > { > +#if 0 > const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev); > u16 tmp16; > > @@ -1698,6 +1699,7 @@ static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hp > tmp16 |= hpriv->port_map; > pci_write_config_word(pdev, PCS_6, tmp16); > } > +#endif > } > > To make the quirk a no-op? Here is the test result: Both enabled & disabled VMD with no-op ahci_intel_pcs_quirk() shows the SATA storage on ASUS B1400CEAE. => OK $ cat /tmp/dmesg.log [ 0.799439] ahci 10000:e0:17.0: version 3.0 [ 0.799459] ahci 10000:e0:17.0: can't derive routing for PCI INT A [ 0.799460] ahci 10000:e0:17.0: PCI INT A: no GSI [ 0.799582] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3 [ 0.799615] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode [ 0.799617] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only pio slum part deso sadm sds [ 0.799722] scsi host0: ahci [ 0.799760] ata1: SATA max UDMA/133 abar m2048@0x76102000 port 0x76102100 irq 144 lpm-pol 3 [ 0.799761] ahci 10000:e0:17.0: ahci_init_one: probed [ 1.112519] ata1: sata_link_resume: rc=0 [ 1.112541] ata1: BUSY ? 0 (status: 0x50) SStatus.DET: 0x3 [ 1.112545] ata1: sata_link_hardreset: is 0 [ 1.112547] ata1: sata_link_hardreset: is on line, returns 0 [ 1.112571] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1.113280] ata1.00: ATA-10: WDC WD10SPZX-80Z10T2, 04.01A04, max UDMA/133 [ 1.114880] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA [ 1.114887] ata1.00: Features: NCQ-prio [ 1.117148] ata1.00: configured for UDMA/133 [ 1.117154] ata1: ahci_set_lpm: policy=3 [ 1.117169] ata1: sata_link_scr_lpm: policy is 3 and original scontrol 0x00000300 [ 1.117172] ata1: sata_link_scr_lpm: write scontrol 0x00000000 Jian-Hong Pan > To be honest, this quirk looks horrible. > > Looking at the original commit: > c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond") > > It claims that: > > Rather than try to fix the PCS quirk to consider the DNV register layout > instead require explicit opt-in. The assumption is that the OS driver > need not touch this register, and platforms can be added with a new > boad_ahci_pcs7 board-id when / if problematic platforms are found in the > future. > > However, it does NOT require an explicit opt-in! > > If we were to add an entry with board type "board_ahci" or > "board_ahci_low_power" for Tiger Lake, the quirk gets applied... > > See also: > 09d6ac8dc51a ("libata/ahci: Fix PCS quirk application") > > So basically, what ahci_intel_pcs_quirk() does is that it checks > if there is an explicit entry in ahci_pci_tbl. > If there is not, the quirk is not applied. > > If there is an entry, and the enum for that board has a value that > is less than board_ahci_pcs7, the quirk is applied... > > But that will be *ALL* other board types since board_ahci_pcs7 is > defined last in the enum: > https://github.com/torvalds/linux/blob/v6.8-rc3/drivers/ata/ahci.c#L75 > > Not only that but the comment for that enum is wrong: > https://github.com/torvalds/linux/blob/v6.8-rc3/drivers/ata/ahci.c#L71-L74 > > /* > * board IDs for Intel chipsets that support more than 6 ports > * *and* end up needing the PCS quirk. > */ > > Is is the opposite... board IDs that do NOT need the PCS quirk... > > But this is not the way we add quirks. > We add a flag and a new board_id and mark the PCI device and vendor ids > that are affected to use that board, see e.g. > 20730e9b2778 ("ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers") > > We don't add a quirk and apply it for everything (board_ahci, > board_ahci_low_power) except for a specific entry (board_ahci_pcs7). > > It seems that at least Intel AHCI controllers that also have Intel VMD > enabled break when this quirk is applied. > > I guess one way would be to do a: > git show c312ef176399:drivers/ata/ahci.c | grep "PCI_VDEVICE(INTEL" > and replace everything that is not: board_ahci_pcs7 > with a board_ahci_pcs_quirk, board_ahci_low_power_pcs_quirk, and > board_ahci_avn_pcs_quirk, and after that change all board_ahci_pcs7 > entries to board_ahci, and assume that entries added since c312ef176399 > do not need the quirk. > > But it would be nice if someone from Intel could clean this up. > > > Kind regards, > Niklas