Niklas Cassel <cassel@xxxxxxxxxx> 於 2024年2月5日 週一 下午7:33寫道: > > On Fri, Feb 02, 2024 at 04:49:00PM +0800, Jian-Hong Pan wrote: > > Niklas Cassel <cassel@xxxxxxxxxx> 於 2024年2月1日 週四 下午11:01寫道: > > > > > > On Wed, Jan 31, 2024 at 11:43:59PM +0100, Niklas Cassel wrote: > > > > On Wed, Jan 31, 2024 at 07:08:12AM -0400, Daniel Drake wrote: > > > > > > (snip) > > > > > > > In libata we perform a reset of the port at boot, see: > > > > libata-sata.c:sata_link_hardreset() > > > > after writing to SControl, we call > > > > libata-core.c:ata_wait_ready() that will poll for the port being ready > > > > by calling the check_ready callback. > > > > For AHCI, this callback funcion is set to: > > > > libahci.c:ahci_check_ready(). > > > > > > > > A reset should take the device out of deep power state and should be > > > > sufficient to establish a connection (and that also seems to be the > > > > case when not using Intel VMD). > > > > > > > > However, if you want to debug, I would start by adding prints to > > > > libata-sata.c:sata_link_hardreset() > > > > libata-core.c:ata_wait_ready() > > > > libahci.c:ahci_check_ready(). > > > > > > FWIW, this will dump SStatus.DET every time the check_ready callback function > > > (ahci_check_ready()) is called: > > > > > > > > > diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c > > > index 1a63200ea437..0467e150601e 100644 > > > --- a/drivers/ata/libahci.c > > > +++ b/drivers/ata/libahci.c > > > @@ -1533,6 +1533,12 @@ int ahci_check_ready(struct ata_link *link) > > > { > > > void __iomem *port_mmio = ahci_port_base(link->ap); > > > u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF; > > > + u32 cur = 0; > > > + > > > + sata_scr_read(link, SCR_STATUS, &cur); > > > + > > > + ata_link_info(link, "BUSY ? %d (status: %#x) SStatus.DET: %#x\n", > > > + status & ATA_BUSY, status, cur & 0xf); > > > > > > return ata_check_ready(status); > > > } > > > > I think I can join the test based on kernel v6.8-rc2, too. > > > > The original ASUS B1400CEAE has only one NVMe SSD. I prepare the > > patch ("ata: ahci: Add force LPM policy quirk for ASUS B1400CEAE") to > > fix the power consumption issue for s2idle with enabled VMD. > > > > The patch is a quirk limiting ASUS B1400CEAE only, not generic for the > > SATA controller [8086:a0d3]. Then, I install another SATA HDD into > > ASUS B1400CEAE for test. The SATA HDD shows up and works. > > > > $ dmesg | grep SATA > > [ 0.785120] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 > > Gbps 0x1 impl SATA mode > > [ 0.785269] ata1: SATA max UDMA/133 abar m2048@0x76102000 port > > 0x76102100 irq 144 lpm-pol 3 > > [ 1.096684] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > > > > However, if I simply revert the commit 6210038aeaf4 ("ata: ahci: > > Revert "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"") (fix the > > conflict, of course), then the SATA HDD disappears!!? Both > > CONFIG_SATA_MOBILE_LPM_POLICY=3 and 0 can reproduce the issue. > > > > $ dmesg | grep SATA > > [ 0.783211] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 > > Gbps 0x1 impl SATA mode > > [ 0.783399] ata1: SATA max UDMA/133 abar m2048@0x76102000 port > > 0x76102100 irq 144 lpm-pol 3 > > [ 1.096685] ata1: SATA link down (SStatus 4 SControl 300) > > > > Here is the dmesg of reverting ("ata: ahci: Revert "ata: ahci: Add > > Tiger Lake UP{3,4} AHCI controller"") > > https://bugzilla.kernel.org/show_bug.cgi?id=217114#c27 > > The code already includes the debug message in ahci_check_ready() from > > Niklas. However, the dmesg does not show the "BUSY ? ..." from > > ahci_check_ready(). > > > > From these scenarios mentioned above, they all apply LPM policy to the > > SATA controller [8086:a0d3]. But, they apply LPM policy at different > > time: > > * The patch ("ata: ahci: Add force LPM policy quirk for ASUS > > B1400CEAE") applies LPM policy in early ahci_init_one(), which is the > > probe callback. > > * Reverting 6210038aeaf4 ("ata: ahci: Revert "ata: ahci: Add Tiger > > Lake UP{3,4} AHCI controller"") applies LPM policy via "ahci_pci_tbl" > > table. > > I don't see why it should matter if we set the AHCI_HFLAG_USE_LPM_POLICY > flag using ahci_pci_tbl, or by your suggested quirk in ahci_init_one(), > as in both cases the flag will be set before ahci_init_one() calls > ahci_update_initial_lpm_policy(). > > > Could it perhaps be that in order for libata to be able to detect your > drive, when VMD is enabled, we also need your patch > "PCI: vmd: enable PCI PM's L1 substates of remapped PCIe port and NVMe" ? I only apply the patch ("ata: ahci: Add force LPM policy quirk for ASUS B1400CEAE") for this test. No "PCI: vmd: enable PCI PM's L1 substates of remapped PCIe port and NVMe". :) > If that is not the case, and there actually is a difference between using > ahci_pci_tbl and your suggested quirk, then my next suggestion would be to > add prints to libata-sata.c:sata_link_scr_lpm(). That way you can dump the > exact SCR writes that are being done for the working case vs. the > non-working case. (Since I assume that there must be some difference.) I prepared debug messages as: diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 7ecd56c8262a..b910c7856d08 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1677,8 +1676,10 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap, /* Ignore processing for chipsets that don't use policy */ - if (!(hpriv->flags & AHCI_HFLAG_USE_LPM_POLICY)) + if (!(hpriv->flags & AHCI_HFLAG_USE_LPM_POLICY)) { + dev_info(ap->dev, "%s: do not use LPM policy\n", __func__); return; + } /* user modified policy via module param */ if (mobile_lpm_policy != -1) { @@ -1696,6 +1697,7 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap, update_policy: if (policy >= ATA_LPM_UNKNOWN && policy <= ATA_LPM_MIN_POWER) ap->target_lpm_policy = policy; + dev_info(ap->dev, "%s: policy %d\n", __func__, policy); } static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv) @@ -1706,12 +1708,16 @@ static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hp /* * Only apply the 6-port PCS quirk for known legacy platforms. */ - if (!id || id->vendor != PCI_VENDOR_ID_INTEL) + if (!id || id->vendor != PCI_VENDOR_ID_INTEL) { + dev_info(&pdev->dev, "%s: not Intel, the vendor is 0x%08x\n", __func__, id->vendor); return; + } /* Skip applying the quirk on Denverton and beyond */ - if (((enum board_ids) id->driver_data) >= board_ahci_pcs7) + if (((enum board_ids) id->driver_data) >= board_ahci_pcs7) { + dev_info(&pdev->dev, "%s: skip\n", __func__); return; + } /* * port_map is determined from PORTS_IMPL PCI register which is @@ -1722,8 +1728,10 @@ static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hp * before the OS boots. */ pci_read_config_word(pdev, PCS_6, &tmp16); + dev_info(&pdev->dev, "%s: PCS_6 is 0x%04x", __func__, tmp16); if ((tmp16 & hpriv->port_map) != hpriv->port_map) { tmp16 |= hpriv->port_map; + dev_info(&pdev->dev, "%s: write PCS_6 with 0x%04x", __func__, tmp16); pci_write_config_word(pdev, PCS_6, tmp16); } } @@ -1998,6 +2006,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (rc) return rc; + dev_info(&pdev->dev, "%s: probed\n", __func__); pm_runtime_put_noidle(&pdev->dev); return 0; } diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index 1a63200ea437..7e4f349554eb 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -812,6 +812,7 @@ static int ahci_set_lpm(struct ata_link *link, enum ata_lpm_policy policy, struct ahci_port_priv *pp = ap->private_data; void __iomem *port_mmio = ahci_port_base(ap); + ata_link_info(link, "%s: policy=%d\n", __func__, policy); if (policy != ATA_LPM_MAX_POWER) { /* wakeup flag only applies to the max power policy */ hints &= ~ATA_LPM_WAKE_ONLY; @@ -1533,6 +1534,12 @@ int ahci_check_ready(struct ata_link *link) { void __iomem *port_mmio = ahci_port_base(link->ap); u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF; + u32 cur = 0; + + sata_scr_read(link, SCR_STATUS, &cur); + + ata_link_info(link, "BUSY ? %d (status: %#x) SStatus.DET: %#x\n", + status & ATA_BUSY, status, cur & 0xf); return ata_check_ready(status); } diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c index 0fb1934875f2..4bcedd46bcfa 100644 --- a/drivers/ata/libata-sata.c +++ b/drivers/ata/libata-sata.c @@ -344,6 +344,7 @@ int sata_link_resume(struct ata_link *link, const unsigned int *params, if (!(rc = sata_scr_read(link, SCR_ERROR, &serror))) rc = sata_scr_write(link, SCR_ERROR, serror); + ata_link_info(link, "%s: rc=%d", __func__, rc); return rc != -EINVAL ? rc : 0; } EXPORT_SYMBOL_GPL(sata_link_resume); @@ -378,6 +379,7 @@ int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy, if (rc) return rc; + ata_link_info(link, "%s: policy is %d and original scontrol 0x%08x\n", __func__, policy, scontrol); switch (policy) { case ATA_LPM_MAX_POWER: /* disable all LPM transitions */ @@ -422,6 +424,7 @@ int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy, WARN_ON(1); } + ata_link_info(link, "%s: write scontrol 0x%08x\n", __func__, scontrol); rc = sata_scr_write(link, SCR_CONTROL, scontrol); if (rc) return rc; @@ -586,9 +589,12 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing, rc = sata_link_resume(link, timing, deadline); if (rc) goto out; + /* if link is offline nothing more to do */ - if (ata_phys_link_offline(link)) + if (ata_phys_link_offline(link)) { + ata_link_info(link, "%s: ata_phys_link_offline is True\n", __func__); goto out; + } /* Link is online. From this point, -ENODEV too is an error. */ if (online) @@ -616,12 +622,15 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing, rc = 0; if (check_ready) rc = ata_wait_ready(link, deadline, check_ready); + + ata_link_info(link, "%s: is %d\n", __func__, rc); out: if (rc && rc != -EAGAIN) { /* online is set iff link is online && reset succeeded */ if (online) *online = false; } + ata_link_info(link, "%s: is %s line, returns %d\n", __func__, *online? "on":"off", rc); return rc; } EXPORT_SYMBOL_GPL(sata_link_hardreset); Have the comparison: * Bind LPM policy with the patch "ata: ahci: Add force LPM policy quirk for ASUS B1400CEAE" based on kernel v6.8-rc2: $ dmesg | grep -E "(SATA|ata1|ahci)" [ 0.791497] ahci 10000:e0:17.0: version 3.0 [ 0.791499] ahci 10000:e0:17.0: force controller follow LPM policy [ 0.791517] ahci 10000:e0:17.0: can't derive routing for PCI INT A [ 0.791518] ahci 10000:e0:17.0: PCI INT A: no GSI [ 0.791637] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3 [ 0.791652] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: not Intel, the vendor is 0xffffffff [ 0.791662] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode [ 0.791663] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only pio slum part deso sadm sds [ 0.791771] scsi host0: ahci [ 0.791806] ata1: SATA max UDMA/133 abar m2048@0x76102000 port 0x76102100 irq 145 lpm-pol 3 [ 0.791808] ahci 10000:e0:17.0: ahci_init_one: probed [ 1.109393] ata1: sata_link_resume: rc=0 [ 1.109415] ata1: BUSY ? 0 (status: 0x50) SStatus.DET: 0x3 [ 1.109418] ata1: sata_link_hardreset: is 0 [ 1.109420] ata1: sata_link_hardreset: is on line, returns 0 [ 1.109444] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1.110161] ata1.00: ATA-10: WDC WD10SPZX-80Z10T2, 04.01A04, max UDMA/133 [ 1.112047] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA [ 1.112054] ata1.00: Features: NCQ-prio [ 1.114814] ata1.00: configured for UDMA/133 [ 1.114821] ata1: ahci_set_lpm: policy=3 [ 1.114837] ata1: sata_link_scr_lpm: policy is 3 and original scontrol 0x00000300 [ 1.114840] ata1: sata_link_scr_lpm: write scontrol 0x00000000 The SATA link is up and SATA storage shows up. Full dmesg as the attachment of https://bugzilla.kernel.org/show_bug.cgi?id=217114#c28 * Bind LPM policy with PCI IDs like commit 104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"): $ dmesg | grep -E "(SATA|ata1|ahci)" [ 0.783125] ahci 10000:e0:17.0: version 3.0 [ 0.783143] ahci 10000:e0:17.0: can't derive routing for PCI INT A [ 0.783145] ahci 10000:e0:17.0: PCI INT A: no GSI [ 0.783257] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3 [ 0.783280] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: PCS_6 is 0x0000 [ 0.783281] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: write PCS_6 with 0x0001 [ 0.783296] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode [ 0.783298] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only pio slum part deso sadm sds [ 0.783402] scsi host0: ahci [ 0.783440] ata1: SATA max UDMA/133 abar m2048@0x76102000 port 0x76102100 irq 144 lpm-pol 3 [ 0.783442] ahci 10000:e0:17.0: ahci_init_one: probed [ 1.096930] ata1: sata_link_resume: rc=0 [ 1.096960] ata1: sata_link_hardreset: ata_phys_link_offline is True [ 1.096962] ata1: sata_link_hardreset: is off line, returns 0 [ 1.097000] ata1: SATA link down (SStatus 4 SControl 300) [ 1.097025] ata1: ahci_set_lpm: policy=3 [ 1.097051] ata1: sata_link_scr_lpm: policy is 3 and original scontrol 0x00000300 [ 1.097054] ata1: sata_link_scr_lpm: write scontrol 0x00000304 The SATA link is down and SATA storage disappears. Full dmesg as the attachment of https://bugzilla.kernel.org/show_bug.cgi?id=217114#c29 The SCR writes different values with these two conditions. However, I notice more interesting thing: "drivers/ata/ahci.c:ahci_intel_pcs_quirk()"! If bind LPM policy with PCI IDs matching, then it does the PCS quirk. But, binding with the patch "ata: ahci: Add force LPM policy quirk for ASUS B1400CEAE" does not, because the vendor is ANY vendor, not Intel. So, I did following test: If I modify the PCI vendor check condition with the pdev, not the PCI ID's vendor: diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 7ecd56c8262a..ece709ac20d6 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1706,12 +1709,16 @@ static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hp /* * Only apply the 6-port PCS quirk for known legacy platforms. */ - if (!id || id->vendor != PCI_VENDOR_ID_INTEL) + if (!id || pdev->vendor != PCI_VENDOR_ID_INTEL) { + dev_info(&pdev->dev, "%s: not Intel, the vendor is 0x%08x\n", __func__, id->vendor); return; + } Then, the SATA HDD always disappears like binding the LPM policy with PCI IDs matching, even with the patch "ata: ahci: Add force LPM policy quirk for ASUS B1400CEAE". So, I think ahci_intel_pcs_quirk() is the key point. BR, Jian-Hong Pan