Hi Paul, On Fri, Jun 2, 2023 at 4:43 PM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote: > > [Cc: linux-pci@xxxxxxxxxxxxxxx] > > Dear Kai, > > > Thank you for your patch. > > Am 02.06.23 um 03:46 schrieb Kai-Heng Feng: > > On Fri, Jun 2, 2023 at 4:24 AM Alexander H Duyck wrote: > >> > >> On Fri, 2023-06-02 at 00:25 +0800, Kai-Heng Feng wrote: > >>> On some I219 devices, ethernet cable plugging detection only works once > >>> from PCI D3 state. Subsequent cable plugging does set PME bit correctly, > >>> but device still doesn't get woken up. > > Could you please add the list of all the devices with the firmware > version, you know this problem exists on? Please also add the URLs of > the bug reports at the end of the commit message. Firmware do you mean the firmware on I219 device, or BIOS? > > Is that problem logged somehow? Could a log message be added first? There's nothing gets logged. When this happens the ACPI GPE is dead silent. > > >> Do we have a root cause on why things don't get woken up? This seems > >> like an issue where something isn't getting reset after the first > >> wakeup and so future ones are blocked. > > > > No we don't know the root cause. > > I guess the D3 wake isn't really tested under Windows because I219 > > doesn't use runtime D3 on Windows. > > How do you know? Where you able to look at the Microsoft Windows driver > source code? Device Manager shows the current PCI state. > > >>> Since I219 connects to the root complex directly, it relies on platform > >>> firmware (ACPI) to wake it up. In this case, the GPE from _PRW only > >>> works for first cable plugging but fails to notify the driver for > >>> subsequent plugging events. > >>> > >>> The issue was originally found on CNP, but the same issue can be found > >>> on ADL too. So workaround the issue by continuing use PME poll after > > The verb is spelled with a space: work around. Sure, will change it. > > >>> first ACPI wake. As PME poll is always used, the runtime suspend > >>> restriction for CNP can also be removed. > > When was that restriction for CNP added? The restriction for CNP+ was introduced by commit 459d69c407f9 ("e1000e: Disable runtime PM on CNP+") and modified by 3335369bad99 ("e1000e: Remove the runtime suspend restriction on CNP+"). > > >>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> > >>> --- > >>> drivers/net/ethernet/intel/e1000e/netdev.c | 4 +++- > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > >>> index bd7ef59b1f2e..f0e48f2bc3a2 100644 > >>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c > >>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > >>> @@ -7021,6 +7021,8 @@ static __maybe_unused int e1000e_pm_runtime_resume(struct device *dev) > >>> struct e1000_adapter *adapter = netdev_priv(netdev); > >>> int rc; > >>> > >>> + pdev->pme_poll = true; > >>> + > >>> rc = __e1000_resume(pdev); > >>> if (rc) > >>> return rc; > >> > >> Doesn't this enable this too broadly. I know there are a number of > >> devices that run under the e1000e and I would imagine that we don't > >> want them all running with "pme_poll = true" do we? > > > > Whack a mole isn't scaling, either. > > The generation between CNP and ADL are probably affected too. > > > >> It seems like at a minimum we should only be setting this for specific > >> platofrms or devices instead of on all of them. > >> > >> Also this seems like something we should be setting on the suspend side > >> since it seems to be cleared in the wakeup calls. > > > > pme_poll gets cleared on wakeup, and once it's cleared the device will > > be removed from pci_pme_list. > > > > To prevent that, reset pme_poll to true immediately on runtime resume. > > > >> Lastly I am not sure the first one is necessarily succeeding. You might > >> want to check the status of pme_poll before you run your first test. > >> From what I can tell it looks like the initial state is true in > >> pci_pm_init. If so it might be getting cleared after the first wakeup > >> which is what causes your issues. > > > > That's by design. pme_poll gets cleared when the hardware is capable > > to signal wakeup via PME# or ACPI GPE. For detected hardwares, the > > pme_poll will never be cleared. > > So this becomes tricky for the issue, since the ACPI GPE works for > > just one time, but never again. > > > >>> @@ -7682,7 +7684,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > >>> > >>> dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_SMART_PREPARE); > >>> > >>> - if (pci_dev_run_wake(pdev) && hw->mac.type != e1000_pch_cnp) > >>> + if (pci_dev_run_wake(pdev)) > >>> pm_runtime_put_noidle(&pdev->dev); > >>> > >>> return 0; > >> > >> I assume this is the original workaround that was put in to address > >> this issue. Perhaps you should add a Fixes tag to this to identify > >> which workaround this patch is meant to be replacing. > > > > Another possibility is to remove runtime power management completely. > > I wonder why Windows keep the device at D0 all the time? > > Who knows how to contact Intel’s driver developers for Microsoft Windows? Probably this mailing list? > > > Can Linux align with Windows? > > Before deciding this, the power usage in the different states should be > measured. The power usage doesn't matter if the device can't function properly. Kai-Heng > > > Kind regards, > > Paul