On Fri, Jun 2, 2023 at 11:57 PM Limonciello, Mario <mario.limonciello@xxxxxxx> wrote: > > > On 6/2/2023 3:21 PM, Bjorn Helgaas wrote: > > [+cc Rafael, Len, linux-acpi] > > > > Hi Mario, > > > > On Thu, Jun 01, 2023 at 10:11:22PM -0500, Mario Limonciello wrote: > >> ASMedia PCIe GPIO controllers connected to AMD SOC fail functional tests > >> after returning from s2idle. This is because the BIOS checks whether the > >> OSPM has called the _REG method to determine whether it can interact with > >> the OperationRegion assigned to the device. > > "s2idle" is a Linux term; I'd prefer something that we can relate to > > the ACPI spec. > It's important for the symptoms of this issue though, this > problem doesn't trigger "just" by moving D-states. > > It happens as a result of system suspend. As I said in my response to Bjorn, s2idle is D0 from the ACPI standpoint. It is not a system sleep and it has no special meaning in ACPI. The problem seems to be related to the low-power S0 idle _DSM calls to me. > > > > Maybe a pointer to the specific function in the driver that has a > > problem? Based on the patch, I assume the driver uses some control > > method that looks at PCI config space? > > The issue isn't in anything Linux code "does"; it's in the "lack" > of Linux code doing what it needs to IE using _REG. So the argument seems to be that under certain conditions the PCI config space becomes unavailable and so _REG(dev, 0) needs to be called when this is about to happen and _REG(dev, 1) needs to be called when the config space becomes available again. Fair enough, but I'm not sure why this is limited to system suspend and resume. Moreover, "PCI_Config operation regions on a PCI root bus containing a _BBN object" are specifically mentioned as one of the cases when _REG need not be evaluated at all. I guess the operation region in question doesn't fall into that category? > At least for this issue _REG is treated like a lock mechanism. > In the spec it says specifically: > > "When an operation region handler is unavailable, AML cannot access > data fields in that region". > > That is it's to ensure that OSPM and AML don't both simultaneously > access the same region. > > What happens is that AML normally wants to access this region during > suspend, but without the sequence of calling _REG it can't. Is this about being unable to access the opregion or racing with concurrent accesses on the OS side? > > > >> To fix this issue, call acpi_evaluate_reg() when saving and restoring the > >> state of PCI devices. > > Please include the spec citation: ACPI r6.5, sec 6.5.4. The URL has > > changed in the past and may change in the future, but the name/section > > number will not. > Sure. > > > >> Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/06_Device_Configuration/Device_Configuration.html#reg-region > >> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx> > >> --- > >> drivers/pci/pci.c | 12 ++++++++++++ > >> 1 file changed, 12 insertions(+) > >> > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >> index e38c2f6eebd4..071ecba548b0 100644 > >> --- a/drivers/pci/pci.c > >> +++ b/drivers/pci/pci.c > >> @@ -1068,6 +1068,12 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev) > >> return acpi_pci_bridge_d3(dev); > >> } > >> > >> +static inline int platform_toggle_reg(struct pci_dev *dev, int c) > >> +{ > >> + return acpi_evaluate_reg(ACPI_HANDLE(&dev->dev), > >> + ACPI_ADR_SPACE_PCI_CONFIG, c); > >> +} > > You never check the return value, so why return it? > > _REG isn't mandatory for any of these uses, and I wanted to make > sure that if it does end up being mandatory in a future use that > the return code wasn't thrown away. If you think it's better to > just throw it away now, I have no qualms making it a void instead. I don't think it can reasonably become mandatory without adding a specific _OSC bit for that. > > > > The function actually doesn't *toggle*; it connects or disconnects > > based on "c". > Can you suggest a better function name? > > > > This looks like it only builds when CONFIG_ACPI=y? > > The prototype for acpi_evaluate_reg isn't guarded by CONFIG_ACPI > so I figured it worked both ways. > > But looking again I don't see a dummy implementation version for > the lack of CONFIG_ACPI, so I'll double check it. > > > > >> /** > >> * pci_update_current_state - Read power state of given device and cache it > >> * @dev: PCI device to handle. > >> @@ -1645,6 +1651,9 @@ static void pci_restore_ltr_state(struct pci_dev *dev) > >> int pci_save_state(struct pci_dev *dev) > >> { > >> int i; > >> + > >> + platform_toggle_reg(dev, ACPI_REG_DISCONNECT); > > I would expect these to be in the PM code near the power state > > transitions, not in the state save/restore code. These functions > > *are* used during suspend/resume, but are used in other places as > > well, where we probably don't want _REG executed. > > > > Cc'd Rafael and PM folks, who can give much better feedback. > My knee jerk reaction when we found the root cause for this issue > was to put the code right around the D-state transitions, but I > decided against this. > > I put it in save/restore intentionally because > like I mentioned above it's treated like a locking mechanism between > OSPM and AML and it's not functionally tied to a D-state transition. > > When the state is saved it's like Linux says > "I'm done using this region, go ahead and touch it firmware". > When it's restored it's like Linux says > "Don't use that region, I'm claiming it for now". So it looks like you want to use _REG for protecting PCI config space against concurrent accesses from AML and the OS. > Think about that other patch I wrote recently that controls D3 > availability [1]. If it was only run in the D-state transitions and > the root port stays in D0 but has a _REG method it would never get > called. And why should it be evaluated in that case?