On Tue, Jun 21, 2016 at 8:01 PM, <Mario_Limonciello@xxxxxxxx> wrote: >> -----Original Message----- >> From: Peter Jones [mailto:pjones@xxxxxxxxxx] >> Sent: Tuesday, June 21, 2016 10:19 AM >> To: Rafael J. Wysocki <rafael@xxxxxxxxxx> >> Cc: ACPI Devel Maling List <linux-acpi@xxxxxxxxxxxxxxx>; Limonciello, Mario >> <Mario_Limonciello@xxxxxxxx>; Linux Kernel Mailing List <linux- >> kernel@xxxxxxxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; Rafael J . Wysocki >> <rjw@xxxxxxxxxxxxx>; Andy Lutomirski <luto@xxxxxxxxxxxxxx> >> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of >> PCIe hotplug. >> >> (Sorry for the slow response - it's deadline time over here.) >> >> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: >> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@xxxxxxxxxx> >> wrote: >> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@xxxxxxxxxx> >> wrote: >> > >> Right now when booting, on many laptops the firmware manages the >> PCIe >> > >> bus. As a result, when we call the _OSC ACPI method, it returns an >> > >> error code. Unfortunately the errors are not very articulate. >> > > >> > > What exactly do you mean here? >> > > >> > >> As a result, we show: >> > >> >> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) >> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM >> Segments MSI] >> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid >> UUID >> > >> _OSC request data: 1 1f 0 >> > > >> > > So _OSC told us that the UUID was invalid, didn't it? >> > >> > BTW, the above messages are KERN_DEBUG, so at least in theory they >> > shouldn't be visible in production runs. >> > >> > Maybe the bug to fix is that they show up when they aren't supposed to? >> >> No - the workflow that I am really trying to remedy is this: >> >> 1) S3 resume sometimes isn't working on some laptop you've got. >> 2) start looking at debug messages >> 3) this shows an error, so it looks like it's probably the problem >> 4) go fishing for red herring >> 5) if you happen to know who maintains the DSDT for the platform in >> question, eventually work out that this is working as intended and >> the bug is someplace else. >> 5b) if you don't know that person, eventually work out that it /might/ >> be someplace else... >> >> So the idea was to make it look more like an indication of status, and >> less like an error that's causing unrelated problems. >> >> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that >> there's a way to distinguish the between the UUID being >> invalid/malformed, being merely unsupported, or being supported in some >> configurations but not the current one. In this particular DSDT, the >> machine doesn't support the OS controlling any of this if USB-C / >> thunderbolt are enabled. The DSDT is clearly written with the belief >> that you have to completely disable the handling for that UUID in this >> case, and googling for this looks like it's not the only one written >> with that belief. >> >> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems >> plausible that you can express this instead by handling the UUID but >> choosing each individual query/status bit in the way that accomplishes >> the OS doing nothing with the response. So it may well be that that's >> just more code that vendors have thought wasn't necessary (or wasn't >> correct for some reason.) >> >> Mario, want to jump in on your thinking here? >> >> -- >> Peter > > After talking to the team, I was told this particular implementation to not let > OS take control when acting on that specific UUID based upon a variable > (NEXP in this case) came from Intel RC code. > > That's probably why this is all across a lot of platforms, including non-Dell. > > At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) > NEXP is set in GNVS based upon Thunderbolt capability. > > As for why they return unrecognized UUID instead of just masking all the > capabilities bits? It's the same net functional result. If the vendor provided > RC code doesn't caused WCHK problems or functional problems it's hard to > make a case for why it needs to be changed by the OEM. > > I think that Peter's patch is appropriate to message this is specifically > what's going on. No, it may hide real (ie. non-intentional) bugs in _OSC, so it is not appropriate. Debug-level messages really should not hurt anyone (and should never show up in production anyway). We can slightly tone down the "_OSC failed (%s); disabling ASPM\n" message in negotiate_os_control() in drivers/acpi/pci_root.c if you think it's too strong and that's it. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html