On Tue, Dec 18, 2018 at 10:55:18AM +0200, Mika Westerberg wrote: > On Mon, Dec 17, 2018 at 02:28:27PM -0600, Bjorn Helgaas wrote: > > On Tue, Dec 04, 2018 at 02:20:48PM +0300, Mika Westerberg wrote: > > > Gigabyte X299 DESIGNARE EX motherboard has one PCIe root port that is > > > connected to an Alpine Ridge Thunderbolt controller. This port has slot > > > implemented bit set in the config space but other than that it is not > > > hotplug capable in the sense we are expecting in Linux (it has > > > dev->is_hotplug_bridge set to 0): > > > > > > 00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 > > > Bus: primary=00, secondary=05, subordinate=46, sec-latency=0 > > > Memory behind bridge: 78000000-8fffffff [size=384M] > > > Prefetchable memory behind bridge: 00003800f8000000-00003800ffffffff [size=128M] > > > ... > > > Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 > > > ... > > > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > > > Slot #8, PowerLimit 25.000W; Interlock- NoCompl+ > > > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- > > > Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- > > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- > > > Changed: MRL- PresDet+ LinkState+ > > > > > > This system is using ACPI based hotplug to notify the OS that it needs > > > to rescan the PCI bus (ACPI hotplug). > > > > > > If there is nothing connected in any of the Thunderbolt ports the root > > > port will not have any runtime PM active children and is thus > > > automatically runtime suspended pretty soon after boot by PCI PM core. > > > Now, when a device is connected the BIOS SMI handler responsible for > > > enumerating newly added devices is not able to find anything because the > > > port is in D3. > > > > Ugh. I don't see how this is a maintainable solution. Are we going > > to have to just update this blacklist empirically as we get reports of > > systems that are "broken"? > > I was hoping not but for that we would need to have some means to > identify these. What you suggest below might be one way to avoid adding > the blacklist. > > > I say "broken" because I don't think we can point to anything here > > that doesn't conform to the specs, so maybe we tripped over something > > that *should* be covered in the spec, or maybe we're just not > > interpreting something correctly. > > That is indeed possible. > > > For example, it looks like PCI_EXP_FLAGS_SLOT is set, but Linux > > basically ignores it. Maybe if PCI_EXP_FLAGS_SLOT is set but we > > aren't using pciehp, we should assume any hotplug would be handled via > > acpiphp? And in that case, we should avoid doing anything that would > > prevent platform firmware from enumerating things below the bridge? > > I don't see why that would not work. This could cause "power regression" > on some systems but I think that's better than systems that do not work > at all. Yeah, I think that would be better, assuming it wouldn't cause a flood of power regressions. I'd even rather have a whitelist of systems where we use acpiphp and it's safe to do power management. > > Is there a bugzilla or any other URL we could include here to help with > > future changes in this area? > > No, this was reported internally. > > I can file one if you think it is helpful. I think a kernel.org bugzilla that archived the "lspci -vv", a dmesg log, and an acpidump might be helpful. Bjorn