On Thu, Jun 20, 2019 at 04:37:10PM +0300, Mika Westerberg wrote: > On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote: > > On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote: > > > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote: > > > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote: > > > > > Intel Ice Lake has an integrated Thunderbolt controller which > > > > > means that the PCIe topology is extended directly from the two > > > > > root ports (RP0 and RP1). > > > > > > > > A PCIe topology is always extended directly from root ports, > > > > regardless of whether a Thunderbolt controller is integrated, so I > > > > guess I'm missing the point you're making. It doesn't sound like > > > > this is anything specific to Thunderbolt? > > > > > > The point I'm trying to make here is to explain why this is problem > > > now and not with the previous discrete controllers. With the > > > previous there was only a single ACPI power resource for the root > > > port and the Thunderbolt host router was connected to that root > > > port. PCIe hierarchy was extended through downstream ports (not root > > > ports) of that controller (which includes PCIe switch). > > > > Sounds like you're using "PCIe topology extension" to mean > > specifically something below a Thunderbolt controller, excluding a > > subtree below a root port. I don't think the PCI core is aware of > > that distinction. > > Right it is not. > > > > Now the thing is part of the SoC so power management is different > > > and causes problems in Linux. > > > > The SoC is a physical packaging issue that really doesn't enter into > > the specs directly. I'm trying to get at the logical topology > > questions in terms of the PCIe and ACPI specs. > > > > I assume we could dream up a non-Thunderbolt topology that would show > > the same problem? > > Yes. > > > > > > Power management is handled by ACPI power resources that are > > > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI > > > > > controller. > > > > > > > > > > The topology with the power resources (marked with []) looks like: > > > > > > > > > > Host bridge > > > > > | > > > > > +- RP0 ---\ > > > > > +- RP1 ---|--+--> [TBT] > > > > > +- NHI --/ | > > > > > | | > > > > > | v > > > > > +- xHCI --> [D3C] > > > > > > > > > > Here TBT and D3C are the shared ACPI power resources. ACPI > > > > > _PR3() method returns either TBT or D3C or both. > > > > I'm not very familiar with _PR3. I guess this is under an ACPI object > > representing a PCI device, e.g., \_SB.PCI0.RP0._PR3? > > Correct. > > > > > > Say we runtime suspend first the root ports RP0 and RP1, then > > > > > NHI. Now since the TBT power resource is still on when the root > > > > > ports are runtime suspended their dev->current_state is set to > > > > > D3hot. When NHI is runtime suspended TBT is finally turned off > > > > > but state of the root ports remain to be D3hot. > > > > So in this example we might have: > > > > _SB.PCI0.RP0._PR3: TBT > > _SB.PCI0.RP1._PR3: TBT > > _SB.PCI0.NHI._PR3: TBT > > and also D3C. > > > And when Linux figures out that everything depending on TBT is in > > D3hot, it evaluates TBT._OFF, which puts them all in D3cold? And part > > of the problem is that they're now in D3cold (where config access > > doesn't work) but Linux still thinks they're in D3hot (where config > > access would work)? > > Exactly. > > > I feel like I'm missing something because I don't know how D3C is > > involved, since you didn't mention suspending xHCI. > > That's another power resource so we will also have D3C turned off when > xHCI gets suspended but I did not want to complicate things too much in > the changelog. If D3C isn't essential to seeing this problem, you could just omit it altogether. I think stripping out anything that's not essential will make it easier to think about the underlying issues. > > And I can't mentally match up the patch with the D3hot/D3cold state > > change (if indeed that's the problem). If we were updating the path > > that evaluates _OFF so it changed the power state of all dependent > > devices, *that* would make a lot of sense to me because it sounds like > > that's where the physical change happens that makes things out of > > sync. > > I did that in the first version [1] but Rafael pointed out that it is > racy one way or another [2]. > > [1] https://www.spinics.net/lists/linux-pci/msg83583.html > [2] https://www.spinics.net/lists/linux-pci/msg83600.html Yeah, interesting. It was definitely a much larger patch. I don't know enough to comment on the races. I would wonder whether there's a way to get rid of the caches that become stale, but that's just an idle thought, not a suggestion. Bjorn