Re: [RFC 4/4] thunderbolt: Support runtime pm

Lukas Wunner <lukas@xxxxxxxxx> · Sun, 24 Apr 2016 17:23:08 +0200

Hi Andreas,

thank you for your valuable feedback.

On Sun, Mar 20, 2016 at 02:53:10PM +0100, Andreas Noever wrote:
> - My firmware does not provide the TRPE ACPI method, only XRPE. So
> either TRPE is only post CactusRidge or it is only present in newer
> MBPs. In any case the OS X driver looks for TRPE first and uses XRPE
> only if TRPE does not exists. I suggest we do the same (but see below
> for TRPE).

I only had the acpidump of an MBA6 (2013) available when I implemented
this and it uses TRPE. I have since been able to obtain the acpidump of
an MBP10 (2012) and you're right, it uses XRPE. Both have the same
controller, Cactus Ridge 4C. It looks like they changed this on machines
introduced 2013. It's just a rename of the method, there are no machines
which have both methods.

> - The XRIN GPE fired immediately after the power was cut. The problem
> seems to be that the controller takes a bit to shut down. The solution
> is to poll until XRIL returns 1 before activating the GPE. On "Type 2"
> devices the OS X driver polls up to 300 times with a 1ms sleep in
> between (for me 1 or 2 iterations were always enough). Afaik no
> polling is done on "Type 1" devices.

Hm, this means that the semantics of XRIN and XRIL changed on Cactus
Ridge. I have changed the behaviour to be exactly as you've specified
above, this works fine on Light Ridge and should hopefully also work
on Cactus Ridge, no distinction between Type 1 and Type 2 necessary.

> Also the OS X interrupt handler checks XRIL
> and only wakes up the device if it returns 0. This was not necessary
> to do on my model - but maybe spurious interrupts can happen with
> newer controllers?

They're doing lots of stuff which seems superfluous or needlessly
complicated, e.g. they also reset the controller upon driver load using
the XRST method (which exists only on some models). I don't think we
have to do everything exactly as they do as long as it works.
FWIW I haven't seen any spurious XRIN interrupts on Light Ridge.

> Concerning TRPE style hardware: It seems that pm is more complicated
> here. I see a bunch of references to SX* ACPI methods (SXFP, SXLV,
> SXIO) and have not yet figured out what they do. Maybe we should not
> enable PM if XRPE is not present until we find someone to test it.

But you do have the SX* methods on your machine even though it uses
XRPE, right? I've mostly figured out now what these methods are there
for and have documented them extensively in upstream.c. However I cannot
verify if my documentation is accurate as they are not present on my
machine, but perhaps you can if your machine has them.

SXLV, SXIO and SXIL exist only on Cactus Ridge machines and utilize
the Go2Sx and Ok2Go2Sx pins. Judging by the PCI quirk you've added,
it seems that a Go2Sx dance is necessary on this controller before
power is cut (either by going to S3 / S4 / S5 or by using the Force
Power pin, which is what XRPE / TRPE / SXFP do).

> As you have noted the "correct" place to but this logic would be at
> the upstream bridge. Ideally the downstream bridges should go into
> D3hot by themselves if no devices are attached. The NHI as well

In v2 it works exactly like this now:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v2

The trick is to allocate a Thunderbolt port service for the upstream
bridge which we can bind to. In fact I'm allocating such a port service
for *any* PCIe port on Thunderbolt devices, this could be useful for all
sorts of other stuff.

Binding to the upstream bridge also allows us to replace the PCI quirk
which delays resume_noirq on the downstream bridges, as demonstrated by
this experimental commit (works fine on Light Ridge but YMMV):
https://github.com/l1k/linux/commit/79e0b8b8fb5da50b63836939f75212f824d8cba7

> (did you by chance check whether the NHI can be put into D3hot without
> killing the thunderbolt tunnels?).

Amazingly this works. However the NHI does not act on hotplug events
after thunderbolt_suspend() has been called. Even without calling
thunderbolt_suspend(), it seems that the control channel is down
when the NHI is in D3hot, I'm getting RX timeouts. Also, I cannot
see any reduction in power consumption when putting the NHI in D3,
same for the downstream bridges.

You can test this for yourself by commenting out the two calls to
pm_runtime_get() and pm_runtime_put_autosuspend() in switch.c.
Plug in a Thunderbolt device, wait 10 sec for the NHI to autosuspend,
try accessing the Thunderbolt device. Works for me.

If the NHI suspended before you had a chance to plug in the device,
invoke "echo on > /sys/bus/pci/devices/0000:06:00.0/power/control".
Plug in the device and use "echo auto" to let the NHI autosuspend.

> (1) should be possible to fix? For (2): D3Cold always requires a
> platform specific mechanism and the pci subsystem only supports ACPI.
> Would it be possible to add an API to tell the pci subsystem that we
> know how to put a specific device(tree) into D3Cold from a platform
> driver [+CC Bjorn]? Then this whole thing would become a normal pci
> suspend operation.

I simply go to D3cold in the driver's ->runtime_suspend callback.
There's just one small fix necessary in pci_raw_set_power_state()
for this to work. Plus some changes in portdrv to call down to the
port service drivers on each pm transition. (It already does this
for ->suspend and ->resume, we just need the same functionality for
additional pm callbacks).

> Bridges in Hotplugged TB devices might have the same PCI ids as the
> "root" bridges (if they use the same TB chip). You probably should
> check that dev is a bridge of the builtin controller (for example by
> checking for the presence of ACPI methods, see the comment in the
> other tb quirks).

For the upstream bridge I'm checking if its parent is a root port now
to determine if it's a host controller built into the machine.

I think the only chance for a false positive is if two machines are
connected with Thunderbolt and one of them has multiple Thunderbolt
controllers built in. Might look like this:

RP - UPSB - DSB - UPSB - DSB - RP - RP - UPSB - DSB
^^^^^^^^^^^^^^^   ^^^^                   ^^^^^^^^^^
local machine     remote machine         secondary controller on remote

If the topology indeed looks like this (which I'm not sure of, I lack
the hardware to test it), a thunderbolt_upstream driver will try to
attach to UPSB on the secondary controller of the remote machine but
should bail out because it can't find an ACPI handle for its NHI.
So we should even have this corner case covered.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html