Hi Rafael, On 12/02/2019 12:08, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > If a stateless device link to a certain supplier with > DL_FLAG_PM_RUNTIME set in the flags is added and then removed by the > consumer driver's probe callback, the supplier's PM-runtime usage > counter will be nonzero after that which effectively causes the > supplier to remain "always on" going forward. > > Namely, device_link_add() called to add the link invokes > device_link_rpm_prepare() which notices that the consumer driver is > probing, so it increments the supplier's PM-runtime usage counter > with the assumption that the link will stay around until > pm_runtime_put_suppliers() is called by driver_probe_device(), > but if the link goes away before that point, the supplier's > PM-runtime usage counter will remain nonzero. > > To prevent that from happening, first rework pm_runtime_get_suppliers() > and pm_runtime_put_suppliers() to use the rpm_active refounts of device > links and make the latter only drop rpm_active and the supplier's > PM-runtime usage counter for each link by one, unless rpm_active is > one already for it. Next, modify device_link_add() to bump up the > new link's rpm_active refcount and the suppliers PM-runtime usage > counter by two, to prevent pm_runtime_put_suppliers(), if it is > called subsequently, from suspending the supplier prematurely (in > case its PM-runtime usage counter goes down to 0 in there). > > Due to the way rpm_put_suppliers() works, this change does not > affect runtime suspend of the consumer ends of new device links (or, > generally, device links for which DL_FLAG_PM_RUNTIME has just been > set). > > Fixes: e2f3cd831a28 ("driver core: Fix handling of runtime PM flags in device_link_add()") > Reported-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > --- > > Note that the issue had been there before commit e2f3cd831a28, but it was > overlooked by that commit and this change is a fix on top of it, so make > the Fixes: tag point to commit e2f3cd831a28 (instead of an earlier one > that the patch will not be applicable to). I noticed that yesterday's and today's -next were no longer booting on one of our Tegra boards (Tegra210 Jetson TX2) because networking is failing. The ethernet chip is a USB device and looking at the bootlogs I can see that the Tegra XHCI driver is failing ... tegra-xusb 70090000.usb: xHCI host controller not responding, assume dead tegra-xusb 70090000.usb: HC died; cleaning up The Tegra XHCI driver uses multiple power-domains and uses device_link_add() to attach them. So now I am wondering if there is something that we have got wrong in our implementation. However, I don't see the device being probed deferred on boot or anything like that. The driver in question is drivers/usb/host/xhci-tegra.c and we add the links in the function tegra_xusb_powerdomain_init() which is before RPM is enabled. Let me know if you have any thoughts. Cheers Jon -- nvpublic