On Sun, Feb 25, 2018 at 1:29 AM, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > On Sat, Feb 24, 2018 at 2:59 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote: >> On Sat, Feb 24, 2018 at 10:47:26AM +0100, Rafael J. Wysocki wrote: >>> On Sat, Feb 24, 2018 at 10:35 AM, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: >>> > On Sat, Feb 24, 2018 at 1:46 AM, Feng Kan <fkan@xxxxxxx> wrote: >>> >> This is not a patch, but rather a question regarding the deferred >>> >> probe's effect on PCIe PM ordering. This happens on our system >>> >> which defer the probing of root bridge due to the IOMMU not being >>> >> ready. Because of the deferred action, the bridge is moved to the >>> >> end of the dpm_list which results in incorrect suspend and resume >>> >> sequence. >>> >> >>> >> In the cases I have seen, the bridge is always reordered because of >>> >> startup sequence. They are always place after the endpoint. If that >>> >> is the case the following code should be able to prevent such cases. >>> >> However, is there some cases here that would violate such situation? >>> > >>> > The code in dd.c assumes that the device being probed has no children >>> > (or consumers, for that matter) and so it is safe to move it to the >>> > end of the list. >>> > >>> > If the device has children (or consumers), moving it to the end of the >>> > list by itself doesn't work, which is the case for you. >>> > >>> > You can try to replace the device_pm_move_last(dev) in >>> > deferred_probe_work_func(struct() with device_reorder_to_tail(), but >>> > that has to be called under device_links_read_lock/unloc () and >>> > device_pm_lock/unloc() (in the right order). >>> >>> Alternatively, you can replace your !dev_is_pci(dev) check with a >>> check for the presence of children or consumers and only move the >>> device to the end of the PM list if there are not any. That would >>> kind of make sense, but it may break other assumptions in the deferred >>> probing mechanism which I don't recall ATM. >>> >>> Or avoid deferred probing of the host bridge driver entirely. I guess >>> you can use it with limited functionality until the IOMMU driver is >>> ready and switch over to the fully functional operation mode when that >>> happens, but that would need to hook into the IOMMU code somehow. >> >> Or model the root bridge's dependency on the iommu using a device link: >> https://www.kernel.org/doc/html/latest/driver-api/device_link.html > > Apparently, there are children registered under the bridge device > before the driver for it is probed. Yes, so the order seems to be like this: 1. root port and endporint is enumerated and added to dpm_list 2. root port probed and deferred 3. iommu probed and successful 4. root port probed again and successful -> moved to last in dpm_list 5. endpoint probed and successful I guess another approach would be to prevent 1 from happening by not adding to dpm_list until the probe is successful. > > That is kind of unusual and I'm not really sure why it happens at all, > because (in theory) the bridge driver should be responsible for > assigning resources to the devices on the bus, among other things. > Depending on what the reason is, your suggestion may or may not work.