On Wed, Jan 16, 2019 at 11:44 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > > On Wednesday, January 16, 2019 7:42:45 PM CET Daniel Vetter wrote: > > On Tue, Jan 15, 2019 at 11:47 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > > > > > > [CC Greg] > > > > > > On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote: > > > > [CC Lukas and linux-pm] > > > > > > > > On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > > > > > > > On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux > > > > > <linux@xxxxxxxxxxxxxxx> wrote: > > > > > > > > [cut] > > > > > > > > > > > > > > > > This thread is discussing how to deal with Armada DRM, its use of the > > > > > > component helper, TDA998x's hybrid use of the component helper and > > > > > > DRM bridges. > > > > > > > > > > > > Currently, DRM bridges register into the DRM system and are added to > > > > > > a global list of bridges. When a DRM driver binds, it looks up the > > > > > > DRM bridge and attaches to it. There is no way in generic DRM code > > > > > > for the DRM driver to know if the DRM bridge is unbound from DRM, > > > > > > consequently a DRM driver may continue trying to call functions in > > > > > > the DRM bridge driver using memory that has already been freed. > > > > > > > > > > > > We had similar issues with imx-drm, and a number of years ago at a > > > > > > kernel summit, this was discussed, and I proposed a system which is > > > > > > now known as the component helper. This handles the problem of a > > > > > > multi-component system. > > > > > > > > > > > > However, DRM bridge was already established, and there appears to be > > > > > > no desire to convert DRM bridges and DRM drivers to use the component > > > > > > helper. > > > > > > > > > > > > We are presently in the situation where Armada DRM is incompatible > > > > > > with the DRM bridges way of doing things, and making it compatible > > > > > > essentially means introducing potential use-after-free bugs into the > > > > > > code. > > > > > > > > > > > > Device links in their stateful form appear to provide an alternative > > > > > > to the component helper, in that a stateful device link will remove > > > > > > consumers of a resource when the supplier is going away - which is > > > > > > exactly the problem which the component helper is solving. The > > > > > > difference is that device links look like being a cleaner solution. > > > > > > > > > > > > Just like the component helper, a stateful link would unbind the > > > > > > consumer of a resource when the supplier goes away - which is exactly > > > > > > the behaviour we're wanting. > > > > > > > > > > > > The problem is that the connection between various drivers is only > > > > > > really known when the drivers obtain their resources, and the > > > > > > following can happen: > > > > > > > > > > > > supplier consumer > > > > > > > > > > > > probe() > > > > > > alloc > > > > > > probe() > > > > > > publish > > > > > > obtain supplier's resource > > > > > > return > > > > > > > > > > > > Where things go wrong is if a stateful link is created when the > > > > > > consumer obtains the suppliers resource before the supplier has > > > > > > finished probing - which from what's been said is illegal. > > > > > > > > > > It just doesn't work (which means "invalid" rather than "illegal" I > > > > > guess, but whatever :-)). > > > > > > > > > > Admittedly, the original design didn't take this particular case into > > > > > account and I'm not actually sure how it can be taken into account > > > > > either. > > > > > > > > > > If the link is created by the consumer in the scenario above, its > > > > > status will be updated twice in a row after the consumer probe returns > > > > > and after the supplier probe returns. It looks like this update would > > > > > need to work regardless of the order it which this happened which > > > > > sounds somewhat challenging. I would need to think about that. > > > > > > > > So if I'm not mistaken it can be made work with the help of the > > > > (completely untested) attached patch (of course, the documentation > > > > would need to be updated too). > > > > > > > > The key observation here is that it should be fine to create a link > > > > from the consumer driver's probe while the supplier is still probing > > > > if the consumer has some way to find out that the supplier is > > > > functional at that point (like when it has published itself already in > > > > your example above). The initial state of the link can be "consumer > > > > probe" in that case and I don't see a reason why that might not work. > > > > > > > > > > Below is a more complete patch that should take all of the corner cases > > > into account unless I have missed anything. Testing it would be > > > appreciated. :-) > > > > > > --- > > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > Subject: [PATCH] driver core: Fix adding device links to probing suppliers > > > > > > Currently, it is not valid to add a device link from a consumer > > > driver ->probe() callback to a supplier that is still probing too, but > > > generally this is a valid use case. For example, if the consumer has > > > just acquired a resource that can only be available when the supplier > > > is functional, adding a device link to that supplier right away > > > should be safe (and even desirable arguably), but device_link_add() > > > doesn't handle that case correctly and the initial state of the link > > > created by it is wrong then. > > > > > > To address this problem, change the initial state of device links > > > added between a probing supplier and a probing consumer to > > > DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to > > > skip such links on the supplier side. > > > > > > With this change, if the supplier probe completes first, > > > device_links_driver_bound() called for it will skip the link state > > > update and when it is called for the consumer, the link state will > > > be updated to "active". In turn, if the consumer probe completes > > > first, device_links_driver_bound() called for it will change the > > > state of the link to "active" and when it is called for the > > > supplier, the link status update will be skipped. > > > > > > However, in principle the supplier or consumer probe may still fail > > > after the link has been added, so modify device_links_no_driver() to > > > change device links in the "active" or "consumer probe" state to > > > "dormant" on the supplier side and update __device_links_no_driver() > > > to change the link state to "available" only if it is "consumer > > > probe" or "active". > > > > > > Then, if the supplier probe fails first, the leftover link to the > > > probing consumer will become "dormant" and device_links_no_driver() > > > called for the consumer (when its probe fails) will clean it up. > > > In turn, if the consumer probe fails first, it will either drop the > > > link, or change its state to "available" and, in the latter case, > > > when device_links_no_driver() is called for the supplier, it will > > > update the link state to "dormant". [If the supplier probe fails, > > > but the consumer probe succeeds, which should not happen as long as > > > the consumer driver is correct, the link still will be around, but > > > it will be "dormant" until the supplier is probed again.] > > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > > Pulling in a bunch of drm_bridge and drm_panel people. See huge > > discussion upthread, this here is Rafael's idea for making device > > links more useful. Would be great if someone could test this out with > > panel/bridge dependencies. > > > > I guess this here isn't yet solving the reprobe issue where the > > provider was unloaded and reloaded (which should cause the consumer to > > reprobe too, with the EPROBE_DEFERRED logic). I think that was the > > other issue we've hit. > > I don't think it is addressed here. > > Can anyone please explain to me what happens in more detail? My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong: - componentized driver (both producer and consumer) fully loaded&working - you unbind the producer/one of the components - component framework or driver core through device_link also unbinds the master/consumer - you reload/rebind the component - with the component framework this results in the master being rebound, and the overall driver working again. With device_links nothing happens. I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still? Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel