On Mon, Sep 02, 2024 at 03:31:28PM +0300, Tomi Valkeinen wrote: > Hi, > > On 02/09/2024 13:50, Daniel Vetter wrote: > > On Mon, Sep 02, 2024 at 11:26:11AM +0200, Maxime Ripard wrote: > > > Hi, > > > > > > On Wed, Aug 07, 2024 at 03:19:23PM GMT, Tomi Valkeinen wrote: > > > > On 25/07/2024 14:28, Maxime Ripard wrote: > > > > > On Mon, Jul 15, 2024 at 11:32:34AM GMT, Tomi Valkeinen wrote: > > > > > > On 02/07/2024 14:43, Maxime Ripard wrote: > > > > > > > Hi Tomi, > > > > > > > > > > > > > > On Wed, Jun 26, 2024 at 06:53:40PM GMT, Tomi Valkeinen wrote: > > > > > > > > On 26/06/2024 18:07, Maxime Ripard wrote: > > > > > > > > > On Wed, Jun 26, 2024 at 12:55:39PM GMT, Tomi Valkeinen wrote: > > > > > > > > > > On 26/06/2024 11:49, Maxime Ripard wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > On Wed, Jun 19, 2024 at 12:07:48PM GMT, Tomi Valkeinen wrote: > > > > > > > > > > > > From: Tomi Valkeinen <tomi.valkeinen+renesas@xxxxxxxxxxxxxxxx> > > > > > > > > > > > > > > > > > > > > > > > > When a bridge driver uses devm_mipi_dsi_device_register_full() or > > > > > > > > > > > > devm_mipi_dsi_attach(), the resource management is moved to devres, > > > > > > > > > > > > which releases the resource automatically when the bridge driver is > > > > > > > > > > > > unbound. > > > > > > > > > > > > > > > > > > > > > > > > However, if the DSI host goes away first, the host unregistration code > > > > > > > > > > > > will automatically detach and unregister any DSI peripherals, without > > > > > > > > > > > > notifying the devres about it. So when the bridge driver later is > > > > > > > > > > > > unbound, the resources are released a second time, leading to crash. > > > > > > > > > > > > > > > > > > > > > > That's super surprising. mipi_dsi_device_unregister calls > > > > > > > > > > > device_unregister, which calls device_del, which in turn calls > > > > > > > > > > > devres_release_all. > > > > > > > > > > > > > > > > > > > > Hmm, right. > > > > > > > > > > > > > > > > > > > > > If that doesn't work like that, then it's what needs to be fixed, and > > > > > > > > > > > not worked around in the MIPI-DSI bus. > > > > > > > > > > > > > > > > > > > > Well, something causes a crash for both the device register/unregister case > > > > > > > > > > and the attach/detach case, and the call stacks and debug prints showed a > > > > > > > > > > double unregister/detach... > > > > > > > > > > > > > > > > > > > > I need to dig up the board and check again why the devres_release_all() in > > > > > > > > > > device_del() doesn't solve this. But I can probably only get back to this in > > > > > > > > > > August, so it's perhaps best to ignore this patch for now. > > > > > > > > > > > > > > > > > > > > However, the attach/detach case is still valid? I see no devres calls in the > > > > > > > > > > detach paths. > > > > > > > > > > > > > > > > > > I'm not sure what you mean by the attach/detach case. Do you expect > > > > > > > > > device resources allocated in attach to be freed when detach run? > > > > > > > > > > > > > > > > Ah, never mind, the devres_release_all() would of course deal with that too. > > > > > > > > > > > > > > > > However, I just realized/remembered why it crashes. > > > > > > > > > > > > > > > > devm_mipi_dsi_device_register_full() and devm_mipi_dsi_attach() are given a > > > > > > > > device which is used for the devres. This device is probably always the > > > > > > > > bridge device. So when the bridge device goes away, so do those resources. > > > > > > > > > > > > > > > > The mipi_dsi_device_unregister() call deals with a DSI device, which was > > > > > > > > created in devm_mipi_dsi_device_register_full(). Unregistering that DSI > > > > > > > > device, which does happen when the DSI host is removed, does not affect the > > > > > > > > devres of the bridge. > > > > > > > > > > > > > > > > So, unloading the DSI host driver causes mipi_dsi_device_unregister() and > > > > > > > > mipi_dsi_detach() to be called (as part of mipi_dsi_host_unregister()), and > > > > > > > > unloading the bridge driver causes them to be called again via devres. > > > > > > > > > > > > > > Sorry, that's one of the things I don't quite get. Both functions are > > > > > > > exclusively(?) called from I2C bridges, so the device passed there > > > > > > > should be a i2c_client instance, and thus the MIPI-DSI host going away > > > > > > > will not remove those i2c devices, only the MIPI-DSI ones, right? > > > > > > > > > > > > Yes. > > > > > > > > > > > > > So if we remove the host, the MIPI-DSI device will be detached and > > > > > > > removed through the path you were explaing with the i2c client lingering > > > > > > > around. And if we remove the I2C device, then devm will kick in and will > > > > > > > detach and remove the MIPI-DSI device. > > > > > > > > > > > > Right. > > > > > > > > > > > > > Or is it the other way around? That if you remove the host, the device > > > > > > > is properly detached and removed, but there's still the devm actions > > > > > > > lingering around in the i2c device with pointers to the mipi_dsi_device > > > > > > > that was first created, but since destroyed? > > > > > > > > > > > > > > And thus, if the i2c device ever goes away, we get a use-after-free? > > > > > > > > > > > > Hmm, I'm not sure I understand what you mean here... Aren't you describing > > > > > > the same thing in both of these cases? > > > > > > > > > > > > In any case, to expand the description a bit, module unloading is quite > > > > > > fragile. I do get a crash if I first unload the i2c bridge module, and only > > > > > > then go and unload the other ones in the DRM pipeline. But I think module > > > > > > unloading will very easily crash, whatever the DRM drivers being used are, > > > > > > so it's not related to this particular issue. > > > > > > > > > > > > In my view, the unload sequence that should be supported (for development > > > > > > purposes, not for production) is to start the unload from the display > > > > > > controller module, which tears down the DRM pipeline, and going from there > > > > > > towards the panels/connectors. > > > > > > > > > > > > Of course, it would be very nice if the module unloading worked perfectly, > > > > > > but afaics fixing all that's related to module unloading would be a > > > > > > multi-year project... So, I just want to keep the sequence I described above > > > > > > working, which allows using modules while doing driver development. > > > > > > > > > > FTR, I'm all for supporting module unloading. The discussion above was > > > > > about what is broken exactly, so we can come up with a good solution. > > > > > > > > Does that mean that you're ok with the patch, or that something should be > > > > improved? > > > > > > No, I meant that at the very least the commit log needs to be updated to > > > reflect what is actually going on, because at least my understanding of > > > it doesn't match what actually happens. > > > > > > We want a solution to the problem you're facing, but it's not clear to > > > me what the problem is exactly at this point, so it's hard to review a > > > solution. > > > > So I haven't looked at the full thing, but I think the proper fix is to > > make both detach and unregister cope with being called multiple times. I > > think devm_ here is a red herring, the underlying issues is that we can > > unregister/detach from two sides: > > > > - when the host dsi goes away > > - when individual dsi devices on a given host go away > > > > So there needs to be book-keeping and locking to make sure no matter which > > order things disappear, we don't try to unregister/detach a dsi device > > twice. > > I think that is what my patch does (for devm_). Yep, except I think you should just do it for everyone, not just for the special case where one of the calls is done through devm. > Some vocabulary first: > > dsi peripheral device - The device that represents the DSI peripheral. It is > a bridge or a panel, and (usually) an i2c or platform device. > > dsi peripheral driver - The driver handling the dsi peripheral device. > > dsi device - Runtime created device instance that represents the DSI > peripheral. So in my case we have the i2c bridge device, and a dsi device is > created for it in the setup code. > > dsi controller device - A device that has a DSI bus (usually a platform or > i2c device, I would guess). > > dsi controller driver - A driver for the dsi controller device. Creates the > dsi host. > > dsi host - represents the DSI host side, owned by the dsi controller driver. > > When a dsi peripheral driver uses devm_mipi_dsi_device_register_full() or > devm_mipi_dsi_attach(), the dsi device is created and attached to the dsi > host. When the dsi peripheral device-driver is unbound, devres will call > unregister and detach are called automatically. This works fine. > > But when the device-driver for the dsi controller is unbound, the dsi > controller driver will unregister the dsi host, and the unregistration will > also unregister and detach the dsi device. But the devres is not told about > that. So when the dsi peripheral is later unbound, its devres will again > unregister and detach. > > To fix that this patch uses devm_remove_action() to remove the devres action > when the host side goes away first. > > Now, after writing the above, I realized that all this won't help with the > non-devm versions: the host side has unregistered and detached the dsi > device, but if the dsi peripheral driver calls mipi_dsi_detach() or > mipi_dsi_device_unregister(), it will again crash. > > Handling the attach/detach should be quite easy, and in fact the code > already handles it, but it uses WARN_ON() there so that has to go. But > attach/detach will crash anyway if the dsi device has already been freed, > which happens when the dsi controller driver calls > mipi_dsi_device_unregister(). Hm I thought we have a full struct device, so refcounted, and also with struct device unregister should be separate from the final kfree when the last reference drops away. Hence I thought this should just work. We might need to grab a reference in attach/detach to sort this out? > So... The dsi peripheral driver should keep a reference to the dsi device, > with get_device()? And then do a put_device() after calling > mipi_dsi_device_unregister()? Yeah I think so. > But we don't free the dsi device, it has essentially been disabled without > telling the dsi peripheral driver about it, which might cause problems. > > I don't know... This doesn't sound correct to me. Probably my patch is just > new wrong on top of old wrong. Or maybe I don't quite grasp how this works. _unregister calls ->detach, so I think we're fine. Essentially what we're doing here is hand-rolling a device/driver bind model, instead of using the one from the core driver model. But all the pieces are there: - register/unregister: Adding/removing a device, after unregister the physical device cannot be accessed anymore (the host driver might be gone), but the struct device will stick around. - attach/detach: binding/unbinding a driver to a dsi peripheral > Oh, I now realized/remembered that we can also have "real" dsi devices, when > the dsi peripheral is a child of the dsi bus device, and controlled via DSI > commands (i.e. not via i2c or memory mapped registers). Perhaps all/some of > this confusion in the code comes from the use of dsi device for both "real" > dsi devices and as a "dsi client", created by i2c/platform devices. I think for more complicated topology device links should help. Of course the dev links will not know about our hand-rolled driver binding, so maybe we need to fix that to make it work perfectly. But I don't think that's the case here. > With a "real" dsi device things work fine, as the driver only attaches and > detaches, and the device (un)registration is handled by the dsi host. > > Well, this turned out to be a bit of a rambling email... I don't have a > clear solution in mind. I think aside from maybe a missing device_get/put for attach/detach we have all the pieces, and it just needs a little bit of polish? We're almost 90% at a clean device driver model implementation, adding the last few bits is I think all we need? Or is there a catch? Cheers, Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch