On Tue, Jan 21, 2025 at 12:27:29PM +0100, Luca Ceresoli wrote: > Hi Maxime, > > On Thu, 16 Jan 2025 13:26:25 +0100 > Maxime Ripard <mripard@xxxxxxxxxx> wrote: > > [...] > > > > And then there is the panel bridge. My understanding (which I'd love to > > > get clarified in case it is not accurate) is that DRM bridges expect to > > > always interact with "the next bridge", which cannot work for the last > > > bridge of course, and so the panel bridge wraps the panel pretending it > > > is a bridge. > > > > > > This software structure is clearly not accurately modeling the > > > hardware (panel is not bridge), > > > > We don't have a proper definition of what a bridge is, so as far as I'm > > concerned, everything is a bridge :) > > > > The name came from "external signal converters", but the API got reused > > to support so many hardware components it's not meaningful anymore. > > So if I'm getting your point here, drm_bridge is actually the base > class for DRM devices in OOP jargon, or a "DRM subunit" in V4L2 jargon. > OK, that's fine for me (except maybe for a "we should rename" thought). To be clear, I'm not sure it's worth renaming drm_bridge to something else, and I certainly don't consider it is a prerequisite to this series. > > > So far this approach has been working because devm and drmm ensure the > > > panel bridge would be dealloacted at some point. However the devm and drmm > > > release actions are associated to the consumer struct device (the panel > > > bridge consumer), so if the panel bridge is removed and the consumer is > > > not, deallocation won't happen. > > > > Oh, right, if one doesn't call drm_bridge_put(), that will result in a > > memory leak. The general topic we discuss and try to address here is > > memory safety, and a memory leak is considered safe. It's also going to > > get allocated only a couple of times anyway, so it's not a *huge* > > concern. > > > > And about how to actually fix it, there's two ways to go about it: > > > > * Either we do a coccinelle script and try to put all those > > drm_bridge_put() everywhere; > > > > * Or we create a devm/drmm action and drop the reference > > automatically. > > > > The latter is obviously less intrusive, we would need to deprecate > > devm_of_get_bridge() for it to be safe, and I'm not entirely sure it > > would be enough, but it might just work. > > > > > For hotplugging we cannot use drmm and devm and instead we use get/put, > > > to let the "next bridge" disappear with the previous one still present. > > > So the trivial idea is to add a drm_of_get_bridge(), similar to > > > {drmm,devm_drm}_of_get_bridge() except it uses plain > > > drm_panel_bridge_add() instead of devm/drmm variants. But then the > > > caller (which is the panel consumer) will have to dispose of the struct > > > drm_bridge pointer by calling: > > > > > > - drm_bridge_put() in case a) > > > - drm_panel_bridge_remove in case b) > > > > > > And that's the problem I need to solve. > > > > I'm not sure the problem is limited to panel_bridge. Your question is > > essentially: how do I make sure a driver-specific init is properly freed > > at drm_bridge_put time. This was done so far mostly at bridge remove > > time, but we obviously can't do that anymore. > > > > But we'd have the same issue if, say, we needed to remove a workqueue > > from a driver. > > > > I think we need a destroy() hook for bridges, just like we have for > > connectors for example that would deal with calling > > drm_panel_bridge_remove() if necessary, or any other driver-specific > > sequence. > > The .destroy hook looked appealing at first, however as I tried to > apply the idea to bridges I'm not sure it matches. Here's why. > > The normal (and sane) flow for a bridge is: > > A) probe > 1. allocate private struct embedding struct drm_bridge > (I have an _alloc() variant ready for v5 to improve this as you proposed) > 2. get resources, initialize struct fields > 3. drm_bridge_add(): publish bridge into global bridge_list > > Now the bridge can be found and pointers taken and used. We agree so far. > And on hardware removal, in reverse order: > > B) remove (hardware is hot-unplugged) > 3. unpublish bridge > 2. release resources, cleanup > 1. kfree private struct I think the sequence would rather be something like: B') remove 3. unpublish bridge 2. release device resources 1. release reference C') last put 2. release KMS resources 1. kfree private struct > Some drivers do real stuff in B2, so it is important that B3 happens > before B2, isn't it? We don't want other drivers to find and use a > bridge that is being dismantled, or afterwards. Yeah, B3/B'3 should definitely happen first. > B3 should normally happen by removing the bridge from the global > bridge_list, or other bridges might find it. However setting the "gone" > bool and teaching of_drm_find_bridge() & Co to skip bridges with > gone==true would allow to postpone the actual removal, if needed. > > With that said, with hotplugging there will be two distinct events: > > * hardware removal > * last ref is put > > The second event could happen way later than the first one. During the > time frame between the two events we need the bridge to be unpublished > and the bridge resources to be already released, as the hardware is > gone. We cannot do this at the last put, it's too late. > > So I think the only sane sequence is: > > * on hardware removal: > B3) unpublish bridge (drm_bridge_remove() or just set gone flag) > B2) free resources, deinit whatever needed > * when last ref is put > B1) kfree (likely via devm) No, devm will have destroyed it in B'2. We need to destroy it in the cleanup hook of kref_put > So, back to the .destroy hook, it would fit at B2, and not at the last > put. destroy would be called at C'2 time > However in that place it seems unnecessary. The actions "on hardware > removal" (B3, B2) are done by the driver remove function, so they are > already driver specific without any additional hook. I'm however fine > to add the hook on hardware removal in case there's a good reason I > missed. > > Do you think my reasoning is correct so far? > > If you don't, can you clarify at which events (hardware removal VS last > put) the various actions (drm_bridge_remove, set gone flag, calling > .destroy, free resources and deinint, kfree) should be done? I believe I did already, the gone flag should be set before B'2 Maxime > > > (Side note: I've been pondering on why the .destroy hook works for > connectors and would not for bridges. I think it's due to the global > bridge_list, or because of the different lifetime management based on > drmm for connectors, or both.) > > > It may look as if my discussion is about bridges in general and not > about the panel bridge. However before delving into how to dispose of > a panel bridge we need to sort out how to dispose of a bridge in the > first place. > > Luca > > -- > Luca Ceresoli, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com >
Attachment:
signature.asc
Description: PGP signature