Apologies for missing v1 ... On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > DRM hotplug bridge driver > ========================= > > DRM natively supports pipelines whose display can be removed, but all the > components preceding it (all the display controller and any bridges) are > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > This series adds support for DRM pipelines having a removable part after > the encoder, thus also allowing bridges to be removed and reconnected at > runtime, possibly with different components. > > This picture summarizes the DRM structure implemented by this series: > > .------------------------. > | DISPLAY CONTROLLER | > | .---------. .------. | > | | ENCODER |<--| CRTC | | > | '---------' '------' | > '------|-----------------' > | > |DSI HOTPLUG > V CONNECTOR > .---------. .--. .-. .---------. .-------. > | 0 to N | | _| _| | | 1 to N | | | > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > | | | | | | | | | | > '---------' '--' '-' '---------' '-------' > > [--- fixed components --] [----------- removable add-on -----------] > > Fixed components include: > > * all components up to the DRM encoder, usually part of the SoC > * optionally some bridges, in the SoC and/or as external chips > > Components on the removable add-on include: > > * one or more bridges > * a fixed connector (not one natively supporting hotplug such as HDMI) > * the panel So I think at a high level this design approach makes sense, but the implementation needs some serious thought. One big thing upfront though, we need to have a clear plan for the overlay hotunload issues, otherwise trying to make drm bridges hotpluggable makes no sense to me. Hotunload is very, very tricky, full of lifetime issues, and those need to be sorted out first or we're just trying to build a castle on quicksand. For bridges itself I don't think the current locking works. You're trying to really cleverly hide it all behind a normal-looking bridge driver, but there's many things beyond that which will blow up if bridges just disappear. Most importantly the bridge states part of an atomic update. Now in drm we have drm_connector as the only hotunpluggable thing, and it took years to sort out all the issues. I think we should either model the bridge hotunplug locking after that, or just outright reuse the connector locking and lifetime rules. I much prefer the latter personally. Anyway the big issues: - We need to refcount the hotpluggable bridges, because software (like atomic state updates) might hang onto pointers for longer than the bridge physically exists. Assuming that you can all tear it down synchronously will not work. If we reuse connector locking/lifetime then we could put the hotpluggable part of the bridge chain into the drm_connector, since that already has refcounting as needed. It would mean that finding the next bridge in the chain becomes a lot more tricky though. With that model we'd create a new connector every time the bridge is hotplugged, which I think is also the cleaner model (because you might plug in a hdmi connector after a panel, so things like the connector type change). - No notifiers please. The create a locking mess with inversions, and especially for hotunplug they create the illusion that you can synchronously keep up to date with hardware state. That's not possible. Fundamentally all bridge drivers which might be hotunplugged need to be able to cope with the hardware disappearing any momemnt. Most likely changes/fixes we need to make overlay hotunload work will impact how exactly this works all ... Also note that the entire dance around correctly stopping userspace from doing modesets on, see all the relevant changes in update_connector_routing(). Relying on hotplugging connectors will sort out a lot of these issues in a consistent way. - Related to this: You're not allowed to shut down hardware behind the user's back with drm_atomic_helper_shutdown. We've tried that approach with dp mst, it really pisses off userspace when a page_flip that it expected to work doesn't work. - There's also the design aspect that in atomic, only atomic_check is allowed to fail, atomic_commit must succeed, even when the hardware is gone. Using connectors and their refcounting should help with that. - Somewhat aside, but I noticed that the bridge->atomic_reset is in drm_bridge_attach, and that's kinda the wrong place. It should be in drm_mode_config_reset, like all the other ->atomic_reset hooks. That would make it a lot clearer that we need to figure out who/when ->atomic_reset should be called for hotplugged bridges, maybe as part of connector registration when the entire bridge and it's new connector is assembled? - Finally this very much means we need to rethink who/how the connector for a bridge is created. The new design is that the main driver creates this connector, once the entire bridge exists. But with hotplugging this gets a lot more complicated, so we might want to extract a pile of that encoder related code from drivers (same way dp mst helpers take care of connector creation too, it's just too much of a mess otherwise). The current bridge chaining infrastructure requires a lot of hand-rolled code in each bridge driver and the encoder, so that might be a good thing anyway. - Finally I think the entire bridge hotplug infrastructure should be irrespective of the underlying bus. Which means for the mipi dsi case we might also want to look into what's missing to make mipi dsi hotunpluggable, at least for the case where it's a proper driver. I think we should ignore the old bridge model where driver's stitched it all toghether using the component framework, in my opinion that approach should be deprecated. - Finally I think we should have a lot of safety checks, like only bridges which declare themselve to be hotunplug safe should be allowed as a part of the hotpluggable bridge chain part. All others must still be attached before the entire driver is registered with drm_dev_register. Or that we only allow bridges with the NO_CONNECTOR flag for drm_bridge_attach. There's probably a pile more fundamental issues I've missed, but this should get a good discussion started. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch