Re: [PATCH v2 0/5] Add support for GE SUNH hot-pluggable connector (was: "drm: add support for hot-pluggable bridges")

Luca Ceresoli <luca.ceresoli@xxxxxxxxxxx> · Fri, 23 Aug 2024 12:39:03 +0200

Hello Sima,

these days I started looking in more detail into some of the topics you
had mentioned in your v2 review. I have questions about those I have
investigated, see below.

On Tue, 21 May 2024 14:01:19 +0200
Daniel Vetter <daniel@xxxxxxxx> wrote:

> On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote:
> > Hello Daniel,
> > 
> > On Thu, 16 May 2024 15:22:01 +0200
> > Daniel Vetter <daniel@xxxxxxxx> wrote:
> >   
> > > Apologies for missing v1 ...
> > > 
> > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:  
> > > > DRM hotplug bridge driver
> > > > =========================
> > > > 
> > > > DRM natively supports pipelines whose display can be removed, but all the
> > > > components preceding it (all the display controller and any bridges) are
> > > > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > > > 
> > > > This series adds support for DRM pipelines having a removable part after
> > > > the encoder, thus also allowing bridges to be removed and reconnected at
> > > > runtime, possibly with different components.
> > > > 
> > > > This picture summarizes the  DRM structure implemented by this series:
> > > > 
> > > >  .------------------------.
> > > >  |   DISPLAY CONTROLLER   |
> > > >  | .---------.   .------. |
> > > >  | | ENCODER |<--| CRTC | |
> > > >  | '---------'   '------' |
> > > >  '------|-----------------'
> > > >         |
> > > >         |DSI            HOTPLUG
> > > >         V              CONNECTOR
> > > >    .---------.        .--.    .-.        .---------.         .-------.
> > > >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> > > >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> > > >    |         |        |  |    | |        |         |         |       |
> > > >    '---------'        '--'    '-'        '---------'         '-------'
> > > > 
> > > >  [--- fixed components --]  [----------- removable add-on -----------]
> > > > 
> > > > Fixed components include:
> > > > 
> > > >  * all components up to the DRM encoder, usually part of the SoC
> > > >  * optionally some bridges, in the SoC and/or as external chips
> > > > 
> > > > Components on the removable add-on include:
> > > > 
> > > >  * one or more bridges
> > > >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> > > >  * the panel    
> > > 
> > > So I think at a high level this design approach makes sense,  
> > 
> > Good starting point :)
> >   
> > > but the
> > > implementation needs some serious thought. One big thing upfront though,
> > > we need to have a clear plan for the overlay hotunload issues, otherwise
> > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> > > very, very tricky, full of lifetime issues, and those need to be sorted
> > > out first or we're just trying to build a castle on quicksand.
> > > 
> > > For bridges itself I don't think the current locking works. You're trying
> > > to really cleverly hide it all behind a normal-looking bridge driver, but
> > > there's many things beyond that which will blow up if bridges just
> > > disappear. Most importantly the bridge states part of an atomic update.  
> > 
> > Surely possible as atomic updates are definitely not stimulated in my
> > use case. Can you recommend any testing tools to be able to trigger any
> > issues?  
> 
> Uh really hard ... You'd need to create an atomic commit that's blocked on
> a sync_file in-fence (so that you can extend the race window). And then
> hotunplug the bridge chain _before_ you signal that fence.
> 
> That's not going to cover all possible races, but at least a large chunk
> of the really big ones.
> 
> > The main setups I used for my testing so far are 'modetest -s' for my
> > daily work and a simple weston setup to periodically test a complete
> > user space stack.
> >   
> > > Now in drm we have drm_connector as the only hotunpluggable thing, and it
> > > took years to sort out all the issues. I think we should either model the
> > > bridge hotunplug locking after that, or just outright reuse the connector
> > > locking and lifetime rules. I much prefer the latter personally.
> > > 
> > > Anyway the big issues:
> > > 
> > > - We need to refcount the hotpluggable bridges, because software (like
> > >   atomic state updates) might hang onto pointers for longer than the
> > >   bridge physically exists. Assuming that you can all tear it down
> > >   synchronously will not work.
> > > 
> > >   If we reuse connector locking/lifetime then we could put the
> > >   hotpluggable part of the bridge chain into the drm_connector, since that
> > >   already has refcounting as needed. It would mean that finding the next
> > >   bridge in the chain becomes a lot more tricky though. With that model
> > >   we'd create a new connector every time the bridge is hotplugged, which I
> > >   think is also the cleaner model (because you might plug in a hdmi
> > >   connector after a panel, so things like the connector type change).

So, based on your dp__mst hint I added connector creation/removal in the
v3 I sent a few days ago. All other aspects in your list are unchanged
from the v2 you have reviewed.

Now I'm trying to tackle some other of the items you mentioned here,
and locking/lifetime is the hardest one for me to understand at the
moment.

If I get you right, you suggest making all the removable bridges
"owned" by the connector that gets hotplugged, the reason being that
locking and lifetime issues for the hotplug connectors have already
been sorted out, while that has not been done for bridges.

Correct?

Assuming the above is correct, I'm wondering whether this would be a
correct design or rather a shortcut to leverage the connector locking
instead of implementing bridge locking/lifetime. Note I'm not
criticizing, I'm really asking myself this question and I'd like to
know your position about that.

Again about putting the removable bridges inside the hotplug connector,
I'm trying to understand how that may happen in the device tree case.
The hot-pluggable bridge is populated starting from the DT code
(through i2c-core-of in my case being an I2C chip), and that code is
not aware that it is being instantiating a DRM device. So there is
nothing we can do before the bridge probe is called. The bridge probe
then does know much about any connectors. It also does not know about
hotplug: this is a design decision I made to make the regular bridge
drivers reusable without edits for the hotplug case, but it can be
waived if needed.

So the only way I currently to move the bridge inside the connector is
to catch it in the hotplug-bridge driver, when it gets notified about
the new bridge having appeared (was a notifier, I'm changing that, see
below). So the hotplug-bridge would need a function line
drm_connector_add_bridge(conn, br) to make the association.

Is this the direction of development you had in mind?

> > I have been investigating the option of adding/removing a connector
> > based on hot-plug/unplug events initially, see my reply to Maxime after
> > v1 [0]:
> >   
> > > The first approach is based on removing the drm_connector. My laptop
> > > uses the i915 driver, and I have observed that attaching/removing a
> > > USB-C dock with an HDMI connector connected to a monitor, a new
> > > drm_connector appears/disappears for the card. User space gets notified
> > > and the external monitor is enabled/disabled, just the way a desktop
> > > user would expect, so this is possible. I had a look at the driver but
> > > how this magic happens was not clear to me honestly.  
> > 
> > So if you could point to where/how this is done, I would certainly
> > reconsider that.  
> 
> Right now only the dp mst code uses hotplug/unplug (like you've observed
> in your testing with i915, usb-c docks are generally all dp mst). For code
> reading it might be best to start with the i915 dp mst code and then go
> through the helpers.
> 
> > > - No notifiers please. The create a locking mess with inversions, and
> > >   especially for hotunplug they create the illusion that you can
> > >   synchronously keep up to date with hardware state. That's not possible.
> > >   Fundamentally all bridge drivers which might be hotunplugged need to be
> > >   able to cope with the hardware disappearing any momemnt.  
> > 
> > Can you clarify this point? I'm sorry I fail to understand the
> > relationship between the use of notifiers and the ability of bridge
> > drivers to cope with hardware disappearing.
> > 
> > In this patch, the hotplug-bridge uses notifiers to be informed when
> > the following bridge is disappearing: which other way would you suggest
> > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
> > not meant to disappear, and it has no actual hardware that can go away,
> > being just a mechanical connector.  
> 
> Yeah you need code to handle that. My point is that using a notifier is
> the wrong design, because the notifier has it's own locking. Which tends
> to get in the way.

I went into this subject to see how drm_bridge_add() could inform the
interested bridges using a DRM-specific mechanism instead of standard
notifiers.

However I think to inform *the interested* bridges, at least in the
device tree case, there is a fundamental issue: DRM core has no idea
about the topology. Definitely not at drm_bridge_add() time, way before
drm_bridge_attach() where the 'previous' pointer introduces a minimum
of topology awareness in the DRM core.

Option 1 is to iterate over all the ports and endpoints and for every
remote-endpoint pointing to a bridge, inform the remote bridge about
that via a new optional callback in drm_bridge_funcs. That means likely
informing more bridges than needed, so when they get informed the
bridges will still have to check whether they are interested or not.

Pseudocode:

  void drm_bridge_add(struct drm_bridge *new_bridge)
  {
     // existing code unchanged

+    if (bridge->of_node) {
+       for_each_remote_endpoint(bridge->of_node) {
+          br = of_drm_find_bridge(remote_endpoint);
+          if (br && br->funcs.bridge_event_notify)
+              br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD,
+                                             new_bridge);
+       }
+    }
  }

That means informing both upstream and downstream bridges, which could
even be useful, but anyway there is no reliable way to pick only the
sink or source ports. It also implies having lots of of_*() calls which
iterate over the tree, which is not optimal, but it happens only at
add/removal so it's fine I guess.

Option 2 is be to just inform all the bridges (using the global
bridge_list). Pros and cons:

 * less of_*() calls to crawl around the remote-endpoints
 * simpler code
 * more bridges than needed would be informed, could be be an issue
   if many implement the .bridge_event_notify()
 * notifies even in the non-OF case too, not sure it's useful

What are your thoughts about these two options?

> Instead I think that code should be directly in core bridge code (I don't
> see the benefit of having this entirely in a separate driver), using drm
> locking to make sure there's no races.

Not sure I got what you mean here. Which one of the following?

 1. The entire hotplug-bridge driver should not exist, and the DRM
    core should handle it all (seems not doable, this driver has lots of
    device-specific details)
 2. The hotplug-driver should exist, but the code to attach/detach the
    pipeline on hotplug/unplug should be moved to the core, with the
    hotplug-driver providing callbacks for the device-specific aspects
 3. Same as 2, but additionally the hotplug-bridge should become a
    connector driver (hotplug-connector.c?) -- not sure it can decouple
    the two sides without a bridge however
 4. None of the above

> > On the opposite side, the following bridges are physically removable
> > and so their drivers will have to be fixed (if needed) to work when
> > removal happens, but I don't see how that relates to the DRM core
> > emitting a notification of such bridges being removed.  
> 
> Yeah it's not directly related, just my experience that people assume
> notifiers provide you more synchronization and race-preventation than they
> really do. So it's better to hand-roll to make it all really explicit.
> 
> > > - Related to this: You're not allowed to shut down hardware behind the
> > >   user's back with drm_atomic_helper_shutdown. We've tried that approach
> > >   with dp mst, it really pisses off userspace when a page_flip that it
> > >   expected to work doesn't work.
> > > 
> > > - There's also the design aspect that in atomic, only atomic_check is
> > >   allowed to fail, atomic_commit must succeed, even when the hardware is
> > >   gone. Using connectors and their refcounting should help with that.  
> > 
> > IIUC here you are suggesting again to remove the connector instead of
> > marking it "disconnected". So, as above, pointers on how that is
> > achieved would be helpful.  
> 
> See dp mst code. It's complex unfortunately, so there's some reading
> involved :-/
> >  
> > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in
> > >   drm_bridge_attach, and that's kinda the wrong place. It should be in
> > >   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
> > >   would make it a lot clearer that we need to figure out who/when  
> > >   ->atomic_reset should be called for hotplugged bridges, maybe as part of    
> > >   connector registration when the entire bridge and it's new connector is
> > >   assembled?
> > > 
> > > - Finally this very much means we need to rethink who/how the connector
> > >   for a bridge is created. The new design is that the main driver creates
> > >   this connector, once the entire bridge exists. But with hotplugging this
> > >   gets a lot more complicated, so we might want to extract a pile of that
> > >   encoder related code from drivers (same way dp mst helpers take care of
> > >   connector creation too, it's just too much of a mess otherwise).
> > > 
> > >   The current bridge chaining infrastructure requires a lot of hand-rolled
> > >   code in each bridge driver and the encoder, so that might be a good
> > >   thing anyway.
> > > 
> > > - Finally I think the entire bridge hotplug infrastructure should be
> > >   irrespective of the underlying bus. Which means for the mipi dsi case we
> > >   might also want to look into what's missing to make mipi dsi
> > >   hotunpluggable, at least for the case where it's a proper driver. I
> > >   think we should ignore the old bridge model where driver's stitched it
> > >   all toghether using the component framework, in my opinion that approach
> > >   should be deprecated.  
> > 
> > The DSI side was one of my headaches on writing this driver, and I
> > must say I dislike the code in hotplug_bridge_dsi_attach(), with the
> > need for an initial "dummy" attach and detach the first time. At first
> > sight I think we need a .update_format callback in struct
> > mipi_dsi_host_ops so the DSI host is aware of a format change.
> > 
> > Right now there are DSI host drivers which keep a copy of the struct
> > mipi_dsi_device pointer and read the format from there when starting a
> > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
> > to react to format changes, but keeping a pointer is bad when the DSI
> > device hot-unplugs, and the new format won't be seen until a
> > disable/enable cycle. So a new callback looks more robust overall.
> > 
> > Any opinion about this?  
> 
> Yeah I don't like the code either.
> 
> What might help for prototyping is if you start with a hotpluggeable
> bridge where the bridge is a i2c device. That way I think we should be
> able to benefit from the driver bind/unbind code that exists already.
> Although there's going to be issues with i2c hotunplug too in i2c core
> code.
> 
> Then lift whatever we learn there to our dsi code. But essentially I think
> we should model the driver core model a lot more, so I guess you could use
> any hotunplug capable bus as a template. Usb might be closest, since
> that's also a packet/message based interface, mostly at least?
> 
> > > - Finally I think we should have a lot of safety checks, like only bridges
> > >   which declare themselve to be hotunplug safe should be allowed as a part
> > >   of the hotpluggable bridge chain part. All others must still be attached
> > >   before the entire driver is registered with drm_dev_register.  
> > 
> > I'm fine with the principle of a "HOTPLUG" flag.
> > 
> > I wonder how that should be implemented, though. Based on my (surely
> > simplistic) understanding, right now bridges can be both added and
> > removed, but only in a specific sequence:
> > 
> >  1. add all bridges
> >  2. use the card
> >  3. remove all bridges
> >  4. EOF
> > 
> > We'd need to change to allow:
> > 
> >  1. add fixed bridges (including hotplug-bridge)
> >  2. add bridges on removable add-on
> >  3. use card
> >  4. remove bridges on removable add-on
> >  5. optionally goto 2
> >  6. remove fixed bridges (including hotplug-bridge)
> >  7. EOF
> > 
> > One naïve idea is that the DRM core keeps a flag whenever any fixed
> > bridge (no HOTLPUG flag) is removed and when that happens forbid adding
> > bridges as we are now at line 5.  
> 
> If a fixed bridge is removed while the driver is still in use (i.e. before
> drm_dev_unregister is called), that's a driver bug. Would be good to catch
> that, which is why I think a lot of all the bridge hotplug handling should
> be in core bridge code and not the separate hotplug driver, so that we can
> enforce all these constraints.
> 
> Also conceptually 3 can happen before 2 (but also before), and that's how
> dp mst works too. It does add all kinds of complications though ...
> 
> > Aside for tons of subtleties I'm surely missing, does this look a
> > proper approach? I'd not be surprised if there is something better and
> > more solid.  
> 
> Yeah roughly. If you look through dp mst code then that also adds all
> kinds of structures (since dp mst is a routed network really), not just
> the connectors. They also all come with refcounts (because the network is
> a tree), but like I said I think for your case we can avoid the per-bridge
> refcounts by relying on the connector refcount we have already.
> 
> But I might be overlooking something, and we need refcounting for each
> bridge like dp mst also needs refcounting for all its internal structures
> that map the entire hotpluggable display chain. If you want to read some
> dp mst code, these internal structures are ports/branches with struct
> drm_dp_mst_branch/port.
> 
> > >   Or that we only allow bridges with the NO_CONNECTOR flag for
> > >   drm_bridge_attach.
> > > 
> > > There's probably a pile more fundamental issues I've missed, but this
> > > should get a good discussion started.  
> > 
> > Sure, I think it has.
> > 
> > Bottom line, I'm clearly not seeing some issues that need to be
> > considered, and that do not show up for my use case. Based on my
> > limited DRM knowledge, this was expected, and I'm glad to work on those
> > issues with some practical indications about the path forward.  
> 
> Yeah for learning I think dp mst is best. It's a bit complex, but since
> you have a dock you should be able to play around and experiment with the
> code with some real hardware.
> 
> That should help to get a bit a feel for the complexity, since lots of
> opportunities for you to ask why/how locking/refcounting is used and
> against which potential issue they protect.
> 
> Cheers, Sima

Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com