On Mon, Oct 21, 2019 at 10:36:01PM -0400, Lyude Paul wrote: > This is a complicated one. Essentially, there's currently a problem in the MST > core that hasn't really caused any issues that we're aware of (emphasis on "that > we're aware of"): locking. > > When we go through and probe the link addresses and path resources in a > topology, we hold no locks when updating ports with said information. The > members I'm referring to in particular are: > > - ldps > - ddps > - mcs > - pdt > - dpcd_rev > - num_sdp_streams > - num_sdp_stream_sinks > - available_pbn > - input > - connector > > Now that we're handling UP requests asynchronously and will be using some of > the struct members mentioned above in atomic modesetting in the future for > features such as PBN validation, this is going to become a lot more important. > As well, the next few commits that prepare us for and introduce suspend/resume > reprobing will also need clear locking in order to prevent from additional > racing hilarities that we never could have hit in the past. > > So, let's solve this issue by using &mgr->base.lock, the modesetting > lock which currently only protects &mgr->base.state. This works > perfectly because it allows us to avoid blocking connection_mutex > unnecessarily, and we can grab this in connector detection paths since > it's a ww mutex. We start by having drm_dp_mst_handle_up_req() hold this > when updating ports. For drm_dp_mst_handle_link_address_port() things > are a bit more complicated. As I've learned the hard way, we can grab > &mgr->lock.base for everything except for port->connector. See, our > normal driver probing paths end up generating this rather obvious > lockdep chain: > > &drm->mode_config.mutex > -> crtc_ww_class_mutex/crtc_ww_class_acquire > -> &connector->mutex > > However, sysfs grabs &drm->mode_config.mutex in order to protect itself > from connector state changing under it. Because this entails grabbing > kn->count, e.g. the lock that the kernel provides for protecting sysfs > contexts, we end up grabbing kn->count followed by > &drm->mode_config.mutex. This ends up creating an extremely rude chain: > > &kn->count > -> &drm->mode_config.mutex > -> crtc_ww_class_mutex/crtc_ww_class_acquire > -> &connector->mutex > > I mean, look at that thing! It's just evil!!! This gross thing ends up > making any calls to drm_connector_register()/drm_connector_unregister() > impossible when holding any kind of modesetting lock. This is annoying > because ideally, we always want to ensure that > drm_dp_mst_port->connector never changes when doing an atomic commit or > check that would affect the atomic topology state so that it can > reliably and easily be used from future DRM DP MST helpers to assist > with tasks such as scanning through the current VCPI allocations and > adding connectors which need to have their allocations updated in > response to a bandwidth change or the like. > > Being able to hold &mgr->base.lock throughout the entire link probe > process would have been _great_, since we could prevent userspace from > ever seeing any states in-between individual port changes and as a > result likely end up with a much faster probe and more consistent > results from said probes. But without some rework of how we handle > connector probing in sysfs it's not at all currently possible. In the > future, maybe we can try using the sysfs locks to protect updates to > connector probing state and fix this mess. > > So for now, to protect everything other than port->connector under > &mgr->base.lock and ensure that we still have the guarantee that atomic > check/commit contexts will never see port->connector change we use a > silly trick. See: port->connector only needs to change in order to > ensure that input ports (see the MST spec) never have a ghost connector > associated with them. But, there's nothing stopping us from simply > throwing the entire port out and creating a new one in order to maintain > that requirement while still keeping port->connector consistent across > the lifetime of the port in atomic check/commit contexts. For all > intended purposes this works fine, as we validate ports in any contexts > we care about before using them and as such will end up reporting the > connector as disconnected until it's port's destruction finalizes. So, > we just do that in cases where we detect port->input has transitioned > from true->false. We don't need to worry about the other direction, > since a port without a connector isn't visible to userspace and as such > doesn't need to be protected by &mgr->base.lock until we finish > registering a connector for it. > > For updating members of drm_dp_mst_port other than port->connector, we > simply grab &mgr->base.lock in drm_dp_mst_link_probe_work() for already > registered ports, update said members and drop the lock before > potentially registering a connector and probing the link address of it's > children. > > Finally, we modify drm_dp_mst_detect_port() to take a modesetting lock > acquisition context in order to acquire &mgr->base.lock under > &connection_mutex and convert all it's users over to using the > .detect_ctx probe hooks. > > With that, we finally have well defined locking. > > Changes since v4: > * Get rid of port->mutex, stop using connection_mutex and just use our own > modesetting lock - mgr->base.lock. Also, add a probe_lock that comes > before this patch. > * Just throw out ports that get changed from an output to an input, and > replace them with new ports. This lets us ensure that modesetting > contexts never see port->connector go from having a connector to being > NULL. > * Write an extremely detailed explanation of what problems this is > trying to fix, since there's a _lot_ of context here and I honestly > forgot some of it myself a couple times. > * Don't grab mgr->lock when reading port->mstb in > drm_dp_mst_handle_link_address_port(). It's not needed. > > Cc: Juston Li <juston.li@xxxxxxxxx> > Cc: Imre Deak <imre.deak@xxxxxxxxx> > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Cc: Harry Wentland <hwentlan@xxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Sean Paul <sean@xxxxxxxxxx> > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx> Overall makes sense to me. Thanks for the comprehensive commit message and comments, they definitely help :) Just one nit below, Reviewed-by: Sean Paul <sean@xxxxxxxxxx> > --- > .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 28 +-- > drivers/gpu/drm/drm_dp_mst_topology.c | 230 ++++++++++++------ > drivers/gpu/drm/i915/display/intel_dp_mst.c | 28 ++- > drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 +-- > drivers/gpu/drm/radeon/radeon_dp_mst.c | 24 +- > include/drm/drm_dp_mst_helper.h | 38 ++- > 6 files changed, 240 insertions(+), 140 deletions(-) > /snip > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c > index 11d842f0bff5..7bf4db91ff90 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c /snip > @@ -1912,35 +1984,40 @@ drm_dp_mst_handle_link_address_port(struct drm_dp_mst_branch *mstb, > { > struct drm_dp_mst_topology_mgr *mgr = mstb->mgr; > struct drm_dp_mst_port *port; > - bool created = false; > - int old_ddps = 0; > + int old_ddps = 0, ret; > + u8 new_pdt = DP_PEER_DEVICE_NONE; > + bool created = false, send_link_addr = false; > > port = drm_dp_get_port(mstb, port_msg->port_number); > if (!port) { > - port = kzalloc(sizeof(*port), GFP_KERNEL); > + port = drm_dp_mst_add_port(dev, mgr, mstb, > + port_msg->port_number); > if (!port) > return; > - kref_init(&port->topology_kref); > - kref_init(&port->malloc_kref); > - port->parent = mstb; > - port->port_num = port_msg->port_number; > - port->mgr = mgr; > - port->aux.name = "DPMST"; > - port->aux.dev = dev->dev; > - port->aux.is_remote = true; > - > - /* > - * Make sure the memory allocation for our parent branch stays > - * around until our own memory allocation is released > + created = true; > + } else if (port_msg->input_port && !port->input && port->connector) { > + /* Destroying the connector is impossible in this context, so > + * replace the port with a new one > */ > - drm_dp_mst_get_mstb_malloc(mstb); > + drm_dp_mst_topology_unlink_port(mgr, port); > + drm_dp_mst_topology_put_port(port); > > + port = drm_dp_mst_add_port(dev, mgr, mstb, > + port_msg->port_number); > + if (!port) > + return; > created = true; > } else { > + /* Locking is only needed when the port has a connector > + * exposed to userspace > + */ > + drm_modeset_lock(&mgr->base.lock, NULL); Random musing: It's kind of unfortunate that we don't have a void varient of drm_modeset_lock for when there's no acquire_ctx since we end up with a mix of drm_modeset_lock calls with and without return checking. /snip > @@ -3441,22 +3516,31 @@ EXPORT_SYMBOL(drm_dp_mst_hpd_irq); > /** > * drm_dp_mst_detect_port() - get connection status for an MST port > * @connector: DRM connector for this port > + * @ctx: The acquisition context to use for grabbing locks > * @mgr: manager for this port > - * @port: unverified pointer to a port > + * @port: pointer to a port > * > - * This returns the current connection state for a port. It validates the > - * port pointer still exists so the caller doesn't require a reference > + * This returns the current connection state for a port. "On error, this returns -errno" /snip > -- > 2.21.0 > -- Sean Paul, Software Engineer, Google / Chromium OS _______________________________________________ Nouveau mailing list Nouveau@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/nouveau