On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul <lyude@xxxxxxxxxx> wrote: > > Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I > accidentally introduced into DRM two years ago. > > Pretend we have a topology like this: > > |- DP-1: mst_primary > |- DP-4: active display > |- DP-5: disconnected > |- DP-6: active hub > |- DP-7: active display > |- DP-8: disconnected > |- DP-9: disconnected > > If we unplug DP-6, the topology starting at DP-7 will be destroyed but > it's payloads will live on in DP-1's VCPI allocations and thus require > removal. However, this removal currently fails because > drm_dp_update_payload_part1() will (rightly so) try to validate the port > before accessing it, fail then abort. If we keep going, eventually we > run the MST hub out of bandwidth and all new allocations will start to > fail (or in my case; all new displays just start flickering a ton). > > We could just teach drm_dp_update_payload_part1() not to drop the port > ref in this case, but then we also need to teach > drm_dp_destroy_payload_step1() to do the same thing, then hope no one > ever adds anything to the that requires a validated port reference in > drm_dp_destroy_connector_work(). Kind of sketchy. > > So let's go with a more clever solution: any port that > drm_dp_destroy_connector_work() interacts with is guaranteed to still > exist in memory until we say so. While said port might not be valid we > don't really care: that's the whole reason we're destroying it in the > first place! So, teach drm_dp_get_validated_port_ref() to use the all > mighty current_work() function to avoid attempting to validate ports > from the context of mgr->destroy_connector_work. I can't see any > situation where this wouldn't be safe, and this avoids having to play > whack-a-mole in the future of trying to work around port validation. > > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx> > Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") > Reported-by: Jerry Zuo <Jerry.Zuo@xxxxxxx> > Cc: Jerry Zuo <Jerry.Zuo@xxxxxxx> > Cc: Harry Wentland <Harry.Wentland@xxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v4.6+ Hm, sounds very similar to the bug I pointed out in your "make vcpi alloc more atomic" series. Will this all interact nicely with the solution we've worked out there (where we need to delay at least the kfree(port) until the last vcpi allocation is released? -Daniel > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c > index 529414556962..08978ad72f33 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ > static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) > { > struct drm_dp_mst_port *rport = NULL; > + > mutex_lock(&mgr->lock); > - if (mgr->mst_primary) > - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port); > + /* > + * Port may or may not be 'valid' but we don't care about that when > + * destroying the port and we are guaranteed that the port pointer > + * will be valid until we've finished > + */ > + if (current_work() == &mgr->destroy_connector_work) { > + kref_get(&port->kref); > + rport = port; > + } else if (mgr->mst_primary) { > + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, > + port); > + } > mutex_unlock(&mgr->lock); > return rport; > } > -- > 2.19.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch