(I am going to try responding to this tomorrow btw. I haven't been super busy this week, but this has been a surprisingly difficult email to respond to because I need to actually need to do a deep dive some of the MST helpers tomorrow to figure out more of the specifics on why I realized we couldn't just hot add/remove port->connector here). On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote: > [Public] > > > -----Original Message----- > > From: Lyude Paul <lyude@xxxxxxxxxx> > > Sent: Tuesday, August 24, 2021 5:18 AM > > To: Lin, Wayne <Wayne.Lin@xxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@xxxxxxx>; Wentland, Harry > > <Harry.Wentland@xxxxxxx>; Zuo, Jerry > > <Jerry.Zuo@xxxxxxx>; Wu, Hersen <hersenxs.wu@xxxxxxx>; Juston Li > > <juston.li@xxxxxxxxx>; Imre Deak <imre.deak@xxxxxxxxx>; > > Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>; Daniel Vetter > > <daniel.vetter@xxxxxxxx>; Sean Paul <sean@xxxxxxxxxx>; Maarten Lankhorst > > <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard <mripard@xxxxxxxxxx>; > > Thomas Zimmermann <tzimmermann@xxxxxxx>; > > David Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; Deucher, > > Alexander <Alexander.Deucher@xxxxxxx>; Siqueira, > > Rodrigo <Rodrigo.Siqueira@xxxxxxx>; Pillai, Aurabindo > > <Aurabindo.Pillai@xxxxxxx>; Bas Nieuwenhuizen > > <bas@xxxxxxxxxxxxxxxxxxx>; Cornij, Nikola <Nikola.Cornij@xxxxxxx>; Jani > > Nikula <jani.nikula@xxxxxxxxx>; Manasi Navare > > <manasi.d.navare@xxxxxxxxx>; Ankit Nautiyal <ankit.k.nautiyal@xxxxxxxxx>; > > José Roberto de Souza <jose.souza@xxxxxxxxx>; Sean > > Paul <seanpaul@xxxxxxxxxxxx>; Ben Skeggs <bskeggs@xxxxxxxxxx>; > > stable@xxxxxxxxxxxxxxx > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected > > end device > > > > [snip] > > > > I think I might still be misunderstanding something, some comments below > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote: > > > > > Hi Lyude, > > > > > > > > > > Really thankful for willing to explain in such details. Really > > > > > appreciate. > > > > > > > > > > I'm trying to fix some problems that observed after these 2 > > > > > patches > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector() > > > > > callback > > > > > * 72dc0f51591 drm/dp_mst: Remove > > > > > drm_dp_mst_topology_cbs.destroy_connector > > > > > > > > > > With above patches, we now change to remove dc_sink when connector > > > > > is about to be destroyed. However, we found out that connectors > > > > > won't get destroyed after hotplugs. Thus, after few times > > > > > hotplugs, we won't create any new dc_sink since number of sink is > > > > > exceeding our limitation. As the result of that, I'm trying to > > > > > figure out why the refcount of connectors won't get zero. > > > > > > > > > > Based on my analysis, I found out that if we connect a sst monitor > > > > > to a mst hub then connect the hub to the system, and then unplug > > > > > the sst monitor from the hub. E.g. > > > > > src - mst hub - sst monitor => src - mst hub (unplug) sst monitor > > > > > > > > > > Within this case, we won't try to put refcount of the sst monitor. > > > > > Which is what I tried to resolve by [PATCH 3/4]. > > > > > But here comes a problem which is confusing me that if I can > > > > > destroy connector in this case. By comparing to another case, if > > > > > now mst hub is connected with a mst monitor like this: > > > > > src - mst hub - mst monitor => src - mst hub (unplug) mst monitor > > > > > > > > > > We will put the topology refcount of mst monitor's branching unit > > > > > in and > > > > > drm_dp_port_set_pdt() and eventually call > > > > > drm_dp_delayed_destroy_port() to unregister the connector of the > > > > > logical port. So following the same rule, I think to dynamically > > > > > unregister a mst connector is what we want and should be > > > > > reasonable to also destroy sst connectors in my case. But this > > > > > conflicts the idea what we have here. We want to create connectors > > > > > for all output ports. > > > > > So if dynamically creating/destroying connectors is what we want, > > > > > when is the appropriate time for us to create one is what I'm > > > > > considering. > > > > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance. This > > > > > hub, internally, is constructed by 3 1-to-2 mst branch chips. 2 > > > > > output ports of 1st chip are hardwired to another 2 chips. It's > > > > > how it makes it to support 1-to-4 mst branching. So within this > > > > > case, the internal > > > > > 2 output ports of 1st chip is not connecting to a stream sink and > > > > > will never get connected to one. Thus, I'm thinking maybe the > > > > > best timing to attach a connector to a port is when the port is > > > > > connected, and the connected PDT is determined as a stream sink. > > > > > > > > > > Sorry if I misunderstand anything here and really thanks for your > > > > > time to shed light on this : ) Thanks Lyude. > > > > > > > > It's no problem, it is my job after all! Sorry for how long my > > > > responses have been taking, but my plate seems to be finally > > > > clearing up for the foreseeable future. > > > > > > > > That being said - it sounds like with this we still aren't actually > > > > clear on where the topology refcount leak is happening - only when > > > > it's happening, which says to me that's the issue we really need to > > > > be figuring out the cause of as opposed to trying to workaround it. > > > > > > > > Actually - refcount leaks is an issue I've ran into a number of > > > > times before in the past, so a while back I actually added some nice > > > > debugging features to assist with debugging leaks. If you enable the > > > > following options in your kernel config: > > > > > > > > CONFIG_EXPERT=y # This must be set first before the next option > > > > CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y > > > > > > > > Unfortunately, I'm suddenly realizing after typing this that > > > > apparently I never bothered adding a way for us to debug the > > > > > refcounts of ports/mstbs that haven't been released yet - only the > > > > ones for ones that have. This shouldn't be difficult at all for me > > > > to add, so I'll send you a patch either today or at the start of > > > > next week to try debugging with using this, and then we can figure > > > > out where this leak is really coming from. > > > > > > Thanks Lyude! > > > > > > Sorry to bother you, but I would like to clarify this again. So it > > > sounds > > > > It's no problem! It's my job and I'm happy to help :). > > Thanks! > I would like to learn more from you as below : p > > > > > like you also agree that we should destroy associated connector > > > > Not quite. I think a better way of explaining this might be to point out > > that the lifetime of an MST port and its connector isn't supposed > > to be determined by whether or not it has something plugged into it - its > > lifetime is supposed to depend on whether there's a valid > > path from us down the MST topology to the port we're trying to reach. So > > an MSTB with ports that is unplugged would destroy all of > > its ports - but an unplugged port should just be the same as a > > disconnected DRM connector - even if the port itself is just hosting a > > branching device. > > This is the part a bit difficult to me. I treat DRM connector as the place > where we associate with a stream sink. So if the statement > is "All DP mst output ports are places we connect with stream sink", I would > say false to this since I can find the negative example when > output port is connected with mst branch device. Thus, looks like we could > only determine whether to create a connector for an output > port when the peer device type is known? > > > > Additionally - we don't want to try "delaying" connector creation either. > > In the modern world hotplugging is almost always reliable in > > normal situations, but even so there's still use cases for wanting force > > probing for analog devices on DP converters and just in general > > as it's a feature commonly used by developers or users working around > > monitors with problematic HPD issues or EDID issues. > > I think I understand that why we want to create connectors for all output > ports here. But under these mentioned use cases, aren't we still > capable to force connector to enable stream? MST hub with muti-functon > capability, it will enumerate connected virtual DP peer device. > For problematic HPD issues or EDID issues, their connection status is also > connected. > > My understanding of output port is it is an internal node to help construct > an end-to-end virtual channel between a stream source device > and a stream sink device. Creating connectors for internal nodes within a > virtual channel is a bit hard for me to get the idea. Please correct > me if I misunderstand anything here. Thanks Lyude! > > > > > when we unplug sst monitor from a mst hub in the case that I > > > described? In the case I described (unplug sst monitor), we only > > > receive CSN from the hub that notifying us the connection status of > > > one of its downstream output ports is changed to disconnected. There > > > is no topology refcount needed to be decreased on this disconnected > > > port but the malloc refcount. Since the output port is still declared > > > by > > > > Apologies - I misunderstood your original mail as implying that topology > > refcounts were being leaked - but it sounds like it's actually > > malloc refcounts being leaked instead? In any case - that means we're > > still tracing down a leak, just a malloc ref leak. > > > > But, this still doesn't totally make sense to me. Malloc refs only keep > > the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in > > memory. > > Nothing else is kept around, meaning the DRM connector (and I assume by > > proxy, the dc_sink) should both be getting dropped still > > and the only thing that should be leaked is a memory allocation. These > > things should instead be dropped once there's no longer any > > topology references around. So, are we _sure_ that the problem here is a > > missing > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()? > > Just my two cents, I don't think it's leak of malloc ref neither. As you > said, malloc ref is dealing with the last step to free port/mstb. > If there is still topology refcount on port/mstb in my case, we won't free > port/mstb. > > > > If we are unfortunately we don't have equivalent tools for malloc() > > tracing. I'm totally fine with trying to add some if we have trouble > > figuring out this issue, but I'm a bit suspicious of the commits you > > mentioned that introduced this problem. If the problem doesn't > > happen until those two commits, then it's something in the code changes > > there that are causing this problem. > > I think we probably also have the problem before these commits, but we > didn't notice this before. Just when we change to clean up all > things in dm_dp_mst_connector_destroy(), I start to try to figure out all > these things out. > > > > The main thing I'm suspicious of just from looking at changes in > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to > > amdgpu_dm_update_freesync_caps() that was previously in > > dm_dp_destroy_mst_connector() appears to be dropped and not re-added in > > (oh dear, this is a /very/ confusingly similar function > > Lol. I also have hard time on this.. > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this was > > intentional on my part, but does adding a call back to > > amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector() right > > before the dc_link_remove_remote_sink() call fix > > anything? > > > > As well, I'm far less suspicious of this one but does re-adding this > > hunk: > > > > aconnector->dc_sink = NULL; > > aconnector->dc_link->cur_link_settings.lane_count = 0; > > > > After dc_sink_release() fix anything either? > > So the main problem is we don't have chance to call > dc_link_remove_remote_sink() in the unplugging SST case. We only have chance > to > remove the remote sink of a link when unplug a mstb. > > > > > the mst hub, I think we shouldn't destroy the port. Actually, no > > > ports nor mst branch devices should get destroyed in this case I think. > > > The result of LINK_ADDRESS is still the same before/after removing the > > > sst monitor except the DisplayPort_Device_Plug_Status/ > > > Legacy_Device_Plug_Status. > > > > > > Hence, if you agree that we should put refcount of the connector of > > > the disconnected port within the unplugging sst monitor case to > > > release the allocated resource, it means we don't want to create > > > connectors for those disconnected ports. Which conflicts current flow > > > to create connectors for all declared output ports. > > > > > > Thanks again for your time Lyude! > > > > -- > > Cheers, > > Lyude Paul (she/her) > > Software Engineer at Red Hat > -- > Regards, > Wayne > -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat