OK - got sidetracked by an issue at work but I just resumed working on this today, should hopefully have it done at the start of next week at the latest (hooray for having time to do things upstream again! :). On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote: > [Public] > > > -----Original Message----- > > From: Lyude Paul <lyude@xxxxxxxxxx> > > Sent: Thursday, September 2, 2021 6:00 AM > > To: Lin, Wayne <Wayne.Lin@xxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@xxxxxxx>; Wentland, Harry > > <Harry.Wentland@xxxxxxx>; Zuo, Jerry > > <Jerry.Zuo@xxxxxxx>; Wu, Hersen <hersenxs.wu@xxxxxxx>; Juston Li > > <juston.li@xxxxxxxxx>; Imre Deak <imre.deak@xxxxxxxxx>; > > Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>; Daniel Vetter > > <daniel.vetter@xxxxxxxx>; Sean Paul <sean@xxxxxxxxxx>; Maarten Lankhorst > > <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard <mripard@xxxxxxxxxx>; > > Thomas Zimmermann <tzimmermann@xxxxxxx>; > > David Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; Deucher, > > Alexander <Alexander.Deucher@xxxxxxx>; Siqueira, > > Rodrigo <Rodrigo.Siqueira@xxxxxxx>; Pillai, Aurabindo > > <Aurabindo.Pillai@xxxxxxx>; Bas Nieuwenhuizen > > <bas@xxxxxxxxxxxxxxxxxxx>; Cornij, Nikola <Nikola.Cornij@xxxxxxx>; Jani > > Nikula <jani.nikula@xxxxxxxxx>; Manasi Navare > > <manasi.d.navare@xxxxxxxxx>; Ankit Nautiyal <ankit.k.nautiyal@xxxxxxxxx>; > > José Roberto de Souza <jose.souza@xxxxxxxxx>; Sean > > Paul <seanpaul@xxxxxxxxxxxx>; Ben Skeggs <bskeggs@xxxxxxxxxx>; > > stable@xxxxxxxxxxxxxxx > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected > > end device > > > > Actually - did some more thinking, and I think we shouldn't try to make > > changes like this until we actually know what the problem is > > here. I could try to figure out what the actual race conditions I was > > facing before with trying to add/destroy connectors based on PDT, > > but we still don't even actually have a clear idea of what's broken here. > > I'd much rather us figure out exactly how this leak is > > happening before considering making changes like this, because we have no > > way of knowing if we've properly fixed it or not if we > > don't know what the problem is in the first place. > > > > I'm still happy to write up the topology debugging stuff I mentioned to > > you if you think that would help you debug this issue - since > > that would make it a lot easier for you to track down what references are > > keeping a connector alive (and additkionally, where those > > references were taken in code. thanks > > stack_depot!) > Hi Lyude, > Sorry for late response. A bit busy on other stuff recently.. > > Really really thankful for all your help : ) I'm also glad to have the > debugging tool if it won’t bother you too much. But before debugging, > I need to have consensus with you about where do we expect to release > resource allocated for a stream sink when it's reported as > disconnected. Previous patch suggests releasing resource when connector is > destroyed which will happen when topology refcount > reaches zero (i.e. unplug mstb from topology). But when the case is > receiving CSN notifying connection change, we don't try to destroy > connector in this case now. And this is not caused by topology/malloc > refcount leak since I don't expect neither one of them get > decrease to zero under this case (topology of mstbs and ports is not > changed). Hence, my plan was to also try to destroy connector under > this case and the reason is reasonable to me as described in previous mail. > With this patch set, I can see connectors eventually get > successfully destroyed after userspace committing set_crtc() to free > connectors (although also need a fix on the connector refcount > grabbed by drm_client_modeset_probe() under specific scenario). > > I think the main problem I encountered here is that I couldn't find a place > that notify us to release resource allocated for a disconnected > stream sink when receive CSN. If we decide not to destroy connector under > this case, then I probably need some guidance about where > to do the release work. > > Thanks again Lyude! > > > > On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote: > > > (I am going to try responding to this tomorrow btw. I haven't been > > > super busy this week, but this has been a surprisingly difficult email > > > to respond to because I need to actually need to do a deep dive some > > > of the MST helpers tomorrow to figure out more of the specifics on why > > > I realized we couldn't just hot add/remove port->connector here). > > > > > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote: > > > > [Public] > > > > > > > > > -----Original Message----- > > > > > From: Lyude Paul <lyude@xxxxxxxxxx> > > > > > Sent: Tuesday, August 24, 2021 5:18 AM > > > > > To: Lin, Wayne <Wayne.Lin@xxxxxxx>; > > > > > dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@xxxxxxx>; Wentland, > > > > > Harry <Harry.Wentland@xxxxxxx>; Zuo, Jerry <Jerry.Zuo@xxxxxxx>; > > > > > Wu, Hersen <hersenxs.wu@xxxxxxx>; Juston Li <juston.li@xxxxxxxxx>; > > > > > Imre Deak <imre.deak@xxxxxxxxx>; Ville Syrjälä > > > > > <ville.syrjala@xxxxxxxxxxxxxxx>; Daniel Vetter > > > > > <daniel.vetter@xxxxxxxx>; Sean Paul <sean@xxxxxxxxxx>; Maarten > > > > > Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard > > > > > <mripard@xxxxxxxxxx>; Thomas Zimmermann <tzimmermann@xxxxxxx>; > > > > > David Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; > > > > > Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Siqueira, Rodrigo > > > > > <Rodrigo.Siqueira@xxxxxxx>; Pillai, Aurabindo > > > > > <Aurabindo.Pillai@xxxxxxx>; Bas Nieuwenhuizen > > > > > <bas@xxxxxxxxxxxxxxxxxxx>; Cornij, Nikola <Nikola.Cornij@xxxxxxx>; > > > > > Jani Nikula <jani.nikula@xxxxxxxxx>; Manasi Navare > > > > > <manasi.d.navare@xxxxxxxxx>; Ankit Nautiyal > > > > > <ankit.k.nautiyal@xxxxxxxxx>; José Roberto de Souza > > > > > <jose.souza@xxxxxxxxx>; Sean Paul <seanpaul@xxxxxxxxxxxx>; Ben > > > > > Skeggs <bskeggs@xxxxxxxxxx>; stable@xxxxxxxxxxxxxxx > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for > > > > > connected end device > > > > > > > > > > [snip] > > > > > > > > > > I think I might still be misunderstanding something, some comments > > > > > below > > > > > > > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote: > > > > > > > > Hi Lyude, > > > > > > > > > > > > > > > > Really thankful for willing to explain in such details. > > > > > > > > Really appreciate. > > > > > > > > > > > > > > > > I'm trying to fix some problems that observed after these 2 > > > > > > > > patches > > > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove > > > > > > > > ->destroy_connector() callback > > > > > > > > * 72dc0f51591 drm/dp_mst: Remove > > > > > > > > drm_dp_mst_topology_cbs.destroy_connector > > > > > > > > > > > > > > > > With above patches, we now change to remove dc_sink when > > > > > > > > connector is about to be destroyed. However, we found out > > > > > > > > that connectors won't get destroyed after hotplugs. Thus, > > > > > > > > after few times hotplugs, we won't create any new dc_sink > > > > > > > > since number of sink is exceeding our limitation. As the > > > > > > > > result of that, I'm trying to figure out why the refcount of > > > > > > > > connectors won't get zero. > > > > > > > > > > > > > > > > Based on my analysis, I found out that if we connect a sst > > > > > > > > monitor to a mst hub then connect the hub to the system, and > > > > > > > > then unplug the sst monitor from the hub. E.g. > > > > > > > > src - mst hub - sst monitor => src - mst hub (unplug) sst > > > > > > > > monitor > > > > > > > > > > > > > > > > Within this case, we won't try to put refcount of the sst > > > > > > > > monitor. > > > > > > > > Which is what I tried to resolve by [PATCH 3/4]. > > > > > > > > But here comes a problem which is confusing me that if I can > > > > > > > > destroy connector in this case. By comparing to another > > > > > > > > case, if now mst hub is connected with a mst monitor like > > > > > > > > this: > > > > > > > > src - mst hub - mst monitor => src - mst hub (unplug) mst > > > > > > > > monitor > > > > > > > > > > > > > > > > We will put the topology refcount of mst monitor's branching > > > > > > > > unit in and > > > > > > > > drm_dp_port_set_pdt() and eventually call > > > > > > > > drm_dp_delayed_destroy_port() to unregister the connector of > > > > > > > > the logical port. So following the same rule, I think to > > > > > > > > dynamically unregister a mst connector is what we want and > > > > > > > > should be reasonable to also destroy sst connectors in my > > > > > > > > case. But this conflicts the idea what we have here. We want > > > > > > > > to create connectors for all output ports. > > > > > > > > So if dynamically creating/destroying connectors is what we > > > > > > > > want, when is the appropriate time for us to create one is > > > > > > > > what I'm considering. > > > > > > > > > > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance. > > > > > > > > This hub, internally, is constructed by 3 1-to-2 mst branch > > > > > > > > chips. 2 output ports of 1st chip are hardwired to another 2 > > > > > > > > chips. It's how it makes it to support 1-to-4 mst branching. > > > > > > > > So within this case, the internal > > > > > > > > 2 output ports of 1st chip is not connecting to a stream > > > > > > > > sink and will never get connected to one. Thus, I'm > > > > > > > > thinking maybe the best timing to attach a connector to a > > > > > > > > port is when the port is connected, and the connected PDT is > > > > > > > > determined as a stream sink. > > > > > > > > > > > > > > > > Sorry if I misunderstand anything here and really thanks for > > > > > > > > your time to shed light on this : ) Thanks Lyude. > > > > > > > > > > > > > > It's no problem, it is my job after all! Sorry for how long my > > > > > > > responses have been taking, but my plate seems to be finally > > > > > > > clearing up for the foreseeable future. > > > > > > > > > > > > > > That being said - it sounds like with this we still aren't > > > > > > > actually clear on where the topology refcount leak is > > > > > > > happening - only when it's happening, which says to me that's > > > > > > > the issue we really need to be figuring out the cause of as > > > > > > > opposed to trying to workaround it. > > > > > > > > > > > > > > Actually - refcount leaks is an issue I've ran into a number > > > > > > > of times before in the past, so a while back I actually added > > > > > > > some nice debugging features to assist with debugging leaks. > > > > > > > If you enable the following options in your kernel config: > > > > > > > > > > > > > > CONFIG_EXPERT=y # This must be set first before the next > > > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y > > > > > > > > > > > > > > Unfortunately, I'm suddenly realizing after typing this that > > > > > > > apparently I never bothered adding a way for us to debug the > > > > > > > > > > > refcounts of ports/mstbs that haven't been released yet - only > > > > > > > the ones for ones that have. This shouldn't be difficult at > > > > > > > all for me to add, so I'll send you a patch either today or at > > > > > > > the start of next week to try debugging with using this, and > > > > > > > then we can figure out where this leak is really coming from. > > > > > > > > > > > > Thanks Lyude! > > > > > > > > > > > > Sorry to bother you, but I would like to clarify this again. So > > > > > > it sounds > > > > > > > > > > It's no problem! It's my job and I'm happy to help :). > > > > > > > > Thanks! > > > > I would like to learn more from you as below : p > > > > > > > > > > > like you also agree that we should destroy associated connector > > > > > > > > > > Not quite. I think a better way of explaining this might be to > > > > > point out that the lifetime of an MST port and its connector isn't > > > > > supposed to be determined by whether or not it has something > > > > > plugged into it - its lifetime is supposed to depend on whether > > > > > there's a valid path from us down the MST topology to the port > > > > > we're trying to reach. So an MSTB with ports that is unplugged > > > > > would destroy all of its ports - but an unplugged port should just > > > > > be the same as a disconnected DRM connector - even if the port > > > > > itself is just hosting a branching device. > > > > > > > > This is the part a bit difficult to me. I treat DRM connector as the > > > > place where we associate with a stream sink. So if the statement is > > > > "All DP mst output ports are places we connect with stream sink", I > > > > would say false to this since I can find the negative example when > > > > output port is connected with mst branch device. Thus, looks like we > > > > could only determine whether to create a connector for an output > > > > port when the peer device type is known? > > > > > > > > > > Additionally - we don't want to try "delaying" connector creation > > > > > either. > > > > > In the modern world hotplugging is almost always reliable in > > > > > normal situations, but even so there's still use cases for wanting > > > > > force probing for analog devices on DP converters and just in > > > > > general as it's a feature commonly used by developers or users > > > > > working around monitors with problematic HPD issues or EDID issues. > > > > > > > > I think I understand that why we want to create connectors for all > > > > output ports here. But under these mentioned use cases, aren't we > > > > still capable to force connector to enable stream? MST hub with > > > > muti-functon capability, it will enumerate connected virtual DP peer > > > > device. > > > > For problematic HPD issues or EDID issues, their connection status > > > > is also connected. > > > > > > > > My understanding of output port is it is an internal node to help > > > > construct an end-to-end virtual channel between a stream source > > > > device and a stream sink device. Creating connectors for internal > > > > nodes within a virtual channel is a bit hard for me to get the idea. > > > > Please correct me if I misunderstand anything here. Thanks Lyude! > > > > > > > > > > > when we unplug sst monitor from a mst hub in the case that I > > > > > > described? In the case I described (unplug sst monitor), we only > > > > > > receive CSN from the hub that notifying us the connection status > > > > > > of one of its downstream output ports is changed to > > > > > > disconnected. There is no topology refcount needed to be > > > > > > decreased on this disconnected port but the malloc refcount. > > > > > > Since the output port is still declared by > > > > > > > > > > Apologies - I misunderstood your original mail as implying that > > > > > topology refcounts were being leaked - but it sounds like it's > > > > > actually malloc refcounts being leaked instead? In any case - that > > > > > means we're still tracing down a leak, just a malloc ref leak. > > > > > > > > > > But, this still doesn't totally make sense to me. Malloc refs only > > > > > keep the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in > > > > > memory. > > > > > Nothing else is kept around, meaning the DRM connector (and I > > > > > assume by proxy, the dc_sink) should both be getting dropped still > > > > > and the only thing that should be leaked is a memory allocation. > > > > > These things should instead be dropped once there's no longer any > > > > > topology references around. So, are we _sure_ that the problem > > > > > here is a missing > > > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()? > > > > > > > > Just my two cents, I don't think it's leak of malloc ref neither. As > > > > you said, malloc ref is dealing with the last step to free port/mstb. > > > > If there is still topology refcount on port/mstb in my case, we > > > > won't free port/mstb. > > > > > > > > > > If we are unfortunately we don't have equivalent tools for > > > > > malloc() tracing. I'm totally fine with trying to add some if we > > > > > have trouble figuring out this issue, but I'm a bit suspicious of > > > > > the commits you mentioned that introduced this problem. If the > > > > > problem doesn't happen until those two commits, then it's > > > > > something in the code changes there that are causing this problem. > > > > > > > > I think we probably also have the problem before these commits, but > > > > we didn't notice this before. Just when we change to clean up all > > > > things in dm_dp_mst_connector_destroy(), I start to try to figure > > > > out all these things out. > > > > > > > > > > The main thing I'm suspicious of just from looking at changes in > > > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to > > > > > amdgpu_dm_update_freesync_caps() that was previously in > > > > > dm_dp_destroy_mst_connector() appears to be dropped and not > > > > > re-added in (oh dear, this is a /very/ confusingly similar > > > > > function > > > > > > > > Lol. I also have hard time on this.. > > > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this > > > > > was intentional on my part, but does adding a call back to > > > > > amdgpu_dm_update_freesync_caps() into > > > > > dm_dp_destroy_mst_connector() right before the > > > > > dc_link_remove_remote_sink() call fix anything? > > > > > > > > > > As well, I'm far less suspicious of this one but does re-adding > > > > > this > > > > > hunk: > > > > > > > > > > aconnector->dc_sink = NULL; > > > > > aconnector->dc_link->cur_link_settings.lane_count = 0; > > > > > > > > > > After dc_sink_release() fix anything either? > > > > > > > > So the main problem is we don't have chance to call > > > > dc_link_remove_remote_sink() in the unplugging SST case. We only > > > > have chance to remove the remote sink of a link when unplug a mstb. > > > > > > > > > > > the mst hub, I think we shouldn't destroy the port. Actually, > > > > > > no ports nor mst branch devices should get destroyed in this > > > > > > case I think. > > > > > > The result of LINK_ADDRESS is still the same before/after > > > > > > removing the sst monitor except the > > > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status. > > > > > > > > > > > > Hence, if you agree that we should put refcount of the connector > > > > > > of the disconnected port within the unplugging sst monitor case > > > > > > to release the allocated resource, it means we don't want to > > > > > > create connectors for those disconnected ports. Which conflicts > > > > > > current flow to create connectors for all declared output ports. > > > > > > > > > > > > Thanks again for your time Lyude! > > > > > > > > > > -- > > > > > Cheers, > > > > > Lyude Paul (she/her) > > > > > Software Engineer at Red Hat > > > > -- > > > > Regards, > > > > Wayne > > > > > > > > > > > -- > > Cheers, > > Lyude Paul (she/her) > > Software Engineer at Red Hat > -- > Regards, > Wayne Lin > -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat