On Wed, 2022-08-10 at 03:28 +0000, Lin, Wayne wrote: > Hi Lyude, > Thanks for your time and sorry for late response! > > It's described in 5.6.1.3 of DP spec 2.0: > "MST branch device, in addition to waiting for the ACK from its immediate > Upstream device, should either wait for the ALLOCATE_PAYLOAD message > transaction with a PBN value equal to 0 from the MST Source device for > de-allocating the time slot assigned to the VC Payload that is routed to the > unplugged DFP or for 2 seconds, whichever occurs first." oooh! Thank you for posting this, I totally missed the bit that says "or for 2 seconds, whichever occurs first." That certainly explains a lot. > > > > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by > > > ports in stale topology") was trying to skip updating payload for a > > > target which is no longer existing in the current topology rooted at > > > mgr->mst_primary. I passed "mgr->mst_primary" to > > > drm_dp_mst_port_downstream_of_branch() previously. > > > Sorry, I might not fully understand the issue you've seen. Could you > > > elaborate on this more please? > > > > > > Thanks! > > > > I will have to double check this since it's been a month, but basically - the idea > > of having the topology references in the first place was to be the one check > > for figuring out whether something's in a topology or not. I've been thinking > > of maybe trying to replace it at some point, but I think we'd want to do it all > > over the helpers instead of just in certain spots. > > > > The other thing I noticed was that when I was rewriting this code, I noticed it > > seemed a lot like we had misunderstood the issue that was causing leaks in > > the first place. The BAD_PARAM we noticed indicates the payload we're > > trying to remove on the other end doesn't exist anymore, meaning the > > branch device in question got rid of any payloads it had active in response to > > the CSN. In testing though I found that payloads would be automatically > > released in situations where the last reachable port was marked as > > disconnected via a previous CSN, but was still reachable otherwise, and not in > > any other situation. This also seemed to match up with the excerpts in the DP > > spec that I found, so I assumed it was probably correct. > > IMHO, the main root cause with the commit 3769e4c0af5b ("drm/dp_mst: Avoid > to mess up payload table by ports in stale topology") is like what described in the > commit message. The problem I encountered was when I unplugged the primary > mst branch device from the system, upper layer didn't try to release stale streams > immediately. Instead, it started to gradually release stale streams when I plugged the > mst hub back to the system. In that case, if we didn't do the check to see whether > the current request for deallocating payload is for this time topology instance, > i.e. might be for the stale topology before I unplug, this deallocation will mess up > payload allocation for new topology instance. > > As for the CSN, it's a node broadcast request message and not a path message. > Referring to 2.14.6.1 of DP 2.0 spec: > "If the broadcast message is a node request, only the end devices, DP MST > Source or Sink devices (or DP MST Branch device if Source/Sink are not plugged), > process the request." > IMHO, payload should be controlled by source only, by ALLOCATE_PAYLOAD or > CLEAR_PAYLAOD_ID_TABLE message. > > > > > Also, I think using the DDPS field instead of trying to traverse the topology > > state (which might not have been fully updated yet in response to CSNs) > > might be a slightly better idea since DDPS may end up being updated before > > the port has been removed from our in-memory topology, which is kind of > > Thank you Lyude! Just want to confirm with you the below idea to see if I > understand it correctly. > The flow I thought would be (from Source perspective): > Receive CSN for notifying disconnection event => update physical topology > connection status (e.g. DDPS, put topology krefcount..) => send hotplug event to > userspace => userspace asks deallocating payloads for disconnected stream > sinks => put malloc krefcount of disconnected ports/mstbs => remove ports/mstb > from in-memory topology. > I suppose physical topology connection status is updated before sending hotplug > event to userspace and the in-memory topology still can be referred for stale > connection status before payload deallocation completes, i.e. which will put > malloc krefcount to eventually destroy disconnected devices in topology in-memory. > I mean, ideally, sounds like the topology in-memory should be reliable when > we send ALLOCATE_PAYLOAD as PBN=0. But I understand it definitely is not the > case if we have krefcount leak. mhm, I think you made me realize I'm overthinking this a bit now that I've seen the excerpt you mentioned above, along with the other excerpt about only the end devices being involved. The main reason I originally foresaw an issue with this is because the delay with updating the in-memory topology structure might put us slightly out of sync with the state of the hub on the other end - causing the hub to spit out an error. However - based on the excerpts you mentioned I think what I was seeing was mainly just the 2 second timeout causing things to be released properly - not specific behavior based on the location in the topology of the branch that was just unplugged like I originally assumed. I think in that case it probably does make more sense to go with your fix, so I'll likely drop this and rework the topology checks you had into this. > > Appreciate for your time and help Lyude! > no, thank you for your help! :) There aren't a whole ton of people who are this involved with MST so it's very useful to finally have another pair of eyes looking at all of this. > -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat