On Wed, Mar 01, 2023 at 02:58:50PM +0100, Johan Hovold wrote: > On Tue, Jan 24, 2023 at 09:09:02AM +0100, Johan Hovold wrote: > > On Mon, Jan 23, 2023 at 09:17:49AM -0800, Bjorn Andersson wrote: > > > On Mon, Jan 23, 2023 at 05:01:45PM +0100, Johan Hovold wrote: > > > > On Tue, Jan 17, 2023 at 09:04:39AM +0100, Johan Hovold wrote: > > > > > On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote: > > > > > > > > Perhaps we have shuffled other things around to avoid this bug? Either > > > > > > way, let's this on hold until further proof that it's still > > > > > > reproducible. > > > > > > > > > > As I've mentioned off list, I haven't hit the apparent race I reported > > > > > here: > > > > > > > > > > https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/ > > > > > > > > > > since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it > > > > > could very well be that something has changes that fixes (or hides) the > > > > > issue since. > > > > > > > > For unrelated reasons, I tried enabling async probing, and apart from > > > > apparently causing the panel driver to probe defer indefinitely, I also > > > > again hit the WARN_ON() I had added to catch this: > > > > > > > > [ 13.593235] WARNING: CPU: 0 PID: 125 at drivers/gpu/drm/drm_probe_helper.c:664 drm_kms_helper_hotplug_event+0x48/0x7 > > > > 0 [drm_kms_helper] > > > > > > So the bug still appears to be there (and the MSM DRM driver is fragile > > > > and broken, but we knew that). > > > > > > > > > > But the ordering between mode_config.funcs = !NULL and > > > drm_kms_helper_poll_init() in msm_drm_init() seems pretty clear. > > > > > > And my testing shows that drm_kms_helper_poll_init() is the cause for > > > getting bridge->hpd_cb != NULL. > > > > > > So the ordering seems legit, unless there's something else causing the > > > assignment of bridge->hpd_cb to happen earlier in this scenario. > > > > I'm not saying that this patch is correct (indeed it doesn't seem to > > be), but only that the bug I reported still appears to be present in > > 6.2. > > So after debugging this issue a third time, I can conclude that it is > still very much present in 6.2. > > It appears you looked at the linux-next tree when you concluded that > this patch was not needed. In 6.2 the bridge->hpd_cb callback is set > before mode_config.funcs is initialised as part of > kms->funcs->hw_init(kms). > > The hpd DRM changes heading into 6.3 do appear to avoid the NULL-pointer > dereference by moving the bridge->hpd_cb initialisation to > drm_kms_helper_poll_init() as you mention above. > > The PMIC GLINK altmode driver still happily forwards notifications > regardless of the DRM driver state though, which can lead to missed > hotplug events. It seems you need to implement the > hpd_enable()/disable() callbacks and either cache or not enable events > in fw until the DRM driver is ready. > It's not clear to me what the expectation from the DRM framework is on this point. We register a drm_bridge which is only capable of signaling HPD events (DRM_BRIDGE_OP_HPD), not querying HPD state (DRM_BRIDGE_OP_DETECT). Does this imply that any such bridge must ensure that hpd events are re-delivered once hpd_enable() has been invoked (we can't invoke it from hpd_enable...)? Is it reasonable to do this retriggering in the altmode driver? Or is it the job of the TCPM (it seems reasonable to not send the PAN_EN message until we get hpd_enable()...)? Regards, Bjorn