On Tue, Jan 24, 2023 at 09:09:02AM +0100, Johan Hovold wrote: > On Mon, Jan 23, 2023 at 09:17:49AM -0800, Bjorn Andersson wrote: > > On Mon, Jan 23, 2023 at 05:01:45PM +0100, Johan Hovold wrote: > > > On Tue, Jan 17, 2023 at 09:04:39AM +0100, Johan Hovold wrote: > > > > On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote: > > > > > > Perhaps we have shuffled other things around to avoid this bug? Either > > > > > way, let's this on hold until further proof that it's still > > > > > reproducible. > > > > > > > > As I've mentioned off list, I haven't hit the apparent race I reported > > > > here: > > > > > > > > https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/ > > > > > > > > since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it > > > > could very well be that something has changes that fixes (or hides) the > > > > issue since. > > > > > > For unrelated reasons, I tried enabling async probing, and apart from > > > apparently causing the panel driver to probe defer indefinitely, I also > > > again hit the WARN_ON() I had added to catch this: > > > > > > [ 13.593235] WARNING: CPU: 0 PID: 125 at drivers/gpu/drm/drm_probe_helper.c:664 drm_kms_helper_hotplug_event+0x48/0x7 > > > 0 [drm_kms_helper] > > > > So the bug still appears to be there (and the MSM DRM driver is fragile > > > and broken, but we knew that). > > > > > > > But the ordering between mode_config.funcs = !NULL and > > drm_kms_helper_poll_init() in msm_drm_init() seems pretty clear. > > > > And my testing shows that drm_kms_helper_poll_init() is the cause for > > getting bridge->hpd_cb != NULL. > > > > So the ordering seems legit, unless there's something else causing the > > assignment of bridge->hpd_cb to happen earlier in this scenario. > > I'm not saying that this patch is correct (indeed it doesn't seem to > be), but only that the bug I reported still appears to be present in > 6.2. > > Now that I actually looked at this again, I realise that the reason that > haven't seen it with 6.2 is more likely due to the fact that I'm now > making sure to load the panel driver before the drm driver to avoid that > unnecessary probe deferral. > > With async probing, I get the probe deferral again, and boom, I hit the > same old NULL deref. > > I see there's a call to drm_kms_helper_poll_fini() in msm_drm_uninit() > which should stop the polling, but perhaps there's still some corner > case due to the unexpected probe (or rather component bind) deferral > which we're hitting. I guess the drm_kms_helper_poll_fini() bit is irrelevant here as the call comes from the pmic_glink_altmode_worker() and drm_bridge_hpd_notify(). Perhaps the pmic_glink altmode driver simply isn't notified that the drm device is gone again due to the late "probe" deferral or similar? Johan