On Mon, Jan 23, 2023 at 09:17:49AM -0800, Bjorn Andersson wrote: > On Mon, Jan 23, 2023 at 05:01:45PM +0100, Johan Hovold wrote: > > On Tue, Jan 17, 2023 at 09:04:39AM +0100, Johan Hovold wrote: > > > On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote: > > > > Perhaps we have shuffled other things around to avoid this bug? Either > > > > way, let's this on hold until further proof that it's still > > > > reproducible. > > > > > > As I've mentioned off list, I haven't hit the apparent race I reported > > > here: > > > > > > https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/ > > > > > > since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it > > > could very well be that something has changes that fixes (or hides) the > > > issue since. > > > > For unrelated reasons, I tried enabling async probing, and apart from > > apparently causing the panel driver to probe defer indefinitely, I also > > again hit the WARN_ON() I had added to catch this: > > > > [ 13.593235] WARNING: CPU: 0 PID: 125 at drivers/gpu/drm/drm_probe_helper.c:664 drm_kms_helper_hotplug_event+0x48/0x7 > > 0 [drm_kms_helper] > > So the bug still appears to be there (and the MSM DRM driver is fragile > > and broken, but we knew that). > > > > But the ordering between mode_config.funcs = !NULL and > drm_kms_helper_poll_init() in msm_drm_init() seems pretty clear. > > And my testing shows that drm_kms_helper_poll_init() is the cause for > getting bridge->hpd_cb != NULL. > > So the ordering seems legit, unless there's something else causing the > assignment of bridge->hpd_cb to happen earlier in this scenario. I'm not saying that this patch is correct (indeed it doesn't seem to be), but only that the bug I reported still appears to be present in 6.2. Now that I actually looked at this again, I realise that the reason that haven't seen it with 6.2 is more likely due to the fact that I'm now making sure to load the panel driver before the drm driver to avoid that unnecessary probe deferral. With async probing, I get the probe deferral again, and boom, I hit the same old NULL deref. I see there's a call to drm_kms_helper_poll_fini() in msm_drm_uninit() which should stop the polling, but perhaps there's still some corner case due to the unexpected probe (or rather component bind) deferral which we're hitting. Johan