On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote: > On Fri, Jan 13, 2023 at 10:57:18AM +0200, Dmitry Baryshkov wrote: > > On 13/01/2023 06:23, Dmitry Baryshkov wrote: > > > On 13/01/2023 06:10, Bjorn Andersson wrote: > > > > Invoking drm_bridge_hpd_notify() on a drm_bridge with a HPD-enabled > > > > bridge_connector ends up in drm_bridge_connector_hpd_cb() calling > > > > drm_kms_helper_hotplug_event(), which assumes that the associated > > > > drm_device's mode_config.funcs is a valid pointer. > > > > > > > > But in the MSM DisplayPort driver the HPD enablement happens at bind > > > > time and mode_config.funcs is initialized late in msm_drm_init(). This > > > > means that there's a window for hot plug events to dereference a NULL > > > > mode_config.funcs. > > > > > > > > Move the assignment of mode_config.funcs before the bind, to avoid this > > > > scenario. > > > > > > Cam we make DP driver not to report HPD events until the enable_hpd() > > > was called? I think this is what was fixed by your internal_hpd > > > patchset. > > > > Or to express this in another words: I thought that internal_hpd already > > deferred enabling hpd event reporting till the time when we need it, didn't > > it? > > > > I added a WARN_ON(1) in drm_bridge_hpd_enable() to get a sense of when > this window of "opportunity" opens up, and here's the callstack: > > ------------[ cut here ]------------ > WARNING: CPU: 6 PID: 99 at drivers/gpu/drm/drm_bridge.c:1260 drm_bridge_hpd_enable+0x48/0x94 [drm] > ... > Call trace: > drm_bridge_hpd_enable+0x48/0x94 [drm] > drm_bridge_connector_enable_hpd+0x30/0x3c [drm_kms_helper] > drm_kms_helper_poll_enable+0xa4/0x114 [drm_kms_helper] > drm_kms_helper_poll_init+0x6c/0x7c [drm_kms_helper] > msm_drm_bind+0x370/0x628 [msm] > try_to_bring_up_aggregate_device+0x170/0x1bc > __component_add+0xb0/0x168 > component_add+0x20/0x2c > dp_display_probe+0x40c/0x468 [msm] > platform_probe+0xb4/0xdc > really_probe+0x13c/0x300 > __driver_probe_device+0xc0/0xec > driver_probe_device+0x48/0x204 > __device_attach_driver+0x124/0x14c > bus_for_each_drv+0x90/0xdc > __device_attach+0xdc/0x1a8 > device_initial_probe+0x20/0x2c > bus_probe_device+0x40/0xa4 > deferred_probe_work_func+0x94/0xd0 > process_one_work+0x1a8/0x3c0 > worker_thread+0x254/0x47c > kthread+0xf8/0x1b8 > ret_from_fork+0x10/0x20 > ---[ end trace 0000000000000000 ]--- > > As drm_kms_helper_poll_init() is the last thing being called in > msm_drm_init() shifting around the mode_config.func assignment would not > have any impact. > > Perhaps we have shuffled other things around to avoid this bug? Either > way, let's this on hold until further proof that it's still > reproducible. As I've mentioned off list, I haven't hit the apparent race I reported here: https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/ since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it could very well be that something has changes that fixes (or hides) the issue since. Johan