Hi Liviu,
On 9/30/22 18:56, Liviu Dudau wrote:
On Wed, Sep 14, 2022 at 12:03:58AM +0200, Danilo Krummrich wrote:
Do you mind trying again with my v2 (although v2 shouldn't make a difference
for this issue) and provide the back-trace when it hangs?
Hi Danilo,
I've finally got a replacement Juno board that it is stable enough.
That's great!
I've tried your v2 on top of 7860d720a84c ("drm/msm: Fix build break with recent mm tree") which
is the head of drm-next today and rmmod hangs. /proc/<pid_of_rmmod>/stack shows:
Thanks for taking the time to test the patches and providing this
stacktrace.
[<0>] __synchronize_srcu.part.0+0x78/0xec
[<0>] synchronize_srcu+0xe0/0x134
[<0>] drm_dev_unplug+0x2c/0x60 [drm]
[<0>] hdlcd_drm_unbind+0x20/0xc0 [hdlcd]
[<0>] component_master_del+0xa4/0xc0
[<0>] hdlcd_remove+0x1c/0x2c [hdlcd]
[<0>] platform_remove+0x28/0x60
[<0>] device_remove+0x4c/0x80
[<0>] device_release_driver_internal+0x1e4/0x250
[<0>] driver_detach+0x50/0xe0
[<0>] bus_remove_driver+0x5c/0xbc
[<0>] driver_unregister+0x30/0x60
[<0>] platform_driver_unregister+0x14/0x20
[<0>] hdlcd_platform_driver_exit+0x1c/0xe40 [hdlcd]
[<0>] __arm64_sys_delete_module+0x18c/0x240
[<0>] invoke_syscall+0x48/0x114
[<0>] el0_svc_common.constprop.0+0xcc/0xec
[<0>] do_el0_svc+0x2c/0xc0
[<0>] el0_svc+0x2c/0x84
[<0>] el0t_64_sync_handler+0x11c/0x150
[<0>] el0t_64_sync+0x18c/0x190
I think I figured it out. I messed up two of the srcu read-side critical
sections by overlooking alternate return paths within those sections -
yikes!
I also found a potential use-after-free I accidentally introduced while
removing the patch in v2.
Finally, I added a patch to remove unnecessary calls to
drm_mode_config_cleanup() and replaced drm_mode_config_init() with
drmm_mode_config_init().
I will send you a v3 containing those fixes.
My quick guess would be that the mixing of managed and unmanaged APIs manages to
confuse the sleepable RCUs and we get the hang.
I guess you're referring to drm_crtc_init_with_planes() and providing a
.destroy callback in hdlcd_crtc_funcs. Actually, this .destroy callback
is handled in the same way as when initializing the crtc with
drmm_crtc_init_with_planes(), hence I don't think we have a mix of
managed and unmanaged APIs here.
drmm_crtc_init_with_planes() just calls __drm_crtc_init_with_planes()
and adds the cleanup via drmm_add_action_or_reset().
Having a .destroy callback registered via drm_crtc_init_with_planes()
ends up the same way, since drm_mode_config_init() simply calls
drmm_mode_config_init(), which also registers drm_mode_config_cleanup()
(ultimately calling the crtc->destroy callback) via
drmm_add_action_or_reset().
- Danilo
Will chat with Daniel Vetter next
week at XDC on what would be the best approach here.
Best regards,
Liviu