On Tue, May 2, 2017 at 5:01 AM, Daniel Vetter <daniel@xxxxxxxx> wrote: > On Fri, Apr 28, 2017 at 8:05 PM, Rob Clark <robdclark@xxxxxxxxx> wrote: >> The ->preclose() hook is a good place to block for pending atomic >> updates. We can't do this in ->postclose(), as it needs to happen >> before drm_fb_release(). Otherwise, since we have already swapped >> state (in the case of a non-blocking atomic update), this means that >> the plane_state->fb will be released and cleared before we wait for >> fences from the atomic-commit wq. >> >> There are probably more complex solutions possible. But since already >> scheduled atomic update, possibly blocking on already scheduled gpu/etc >> fences, will complete eventually (assuming nothing catches fire), so >> the sanest thing seems to be just block until already scheduled atomic >> updates complete before tearing things down. >> >> Fixes: >> >> WARNING: CPU: 1 PID: 69 at ../drivers/gpu/drm/drm_atomic_helper.c:1061 drm_atomic_helper_wait_for_fences+0xe0/0xf8 >> Modules linked in: >> >> CPU: 1 PID: 69 Comm: kworker/1:1 Tainted: G W 4.11.0-rc8+ #1187 >> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >> Workqueue: events drm_mode_rmfb_work_fn >> task: ffffffc036560d00 task.stack: ffffffc036550000 >> PC is at drm_atomic_helper_wait_for_fences+0xe0/0xf8 >> LR is at complete_commit.isra.1+0x44/0x1c0 >> pc : [<ffffff80084f6040>] lr : [<ffffff800854176c>] pstate: 20000145 >> sp : ffffffc036553b60 >> x29: ffffffc036553b60 x28: ffffffc0264e6a00 >> x27: ffffffc035659000 x26: 0000000000000000 >> x25: ffffffc0240e8000 x24: 0000000000000038 >> x23: 0000000000000000 x22: ffffff800858f200 >> x21: ffffffc0240e8000 x20: ffffffc02f56a800 >> x19: 0000000000000000 x18: 0000000000000000 >> x17: 0000000000000000 x16: 0000000000000000 >> x15: 0000000000000000 x14: ffffffc00a192700 >> x13: 0000000000000004 x12: 0000000000000000 >> x11: ffffff80089a1690 x10: 00000000000008f0 >> x9 : ffffffc036553b20 x8 : ffffffc036561650 >> x7 : ffffffc03fe6cb40 x6 : 0000000000000000 >> x5 : 0000000000000001 x4 : 0000000000000002 >> x3 : ffffffc035659000 x2 : ffffffc0240e8c80 >> x1 : 0000000000000000 x0 : ffffffc02adbe588 >> >> ---[ end trace 13aeec77c3fb55e2 ]--- >> Call trace: >> Exception stack(0xffffffc036553990 to 0xffffffc036553ac0) >> 3980: 0000000000000000 0000008000000000 >> 39a0: ffffffc036553b60 ffffff80084f6040 0000000000004ff0 0000000000000038 >> 39c0: ffffffc0365539d0 ffffff800857e098 ffffffc036553a00 ffffff800857e1b0 >> 39e0: ffffffc036553a10 ffffff800857c554 ffffffc0365e8400 ffffffc0365e8400 >> 3a00: ffffffc036553a20 ffffff8008103358 000000000001aad7 ffffff800851b72c >> 3a20: ffffffc036553a50 ffffff80080e9228 ffffffc02adbe588 0000000000000000 >> 3a40: ffffffc0240e8c80 ffffffc035659000 0000000000000002 0000000000000001 >> 3a60: 0000000000000000 ffffffc03fe6cb40 ffffffc036561650 ffffffc036553b20 >> 3a80: 00000000000008f0 ffffff80089a1690 0000000000000000 0000000000000004 >> 3aa0: ffffffc00a192700 0000000000000000 0000000000000000 0000000000000000 >> [<ffffff80084f6040>] drm_atomic_helper_wait_for_fences+0xe0/0xf8 >> [<ffffff800854176c>] complete_commit.isra.1+0x44/0x1c0 >> [<ffffff8008541c64>] msm_atomic_commit+0x32c/0x350 >> [<ffffff8008516230>] drm_atomic_commit+0x50/0x60 >> [<ffffff8008517548>] drm_atomic_remove_fb+0x158/0x250 >> [<ffffff80085186d0>] drm_framebuffer_remove+0x50/0x158 >> [<ffffff8008518818>] drm_mode_rmfb_work_fn+0x40/0x58 >> [<ffffff80080d5668>] process_one_work+0x1d0/0x378 >> [<ffffff80080d5a54>] worker_thread+0x244/0x488 >> [<ffffff80080db7fc>] kthread+0xfc/0x128 >> [<ffffff8008082ec0>] ret_from_fork+0x10/0x50 >> >> Reported-by: Stanimir Varbanov <stanimir.varbanov@xxxxxxxxxx> >> Signed-off-by: Rob Clark <robdclark@xxxxxxxxx> >> --- >> The hunk that removes the comment about ->preclose() included in this >> patch to challenge the assumption that ->preclose() shouldn't exist ;-) > > And I'm going to challenge your patch here. Both fences and > framebuffers and atomic commits are refcounted. If you go boom on them > when userspace closes the fd, you have a refcount bug. We don't fix > those by flusing stuff :-) So, it isn't a refcount'ing but, but something much funnier.. It seems that mdp5 had custom plane state with it's own dup_state fxn, pre-dating the addition of __drm_atomic_helper_plane_duplicate_state(), and when the helper was introduced it wasn't retrofitted. Which was all good until the fence ptr is added to base plane_state struct. So this means that plane_state->fence was getting copied over into the duplicated plane_state. So the atomic rmfb code would sometimes manage to copy the fence ptr if there is another pending update which had already swapped state but not yet committed. BR, -R > Please add a pair of get/put() calls at the right place instead. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel