On Thu, Oct 11, 2018 at 08:27:43PM -0400, sunpeng.li@xxxxxxx wrote: > From: Leo Li <sunpeng.li@xxxxxxx> > > This fixes a general protection fault, caused by accessing the contents > of a flip_done completion object that has already been freed. It occurs > due to the preemption of a non-blocking commit worker thread W by > another commit thread X. X continues to clear its atomic state at the > end, destroying the CRTC commit object that W still needs. Switching > back to W and accessing the commit objects then leads to bad results. > > Worker W becomes preemptable when waiting for flip_done to complete. At > this point, a frequently occurring commit thread X can take over. Here's > an example where W is a worker thread that flips on both CRTCs, and X > does a legacy cursor update on both CRTCs: > > ... > 1. W does flip work > 2. W runs commit_hw_done() > 3. W waits for flip_done on CRTC 1 > 4. > flip_done for CRTC 1 completes > 5. W finishes waiting for CRTC 1 > 6. W waits for flip_done on CRTC 2 > > 7. > Preempted by X > 8. > flip_done for CRTC 2 completes > 9. X atomic_check: hw_done and flip_done are complete on all CRTCs > 10. X updates cursor on both CRTCs > 11. X destroys atomic state > 12. X done > > 13. > Switch back to W > 14. W waits for flip_done on CRTC 2 > 15. W raises general protection fault > > The error looks like so: > > general protection fault: 0000 [#1] PREEMPT SMP PTI > **snip** > Call Trace: > lock_acquire+0xa2/0x1b0 > _raw_spin_lock_irq+0x39/0x70 > wait_for_completion_timeout+0x31/0x130 > drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper] > amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu] > commit_tail+0x3d/0x70 [drm_kms_helper] > process_one_work+0x212/0x650 > worker_thread+0x49/0x420 > kthread+0xfb/0x130 > ret_from_fork+0x3a/0x50 > Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O) > gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt > fb_sys_fops ttm(O) drm(O) > > Note that i915 has this issue masked, since hw_done is signaled after > waiting for flip_done. Doing so will block the cursor update from > happening until hw_done is signaled, preventing the cursor commit from > destroying the state. > > Signed-off-by: Leo Li <sunpeng.li@xxxxxxx> Great analysis! Bugfix itself needs a bit more work though, see below. > --- > drivers/gpu/drm/drm_atomic_helper.c | 30 ++++++++++++++++++++++++++---- > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > index 80be74d..efdf043 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1410,20 +1410,42 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev, > { > struct drm_crtc_state *new_crtc_state; > struct drm_crtc *crtc; > + struct drm_crtc_commit **commits; > int i; > + int num_crtc = dev->mode_config.num_crtc; > > + commits = kcalloc(num_crtc, sizeof(*commits), GFP_KERNEL); > + > + /* > + * Because we open ourselves to preemption by calling > + * wait_for_completion_timeout(), we need to get our own references to > + * the commit objects. This is so that a preempting commit won't free > + * them. > + */ The scheduler can preempt you at any point before here too, if you enable CONFIG_PREEMPT. It definitely can preempt/block in the kcalloc above, even if you have CONFIG_PREEMPT disabled. The scheduling point you identified below is simply the most likely. The underlying bug is that after drm_atomic_helper_commit_hw_done() we unblock the next commit worker (thread X in your example), and cannot assume anymore that the new state will stay around (since new state is released by the subsequent commit work run in thread X). Therefore your drm_crtc_commit_get here already can access freed memory. The correct fix here is to store the temporary reference before we call drm_atomic_helper_commit_hw_done(), and then use that in this function here. The problem is that this is somewhat tricky to pull off, since some drivers call wait_for_flip_done() before hw_done() (like i915). Here's what I'd do: - Add a new drm_crtc_commit pointer to struct __drm_crtcs_state. - Fill that out in drm_atomic_helper_setup_commit, and make sure you release the reference for it in drm_atomic_state_default_clear(). - Use that pointer instead in drm_atomic_helper_wait_for_flip_done() here. Cheers, Daniel > for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) { > - struct drm_crtc_commit *commit = new_crtc_state->commit; > + commits[i] = new_crtc_state->commit; > + if (commits[i]) > + drm_crtc_commit_get(commits[i]); > + } > + > + for (i = 0; i < num_crtc; i++) { > int ret; > > - if (!commit) > + if (!commits[i]) > continue; > > - ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ); > + ret = wait_for_completion_timeout(&commits[i]->flip_done, > + 10 * HZ); > if (ret == 0) > DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", > - crtc->base.id, crtc->name); > + commits[i]->crtc->base.id, > + commits[i]->crtc->name); > } > + > + for (i = 0; i < num_crtc; i++) > + if (commits[i]) > + drm_crtc_commit_put(commits[i]); > + kfree(commits); > } > EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); > > -- > 2.7.4 > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel