On Thursday, July 23, 2020 6:32 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On Thu, Jul 23, 2020 at 09:10:15PM +0000, Mazin Rezk wrote: > > > When amdgpu_dm_atomic_commit_tail is running in the workqueue, > > drm_atomic_state_put will get called while amdgpu_dm_atomic_commit_tail is > > running, causing a race condition where state (and then dm_state) is > > sometimes freed while amdgpu_dm_atomic_commit_tail is running. This bug has > > occurred since 5.7-rc1 and is well documented among polaris11 users [1]. > > Prior to 5.7, this was not a noticeable issue since the freelist pointer > > was stored at the beginning of dm_state (base), which was unused. After > > changing the freelist pointer to be stored in the middle of the struct, the > > freelist pointer overwrote the context, causing dc_state to become garbage > > data and made the call to dm_enable_per_frame_crtc_master_sync dereference > > a freelist pointer. > > This patch fixes the aforementioned issue by calling drm_atomic_state_get > > in amdgpu_dm_atomic_commit before drm_atomic_helper_commit is called and > > drm_atomic_state_put after amdgpu_dm_atomic_commit_tail is complete. > > According to my testing on 5.8.0-rc6, this should fix bug 207383 on > > Bugzilla [1]. > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=207383 > > Nice work tracking this down! > > > Fixes: 3202fa62f ("slub: relocate freelist pointer to middle of object") > > I do, however, object to this Fixes tag. :) The flaw appears to have > been with amdgpu_dm's reference tracking of "state" in the nonblocking > case. (How this reference counting is supposed to work correctly, though, > I'm not sure.) If I look at where the drm helper was split from being > the default callback, it looks like this was what introduced the bug: > > da5c47f682ab ("drm/amd/display: Remove acrtc->stream") > > ? 3202fa62f certainly exposed it much more quickly, but there was a race > even without 3202fa62f where something could have realloced the memory > and written over it. > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Kees Cook Thanks, I'll be sure to avoid using 3202fa62f as the cause next time. I just thought to do that because it was what made the use-after-free cause a noticeable bug. Also, by the way, I just realised the patch didn't completely solve the bug. Sorry about that, making an LKML thread on this was hasty on my part. Should I get further confirmation from the Bugzilla thread before submitting a patch for this bug in the future? Thanks, Mazin Rezk _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx