On Monday, July 27, 2020 1:40 AM, Mazin Rezk <mnrzk@xxxxxxxxxxxxxx> wrote: > This patch fixes a race condition that causes a use-after-free during > amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits > are requested and the second one finishes before the first. Essentially, > this bug occurs when the following sequence of events happens: > > 1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is > deferred to the workqueue. > > 2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is > deferred to the workqueue. > > 3. Commit #2 starts before commit #1, dm_state #1 is used in the > commit_tail and commit #2 completes, freeing dm_state #1. > > 4. Commit #1 starts after commit #2 completes, uses the freed dm_state > 1 and dereferences a freelist pointer while setting the context. > > Since this bug has only been spotted with fast commits, this patch fixes > the bug by clearing the dm_state instead of using the old dc_state for > fast updates. In addition, since dm_state is only used for its dc_state > and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found, > removing the dm_state should not have any consequences in fast updates. > > This use-after-free bug has existed for a while now, but only caused a > noticeable issue starting from 5.7-rc1 due to 3202fa62f ("slub: relocate > freelist pointer to middle of object") moving the freelist pointer from > dm_state->base (which was unused) to dm_state->context (which is > dereferenced). > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383 > Fixes: bd200d190f45 ("drm/amd/display: Don't replace the dc_state for fast updates") > Reported-by: Duncan <1i5t5.duncan@xxxxxxx> > Signed-off-by: Mazin Rezk <mnrzk@xxxxxxxxxxxxxx> > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ++++++++++++++----- > 1 file changed, 27 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 86ffa0c2880f..710edc70e37e 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -8717,20 +8717,38 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev, > * the same resource. If we have a new DC context as part of > * the DM atomic state from validation we need to free it and > * retain the existing one instead. > + * > + * Furthermore, since the DM atomic state only contains the DC > + * context and can safely be annulled, we can free the state > + * and clear the associated private object now to free > + * some memory and avoid a possible use-after-free later. > */ > - struct dm_atomic_state *new_dm_state, *old_dm_state; > > - new_dm_state = dm_atomic_get_new_state(state); > - old_dm_state = dm_atomic_get_old_state(state); > + for (i = 0; i < state->num_private_objs; i++) { > + struct drm_private_obj *obj = state->private_objs[i].ptr; > > - if (new_dm_state && old_dm_state) { > - if (new_dm_state->context) > - dc_release_state(new_dm_state->context); > + if (obj->funcs == adev->dm.atomic_obj.funcs) { > + int j = state->num_private_objs-1; > > - new_dm_state->context = old_dm_state->context; > + dm_atomic_destroy_state(obj, > + state->private_objs[i].state); > + > + /* If i is not at the end of the array then the > + * last element needs to be moved to where i was > + * before the array can safely be truncated. > + */ > + if (i != j) > + state->private_objs[i] = > + state->private_objs[j]; > > - if (old_dm_state->context) > - dc_retain_state(old_dm_state->context); > + state->private_objs[j].ptr = NULL; > + state->private_objs[j].state = NULL; > + state->private_objs[j].old_state = NULL; > + state->private_objs[j].new_state = NULL; > + > + state->num_private_objs = j; > + break; > + } > } > } > > -- > 2.27.0 > I have tested this on 5.8.0-rc6 w/ an RX 480 for 8 hours and I have not had the crash described in the Bugzilla thread. I will also be running this patch on my kernel for the next couple of days to further confirm that this is working. In addition, I will ask the users in the Bugzilla thread to test this patch and confirm that it works. My previous attempt to patch this bug ("amdgpu_dm: fix nonblocking atomic commit use-after-free") [1] was unsuccessful and this patch is meant to replace it. That said, this isn't necessarily a v2 of that patch, since this patch works very differently from the other patch. [1] https://lkml.org/lkml/2020/7/23/1123 Thanks, Mazin Rezk _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx