On 2020-07-23 5:10 p.m., Mazin Rezk wrote:
When amdgpu_dm_atomic_commit_tail is running in the workqueue,
drm_atomic_state_put will get called while amdgpu_dm_atomic_commit_tail is
running, causing a race condition where state (and then dm_state) is
sometimes freed while amdgpu_dm_atomic_commit_tail is running. This bug has
occurred since 5.7-rc1 and is well documented among polaris11 users [1].
Prior to 5.7, this was not a noticeable issue since the freelist pointer
was stored at the beginning of dm_state (base), which was unused. After
changing the freelist pointer to be stored in the middle of the struct, the
freelist pointer overwrote the context, causing dc_state to become garbage
data and made the call to dm_enable_per_frame_crtc_master_sync dereference
a freelist pointer.
This patch fixes the aforementioned issue by calling drm_atomic_state_get
in amdgpu_dm_atomic_commit before drm_atomic_helper_commit is called and
drm_atomic_state_put after amdgpu_dm_atomic_commit_tail is complete.
According to my testing on 5.8.0-rc6, this should fix bug 207383 on
Bugzilla [1].
[1] https://bugzilla.kernel.org/show_bug.cgi?id=207383
Fixes: 3202fa62f ("slub: relocate freelist pointer to middle of object")
Reported-by: Duncan <1i5t5.duncan@xxxxxxx>
Signed-off-by: Mazin Rezk <mnrzk@xxxxxxxxxxxxxx>
Thanks for the investigation and your patch. I appreciate the help in
trying to narrow down the root cause as this issue has been difficult to
reproduce on my setups.
Though I'm not sure this really resolves the issue - we make use of the
drm_atomic_helper_commit helper function from DRM which internally does
what you're doing with this patch:
drm_atomic_state_get(state);
if (nonblock)
queue_work(system_unbound_wq, &state->commit_work);
else
commit_tail(state);
So even when it gets queued off to the unbound workqueue we still have a
reference on the state.
That reference gets dropped as part of commit tail helper in DRM as well:
if (funcs && funcs->atomic_commit_tail)
funcs->atomic_commit_tail(old_state);
else
drm_atomic_helper_commit_tail(old_state);
commit_time_ms = ktime_ms_delta(ktime_get(), start);
if (commit_time_ms > 0)
drm_self_refresh_helper_update_avg_times(old_state,
(unsigned long)commit_time_ms,
new_self_refresh_mask);
drm_atomic_helper_commit_cleanup_done(old_state);
drm_atomic_state_put(old_state);
So instead of a use after free happening when we access the state we get
a double-free happening later at the end of commit tail in DRM.
What I think would be the right next step here is to actually determine
what sequence of IOCTLs and atomic commits are happening under your
setup with a very verbose dmesg log. You can set a debug level for DRM
in your kernel parameters with something like:
drm.debug=0x54
I don't see anything in amdgpu_dm.c that looks like it would be freeing
the state so I suspect something in the core is this doing this.
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 86ffa0c2880f..86d6652872f2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7303,6 +7303,7 @@ static int amdgpu_dm_atomic_commit(struct drm_device *dev,
* unset legacy_cursor_update
*/
+ drm_atomic_state_get(state);
Also note that if the drm_atomic_helper_commit() call fails here then
we're going to never free this structure. So we should really be
checking the return code here below before trying to do this, if at all.
Regards,
Nicholas Kazlauskas
return drm_atomic_helper_commit(dev, state, nonblock);
/*TODO Handle EINTR, reenable IRQ*/
@@ -7628,6 +7629,8 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
if (dc_state_temp)
dc_release_state(dc_state_temp);
+
+ drm_atomic_state_put(state);
}
--
2.27.0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx