On Sun Sep 8, 2024 at 4:23 AM PDT, Tobias Jakobi wrote: > On 9/8/24 09:35, Christopher Snowhill wrote: > > > On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: > >> From: Tobias Jakobi <tjakobi@xxxxxxxxxxxxxxxxxxxxx> > >> > >> Hello, > >> > >> this fixes a nasty race condition in the set_drr() callbacks for DCN10 > >> and DCN35 that has existed now since quite some time, see this GitLab > >> issue for reference. > >> > >> https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > >> > >> The report just focuses von DCN10, but the same problem also exists in > >> the DCN35 code. > > Does the problem not exist in the following references to funcs->set_drr? > > > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > Maybe. But the big difference I see here, is that in this code there > isn't even any kind of NULL check applied to tg. Or most of the members > of *pipe_ctx. If there really is the same kind of problem here, then one > would need to rewrite a bit more code to fix stuff. > > E.g. in the case of dcn31_hwseq.c, the questionable code is in > dcn31_reset_back_end_for_pipe(), which is static and only called from > dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap > callback. And this specific callback, from what I can see, is only > called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the > .apply_ctx_to_hw callback. The callback is only called from > dc_commit_state_no_check(). That one is static again, and called from > dc_commit_streams(). > > I could trace this even further. My point is: I don't think this is > called from any IRQ handler code. And given the depth and complexity of > the callgraph, I have to admit, that, at least at this point, this is a > bit over my head. > > Sure, I could now sprinkle a bunch of x != NULL in the code, but that > would be merely voodoo. And I usually try to have a theoretical basis > when I apply changes to code. > > Maybe if someone from the AMD display team could give some insight if > there still is potentially vulnerable code in some of the instances that > Christopher has posted, then I would gladly take a look. Sorry, I was taking a note from someone else who mentioned set_drr functions, and wasn't aware that none of the other implementations happen to be called from IRQ handlers. Thanks for looking into this. -Christopher > With best wishes, > Tobias > > > > >> With best wishes, > >> Tobias > >> > >> Tobias Jakobi (2): > >> drm/amd/display: Avoid race between dcn10_set_drr() and > >> dc_state_destruct() > >> drm/amd/display: Avoid race between dcn35_set_drr() and > >> dc_state_destruct() > >> > >> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- > >> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- > >> 2 files changed, 24 insertions(+), 16 deletions(-)