On 2020-02-27 6:56 a.m., Huang Rui wrote: > On Thu, Feb 27, 2020 at 06:39:03AM +0800, Tuikov, Luben wrote: >> Since commit "Move to a per-IB secure flag (TMZ)", >> we've been seeing hangs in GFX. Ray H. pointed out >> by sending a patch that we need to send FRAME >> CONTROL stop/start back-to-back, every time we >> flip the TMZ flag as per each IB we submit. That >> is, when we transition from TMZ to non-TMZ we have >> to send a stop with TMZ followed by a start with >> non-TMZ, and similarly for transitioning from >> non-TMZ into TMZ. >> >> This patch implements this, thus fixing the GFX >> hang. >> >> Signed-off-by: Luben Tuikov <luben.tuikov@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 87 +++++++++++++++++------- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +- >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 15 ++-- >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 ++-- >> 4 files changed, 79 insertions(+), 41 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> index 4b2342d11520..16d6df3304d3 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> @@ -216,40 +216,75 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, >> amdgpu_ring_emit_cntxcntl(ring, status); >> } >> >> - secure = false; >> + /* Find the first non-preamble IB. >> + */ >> for (i = 0; i < num_ibs; ++i) { >> ib = &ibs[i]; >> >> /* drop preamble IBs if we don't have a context switch */ >> - if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && >> - skip_preamble && >> - !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && >> - !amdgpu_mcbp && >> - !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ >> - continue; >> - >> - /* If this IB is TMZ, add frame TMZ start packet, >> - * else, turn off TMZ. >> - */ >> - if (ib->flags & AMDGPU_IB_FLAGS_SECURE && ring->funcs->emit_tmz) { >> - if (!secure) { >> - secure = true; >> - amdgpu_ring_emit_tmz(ring, true); >> - } >> - } else if (secure) { >> + if (!(ib->flags & AMDGPU_IB_FLAG_PREAMBLE) || >> + !skip_preamble || >> + (status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) || >> + amdgpu_mcbp || >> + amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ >> + break; >> + } >> + if (i >= num_ibs) >> + goto Done; >> + /* Setup initial TMZiness and send it off. >> + */ >> + secure = false; >> + if (job && ring->funcs->emit_frame_cntl) { >> + if (ib->flags & AMDGPU_IB_FLAGS_SECURE) >> + secure = true; >> + else >> secure = false; >> - amdgpu_ring_emit_tmz(ring, false); >> - } >> - >> - amdgpu_ring_emit_ib(ring, job, ib, status); >> - status &= ~AMDGPU_HAVE_CTX_SWITCH; >> + amdgpu_ring_emit_frame_cntl(ring, true, secure); >> } >> + amdgpu_ring_emit_ib(ring, job, ib, status); >> + status &= ~AMDGPU_HAVE_CTX_SWITCH; >> + i += 1; >> + /* Send the rest of the IBs. >> + */ >> + if (job && ring->funcs->emit_frame_cntl) { >> + for ( ; i < num_ibs; ++i) { >> + ib = &ibs[i]; >> + >> + /* drop preamble IBs if we don't have a context switch */ >> + if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && >> + skip_preamble && >> + !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && >> + !amdgpu_mcbp && >> + !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ >> + continue; > > Snip. > >> + >> + if (!!secure ^ !!(ib->flags & AMDGPU_IB_FLAGS_SECURE)) { >> + amdgpu_ring_emit_frame_cntl(ring, false, secure); >> + secure = !secure; >> + amdgpu_ring_emit_frame_cntl(ring, true, secure); >> + } > > That's pretty good optimization! I spend quit a few time to understand this. I know. I know you did. It's called experience. When I saw you v1, it was a cringe. Seriously? > >> >> - if (secure) { >> - secure = false; >> - amdgpu_ring_emit_tmz(ring, false); >> + amdgpu_ring_emit_ib(ring, job, ib, status); >> + status &= ~AMDGPU_HAVE_CTX_SWITCH; >> + } >> + amdgpu_ring_emit_frame_cntl(ring, false, secure); >> + } else { >> + for ( ; i < num_ibs; ++i) { >> + ib = &ibs[i]; >> + >> + /* drop preamble IBs if we don't have a context switch */ >> + if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && >> + skip_preamble && >> + !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && >> + !amdgpu_mcbp && >> + !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ >> + continue; >> + >> + amdgpu_ring_emit_ib(ring, job, ib, status); >> + status &= ~AMDGPU_HAVE_CTX_SWITCH; > > To pull the checking "job && ring->funcs->emit_frame_cntl" out of the loop, > that make the implemenation more duplicated like below, we have to write Yes, that is exactly what we want. As I explained before and will explain again, and again, and again, "job && ring->funcs->emit_frame_cntl" is static to the body of the loop, and as such can be pulled OUT of the loop and it should. This is a *formulaic* optimization exercise. Compilers and optimizers do it first. To extrapolate, consider that what you'd eventually have is a HUGE long long loop and everything inside it. Not good. Regards, Luben > the same codes multiple times. I hope the implementation is more simple and > readable. Please see my V2 patch that I just send out. We can try to use > minimum changes to fix the issue. > > for ( ; i < num_ibs; ++i) { > ib = &ibs[i]; > > /* drop preamble IBs if we don't have a context switch */ > if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && > skip_preamble && > !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && > !amdgpu_mcbp && > !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ > continue; > > ... > amdgpu_ring_emit_ib(ring, job, ib, status); > > Thanks, > Ray > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx