On Thu, Feb 27, 2020 at 4:15 PM Luben Tuikov <luben.tuikov@xxxxxxx> wrote: > > On 2020-02-27 6:56 a.m., Huang Rui wrote: > > On Thu, Feb 27, 2020 at 06:39:03AM +0800, Tuikov, Luben wrote: > >> Since commit "Move to a per-IB secure flag (TMZ)", > >> we've been seeing hangs in GFX. Ray H. pointed out > >> by sending a patch that we need to send FRAME > >> CONTROL stop/start back-to-back, every time we > >> flip the TMZ flag as per each IB we submit. That > >> is, when we transition from TMZ to non-TMZ we have > >> to send a stop with TMZ followed by a start with > >> non-TMZ, and similarly for transitioning from > >> non-TMZ into TMZ. > >> > >> This patch implements this, thus fixing the GFX > >> hang. > >> > >> Signed-off-by: Luben Tuikov <luben.tuikov@xxxxxxx> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 87 +++++++++++++++++------- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +- > >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 15 ++-- > >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 ++-- > >> 4 files changed, 79 insertions(+), 41 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >> index 4b2342d11520..16d6df3304d3 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >> @@ -216,40 +216,75 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, > >> amdgpu_ring_emit_cntxcntl(ring, status); > >> } > >> > >> - secure = false; > >> + /* Find the first non-preamble IB. > >> + */ > >> for (i = 0; i < num_ibs; ++i) { > >> ib = &ibs[i]; > >> > >> /* drop preamble IBs if we don't have a context switch */ > >> - if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && > >> - skip_preamble && > >> - !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && > >> - !amdgpu_mcbp && > >> - !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ > >> - continue; > >> - > >> - /* If this IB is TMZ, add frame TMZ start packet, > >> - * else, turn off TMZ. > >> - */ > >> - if (ib->flags & AMDGPU_IB_FLAGS_SECURE && ring->funcs->emit_tmz) { > >> - if (!secure) { > >> - secure = true; > >> - amdgpu_ring_emit_tmz(ring, true); > >> - } > >> - } else if (secure) { > >> + if (!(ib->flags & AMDGPU_IB_FLAG_PREAMBLE) || > >> + !skip_preamble || > >> + (status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) || > >> + amdgpu_mcbp || > >> + amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ > >> + break; > >> + } > >> + if (i >= num_ibs) > >> + goto Done; > >> + /* Setup initial TMZiness and send it off. > >> + */ > >> + secure = false; > >> + if (job && ring->funcs->emit_frame_cntl) { > >> + if (ib->flags & AMDGPU_IB_FLAGS_SECURE) > >> + secure = true; > >> + else > >> secure = false; > >> - amdgpu_ring_emit_tmz(ring, false); > >> - } > >> - > >> - amdgpu_ring_emit_ib(ring, job, ib, status); > >> - status &= ~AMDGPU_HAVE_CTX_SWITCH; > >> + amdgpu_ring_emit_frame_cntl(ring, true, secure); > >> } > >> + amdgpu_ring_emit_ib(ring, job, ib, status); > >> + status &= ~AMDGPU_HAVE_CTX_SWITCH; > >> + i += 1; > >> + /* Send the rest of the IBs. > >> + */ > >> + if (job && ring->funcs->emit_frame_cntl) { > >> + for ( ; i < num_ibs; ++i) { > >> + ib = &ibs[i]; > >> + > >> + /* drop preamble IBs if we don't have a context switch */ > >> + if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && > >> + skip_preamble && > >> + !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && > >> + !amdgpu_mcbp && > >> + !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ > >> + continue; > > > > Snip. > > > >> + > >> + if (!!secure ^ !!(ib->flags & AMDGPU_IB_FLAGS_SECURE)) { > >> + amdgpu_ring_emit_frame_cntl(ring, false, secure); > >> + secure = !secure; > >> + amdgpu_ring_emit_frame_cntl(ring, true, secure); > >> + } > > > > That's pretty good optimization! I spend quit a few time to understand this. > > I know. I know you did. It's called experience. > > When I saw you v1, it was a cringe. Seriously? It may be a good optimization but if it's hard to understand it makes the code harder to read and comprehend, that can lead to misinterpretation and maintenance burdens. It took me a while to understand what was intended here. Alex > > > > >> > >> - if (secure) { > >> - secure = false; > >> - amdgpu_ring_emit_tmz(ring, false); > >> + amdgpu_ring_emit_ib(ring, job, ib, status); > >> + status &= ~AMDGPU_HAVE_CTX_SWITCH; > >> + } > >> + amdgpu_ring_emit_frame_cntl(ring, false, secure); > >> + } else { > >> + for ( ; i < num_ibs; ++i) { > >> + ib = &ibs[i]; > >> + > >> + /* drop preamble IBs if we don't have a context switch */ > >> + if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && > >> + skip_preamble && > >> + !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && > >> + !amdgpu_mcbp && > >> + !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ > >> + continue; > >> + > >> + amdgpu_ring_emit_ib(ring, job, ib, status); > >> + status &= ~AMDGPU_HAVE_CTX_SWITCH; > > > > To pull the checking "job && ring->funcs->emit_frame_cntl" out of the loop, > > that make the implemenation more duplicated like below, we have to write > > Yes, that is exactly what we want. > > As I explained before and will explain again, and again, and again, > "job && ring->funcs->emit_frame_cntl" is static to the body of the loop, > and as such can be pulled OUT of the loop and it should. > > This is a *formulaic* optimization exercise. Compilers and optimizers do it first. > > To extrapolate, consider that what you'd eventually have is a HUGE long long loop > and everything inside it. Not good. > > Regards, > Luben > > > > the same codes multiple times. I hope the implementation is more simple and > > readable. Please see my V2 patch that I just send out. We can try to use > > minimum changes to fix the issue. > > > > for ( ; i < num_ibs; ++i) { > > ib = &ibs[i]; > > > > /* drop preamble IBs if we don't have a context switch */ > > if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && > > skip_preamble && > > !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && > > !amdgpu_mcbp && > > !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ > > continue; > > > > ... > > amdgpu_ring_emit_ib(ring, job, ib, status); > > > > Thanks, > > Ray > > > > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx