>-----Original Message----- >From: Christian König <ckoenig.leichtzumerken at gmail.com> >Sent: Monday, August 20, 2018 9:15 PM >To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >Cc: Liu, Monk <Monk.Liu at amd.com> >Subject: Re: [PATCH] amdgpu: fix multi-process hang issue > >Am 20.08.2018 um 05:35 schrieb Emily Deng: >> From: Monk Liu <Monk.Liu at amd.com> >> >> SWDEV-146499: hang during multi vulkan process testing >> >> cause: >> the second frame's PREAMBLE_IB have clear-state and LOAD actions, >> those actions ruin the pipeline that is still doing process in the >> previous frame's work-load IB. >> >> fix: >> need insert pipeline sync if have context switch for SRIOV (because >> only SRIOV will report PREEMPTION flag to UMD) >> >> Change-Id: If676da7a9b0147114cd76d19b6035ed8033de449 >> Signed-off-by: Monk Liu <Monk.Liu at amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 15 +++++++++++---- >> 1 file changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> index 5c22cfd..66efa85 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> @@ -95,6 +95,12 @@ void amdgpu_ib_free(struct amdgpu_device *adev, >struct amdgpu_ib *ib, >> amdgpu_sa_bo_free(adev, &ib->sa_bo, f); >> } >> >> +static bool amdgpu_ib_has_preamble(struct amdgpu_ring *ring, bool >ctx_switch) { >> + return (amdgpu_sriov_vf(ring->adev) && >> + ring->funcs->type == AMDGPU_RING_TYPE_GFX && >> + ctx_switch); >> +} >> + > >Well NAK, please merge that into amdgpu_ib_schedule() and drop the extra >check for GFX, that will apply to compute queues as well. > >> /** >> * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring >> * >> @@ -123,7 +129,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >unsigned num_ibs, >> struct amdgpu_device *adev = ring->adev; >> struct amdgpu_ib *ib = &ibs[0]; >> struct dma_fence *tmp = NULL; >> - bool skip_preamble, need_ctx_switch; >> + bool need_ctx_switch; >> unsigned patch_offset = ~0; >> struct amdgpu_vm *vm; >> uint64_t fence_ctx; >> @@ -156,6 +162,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >unsigned num_ibs, >> return -EINVAL; >> } >> >> + need_ctx_switch = ring->current_ctx != fence_ctx; >> + >> alloc_size = ring->funcs->emit_frame_size + num_ibs * >> ring->funcs->emit_ib_size; >> >> @@ -167,6 +175,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >> unsigned num_ibs, >> >> if (ring->funcs->emit_pipeline_sync && job && >> ((tmp = amdgpu_sync_get_fence(&job->sched_sync, NULL)) || >> + amdgpu_ib_has_preamble(ring, need_ctx_switch) || >> amdgpu_vm_need_pipeline_sync(ring, job))) { >> need_pipe_sync = true; >> >> @@ -200,8 +209,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >unsigned num_ibs, >> amdgpu_asic_flush_hdp(adev, ring); >> } >> >> - skip_preamble = ring->current_ctx == fence_ctx; >> - need_ctx_switch = ring->current_ctx != fence_ctx; >> if (job && ring->funcs->emit_cntxcntl) { >> if (need_ctx_switch) >> status |= AMDGPU_HAVE_CTX_SWITCH; @@ -215,7 >+222,7 @@ int >> amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, >> >> /* drop preamble IBs if we don't have a context switch */ >> if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && >> - skip_preamble && >> + !need_ctx_switch && >> !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) >&& >> !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, >Preamble CE ib >> must be inserted anyway */ > >Ok, please clean that up more carefully. All prerequisites which are not IB >dependent should come before the loop. > >BTW: The handling of AMDGPU_PREAMBLE_IB_PRESENT_FIRST in amdgpu_cs.c >is completely broken as well. Thanks, will refine more carefully. > >Regards, >Christian. > >> continue;