On Tue, Apr 07, 2020 at 04:26:19PM +0800, Liang, Prike wrote: > > > > -----Original Message----- > > From: Huang, Ray <Ray.Huang@xxxxxxx> > > Sent: Tuesday, April 7, 2020 4:03 PM > > To: Liang, Prike <Prike.Liang@xxxxxxx> > > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, Felix > > <Felix.Kuehling@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; amd- > > gfx@xxxxxxxxxxxxxxxxxxxxx > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video > > playback > > > > On Tue, Apr 07, 2020 at 01:49:43PM +0800, Liang, Prike wrote: > > > > > > > -----Original Message----- > > > > From: Huang, Ray <Ray.Huang@xxxxxxx> > > > > Sent: Friday, April 3, 2020 6:29 PM > > > > To: Liang, Prike <Prike.Liang@xxxxxxx> > > > > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, Felix > > > > <Felix.Kuehling@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; amd- > > > > gfx@xxxxxxxxxxxxxxxxxxxxx > > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with > > > > video playback > > > > > > > > On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote: > > > > > On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Huang, Ray <Ray.Huang@xxxxxxx> > > > > > > > Sent: Friday, April 3, 2020 2:27 PM > > > > > > > To: Liang, Prike <Prike.Liang@xxxxxxx> > > > > > > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Quan, Evan > > > > <Evan.Quan@xxxxxxx>; > > > > > > > Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Kuehling, > > > > > > > Felix <Felix.Kuehling@xxxxxxx> > > > > > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend > > > > > > > with video playback > > > > > > > > > > > > > > (+ Felix) > > > > > > > > > > > > > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote: > > > > > > > > The system will be hang up during S3 as SMU is pending at GC > > > > > > > > not respose the register CP_HQD_ACTIVE access request and > > > > > > > > this issue can be fixed by adding RLC safe mode guard before > > > > > > > > each HQD map/unmap retrive opt. > > > > > > > > > > > > > > We need more information for the issue, does the map/unmap is > > > > > > > required for MAP_QUEUES/UNMAP_QUEUES packets or writing with > > > > MMIO or both? > > > > > > > > > > > > > [Prike] The issue hang up at MP1 was trying to read register > > > > > > RSMU_RESIDENCY_COUNTER_GC but did not get response from GFX, > > > > since GFX was busy at reading register CP_HQD_ACTIVE. > > > > > > Moreover, when disabled GFXOFF this issue also can't see so > > > > > > there is likely to perform register accessed at GFXOFF CGPG/CGCG > > enter stage. > > > > > > As for only this issue, that seems just MMIO access failed > > > > > > case which > > > > occurred under QUEUE map/unmap status check. > > > > > > > > > > > > > > > > While we start to do S3, we will disable gfxoff at start of suspend. > > > > > Then in this point, the gfx should be always in "on" state. > > > > > > > > > > > > From your patch, you just protect the kernel kiq and user queue. > > > > > > > What about other kernel compute queues? HIQ? > > > > > > > > > > > > > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire > > > > > > the CP_HQD_ACTIVE status by MMIO accessing, therefore just guard > > > > > > the KIQ > > > > and some type user queue now. Regarding HIQ map and ummap which > > used > > > > the method of submitting configuration packet. > > > > > > > > > > > > > > > > KIQ itself init/unit should be always under gfx on state. Can you > > > > > give a check the result if not add enter/exit rlc safe mode around it? > > > > > > > > Wait... In your case, the system didn't load any user queues because > > > > no ROCm based application is running. So the issue is probably > > > > caused by KIQ itself init/unit, can you confirm? > > > [Prike] This improper register access is under performing MQD > > > destroy during amdkfd suspend period. For the KIQ UNI process may not > > > need the RLC guard as GFX CGPG has been disabled at the early suspend > > period. > > > > How about move below gfxoff/cgpg disabling ahead of > > amdgpu_amdkfd_suspend? > > > > amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > > amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE); > > > > amdgpu_amdkfd_suspend(adev, !fbcon); > > > > We should disable the gfxoff/cgpg at first to avoid mmio access. > > > [Prike] Generally speaking that's fine to un-gate the CGPG before each GFX MMIO access. > That's should be no different to enter RLC safe mode. > So do you prefer the solution to move the CGPG ungated at suspend early time right ? > Yes, that is able to avoid all impact from the following GC access behavoir. Thanks, Ray > > Thanks, > > Ray > > > > > > > > If have concern the other case over guard will send a patch for simplify it. > > > > > > > > Thanks, > > > > Ray > > > > > > > > > > > > > > Hi Felix, maybe we need to use packets with kiq to map all user queues. > > > > > > > > > > Thanks, > > > > > Ray > > > > > > > > > > > > Thanks, > > > > > > > Ray > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > > > > > > > > Tested-by: Mengbing Wang <Mengbing.Wang@xxxxxxx> > > > > > > > > --- > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 > > > > ++++++ > > > > > > > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++++ > > > > > > > > 2 files changed, 10 insertions(+) > > > > > > > > > > > > > > > > diff --git > > > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > > > > > > > > index df841c2..e265063 100644 > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > > > > > > > > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev > > > > > > > > *kgd, void > > > > > > > *mqd, uint32_t pipe_id, > > > > > > > > uint32_t *mqd_hqd; > > > > > > > > uint32_t reg, hqd_base, data; > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev); > > > > > > > > m = get_mqd(mqd); > > > > > > > > > > > > > > > > acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 > > @@ > > > > > > > > int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, > > > > > > > > uint32_t pipe_id, > > > > > > > > > > > > > > > > release_queue(kgd); > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_exit_safe_mode(adev); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct > > > > > > > > kgd_dev > > > > > > > *kgd, uint64_t queue_address, > > > > > > > > bool retval = false; > > > > > > > > uint32_t low, high; > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev); > > > > > > > > acquire_queue(kgd, pipe_id, queue_id); > > > > > > > > act = RREG32(SOC15_REG_OFFSET(GC, 0, > > mmCP_HQD_ACTIVE)); > > > > > > > > if (act) { > > > > > > > > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct > > > > > > > > kgd_dev > > > > > > > *kgd, uint64_t queue_address, > > > > > > > > retval = true; > > > > > > > > } > > > > > > > > release_queue(kgd); > > > > > > > > + amdgpu_gfx_rlc_exit_safe_mode(adev); > > > > > > > > return retval; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct > > > > > > > > kgd_dev *kgd, > > > > > > > void *mqd, > > > > > > > > uint32_t temp; > > > > > > > > struct v9_mqd *m = get_mqd(mqd); > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev); > > > > > > > > if (adev->in_gpu_reset) > > > > > > > > return -EIO; > > > > > > > > > > > > > > > > @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct > > > > > > > > kgd_dev *kgd, > > > > > > > void *mqd, > > > > > > > > } > > > > > > > > > > > > > > > > release_queue(kgd); > > > > > > > > + amdgpu_gfx_rlc_exit_safe_mode(adev); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > > > index 1fea077..ee107d9 100644 > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > > > @@ -3533,6 +3533,7 @@ static int > > > > > > > > gfx_v9_0_kiq_init_register(struct > > > > > > > amdgpu_ring *ring) > > > > > > > > struct v9_mqd *mqd = ring->mqd_ptr; > > > > > > > > int j; > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev); > > > > > > > > /* disable wptr polling */ > > > > > > > > WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0); > > > > > > > > > > > > > > > > @@ -3629,6 +3630,7 @@ static int > > > > > > > > gfx_v9_0_kiq_init_register(struct > > > > > > > amdgpu_ring *ring) > > > > > > > > if (ring->use_doorbell) > > > > > > > > WREG32_FIELD15(GC, 0, CP_PQ_STATUS, > > DOORBELL_ENABLE, > > > > > > > 1); > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_exit_safe_mode(adev); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -3637,6 +3639,7 @@ static int > > > > > > > > gfx_v9_0_kiq_fini_register(struct > > > > > > > amdgpu_ring *ring) > > > > > > > > struct amdgpu_device *adev = ring->adev; > > > > > > > > int j; > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev); > > > > > > > > /* disable the queue if it's active */ > > > > > > > > if (RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1) { > > > > > > > > > > > > > > > > @@ -3668,6 +3671,7 @@ static int > > > > > > > > gfx_v9_0_kiq_fini_register(struct > > > > > > > amdgpu_ring *ring) > > > > > > > > WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_WPTR_HI, 0); > > > > > > > > WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_WPTR_LO, 0); > > > > > > > > > > > > > > > > + amdgpu_gfx_rlc_exit_safe_mode(adev); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > -- > > > > > > > > 2.7.4 > > > > > > > > > > > > > _______________________________________________ > > > > > amd-gfx mailing list > > > > > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > > > list > > > > > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd- > > > > gfx&data=02%7C01%7Cra > > > > > > > > > > > y.huang%40amd.com%7C040563ff26374383ec6108d7d7b6a2bb%7C3dd8961 > > > > fe4884e6 > > > > > > > > > > > 08e11a82d994e183d%7C0%7C0%7C637215053543776633&sdata=COMv > > > > G7W4%2Fl7 > > > > > aKDAV8Qgbl%2F3myW0HCSz7qk014OLUzrY%3D&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx