RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - Internal Distribution Only]

Ping ... this is a severe bug fix

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Liu, Monk
Sent: Monday, August 3, 2020 9:55 AM
To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

[AMD Official Use Only - Internal Distribution Only]

[AMD Official Use Only - Internal Distribution Only]

>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the
>>right place to stop the KIQ

KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV ) for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch

But this is really a weird bug because even with the same approach it doesn't make KIQ (CP) hang on GFX9, only GFX10 need this patch ....

By now I do not have other good explanation or better fix than this one

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Sent: Friday, July 31, 2020 9:57 PM
To: Liu, Monk <Monk.Liu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the memory that was previously used for the KIQ ring buffer and overwrites it with something that's not a valid PM4 packet.

Regards,
  Felix

Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> KIQ will hang if we try below steps:
> modprobe amdgpu
> rmmod amdgpu
> modprobe amdgpu sched_hw_submission=4
>
> the cause is that due to KIQ is always living there even after we
> unload KMD thus when doing the realod of KMD KIQ will crash upon its
> register programed with different values with the previous
> configuration (the config like HQD addr, ring size, is easily changed
> if we alter the sched_hw_submission)
>
> the fix is we must inactive KIQ first before touching any of its
> registgers
>
> Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index db9f1e8..f571e25 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct
> amdgpu_ring *ring)  struct v10_compute_mqd *mqd = ring->mqd_ptr;  int
> j;
>
> +/* activate the queue */
> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
> +
>  /* disable wptr polling */
>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C4837e2d566b44af845f608d837503a3b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637320165018899834&amp;sdata=TED%2BkhlYyAIyTmLJAZBBBHHnE6PRg4fpUsZhD9ke%2BPU%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux