[AMD Official Use Only - General] If the behavior is correct, this patch looks like workaround HW reset not flushed the TLB or something can be workaround by adding a gpu TLB flush. Thanks, Feifei -----Original Message----- From: Koenig, Christian <Christian.Koenig@xxxxxxx> Sent: Tuesday, October 10, 2023 5:07 PM To: Xu, Feifei <Feifei.Xu@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb Hi Feifei, yeah, that is correct behavior. The GMC callback should *not* get called during resume in a reset, because the reset needs to take care of invalidating the TLB anyway. If the later doesn't work any more we need to re-iterate the reset procedure and not mess with this here. Regards, Christian. Am 10.10.23 um 04:27 schrieb Xu, Feifei: > [AMD Official Use Only - General] > > Yes, adev->gfx.is_poweron check will be applied in gmc_v11 callback, which will be called after the generic gmc part: amdgpu_gmc_flush_gpu_tlb() function. > But in commit: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2"), the flush is moved at a higher level amdgpu_gmc_flush_gpu_tlb() function. > > Thus the gmc_v11 callback will never be called in the resume because adev->reset_domain->sem not released and returned ahead. Adding a check of adev->gfx.is_poweron will let GFX11 not breaking ahead, like following: > > if (!down_read_trylock(&adev->reset_domain->sem) && //--> true in > gfx11 > +!adev->gfx.is_poweron) //-->false in gfx11, and the whole if statement will be false, not return ahead. The following gmc v11 callback will be executed later. > > Thanks, > Feifei > > -----Original Message----- > From: Zhang, Hawking <Hawking.Zhang@xxxxxxx> > Sent: Monday, October 9, 2023 4:58 PM > To: Xu, Feifei <Feifei.Xu@xxxxxxx>; Wang, Yang(Kevin) > <KevinYang.Wang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Koenig, Christian <Christian.Koenig@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip > flush_gpu_tlb > > [AMD Official Use Only - General] > > adev->gfx.is_poweron check should already be applied in IP specific (gmc v11) callback. If gfx is not power on, it does nothing but just returns. I didn't see how it helps resolve the issue if we just move the check from one function to another. > > Regards, > Hawking > > -----Original Message----- > From: Xu, Feifei <Feifei.Xu@xxxxxxx> > Sent: Monday, October 9, 2023 09:51 > To: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Koenig, Christian <Christian.Koenig@xxxxxxx>; Zhang, Hawking > <Hawking.Zhang@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip > flush_gpu_tlb > > [AMD Official Use Only - General] > > Hi, > >>> Based on your description, the above code should use "||" instead of >>> "&&", > && is to add more restriction here. To avoid skipping necessary TLB flush by return. > For Asics < GFX11, !adev->gfx.is_poweron is always true (this paremeter is intrudoced from GFX11), only depends on reset_domain->sem; For Asics = GFX11, !adev->gfx.is_poweron might be false (which gfx might already poweron in the reset), this will make the if () not ture, return will not be executed, thus flush TLB. > >>> And after merging code into one line may result in the lock not being released if the lock can be acquired success. > If !adev->gfx.is_poweron is true, the reset_domin->sem will not be down_read_trylock, thus could avoid this deadlock. > > Thanks, > Feifei > > -----Original Message----- > From: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx> > Sent: Sunday, October 8, 2023 9:36 PM > To: Xu, Feifei <Feifei.Xu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Xu, Feifei <Feifei.Xu@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; > Koenig, Christian <Christian.Koenig@xxxxxxx>; Zhang, Hawking > <Hawking.Zhang@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip > flush_gpu_tlb > > > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Feifei Xu > Sent: Sunday, October 8, 2023 6:07 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Xu, Feifei <Feifei.Xu@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; > Koenig, Christian <Christian.Koenig@xxxxxxx>; Zhang, Hawking > <Hawking.Zhang@xxxxxxx> > Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb > > To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb after reset on GFX11. > Gfxhub tlb flush need check if adev->gfx.is_poweron set. > > Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb > v2") > > Signed-off-by: Feifei Xu <Feifei.Xu@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index 2f9bb86edd71..048d32edee88 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid, > /* > * A GPU reset should flush all TLBs anyway, so no need to do > * this while one is ongoing. > + * For GFX11, gfxhub flush check if adev->gfx.is_poweron is set. > */ > - if (!down_read_trylock(&adev->reset_domain->sem)) > + if (!down_read_trylock(&adev->reset_domain->sem) && > +!adev->gfx.is_poweron) > return; > > [Kevin]: > Based on your description, the above code should use "||" instead of "&&", And after merging code into one line may result in the lock not being released if the lock can be acquired success. > > Best Regards, > Kevin > > if (adev->gmc.flush_tlb_needs_extra_type_2) > -- > 2.34.1 > > >