On Wed, Jan 17, 2024 at 7:36 AM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Am 16.01.24 um 11:31 schrieb Friedrich Vock: > > On 16.01.24 08:03, Christian König wrote: > >> Am 15.01.24 um 12:18 schrieb Friedrich Vock: > >>> [SNIP] > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32(mmIH_RB_CNTL); > >>>>> + tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; > >>>>> + WREG32(mmIH_RB_CNTL, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>> > >>>> Well that is an extremely bad idea. We already reset the overflow > >>>> after reading the WPTR. > >>> > >>> This is not resetting the overflow bit. This is resetting a "clear > >>> overflow" bit. I don't have the hardware docs, but the name (and my > >>> observations) strongly suggest that setting this bit actually prevents > >>> the hardware from setting the overflow bit ever again. > >> > >> Well that doesn't make any sense at all. The hardware documentation > >> clearly states that this bit is write only and should always read as > >> zero. > >> > >> Setting this bit will clear the overflow flag in the WPTR register and > >> clearing it has no effect at all. > >> > >> I could only ping the hw engineer responsible for this block to double > >> check if the documentation is somehow outdated, but I really doubt so. > >> > > I see. I wish I had access to the documentation, > > Well, doesn't Valve has an NDA in place? > > > but I don't, so all I > > can do is tell you what I observe the hardware doing. I've tested this > > on both a Steam Deck (OSSYS 5.2.0) and an RX 6700 XT (OSSYS 5.0.3). On > > both systems, launching a bunch of shaders that cause page faults leads > > to lots of "[gfxhub] page fault" messages in dmesg, followed by an > > "amdgpu: IH ring buffer overflow". > > Well that is certainly a bug, maybe even the same thing we have seen on > Vega and MI. > > What we could do is to try to apply the same workaround to re-route the > page faults to a different IH ring. > > See those patches here as well: > > commit 516bc3d8dd7965f1a8a3ea453857f14d95971e62 > Author: Christian König <christian.koenig@xxxxxxx> > Date: Fri Nov 2 15:00:16 2018 +0100 > > drm/amdgpu: reroute VMC and UMD to IH ring 1 > > Page faults can easily overwhelm the interrupt handler. > > So to make sure that we never lose valuable interrupts on the > primary ring > we re-route page faults to IH ring 1. > > commit b849aaa41c914a0fd88003f88cb04420a873c624 > Author: Christian König <christian.koenig@xxxxxxx> > Date: Mon Mar 4 19:34:34 2019 +0100 > > drm/amdgpu: also reroute VMC and UMD to IH ring 1 on Vega 20 > > Same patch we alredy did for Vega10. Just re-route page faults to a > separate > ring to avoid drowning in interrupts. > > > > > If I re-launch the same set of shaders after the GPU has soft-recovered, > > the "amdgpu: IH ring buffer overflow" message is missing, even though > > the same amount of page faults should've been triggered at roughly the > > same rate. Running with this patch applied makes more "amdgpu: IH ring > > buffer overflow" messages appear after relaunching the faulting shaders > > (but not when processing any non-faulting work). > > That is actually the expected behavior. There should be a limit on the > number of faults written to the ring so that the ring never overflows. > > > > > The only possible conclusion I can draw from this is that clearing that > > bit *does* have an effect, and I don't think it's far-fetched to assume > > the IH ring buffer overflows still happen after re-launching the > > faulting shaders but go undetected so far. > > Well that can only mean that the hw documentation is incorrect. > > Either the value is not write only trigger bit as documented or we need > an additional read of the register for it to take effect or something > like this. > > >>> Right now, IH overflows, even if they occur repeatedly, only get > >>> registered once. If not registering IH overflows can trivially lead to > >>> system crashes, it's amdgpu's current handling that is broken. > >> > >> It's years that we last tested this but according to the HW > >> documentation this should work fine. > >> > >> What could potentially happen is that the IH has silenced the source > >> of the overflow. We never implemented resetting those, but in this > >> case that here won't help either. > >> > > If the IH silenced the page faults (which quite clearly cause the > > overflow here), then how are the page faults still logged in dmesg? > > There should be a hardware rate limit for the page faults, e.g. there > can only be X faults reported in N clock cycles and then a delay is > inserted. @Christian Koenig Is that tied to xnack (i.e., noretry)? The default is noretry=1 on gfx10.3 and newer. But it can be overridden. It was not set on some older kernels, maybe that is the problem? @Friedrich Vock does setting amdgpu.noretry=1 fix the issue? Alex > > >>> > >>> The possibility of a repeated IH overflow in between reading the wptr > >>> and updating the rptr is a good point, but how can we detect that at > >>> all? It seems to me like we can't set the OVERFLOW_CLEAR bit at all > >>> then, because we're guaranteed to miss any overflows that happen while > >>> the bit is set. > >> > >> When an IH overflow is signaled we clear that flag by writing 1 into > >> the OVERFLOW_CLEAR bit and skip one entry in the IH ring buffer. > >> > >> What can of course happen is that the IH ring buffer overflows more > >> than this single entry and we process IVs which are potentially > >> corrupted, but we won't miss any additional overflows since we only > >> start processing after resetting the flag. > >> > >> An IH overflow is also something you should *never* see in a > >> production system. This is purely for driver bringup and as fallback > >> when there is a severe incorrect programming of the HW. > >> > >> The only exception of that is page fault handling on MI products > >> because of a hardware bug, to mitigate this we are processing page > >> faults on a separate IH ring on those parts. > >> > >> On all other hw generations the IH should have some rate limit for the > >> number of faults generated per second, so that the CPU is always able > >> to catch up. > > > > I'm wondering if there is another bug in here somewhere. Your > > explanation of how it's supposed to work makes a lot of sense, but from > > what I can tell it doesn't work that way when I test it. > > > > From the printk_ratelimit stats it would seem like >2000 faults arrive > > in less than a second, so perhaps your theory about fault interrupt > > ratelimiting not working is correct (but it's hard for me to verify what > > is going on without the documentation). > > I'm going to ping the relevant engineer and putting someone on the task > to take a look. > > Thanks, > Christian. > > > > > Regards, > > Friedrich > > > >> > >> Regards, > >> Christian. > >> > >>> > >>> Regards, > >>> Friedrich > >>> > >>>> > >>>> When you clear the overflow again when updating the RPTR you could > >>>> loose another overflow which might have happened in between and so > >>>> potentially process corrupted IVs. > >>>> > >>>> That can trivially crash the system. > >>>> > >>>> Regards, > >>>> Christian. > >>>> > >>>>> } > >>>>> > >>>>> static int cik_ih_early_init(void *handle) > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > >>>>> index b8c47e0cf37a..076559668573 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > >>>>> @@ -215,7 +215,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_device > >>>>> *adev, > >>>>> tmp = RREG32(mmIH_RB_CNTL); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32(mmIH_RB_CNTL, tmp); > >>>>> - > >>>>> + ih->overflow = true; > >>>>> > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> @@ -266,7 +266,19 @@ static void cz_ih_decode_iv(struct amdgpu_device > >>>>> *adev, > >>>>> static void cz_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> + > >>>>> WREG32(mmIH_RB_RPTR, ih->rptr); > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32(mmIH_RB_CNTL); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32(mmIH_RB_CNTL, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> static int cz_ih_early_init(void *handle) > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c > >>>>> index aecad530b10a..1a5e668643d1 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c > >>>>> @@ -214,7 +214,7 @@ static u32 iceland_ih_get_wptr(struct > >>>>> amdgpu_device *adev, > >>>>> tmp = RREG32(mmIH_RB_CNTL); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32(mmIH_RB_CNTL, tmp); > >>>>> - > >>>>> + ih->overflow = true; > >>>>> > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> @@ -265,7 +265,19 @@ static void iceland_ih_decode_iv(struct > >>>>> amdgpu_device *adev, > >>>>> static void iceland_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> + > >>>>> WREG32(mmIH_RB_RPTR, ih->rptr); > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32(mmIH_RB_CNTL); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32(mmIH_RB_CNTL, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> static int iceland_ih_early_init(void *handle) > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c > >>>>> index d9ed7332d805..ce8f7feec713 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c > >>>>> @@ -418,6 +418,8 @@ static u32 ih_v6_0_get_wptr(struct amdgpu_device > >>>>> *adev, > >>>>> tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); > >>>>> + ih->overflow = true; > >>>>> + > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> } > >>>>> @@ -459,6 +461,7 @@ static void ih_v6_0_irq_rearm(struct > >>>>> amdgpu_device *adev, > >>>>> static void ih_v6_0_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> struct amdgpu_ih_regs *ih_regs; > >>>>> > >>>>> if (ih->use_doorbell) { > >>>>> @@ -472,6 +475,16 @@ static void ih_v6_0_set_rptr(struct > >>>>> amdgpu_device *adev, > >>>>> ih_regs = &ih->ih_regs; > >>>>> WREG32(ih_regs->ih_rb_rptr, ih->rptr); > >>>>> } > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> /** > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c > >>>>> index 8fb05eae340a..668788ad34d9 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c > >>>>> @@ -418,6 +418,8 @@ static u32 ih_v6_1_get_wptr(struct amdgpu_device > >>>>> *adev, > >>>>> tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); > >>>>> + ih->overflow = true; > >>>>> + > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> } > >>>>> @@ -459,6 +461,7 @@ static void ih_v6_1_irq_rearm(struct > >>>>> amdgpu_device *adev, > >>>>> static void ih_v6_1_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> struct amdgpu_ih_regs *ih_regs; > >>>>> > >>>>> if (ih->use_doorbell) { > >>>>> @@ -472,6 +475,16 @@ static void ih_v6_1_set_rptr(struct > >>>>> amdgpu_device *adev, > >>>>> ih_regs = &ih->ih_regs; > >>>>> WREG32(ih_regs->ih_rb_rptr, ih->rptr); > >>>>> } > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> /** > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >>>>> index e64b33115848..0bdac923cb4d 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >>>>> @@ -442,6 +442,7 @@ static u32 navi10_ih_get_wptr(struct > >>>>> amdgpu_device *adev, > >>>>> tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); > >>>>> + ih->overflow = true; > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> } > >>>>> @@ -483,6 +484,7 @@ static void navi10_ih_irq_rearm(struct > >>>>> amdgpu_device *adev, > >>>>> static void navi10_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> struct amdgpu_ih_regs *ih_regs; > >>>>> > >>>>> if (ih == &adev->irq.ih_soft) > >>>>> @@ -499,6 +501,16 @@ static void navi10_ih_set_rptr(struct > >>>>> amdgpu_device *adev, > >>>>> ih_regs = &ih->ih_regs; > >>>>> WREG32(ih_regs->ih_rb_rptr, ih->rptr); > >>>>> } > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> /** > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/si_ih.c > >>>>> index 9a24f17a5750..ff35056d2b54 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c > >>>>> @@ -119,6 +119,7 @@ static u32 si_ih_get_wptr(struct amdgpu_device > >>>>> *adev, > >>>>> tmp = RREG32(IH_RB_CNTL); > >>>>> tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; > >>>>> WREG32(IH_RB_CNTL, tmp); > >>>>> + ih->overflow = true; > >>>>> } > >>>>> return (wptr & ih->ptr_mask); > >>>>> } > >>>>> @@ -147,7 +148,18 @@ static void si_ih_decode_iv(struct amdgpu_device > >>>>> *adev, > >>>>> static void si_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> + > >>>>> WREG32(IH_RB_RPTR, ih->rptr); > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32(IH_RB_CNTL); > >>>>> + tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; > >>>>> + WREG32(IH_RB_CNTL, tmp); > >>>>> + } > >>>>> } > >>>>> > >>>>> static int si_ih_early_init(void *handle) > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c > >>>>> index 917707bba7f3..6f5090d3db48 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c > >>>>> @@ -218,6 +218,7 @@ static u32 tonga_ih_get_wptr(struct amdgpu_device > >>>>> *adev, > >>>>> tmp = RREG32(mmIH_RB_CNTL); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32(mmIH_RB_CNTL, tmp); > >>>>> + ih->overflow = true; > >>>>> > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> @@ -268,6 +269,8 @@ static void tonga_ih_decode_iv(struct > >>>>> amdgpu_device *adev, > >>>>> static void tonga_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> + > >>>>> if (ih->use_doorbell) { > >>>>> /* XXX check if swapping is necessary on BE */ > >>>>> *ih->rptr_cpu = ih->rptr; > >>>>> @@ -275,6 +278,16 @@ static void tonga_ih_set_rptr(struct > >>>>> amdgpu_device *adev, > >>>>> } else { > >>>>> WREG32(mmIH_RB_RPTR, ih->rptr); > >>>>> } > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32(mmIH_RB_CNTL); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32(mmIH_RB_CNTL, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> static int tonga_ih_early_init(void *handle) > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >>>>> index d364c6dd152c..bb005924f194 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >>>>> @@ -372,6 +372,7 @@ static u32 vega10_ih_get_wptr(struct > >>>>> amdgpu_device *adev, > >>>>> tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); > >>>>> + ih->overflow = true; > >>>>> > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> @@ -413,6 +414,7 @@ static void vega10_ih_irq_rearm(struct > >>>>> amdgpu_device *adev, > >>>>> static void vega10_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> struct amdgpu_ih_regs *ih_regs; > >>>>> > >>>>> if (ih == &adev->irq.ih_soft) > >>>>> @@ -429,6 +431,16 @@ static void vega10_ih_set_rptr(struct > >>>>> amdgpu_device *adev, > >>>>> ih_regs = &ih->ih_regs; > >>>>> WREG32(ih_regs->ih_rb_rptr, ih->rptr); > >>>>> } > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> /** > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > >>>>> b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > >>>>> index ddfc6941f9d5..bb725a970697 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c > >>>>> @@ -420,6 +420,7 @@ static u32 vega20_ih_get_wptr(struct > >>>>> amdgpu_device *adev, > >>>>> tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); > >>>>> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > >>>>> WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); > >>>>> + ih->overflow = true; > >>>>> > >>>>> out: > >>>>> return (wptr & ih->ptr_mask); > >>>>> @@ -462,6 +463,7 @@ static void vega20_ih_irq_rearm(struct > >>>>> amdgpu_device *adev, > >>>>> static void vega20_ih_set_rptr(struct amdgpu_device *adev, > >>>>> struct amdgpu_ih_ring *ih) > >>>>> { > >>>>> + u32 tmp; > >>>>> struct amdgpu_ih_regs *ih_regs; > >>>>> > >>>>> if (ih == &adev->irq.ih_soft) > >>>>> @@ -478,6 +480,16 @@ static void vega20_ih_set_rptr(struct > >>>>> amdgpu_device *adev, > >>>>> ih_regs = &ih->ih_regs; > >>>>> WREG32(ih_regs->ih_rb_rptr, ih->rptr); > >>>>> } > >>>>> + > >>>>> + /* If we overflowed previously (and thus set the OVERFLOW_CLEAR > >>>>> bit), > >>>>> + * reset it here to detect more overflows if they occur. > >>>>> + */ > >>>>> + if (ih->overflow) { > >>>>> + tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl); > >>>>> + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, > >>>>> 0); > >>>>> + WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp); > >>>>> + ih->overflow = false; > >>>>> + } > >>>>> } > >>>>> > >>>>> /** > >>>>> -- > >>>>> 2.43.0 > >>>>> > >>>> > >> >