在 2024-06-17星期一的 15:59 +0200,Christian König写道: > Am 17.06.24 um 15:43 schrieb Icenowy Zheng: > > 在 2024-06-17星期一的 15:09 +0200,Christian König写道: > > > Am 17.06.24 um 15:03 schrieb Icenowy Zheng: > > > > 在 2024-06-17星期一的 14:35 +0200,Christian König写道: > > > > > Am 17.06.24 um 12:58 schrieb Icenowy Zheng: > > > > > > The duplication of EOP packets for GFX7/8, with the former > > > > > > one > > > > > > have > > > > > > seq-1 written and the latter one have seq written, seems to > > > > > > confuse > > > > > > some > > > > > > hardware platform (e.g. Loongson 7A series PCIe > > > > > > controllers). > > > > > > > > > > > > Make the content of the duplicated EOP packet the same with > > > > > > the > > > > > > real > > > > > > one, only masking any possible interrupts. > > > > > Well completely NAK to that, exactly that disables the > > > > > workaround. > > > > > > > > > > The CPU needs to see two different values written here. > > > > Why do the CPU need to see two different values here? Only the > > > > second > > > > packet will raise an interrupt before and after applying this > > > > patch, > > > > and the first packet's result should just be overriden on > > > > ordinary > > > > platforms. The CPU won't see the first one, until it's polling > > > > for > > > > the > > > > address for a very short interval, so short that the GPU CP > > > > couldn't > > > > execute 2 commands. > > > Yes exactly that. We need to make two writes, one with the old > > > value > > > (seq - 1) and a second with the real value (seq). > > > > > > Otherwise it is possible that a polling CPU would see the > > > sequence > > > before the second EOP is issued with results in incoherent view > > > of > > > memory. > > In this case shouldn't we write seq-1 before any work, and then > > write > > seq after work, like what is done in Mesa? > > No. This hw workaround requires that two consecutive write operations > happen directly behind each other on the PCIe bus with two different > values. > > To make the software logic around that work without any changes we > use > the values seq - 1 and seq because those are guaranteed to be > different > and not trigger any unwanted software behavior. > > Only then we can guarantee that we have a coherent view of system > memory. BTW is there any operation that could be taken to examine this specific workaround? Is there any case possible to reproduce? > > > As what I see, Mesa uses another command buffer to emit a > > EVENT_WRITE_EOP writing 0, and commit this command buffer before > > the > > real command buffer. > > > > > > Or do you mean the GPU needs to see two different values being > > > > written, > > > > or they will be merged into only one write request? > > > > > > > > Please give out more information about this workaround, > > > > otherwise > > > > the > > > > GPU hang problem on Loongson platforms will persist. > > > Well if Loongson can't handle two consecutive write operations to > > > the > > > same address with different values then you have a massive > > > platform > > > bug. > > I think the issue is triggered when two consecutive write > > operations > > and one IRQ is present, which is exactly the case of this function. > > Well then you have a massive platform bug. > > Two consecutive writes to the same bus address are perfectly legal > from > the PCIe specification and can happen all the time, even without this > specific hw workaround. > > Regards, > Christian. > > > > > > That is something which can happen all the time throughout the > > > operation. > > > > > > Regards, > > > Christian. > > > > > > > > Regards, > > > > > Christian. > > > > > > > > > > > Fixes: bf26da927a1c ("drm/amdgpu: add cache flush > > > > > > workaround to > > > > > > gfx8 emit_fence") > > > > > > Fixes: a2e73f56fa62 ("drm/amdgpu: Add support for CIK > > > > > > parts") > > > > > > Signed-off-by: Icenowy Zheng <uwu@xxxxxxxxxx> > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 12 +++++------- > > > > > > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 12 ++++-------- > > > > > > 2 files changed, 9 insertions(+), 15 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > > > > > index 541dbd70d8c75..778f27f1a34fe 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > > > > > @@ -2117,9 +2117,8 @@ static void > > > > > > gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 > > > > > > addr, > > > > > > { > > > > > > bool write64bit = flags & > > > > > > AMDGPU_FENCE_FLAG_64BIT; > > > > > > bool int_sel = flags & AMDGPU_FENCE_FLAG_INT; > > > > > > - /* Workaround for cache flush problems. First send > > > > > > a > > > > > > dummy > > > > > > EOP > > > > > > - * event down the pipe with seq one below. > > > > > > - */ > > > > > > + > > > > > > + /* Workaround for cache flush problems, send EOP > > > > > > twice. > > > > > > */ > > > > > > amdgpu_ring_write(ring, > > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP, > > > > > > 4)); > > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN | > > > > > > EOP_TC_ACTION_EN | > > > > > > @@ -2127,11 +2126,10 @@ static void > > > > > > gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 > > > > > > addr, > > > > > > EVENT_INDEX(5))); > > > > > > amdgpu_ring_write(ring, addr & 0xfffffffc); > > > > > > amdgpu_ring_write(ring, (upper_32_bits(addr) & > > > > > > 0xffff) > > > > > > > > > > > > > - DATA_SEL(1) | INT_SEL(0)); > > > > > > - amdgpu_ring_write(ring, lower_32_bits(seq - 1)); > > > > > > - amdgpu_ring_write(ring, upper_32_bits(seq - 1)); > > > > > > + DATA_SEL(write64bit ? 2 : > > > > > > 1) | > > > > > > INT_SEL(0)); > > > > > > + amdgpu_ring_write(ring, lower_32_bits(seq)); > > > > > > + amdgpu_ring_write(ring, upper_32_bits(seq)); > > > > > > > > > > > > - /* Then send the real EOP event down the pipe. */ > > > > > > amdgpu_ring_write(ring, > > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP, > > > > > > 4)); > > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN | > > > > > > EOP_TC_ACTION_EN | > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > > > > > > index 2f0e72caee1af..39a7d60f1fd69 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > > > > > > @@ -6153,9 +6153,7 @@ static void > > > > > > gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 > > > > > > addr, > > > > > > bool write64bit = flags & > > > > > > AMDGPU_FENCE_FLAG_64BIT; > > > > > > bool int_sel = flags & AMDGPU_FENCE_FLAG_INT; > > > > > > > > > > > > - /* Workaround for cache flush problems. First send > > > > > > a > > > > > > dummy > > > > > > EOP > > > > > > - * event down the pipe with seq one below. > > > > > > - */ > > > > > > + /* Workaround for cache flush problems, send EOP > > > > > > twice. > > > > > > */ > > > > > > amdgpu_ring_write(ring, > > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP, > > > > > > 4)); > > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN | > > > > > > EOP_TC_ACTION_EN | > > > > > > @@ -6164,12 +6162,10 @@ static void > > > > > > gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 > > > > > > addr, > > > > > > EVENT_INDEX(5))); > > > > > > amdgpu_ring_write(ring, addr & 0xfffffffc); > > > > > > amdgpu_ring_write(ring, (upper_32_bits(addr) & > > > > > > 0xffff) > > > > > > > > > > > > > - DATA_SEL(1) | INT_SEL(0)); > > > > > > - amdgpu_ring_write(ring, lower_32_bits(seq - 1)); > > > > > > - amdgpu_ring_write(ring, upper_32_bits(seq - 1)); > > > > > > + DATA_SEL(write64bit ? 2 : 1) | > > > > > > INT_SEL(0)); > > > > > > + amdgpu_ring_write(ring, lower_32_bits(seq)); > > > > > > + amdgpu_ring_write(ring, upper_32_bits(seq)); > > > > > > > > > > > > - /* Then send the real EOP event down the pipe: > > > > > > - * EVENT_WRITE_EOP - flush caches, send int */ > > > > > > amdgpu_ring_write(ring, > > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP, > > > > > > 4)); > > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN | > > > > > > EOP_TC_ACTION_EN | >