在 2024-06-17星期一的 16:42 +0200,Christian König写道: > Am 17.06.24 um 16:30 schrieb Icenowy Zheng: > > 在 2024-06-17星期一的 15:59 +0200,Christian König写道: > > > Am 17.06.24 um 15:43 schrieb Icenowy Zheng: > > > > 在 2024-06-17星期一的 15:09 +0200,Christian König写道: > > > > > Am 17.06.24 um 15:03 schrieb Icenowy Zheng: > > > > > > 在 2024-06-17星期一的 14:35 +0200,Christian König写道: > > > > > > > Am 17.06.24 um 12:58 schrieb Icenowy Zheng: > > > > > > > > The duplication of EOP packets for GFX7/8, with the > > > > > > > > former > > > > > > > > one > > > > > > > > have > > > > > > > > seq-1 written and the latter one have seq written, > > > > > > > > seems to > > > > > > > > confuse > > > > > > > > some > > > > > > > > hardware platform (e.g. Loongson 7A series PCIe > > > > > > > > controllers). > > > > > > > > > > > > > > > > Make the content of the duplicated EOP packet the same > > > > > > > > with > > > > > > > > the > > > > > > > > real > > > > > > > > one, only masking any possible interrupts. > > > > > > > Well completely NAK to that, exactly that disables the > > > > > > > workaround. > > > > > > > > > > > > > > The CPU needs to see two different values written here. > > > > > > Why do the CPU need to see two different values here? Only > > > > > > the > > > > > > second > > > > > > packet will raise an interrupt before and after applying > > > > > > this > > > > > > patch, > > > > > > and the first packet's result should just be overriden on > > > > > > ordinary > > > > > > platforms. The CPU won't see the first one, until it's > > > > > > polling > > > > > > for > > > > > > the > > > > > > address for a very short interval, so short that the GPU CP > > > > > > couldn't > > > > > > execute 2 commands. > > > > > Yes exactly that. We need to make two writes, one with the > > > > > old > > > > > value > > > > > (seq - 1) and a second with the real value (seq). > > > > > > > > > > Otherwise it is possible that a polling CPU would see the > > > > > sequence > > > > > before the second EOP is issued with results in incoherent > > > > > view > > > > > of > > > > > memory. > > > > In this case shouldn't we write seq-1 before any work, and then > > > > write > > > > seq after work, like what is done in Mesa? > > > No. This hw workaround requires that two consecutive write > > > operations > > > happen directly behind each other on the PCIe bus with two > > > different > > > values. > > Well to be honest the workaround code in Mesa seems to not be > > working > > in this way ... > > Mesa doesn't have any workaround for that hw issue, the code there > uses > a quite different approach. Ah? Commit bf26da927a1c ("drm/amdgpu: add cache flush workaround to gfx8 emit_fence") says "Both PAL and Mesa use it for gfx8 too, so port this commit to gfx_v8_0_ring_emit_fence_gfx", so maybe the workaround should just be not necessary here? > > > > To make the software logic around that work without any changes > > > we > > > use > > > the values seq - 1 and seq because those are guaranteed to be > > > different > > > and not trigger any unwanted software behavior. > > > > > > Only then we can guarantee that we have a coherent view of system > > > memory. > > Any more details about it? > > No, sorry. All I know is that it's a bug in the cache flush logic > which > can be worked around by issuing two write behind each other to the > same > location. So the issue is that the first EOP write does not properly flush the cache? Could EVENT_WRITE be used instead of EVENT_WRITE_EOP in this workaround to properly flush it without hurting the fence value? > > > BTW in this case, could I try to write it for 3 times instead of 2, > > with seq-1, seq and seq? > > That could potentially work as well, but at some point we would need > to > increase the EOP ring buffer size or could run into performance > issues. Well I will try this. I think the buffer is enlarged in the original workaround commit. > > > > > As what I see, Mesa uses another command buffer to emit a > > > > EVENT_WRITE_EOP writing 0, and commit this command buffer > > > > before > > > > the > > > > real command buffer. > > > > > > > > > > Or do you mean the GPU needs to see two different values > > > > > > being > > > > > > written, > > > > > > or they will be merged into only one write request? > > > > > > > > > > > > Please give out more information about this workaround, > > > > > > otherwise > > > > > > the > > > > > > GPU hang problem on Loongson platforms will persist. > > > > > Well if Loongson can't handle two consecutive write > > > > > operations to > > > > > the > > > > > same address with different values then you have a massive > > > > > platform > > > > > bug. > > > > I think the issue is triggered when two consecutive write > > > > operations > > > > and one IRQ is present, which is exactly the case of this > > > > function. > > > Well then you have a massive platform bug. > > > > > > Two consecutive writes to the same bus address are perfectly > > > legal > > > from > > > the PCIe specification and can happen all the time, even without > > > this > > > specific hw workaround. > > Yes I know it, and I am not from Loongson, just some user trying to > > mess around it. > > Well to be honest on a platform where even two consecutive writes to > the > same location doesn't work I would have strong doubts that it is > stable > in general. Well I think the current situation is that the IRQ triggered by the second EOP packet arrives before the second write is finished, not the second write is totally dropped. > > Regards, > Christian.