Am Donnerstag, den 02.04.2020, 18:07 +0100 schrieb Robert Beckett: > commit 4900dda90af2cb13bc1d4c12ce94b98acc8fe64e upstream > > Due to async need_flush updating via other buffer mapping, checking > gpu->need_flush in 3 places within etnaviv_buffer_queue can cause GPU > hangs. > > This occurs due to need_flush being false for the first 2 checks in that > function, so that the extra dword does not get accounted for, but by the > time we come to check for the third time, gpu->mmu->need_flish is true, > which outputs the flush instruction. This causes the prefetch during the > final link to be off by 1. This causes GPU hangs. Yep, there should have been a READ_ONCE on this state. :/ > It causes the ring to contain patterns like this: > > 0x40000005, /* LINK (8) PREFETCH=0x5,OP=LINK */ > 0x70040010, /* ADDRESS *0x70040010 */ > 0x40000002, /* LINK (8) PREFETCH=0x2,OP=LINK */ > 0x70040000, /* ADDRESS *0x70040000 */ > 0x08010e04, /* LOAD_STATE (1) Base: 0x03810 Size: 1 Fixp: 0 */ > 0x0000001f, /* GL.FLUSH_MMU := FLUSH_FEMMU=1,FLUSH_UNK1=1,FLUSH_UNK2=1,FLUSH_PEMMU=1,FLUSH_UNK4=1 */ > 0x08010e03, /* LOAD_STATE (1) Base: 0x0380C Size: 1 Fixp: 0 */ > 0x00000000, /* GL.FLUSH_CACHE := DEPTH=0,COLOR=0,TEXTURE=0,PE2D=0,TEXTUREVS=0,SHADER_L1=0,SHADER_L2=0,UNK10=0,UNK11=0,DESCRIPTOR_UNK12=0,DESCRIPTOR_UNK13=0 */ > 0x08010e02, /* LOAD_STATE (1) Base: 0x03808 Size: 1 Fixp: 0 */ > 0x00000701, /* GL.SEMAPHORE_TOKEN := FROM=FE,TO=PE,UNK28=0x0 */ > 0x48000000, /* STALL (9) OP=STALL */ > 0x00000701, /* TOKEN FROM=FE,TO=PE,UNK28=0x0 */ > 0x08010e00, /* LOAD_STATE (1) Base: 0x03800 Size: 1 Fixp: 0 */ > 0x00000000, /* GL.PIPE_SELECT := PIPE=PIPE_3D */ > 0x40000035, /* LINK (8) PREFETCH=0x35,OP=LINK */ > 0x70041000, /* ADDRESS *0x70041000 */ > > Here we see a link with prefetch of 5 dwords starting with the 3rd > instruction. It only loads the 5 dwords up and including the final > LOAD_STATE. It needs to include the final LINK instruction. > > This was seen on imx6q, and the fix is confirmed to stop the GPU hangs. > > The commit referenced inadvertently fixed this issue by checking > gpu->mmu->need_flush once at the start of the function. > Given that this commit is independant, and useful for all version, it > seems sensible to backport it to the stable branches. I agree. Without shared MMUs this doesn't really need to be sequence, but better just to backport this change, which has seen quite some testing, than creating yet another, slightly different, version of this function in the stable branches. Regards, Lucas