On Sat, Apr 01, 2017 at 05:48:55PM -0700, Matt Turner wrote: > On Wed, Mar 29, 2017 at 8:56 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may > > use to indicate that it wants the contents of this buffer preserved in > > the error state (/sys/class/drm/cardN/error) following a GPU hang > > involving this batch. > > > > Use this at your discretion, the contents of the error state. although > > compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all > > eternity (until the error state is destroyed). > > > > Based on an earlier patch by Ben Widawsky <ben@xxxxxxxxxxxx> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Ben Widawsky <ben@xxxxxxxxxxxx> > > Cc: Matt Turner <mattst88@xxxxxxxxx> > > Acked-by: Ben Widawsky <ben@xxxxxxxxxxxx> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > --- > > Thank you, Chris. With this in place (and a few patches from Ben > rebased for libdrm and Mesa) I can disassemble the shader program from > an error state. > > In this case, I turned off the end-of-thread bit on the sendc in order > to cause a hang: > > render ring --- user = 0x00000000 fff75000 > pln(8) g124<1>F g4<0,1,0>F g2<8,8,1>F { > align1 1Q compacted }; > pln(8) g125<1>F g4.4<0,1,0>F g2<8,8,1>F { > align1 1Q compacted }; > pln(8) g126<1>F g5<0,1,0>F g2<8,8,1>F { > align1 1Q compacted }; > pln(8) g127<1>F g5.4<0,1,0>F g2<8,8,1>F { > align1 1Q compacted }; > sendc(8) null<1>UW g124<8,8,1>F > render RT write SIMD8 LastRT Surface = 0 > mlen 4 rlen 0 { align1 1Q }; > nop ; > pln(16) g120<1>F g6<0,1,0>F g2<8,8,1>F { > align1 1H compacted }; > pln(16) g122<1>F g6.4<0,1,0>F g2<8,8,1>F { > align1 1H compacted }; > pln(16) g124<1>F g7<0,1,0>F g2<8,8,1>F { > align1 1H compacted }; > pln(16) g126<1>F g7.4<0,1,0>F g2<8,8,1>F { > align1 1H compacted }; > sendc(16) null<1>UW g120<8,8,1>F > render RT write SIMD16 LastRT Surface = 0 > mlen 8 rlen 0 { align1 1H }; > illegal(1) { align1 1N }; > > Presumably we would like to save more than just instruction buffers. > Do we have a good way of discerning what each blob of data in the > error state is? The prechosen set are named (batch, ring, HW context, HW status, semaphore). The user ones just have a nondescript 'user'. My thinking was that either there would be an additional debug only (aub-esque) buffer added to the execbuf that contained all the useful info to index the other buffers captured, or userspace puts a header/footer into its captured batches. I did consider the possibility of adding a tag through the execobject, maybe 8-bits inside flags, but I prefer the approach of embedding information into the buffers (much more flexibile). It is also possible to take the simulator route and decode the buffers according to the current GPU state, the link between relocation addresses and buffer address should be sufficient? -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx