---- Le lun., 04 déc. 2023 22:27:49 -0500 Dave Airlie a écrit ---- > On Mon, 4 Dec 2023 at 05:04, Paul Dufresne dufresnep@xxxxxxxx> wrote: > > > > In https://nouveau.freedesktop.org/KernelModuleParameters.html, there is: > > Here is a list of engines: > > DEVICE > > DMAOBJ ... > > PVP > > SW > > Also, in debug: > > CLIENT > > ... > > Also, my interest is linked to the state of GPU graph given after a context switch timeout that looks like: > > [ 1696.780305] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] > > [ 1696.780361] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6 > > [ 1696.780422] nouveau 0000:01:00.0: fifo:000000:07[ ce2]: 00050005: busy 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5 > > [ 1696.780476] nouveau 0000:01:00.0: fifo:000004:04[ ce0]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 > > [ 1696.780529] nouveau 0000:01:00.0: fifo:000001:01[ mspdec]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 > > [ 1696.780581] nouveau 0000:01:00.0: fifo:000002:02[ msppp]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 > > [ 1696.780633] nouveau 0000:01:00.0: fifo:000003:03[ msvld]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 > > [ 1696.780689] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6 > > [ 1696.780744] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6 > > [ 1696.780795] nouveau 0000:01:00.0: fifo:000000:00[ gr]: triggering mmu fault on 0x00 > > [ 1696.780835] nouveau 0000:01:00.0: fifo:000000:07[ ce2]: 00050005: busy 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5 > > [ 1696.780942] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 00000100: mmu fault triggered > > [ 1696.780987] nouveau 0000:01:00.0: fifo:000000:00[ gr]: c006e005: busy 1 faulted 1 chsw 1 save 1 load 1 chid 5*-> chid 6 > > [ 1696.781040] nouveau 0000:01:00.0: fifo:000000:0005:[Renderer[13701]] rc scheduled > > > > where I suspect ce2, is linked to PCE2. > > > > Is there a documentation that describes those "engines"? > > CE is copy engine. > But this looks like an mmu fault on the GPU side, so some shader is > doing something wrong most likely. > > Dave. > Sometimes the GPU mmu fault is on a gr engine, sometimes on ce2 engine. But the driver is stable when using nouveau.noaccel=1 (not seen other kind of errors too, like deadlock detections when using noaccel=1). Looking at the code, I begin to think that noaccel=0 allows for user-side channel creation, and so create the need for context switching... not sure.