On 18.06.2016 13:56, Mads wrote: > I removed the global env R600_DEBUG=nodma before this test, didn't seem > to matter anyway... > > On 2016-06-18 13:36, Nicolai Hähnle wrote: > >> A sanity check is `grep radeonsi /proc/$pid/maps` -- if something >> shows up, the driver was loaded into the process. > > dolphin has pid 560: > > $ grep radeonsi /proc/560/maps > 7f7e70906000-7f7e7100a000 r-xp 00000000 00:0e 2125313 > /usr/lib64/mesa/radeonsi_dri.so > 7f7e7100a000-7f7e71043000 rw-p 00703000 00:0e 2125313 > /usr/lib64/mesa/radeonsi_dri.so > > So that's something, I guess... > > So, newly compiled mesa from git with assertions/debug enabled: > > $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin > libGL: pci id for fd 9: 1002:9874, driver radeonsi > libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so > libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so > libGL: Using DRI3 for screen 0 > Trying to convert empty KLocalizedString to QString. > Cannot creat accessible child interface for object: > PlacesView(0xb7adc0) index: 4 > QPixmap::scaled: Pixmap is a null pixmap > QPixmap::scaled: Pixmap is a null pixmap > (... repeating a few times, guessing there's missing icons in the > themeset or something. dolphin itself does not crash...) Okay, so since dolphin uses OpenGL for rendering as well, the problem now is to figure out whether the VM fault comes from dolphin or from the compositor. There are two approaches. The first one is to just try your luck and capture an apitrace of dolphin, and then see whether playing that apitrace back also produces VM faults. If it does, great - upload the apitrace somewhere, and we can hopefully get it fixed. The second approach is to correlate the VM ID in > dmesg: > [ 78.873577] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714 > [ 78.873590] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x0010151C > [ 78.873592] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > 0x0D0B7014 > [ 78.873595] VM fault (0x14, vmid 6) at page 1053980, write from > 'SDM0' (0x53444d30) (183) with the running processes. This can be done via tracing. As root: echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable echo 1 > /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable cat /sys/kernel/debug/tracing/trace_pipe You'll get *lots* of output of the form compiz-2065 [000] .... 14927.891778: amdgpu_cs_ioctl: adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first ib=ffff8800923e0200, sched fence=ffff880068509b80, ring name:gfx, num_ibs:1 compiz-2065 [000] .... 14927.891782: amd_sched_job: entity=ffff88023258f030, sched job=ffff880110dab2a0, fence=ffff880068509b80, ring=gfx, job count:0, hw job count:0 gfx-172 [002] .... 14927.891802: amdgpu_sched_run_job: adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first ib=ffff8800923e0200, sched fence=ffff880068509b80, ring name:gfx, num_ibs:1 gfx-172 [002] .... 14927.891809: amdgpu_vm_grab_id: vmid=5, ring=0 In this particular case, compiz submitted a CS (command stream), which was then asynchronously sent and processed on the gfx ring with vmid=5. The idea is to correlate the timestamps with those of the VM fault to see which process is at fault. If you do this, please send a bit more log context in attachments, because asynchronous execution can occasionally make the logs difficult to interpret. Cheers, Nicolai > [ 78.873598] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08eab714 > [ 78.873600] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x0010151C > [ 78.873602] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > 0x0D0B7014 > [ 78.873604] VM fault (0x14, vmid 6) at page 1053980, write from > 'SDM0' (0x53444d30) (183) > [ 78.874141] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714 > [ 78.874148] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x0010151C > [ 78.874150] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > 0x0D0B7014 > [ 78.874154] VM fault (0x14, vmid 6) at page 1053980, write from > 'SDM0' (0x53444d30) (183) > [ 78.874158] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08eab714 > [ 78.874160] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x0010151C > [ 78.874162] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > 0x0D0B7014 > [ 78.874164] VM fault (0x14, vmid 6) at page 1053980, write from > 'SDM0' (0x53444d30) (183) > > - Mads