On 2016-06-18 14:30, Nicolai Hähnle wrote: > The second approach is to correlate the VM ID in > >> dmesg: >> [ 78.873577] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714 >> [ 78.873590] amdgpu 0000:00:01.0: >> VM_CONTEXT1_PROTECTION_FAULT_ADDR >> 0x0010151C >> [ 78.873592] amdgpu 0000:00:01.0: >> VM_CONTEXT1_PROTECTION_FAULT_STATUS >> 0x0D0B7014 >> [ 78.873595] VM fault (0x14, vmid 6) at page 1053980, write from >> 'SDM0' (0x53444d30) (183) > > with the running processes. This can be done via tracing. As root: > > echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable > echo 1 > > /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable > echo 1 > > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable > echo 1 > > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable > cat /sys/kernel/debug/tracing/trace_pipe > > You'll get *lots* of output of the form > > compiz-2065 [000] .... 14927.891778: amdgpu_cs_ioctl: > adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first > ib=ffff8800923e0200, sched fence=ffff880068509b80, ring name:gfx, > num_ibs:1 > compiz-2065 [000] .... 14927.891782: amd_sched_job: > entity=ffff88023258f030, sched job=ffff880110dab2a0, > fence=ffff880068509b80, ring=gfx, job count:0, hw job count:0 > gfx-172 [002] .... 14927.891802: amdgpu_sched_run_job: > adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first > ib=ffff8800923e0200, > sched fence=ffff880068509b80, ring name:gfx, > num_ibs:1 > gfx-172 [002] .... 14927.891809: amdgpu_vm_grab_id: > vmid=5, ring=0 > > In this particular case, compiz submitted a CS (command stream), which > was then asynchronously sent and processed on the gfx ring with vmid=5. > > The idea is to correlate the timestamps with those of the VM fault to > see which process is at fault. If you do this, please send a bit more > log context in attachments, because asynchronous execution can > occasionally make the logs difficult to interpret. > I made this script: > #!/bin/bash > echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable > echo 1 > > /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable > echo 1 > > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable > echo 1 > > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable > cat /sys/kernel/debug/tracing/trace_pipe >> carrizo.log & > catpid=$! > sudo -u htpc XAUTHORITY=/home/htpc/.Xauthority DISPLAY=:0 dolphin & > dolphinpid=$! > sleep 3 > echo 0 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable > echo 0 > > /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable > echo 0 > > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable > echo 0 > > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable > kill $catpid > kill $dolphinpid Attaching the tracelog and dmesg, hope you can make sense of it :) - Mads -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: carrizo.dmesg URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160620/7043a68a/attachment-0002.ksh> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: carrizo.log URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160620/7043a68a/attachment-0003.ksh>