That was quick! :) On 2016-06-18 11:28, Nicolai Hähnle wrote: > > Since you've tried a lot of kernel variations, I'm tempted to look for > the problem in Mesa. A couple of things you could try: > > 1) Run R600_DEBUG=testdma,check_vm glxgears (or any other GL app, > really). This executes a DMA self-test. Observe whether there are any > failures and whether you get VM faults associated to the run in dmesg. > (The self-test runs indefinitely, until you Ctrl+C out of it.) last line before ctrl+c: 342: dst = ( 80 x 104 x 1, 2D_TILED_THIN1), src = ( 1164 x 1940 x 1, 2D_TILED_THIN1), bpp = 16, BLITs: GFX = 30, DMA = 0, pass [343/343] It didn't seem to cause any issues, no messages in dmesg... > 2) Start your desktop session with R600_DEBUG=nodma and see if that > makes the VM faults go away. (Please make sure that the environment > variable actually makes it through, by looking at /proc/$pid/environ, > where $pid is the PID of kwin and other relevant processes.) It set it globally, and I could see krunner's environ-file containing R600_DEBUG=nodma. Still corruption, graphical lock up and this output from dmesg after starting dolphin: [ 1188.562864] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714 [ 1188.562870] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00101508 [ 1188.562872] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.562875] VM fault (0x14, vmid 6) at page 1053960, write from 'SDM0' (0x53444d30) (183) [ 1188.562879] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714 [ 1188.562881] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010151F [ 1188.562883] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.562885] VM fault (0x14, vmid 6) at page 1053983, write from 'SDM0' (0x53444d30) (183) [ 1188.565159] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714 [ 1188.565165] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00101508 [ 1188.565168] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.565170] VM fault (0x14, vmid 6) at page 1053960, write from 'SDM0' (0x53444d30) (183) [ 1188.565176] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714 [ 1188.565178] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010150A [ 1188.565180] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.565182] VM fault (0x14, vmid 6) at page 1053962, write from 'SDM0' (0x53444d30) (183) [ 1188.565187] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714 [ 1188.565189] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00101508 [ 1188.565191] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.565193] VM fault (0x14, vmid 6) at page 1053960, write from 'SDM0' (0x53444d30) (183) [ 1188.572882] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714 [ 1188.572887] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00101508 [ 1188.572888] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.572890] VM fault (0x14, vmid 6) at page 1053960, write from 'SDM0' (0x53444d30) (183) [ 1188.572895] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714 [ 1188.572896] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010150A [ 1188.572897] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.572898] VM fault (0x14, vmid 6) at page 1053962, write from 'SDM0' (0x53444d30) (183) [ 1188.572902] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714 [ 1188.572903] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00101508 [ 1188.572904] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014 [ 1188.572905] VM fault (0x14, vmid 6) at page 1053960, write from 'SDM0' (0x53444d30) (183) > 3) Do dolphin and konsole use OpenGL directly in your setting, or is it > just the compositor? > I don't think they're special...? I wouldn't know where to setup that kind of setting, so I'm guessing it's the compositor. > 4) Something else I notice is that the page numbers of the VM faults > are of the form 0x001xxxxx. This suggest a 32-bit address underflow, > i.e. an address wraps around to a very large 32-bit number. Could you > please install a version of Mesa with assertions enabled > (--enable-debug in ./configure does the trick) and see if some check is > triggered? I'll do this next, it takes a while to build so I'll reply as soon as I have it :) It is a 64 bit system though, but I have both 64bit libs and 32bits libs installed (I can't think of anything that should be running that would be 32-bit...) Thanks for help! - Mads