[Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup

bugzilla-daemon@xxxxxxxxxxxxxxx · Sun, 29 Apr 2018 22:23:24 +0000

     Maciej S. Szmigiero
 changed
          bug 105113

            What
            Removed
            Added

           CC

           mail@maciej.szmigiero.name

            Comment # 2
              on bug 105113
              from  Maciej S. Szmigiero

        I've also hit this issue on "Oland PRO [Radeon R7 240/340] (rev 87)" with
mesa-18.1.0_rc2, llvm-6.0.0 and kernel 4.16.5.

The crash happens at "cl/program/execute/calls-struct.cl" from piglit as well.
It happens both from a X session and from a KMS console.

The exact crash looks like this:
[  171.969488] radeon 0000:20:00.0: GPU fault detected: 147 0x06106001
[  171.969489] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500030
[  171.969490] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x10060001
[  171.969491] VM fault (0x01, vmid 8) at page 5242928, read from CB (96)

Then the radeon driver tries to reset the GPU endlessly.
I've tried pcie_gen2=0, msi=0, dpm=0, hard_reset=1, vm_size=16 in various
combinations, nothing seems to help (msi=0 gives a ton of IOMMU errors, BTW).

Also have tried amdgpu which gives a similar crash (it looks like this
driver didn't attempt to reset the GPU afterwards):
[  435.596230] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596233] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596235] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08060002
[  435.596239] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (96)
[  435.596245] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596247] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596248] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596252] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596256] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596258] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596260] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08010002
[  435.596263] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (16)
[  435.596267] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002
[  435.596269] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596271] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596274] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596278] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002

This might be (also?) a kernel bug since a userspace program should not
be able to crash a GPU, regardless how incorrect command stream it sends
to one.

      You are receiving this mail because:

          You are the assignee for the bug.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel