cc'ing some more people. On Tue, 15 Sep 2020 at 23:07, Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote: > > Dear Andrew folks, dear Linux folks, > > > With Linux 5.9-rc4 on a Dell OptiPlex 5080 with Intel Core i7-10700 CPU > @ 2.90GHz, and external > > 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, > Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] (rev 87) > > running graphical demanding applications glmark2 [1] and the Phoronix > Test Suite [2] benchmark *pts/desktop-graphics* [3] > > $ git describe --tags > v10.0.0m1-13-g0b5ddc3c0 > > I got three general protection faults, and it restarted or froze (no > input devices working, screen froze and even network card (no ping)). > > Here the system restarted itself: > > > kernel: general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI > > kernel: CPU: 2 PID: 9702 Comm: glmark2 Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1 > > kernel: Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020 > > kernel: RIP: 0010:free_block+0xdc/0x1f0 > > Here it froze: > > > [14639.665745] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI > > [14639.675917] CPU: 15 PID: 23094 Comm: pvpython Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1 > > [14639.684431] Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020 > > [14639.691823] RIP: 0010:free_block+0xdc/0x1f0 > > Here it froze: > > > kernel: general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI > > kernel: CPU: 15 PID: 23094 Comm: pvpython Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1 > > kernel: Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020 > > kernel: RIP: 0010:free_block+0xdc/0x1f0 > > Running `scripts/decode_stacktrace.sh`: > > > linux-5.9_rc4-343.x86_64/source$ scripts/decode_stacktrace.sh vmlinux < optiplex-5080-linux-5.9-rc4-gp-pvpython.txt > > [14528.718656] cgroup: fork rejected by pids controller in /user.slice/user-5272.slice/session-c6.scope > > [14639.665745] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI > > [14639.675917] CPU: 15 PID: 23094 Comm: pvpython Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1 > > [14639.684431] Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020 > > [14639.691823] RIP: 0010:free_block (./include/linux/list.h:112 ./include/linux/list.h:135 ./include/linux/list.h:146 mm/slab.c:3336) > > [14639.696006] Code: 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 4c 01 e8 48 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 48 08 48 8b 50 10 4c 8d 78 08 <48> 89 51 08 48 89 0a 4c 89 da 48 2b 50 28 4c 89 60 08 48 89 68 10 > > All code > > ======== > > 0: 00 48 01 add %cl,0x1(%rax) > > 3: d0 48 c1 rorb -0x3f(%rax) > > 6: e8 0c 48 c1 e0 callq 0xffffffffe0c14817 > > b: 06 (bad) > > c: 4c 01 e8 add %r13,%rax > > f: 48 8b 50 08 mov 0x8(%rax),%rdx > > 13: 48 8d 4a ff lea -0x1(%rdx),%rcx > > 17: 83 e2 01 and $0x1,%edx > > 1a: 48 0f 45 c1 cmovne %rcx,%rax > > 1e: 48 8b 48 08 mov 0x8(%rax),%rcx > > 22: 48 8b 50 10 mov 0x10(%rax),%rdx > > 26: 4c 8d 78 08 lea 0x8(%rax),%r15 > > 2a:* 48 89 51 08 mov %rdx,0x8(%rcx) <-- trapping instruction > > 2e: 48 89 0a mov %rcx,(%rdx) > > 31: 4c 89 da mov %r11,%rdx > > 34: 48 2b 50 28 sub 0x28(%rax),%rdx > > 38: 4c 89 60 08 mov %r12,0x8(%rax) > > 3c: 48 89 68 10 mov %rbp,0x10(%rax) > > > > Code starting with the faulting instruction > > =========================================== > > 0: 48 89 51 08 mov %rdx,0x8(%rcx) > > 4: 48 89 0a mov %rcx,(%rdx) > > 7: 4c 89 da mov %r11,%rdx > > a: 48 2b 50 28 sub 0x28(%rax),%rdx > > e: 4c 89 60 08 mov %r12,0x8(%rax) > > 12: 48 89 68 10 mov %rbp,0x10(%rax) > > [14639.714747] RSP: 0018:ffffc9001c26fab8 EFLAGS: 00010046 > > [14639.719970] RAX: ffffea000d193600 RBX: 0000000080000000 RCX: dead000000000100 > > [14639.727099] RDX: dead000000000122 RSI: ffff88842d5f3ef0 RDI: ffff88842b440300 > > [14639.734225] RBP: dead000000000122 R08: ffffc9001c26fb30 R09: ffff88842b441280 > > [14639.741351] R10: 000000000000000f R11: ffff8883464d80c0 R12: dead000000000100 > > [14639.748477] R13: ffffea0000000000 R14: ffff88842d5f3ff0 R15: ffffea000d193608 > > [14639.755604] FS: 00007fd3b7e8f040(0000) GS:ffff88842d5c0000(0000) knlGS:0000000000000000 > > [14639.763692] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [14639.769430] CR2: 00007fd344233548 CR3: 00000002f46aa003 CR4: 00000000007706e0 > > [14639.776556] PKRU: 55555554 > > [14639.779265] Call Trace: > > [14639.781717] ___cache_free (mm/slab.c:3389 mm/slab.c:3455) > > [14639.785463] kfree (./arch/x86/include/asm/irqflags.h:41 ./arch/x86/include/asm/irqflags.h:84 mm/slab.c:3757) > > [14639.788432] kmem_freepages (mm/slab.h:266 mm/slab.h:437 mm/slab.c:1406) > > [14639.792093] slab_destroy (mm/slab.c:1631) > > [14639.795579] slabs_destroy (mm/slab.c:1639 (discriminator 12)) > > [14639.799152] ___cache_free (mm/slab.c:3406 mm/slab.c:3455) > > [14639.802902] ? _cond_resched (kernel/sched/core.c:6123) > > [14639.806650] kfree (./arch/x86/include/asm/irqflags.h:41 ./arch/x86/include/asm/irqflags.h:84 mm/slab.c:3757) > > [14639.809644] amdgpu_vram_mgr_del (drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:439) amdgpu > > [14639.814524] ttm_bo_cleanup_memtype_use (drivers/gpu/drm/ttm/ttm_bo.c:866 drivers/gpu/drm/ttm/ttm_bo.c:367) ttm > > [14639.819748] ttm_bo_put (./include/linux/dma-resv.h:226 drivers/gpu/drm/ttm/ttm_bo.c:612 ./include/linux/kref.h:65 drivers/gpu/drm/ttm/ttm_bo.c:624) ttm > > [14639.823768] amdgpu_bo_unref (drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:861) amdgpu > > [14639.828313] amdgpu_vm_free_table (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:953) amdgpu > > [14639.833293] amdgpu_vm_free_pts (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:975) amdgpu > > [14639.838097] amdgpu_vm_fini (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3119) amdgpu > > [14639.842727] amdgpu_driver_postclose_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:1116) amdgpu > > [14639.848387] drm_file_free.part.9 (drivers/gpu/drm/drm_file.c:292) drm > > [14639.853263] drm_release (./arch/x86/include/asm/atomic.h:123 ./include/asm-generic/atomic-instrumented.h:749 drivers/gpu/drm/drm_file.c:496) drm > > [14639.857183] __fput (fs/file_table.c:282) > > [14639.860238] task_work_run (kernel/task_work.c:143 (discriminator 1)) > > [14639.863811] exit_to_user_mode_prepare (./include/linux/tracehook.h:188 kernel/entry/common.c:163 kernel/entry/common.c:190) > > [14639.868602] syscall_exit_to_user_mode (./arch/x86/include/asm/atomic.h:29 ./include/asm-generic/atomic-instrumented.h:28 ./include/linux/jump_label.h:254 ./arch/x86/include/asm/nospec-branch.h:288 ./arch/x86/include/asm/entry-common.h:80 kernel/entry/common.c:131 kernel/entry/common.c:267) > > [14639.873304] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:125) > > [14639.878353] RIP: 0033:0x7fd3d715cb5f > > [14639.881925] Code: 20 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 bc fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 15 89 d7 89 44 24 0c e8 fe fb ff ff 8b 44 24 > > All code > > ======== > > 0: 20 00 and %al,(%rax) > > 2: f7 d8 neg %eax > > 4: 64 89 02 mov %eax,%fs:(%rdx) > > 7: b8 ff ff ff ff mov $0xffffffff,%eax > > c: c3 retq > > d: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > 13: 53 push %rbx > > 14: 89 fb mov %edi,%ebx > > 16: 48 83 ec 10 sub $0x10,%rsp > > 1a: e8 bc fb ff ff callq 0xfffffffffffffbdb > > 1f: 89 df mov %ebx,%edi > > 21: 89 c2 mov %eax,%edx > > 23: b8 03 00 00 00 mov $0x3,%eax > > 28: 0f 05 syscall > > 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction > > 30: 77 15 ja 0x47 > > 32: 89 d7 mov %edx,%edi > > 34: 89 44 24 0c mov %eax,0xc(%rsp) > > 38: e8 fe fb ff ff callq 0xfffffffffffffc3b > > 3d: 8b .byte 0x8b > > 3e: 44 rex.R > > 3f: 24 .byte 0x24 > > > > Code starting with the faulting instruction > > =========================================== > > 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax > > 6: 77 15 ja 0x1d > > 8: 89 d7 mov %edx,%edi > > a: 89 44 24 0c mov %eax,0xc(%rsp) > > e: e8 fe fb ff ff callq 0xfffffffffffffc11 > > 13: 8b .byte 0x8b > > 14: 44 rex.R > > 15: 24 .byte 0x24 > > [14639.900667] RSP: 002b:00007fff07ed2f40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 > > [14639.908229] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fd3d715cb5f > > [14639.915354] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008 > > [14639.922480] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000e > > [14639.929607] R10: 000000000000000c R11: 0000000000000293 R12: 0000000005168450 > > [14639.936732] R13: 0000000000000008 R14: 00000000007c8290 R15: 00007fff07ed31c0 > > [14639.943859] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp stp mrp llc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 amdgpu gpu_sched ttm input_leds x86_pkg_temp_thermal iosf_mbi led_class drm_kms_helper kvm_intel snd_hda_codec_hdmi drm snd_hda_intel intel_gtt snd_intel_dspcfg kvm fb_sys_fops syscopyarea snd_hda_codec snd_hda_core sysfillrect wmi_bmof sysimgblt snd_pcm irqbypass wmi snd_timer snd deflate iTCO_wdt soundcore iTCO_vendor_support crc32c_intel efi_pstore video pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables unix ipv6 autofs4 > > [14639.996237] ---[ end trace c4d9d5f7e4b117a6 ]--- > > [14640.705681] RIP: 0010:free_block (./include/linux/list.h:112 ./include/linux/list.h:135 ./include/linux/list.h:146 mm/slab.c:3336) > > [14640.709874] Code: 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 4c 01 e8 48 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 48 08 48 8b 50 10 4c 8d 78 08 <48> 89 51 08 48 89 0a 4c 89 da 48 2b 50 28 4c 89 60 08 48 89 68 10 > > All code > > ======== > > 0: 00 48 01 add %cl,0x1(%rax) > > 3: d0 48 c1 rorb -0x3f(%rax) > > 6: e8 0c 48 c1 e0 callq 0xffffffffe0c14817 > > b: 06 (bad) > > c: 4c 01 e8 add %r13,%rax > > f: 48 8b 50 08 mov 0x8(%rax),%rdx > > 13: 48 8d 4a ff lea -0x1(%rdx),%rcx > > 17: 83 e2 01 and $0x1,%edx > > 1a: 48 0f 45 c1 cmovne %rcx,%rax > > 1e: 48 8b 48 08 mov 0x8(%rax),%rcx > > 22: 48 8b 50 10 mov 0x10(%rax),%rdx > > 26: 4c 8d 78 08 lea 0x8(%rax),%r15 > > 2a:* 48 89 51 08 mov %rdx,0x8(%rcx) <-- trapping instruction > > 2e: 48 89 0a mov %rcx,(%rdx) > > 31: 4c 89 da mov %r11,%rdx > > 34: 48 2b 50 28 sub 0x28(%rax),%rdx > > 38: 4c 89 60 08 mov %r12,0x8(%rax) > > 3c: 48 89 68 10 mov %rbp,0x10(%rax) > > > > Code starting with the faulting instruction > > =========================================== > > 0: 48 89 51 08 mov %rdx,0x8(%rcx) > > 4: 48 89 0a mov %rcx,(%rdx) > > 7: 4c 89 da mov %r11,%rdx > > a: 48 2b 50 28 sub 0x28(%rax),%rdx > > e: 4c 89 60 08 mov %r12,0x8(%rax) > > 12: 48 89 68 10 mov %rbp,0x10(%rax) > > [14640.728612] RSP: 0018:ffffc9001c26fab8 EFLAGS: 00010046 > > [14640.733834] RAX: ffffea000d193600 RBX: 0000000080000000 RCX: dead000000000100 > > [14640.740962] RDX: dead000000000122 RSI: ffff88842d5f3ef0 RDI: ffff88842b440300 > > [14640.748092] RBP: dead000000000122 R08: ffffc9001c26fb30 R09: ffff88842b441280 > > [14640.755218] R10: 000000000000000f R11: ffff8883464d80c0 R12: dead000000000100 > > [14640.762348] R13: ffffea0000000000 R14: ffff88842d5f3ff0 R15: ffffea000d193608 > > [14640.769478] FS: 00007fd3b7e8f040(0000) GS:ffff88842d5c0000(0000) knlGS:0000000000000000 > > [14640.777558] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [14640.783327] CR2: 00007fd344233548 CR3: 00000002f46aa003 CR4: 00000000007706e0 > > [14640.790476] PKRU: 55555554 > > [14661.818409] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > [14661.824340] rcu: 6-...0: (1 GPs behind) idle=83a/1/0x4000000000000000 softirq=545426/545427 fqs=1448 > > [14661.833636] (detected by 10, t=21025 jiffies, g=3736877, q=2158) > > [14661.839726] Task dump for CPU 6: > > [14661.842952] task:kworker/6:2 state:R running task stack: 0 pid: 7383 ppid: 2 flags:0x00004008 > > [14661.852856] Workqueue: events cache_reap > > [14661.856779] Call Trace: > > [14661.859230] ? cache_reap (mm/slab.c:3978) > > [14661.862804] ? process_one_work (./arch/x86/include/asm/atomic.h:29 ./include/asm-generic/atomic-instrumented.h:28 ./include/linux/jump_label.h:254 ./include/linux/jump_label.h:264 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274) > > [14661.866987] ? cancel_delayed_work (kernel/workqueue.c:2358) > > [14661.871254] ? worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416) > > [14661.875087] ? cancel_delayed_work (kernel/workqueue.c:2358) > > [14661.879354] ? kthread (kernel/kthread.c:292) > > [14661.882756] ? kthread_use_mm (kernel/kthread.c:245) > > [14661.886589] ? ret_from_fork (arch/x86/entry/entry_64.S:294) > > [14726.905632] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > [14726.911561] rcu: 6-...0: (1 GPs behind) idle=83a/1/0x4000000000000000 softirq=545426/545427 fqs=1735 > > [14726.920856] (detected by 10, t=86112 jiffies, g=3736877, q=3398) > > [14726.926946] Task dump for CPU 6: > > [14726.930172] task:kworker/6:2 state:R running task stack: 0 pid: 7383 ppid: 2 flags:0x00004008 > > [14726.940076] Workqueue: events cache_reap > > [14726.943994] Call Trace: > > [14726.946445] ? cache_reap (mm/slab.c:3978) > > [14726.950019] ? process_one_work (./arch/x86/include/asm/atomic.h:29 ./include/asm-generic/atomic-instrumented.h:28 ./include/linux/jump_label.h:254 ./include/linux/jump_label.h:264 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274) > > [14726.954203] ? cancel_delayed_work (kernel/workqueue.c:2358) > > [14726.958470] ? worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416) > > [14726.962307] ? cancel_delayed_work (kernel/workqueue.c:2358) > > [14726.966575] ? kthread (kernel/kthread.c:292) > > [14726.969976] ? kthread_use_mm (kernel/kthread.c:245) > > [14726.973809] ? ret_from_fork (arch/x86/entry/entry_64.S:294) > > Is that a known issue? Reproducing the problem often takes several > hours, so some guidance on what to try would be great. > > > Kind regards, > > Paul > > > [1]: https://github.com/glmark2/glmark2 > [2]: https://phoronix-test-suite.com/ > [3]: https://openbenchmarking.org/suite/pts/desktop-graphics _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx