Another two spontaneous reboots today. Latest one occured whilst I was away from the computer, output is below. A different call trace this time. No response to my previous report, so adding some e-mail addresses from get_maintainer.pl / git blame. It's an old graphics card, and this time I note references to RAM and memory. Is there any possibility it's hardware? Is there a GPU equivalent to memtest86+ ? On Tue, 28 Jul 2020, "Alan J. Wylie" <alan@xxxxxxxxxxx> writes: > I've had several recent crashes of the nouveau kernel driver over the past > month or so. > > My suspicion is that Firefox is causing it. > > The screen goes black and then the computer reboots. > > Nothing much in the syslogs, however I've managed to get netconsole output. > > It happens very infrequently and I'm afraid I don't know how to reproduce it, > however I'll be more than happy to help by providing more information or > debugging. > > Hardware: > 01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 640] (rev a1) > > Kernel: > Linux frodo 5.7.10 #21 SMP PREEMPT Wed Jul 22 13:01:11 BST 2020 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux > > Software: > Recent Gentoo > Nightly Firefox. > > [I] media-libs/mesa (20.0.8@04/07/20): OpenGL-like graphic library for Linux > [I] x11-apps/mesa-progs (8.4.0@07/04/19): Mesa's OpenGL utility and demo programs (glxgears and glxinfo) > [I] x11-drivers/xf86-video-nouveau (1.0.16@17/06/20): Accelerated Open Source driver for nVidia cards > [I] x11-base/xorg-server (1.20.8-r1(0/1.20.8)@22/07/20): X.Org X servers > netconsole: BUG: unable to handle page fault for address: 000000010050786b #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 2 PID: 1084 Comm: X Not tainted 5.8.1 #25 Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F12 05/30/2012 RIP: 0010:__kmalloc+0xb1/0x2c0 Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202 RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932 RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20 RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052 R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480 FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0 Call Trace: nvif_object_init+0x7c/0x160 [nouveau] nvif_mem_init_type+0xc8/0x1b0 [nouveau] ? nvkm_vram_map+0x56/0x80 [nouveau] ? nvkm_uvmm_mthd+0x794/0x7c0 [nouveau] ? nvkm_vmm_get_locked+0x37f/0x540 [nouveau] nouveau_mem_vram+0xf1/0x1a0 [nouveau] nouveau_vram_manager_new+0x91/0xd0 [nouveau] ttm_bo_mem_space+0xd7/0x320 [ttm] ttm_bo_validate+0x12e/0x1a0 [ttm] ? drm_vma_offset_add+0x41/0x90 [drm] ? nv10_bo_put_tile_region+0x90/0x90 [nouveau] ttm_bo_init_reserved+0x2ad/0x320 [ttm] ttm_bo_init+0x89/0x100 [ttm] ? nv10_bo_put_tile_region+0x90/0x90 [nouveau] nouveau_bo_init+0xc1/0xf0 [nouveau] ? nv10_bo_put_tile_region+0x90/0x90 [nouveau] nouveau_gem_new+0xcf/0x120 [nouveau] ? nouveau_gem_new+0x120/0x120 [nouveau] nouveau_gem_ioctl_new+0x67/0xf0 [nouveau] ? nouveau_gem_new+0x120/0x120 [nouveau] drm_ioctl_kernel+0xcc/0x110 [drm] drm_ioctl+0x202/0x390 [drm] ? nouveau_gem_new+0x120/0x120 [nouveau] nouveau_drm_ioctl+0x91/0xd0 [nouveau] ksys_ioctl+0xa4/0xd0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x3e/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f71b5568dd7 Code: 00 00 90 48 8b 05 a9 40 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 40 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff1a291988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fff1a2919d0 RCX: 00007f71b5568dd7 RDX: 00007fff1a2919d0 RSI: 00000000c0306480 RDI: 000000000000000a RBP: 00000000c0306480 R08: 0000000000000000 R09: 00005575014822e0 R10: 00007f71b562d9e0 R11: 0000000000000246 R12: 00007fff1a2919d0 R13: 000000000000000a R14: 0000557500582e00 R15: 0000000000000000 Modules linked in: essiv authenc dm_crypt binfmt_misc netconsole configfs sha256_generic libsha256 cfg80211 8021q veth cpuid i2c_dev asus_atk0110 acpi_power_meter it87 hwmon_vid nouveau af_packet bridge stp evdev mxm_wmi llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic video snd_hda_intel ttm snd_intel_dspcfg drm_kms_helper snd_hda_codec snd_hda_core kvm_amd kvm snd_pcm syscopyarea snd_timer sysfillrect fam15h_power k10temp sysimgblt snd irqbypass fb_sys_fops soundcore i2c_piix4 wmi acpi_cpufreq softdog nfs nfsd auth_rpcgss lockd grace drm sunrpc drm_panel_orientation_quirks backlight agpgart usbhid ohci_pci ghash_clmulni_intel cryptd ehci_pci ohci_hcd sr_mod ehci_hcd cdrom xhci_pci xhci_hcd usbcore usb_common 8250 8250_base serial_core CR2: 000000010050786b ---[ end trace 67649d0c2234e455 ]--- RIP: 0010:__kmalloc+0xb1/0x2c0 Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202 RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932 RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20 RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052 R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480 FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x2b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Rebooting in 20 seconds.. And passed through decode_stacktrace.sh # uname -a Linux frodo 5.8.1 #25 SMP PREEMPT Tue Aug 11 19:47:00 BST 2020 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux # /work/src.git/linux-stable/scripts/decode_stacktrace.sh /work/src.git/linux-stable/arch/x86/boot/compressed/vmlinux /work/src.git/linux-stable/ /lib/modules/5.8.1 < ~alan/nouveau/bug.001 BUG: unable to handle page fault for address: 000000010050786b #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 2 PID: 1084 Comm: X Not tainted 5.8.1 #25 Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F12 05/30/2012 RIP: 0010:__kmalloc (??:?) Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b All code ======== 0: 89 c8 mov %ecx,%eax 2: 65 48 03 05 3f 29 df add %gs:0x53df293f(%rip),%rax # 0x53df2949 9: 53 a: 48 8b 70 08 mov 0x8(%rax),%rsi e: 48 39 f2 cmp %rsi,%rdx 11: 75 e7 jne 0xfffffffffffffffa 13: 4c 8b 28 mov (%rax),%r13 16: 4d 85 ed test %r13,%r13 19: 0f 84 d8 00 00 00 je 0xf7 1f: 41 8b 47 20 mov 0x20(%r15),%eax 23: 49 8b 3f mov (%r15),%rdi 26: 48 8d 4a 08 lea 0x8(%rdx),%rcx 2a:* 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx <-- trapping instruction 2f: 4c 89 e8 mov %r13,%rax 32: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) 37: 0f 94 c0 sete %al 3a: 84 c0 test %al,%al 3c: 74 b9 je 0xfffffffffffffff7 3e: 41 rex.B 3f: 8b .byte 0x8b Code starting with the faulting instruction =========================================== 0: 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx 5: 4c 89 e8 mov %r13,%rax 8: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) d: 0f 94 c0 sete %al 10: 84 c0 test %al,%al 12: 74 b9 je 0xffffffffffffffcd 14: 41 rex.B 15: 8b .byte 0x8b RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202 RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932 RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20 RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052 R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480 FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0 Call Trace: nvif_object_init (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvif/object.c:279) nouveau nvif_mem_init_type (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvif/mem.c:72) nouveau ? nvkm_vram_map (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.c:49) nouveau ? nvkm_uvmm_mthd (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:218 /work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:340) nouveau ? nvkm_vmm_get_locked (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c:1769 (discriminator 4)) nouveau nouveau_mem_vram (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_mem.c:155) nouveau nouveau_vram_manager_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_ttm.c:76 /work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_ttm.c:59) nouveau ttm_bo_mem_space (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1068) ttm ttm_bo_validate (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1142 /work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1218) ttm ? drm_vma_offset_add (/work/src.git/linux-stable/drivers/gpu/drm/drm_vma_manager.c:215) drm ? nv10_bo_put_tile_region (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:134) nouveau ttm_bo_init_reserved (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1335) ttm ttm_bo_init (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1369) ttm ? nv10_bo_put_tile_region (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:134) nouveau nouveau_bo_init (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:317) nouveau ? nv10_bo_put_tile_region (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:134) nouveau nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:203) nouveau ? nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:263) nouveau nouveau_gem_ioctl_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:272) nouveau ? nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:263) nouveau drm_ioctl_kernel (/work/src.git/linux-stable/drivers/gpu/drm/drm_ioctl.c:793) drm drm_ioctl (/work/src.git/linux-stable/./include/linux/thread_info.h:119 /work/src.git/linux-stable/./include/linux/thread_info.h:152 /work/src.git/linux-stable/./include/linux/uaccess.h:151 /work/src.git/linux-stable/drivers/gpu/drm/drm_ioctl.c:888) drm ? nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:263) nouveau nouveau_drm_ioctl (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_drm.c:1120) nouveau ksys_ioctl (??:?) __x64_sys_ioctl (??:?) do_syscall_64 (??:?) entry_SYSCALL_64_after_hwframe (??:?) RIP: 0033:0x7f71b5568dd7 Code: 00 00 90 48 8b 05 a9 40 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 40 0c 00 f7 d8 64 89 01 48 All code ======== 0: 00 00 add %al,(%rax) 2: 90 nop 3: 48 8b 05 a9 40 0c 00 mov 0xc40a9(%rip),%rax # 0xc40b3 a: 64 c7 00 26 00 00 00 movl $0x26,%fs:(%rax) 11: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 18: c3 retq 19: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 20: 00 00 00 23: b8 10 00 00 00 mov $0x10,%eax 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 01 jae 0x33 32: c3 retq 33: 48 8b 0d 79 40 0c 00 mov 0xc4079(%rip),%rcx # 0xc40b3 3a: f7 d8 neg %eax 3c: 64 89 01 mov %eax,%fs:(%rcx) 3f: 48 rex.W Code starting with the faulting instruction =========================================== 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 6: 73 01 jae 0x9 8: c3 retq 9: 48 8b 0d 79 40 0c 00 mov 0xc4079(%rip),%rcx # 0xc4089 10: f7 d8 neg %eax 12: 64 89 01 mov %eax,%fs:(%rcx) 15: 48 rex.W RSP: 002b:00007fff1a291988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fff1a2919d0 RCX: 00007f71b5568dd7 RDX: 00007fff1a2919d0 RSI: 00000000c0306480 RDI: 000000000000000a RBP: 00000000c0306480 R08: 0000000000000000 R09: 00005575014822e0 R10: 00007f71b562d9e0 R11: 0000000000000246 R12: 00007fff1a2919d0 R13: 000000000000000a R14: 0000557500582e00 R15: 0000000000000000 Modules linked in: essiv authenc dm_crypt binfmt_misc netconsole configfs sha256_generic libsha256 cfg80211 8021q veth cpuid i2c_dev asus_atk0110 acpi_power_meter it87 hwmon_vid nouveau af_packet bridge stp evdev mxm_wmi llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic video snd_hda_intel ttm snd_intel_dspcfg drm_kms_helper snd_hda_codec snd_hda_core kvm_amd kvm snd_pcm syscopyarea snd_timer sysfillrect fam15h_power k10temp sysimgblt snd irqbypass fb_sys_fops soundcore i2c_piix4 wmi acpi_cpufreq softdog nfs nfsd auth_rpcgss lockd grace drm sunrpc drm_panel_orientation_quirks backlight agpgart usbhid ohci_pci ghash_clmulni_intel cryptd ehci_pci ohci_hcd sr_mod ehci_hcd cdrom xhci_pci xhci_hcd usbcore usb_common 8250 8250_base serial_core CR2: 000000010050786b ---[ end trace 67649d0c2234e455 ]--- RIP: 0010:__kmalloc (??:?) Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b All code ======== 0: 89 c8 mov %ecx,%eax 2: 65 48 03 05 3f 29 df add %gs:0x53df293f(%rip),%rax # 0x53df2949 9: 53 a: 48 8b 70 08 mov 0x8(%rax),%rsi e: 48 39 f2 cmp %rsi,%rdx 11: 75 e7 jne 0xfffffffffffffffa 13: 4c 8b 28 mov (%rax),%r13 16: 4d 85 ed test %r13,%r13 19: 0f 84 d8 00 00 00 je 0xf7 1f: 41 8b 47 20 mov 0x20(%r15),%eax 23: 49 8b 3f mov (%r15),%rdi 26: 48 8d 4a 08 lea 0x8(%rdx),%rcx 2a:* 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx <-- trapping instruction 2f: 4c 89 e8 mov %r13,%rax 32: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) 37: 0f 94 c0 sete %al 3a: 84 c0 test %al,%al 3c: 74 b9 je 0xfffffffffffffff7 3e: 41 rex.B 3f: 8b .byte 0x8b Code starting with the faulting instruction =========================================== 0: 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx 5: 4c 89 e8 mov %r13,%rax 8: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) d: 0f 94 c0 sete %al 10: 84 c0 test %al,%al 12: 74 b9 je 0xffffffffffffffcd 14: 41 rex.B 15: 8b .byte 0x8b RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202 RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932 RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20 RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052 R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480 FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x2b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Rebooting in 20 seconds.. -- Alan J. Wylie https://www.wylie.me.uk/ Dance like no-one's watching. / Encrypt like everyone is. Security is inversely proportional to convenience _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel