[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Comment # 28 on bug 105733 from
(In reply to Andrey Grodzovsky from comment #25)

Still same issue happening here on both projects built from git. One issue here
which doesn't seem completely related:
Aug 23 20:41:20 archlinux kernel: ------------[ cut here ]------------
Aug 23 20:41:20 archlinux kernel: CPU update of VM recommended only for large
BAR system
Aug 23 20:41:20 archlinux kernel: WARNING: CPU: 5 PID: 1092 at
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2606 amdgpu_vm_init+0x477/0x490 [amdgpu]
Aug 23 20:41:20 archlinux kernel: Modules linked in: bnep nct6775 hwmon_vid
joydev btusb btrtl btbcm btintel bluetooth snd_usb_audio snd_usbmidi_lib
snd_rawmidi input_leds snd_seq_device ecdh_generic mousedev nls_iso8859_1
nls_cp437 vfat fat btrfs zstd_compress libcrc32c zstd_decompress xxhash xor
arc4 amdkfd amd_iommu_v2 amdgpu iwlmvm mac80211 edac_mce_amd led_class kvm_amd
iwlwifi snd_hda_codec_realtek chash gpu_sched kvm snd_hda_codec_hdmi
snd_hda_codec_generic ttm snd_hda_intel drm_kms_helper irqbypass snd_hda_codec
cfg80211 morus1280_avx2 drm morus1280_sse2 morus1280_glue morus640_sse2
morus640_glue snd_hda_core aegis256_aesni aegis128l_aesni aegis128_aesni igb
snd_hwdep crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm pcbc
snd_timer agpgart evdev ccp sp5100_tco aesni_intel snd syscopyarea i2c_algo_bit
sysfillrect
Aug 23 20:41:20 archlinux kernel:  aes_x86_64 wmi_bmof mac_hid crypto_simd
sysimgblt raid6_pq cryptd glue_helper fb_sys_fops soundcore k10temp i2c_piix4
dca rfkill rng_core wmi button acpi_cpufreq sch_fq_codel vboxnetflt(O)
vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) sg crypto_user ip_tables x_tables
ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod uas
usb_storage hid_uclogic hid_generic usbhid hid ahci libahci xhci_pci libata
crc32c_intel xhci_hcd usbcore scsi_mod usb_common
Aug 23 20:41:20 archlinux kernel: CPU: 5 PID: 1092 Comm: Xorg.wrap Tainted: G  
        O      4.18.0-rc1-5024f8dfe478 #1
Aug 23 20:41:20 archlinux kernel: Hardware name: To Be Filled By O.E.M. To Be
Filled By O.E.M./X370 Gaming-ITX/ac, BIOS P3.40 11/07/2017
Aug 23 20:41:20 archlinux kernel: RIP: 0010:amdgpu_vm_init+0x477/0x490 [amdgpu]
Aug 23 20:41:20 archlinux kernel: Code: b8 08 d8 ff ff e8 79 89 7c e8 e9 ee fe
ff ff 41 89 ef e9 e6 fe ff ff 48 c7 c7 08 65 f0 c0 c6 05 41 af 2b 00 01 e8 a3
8f 37 e8 <0f> 0b 0f b6 8b 60 01 00 00 e9 b4 fc ff ff e8 26 8d 37 e8 66 0f 1f 
Aug 23 20:41:20 archlinux kernel: RSP: 0018:ffffacc2c8df7b60 EFLAGS: 00010286
Aug 23 20:41:20 archlinux kernel: RAX: 0000000000000000 RBX: ffff9b10f7bf9000
RCX: 0000000000000006
Aug 23 20:41:20 archlinux kernel: RDX: 0000000000000007 RSI: 0000000000000002
RDI: ffff9b10fe7564d0
Aug 23 20:41:20 archlinux kernel: RBP: ffff9b10f5640000 R08: 0000001856da5330
R09: 0000000000000036
Aug 23 20:41:20 archlinux kernel: R10: 0000000000000424 R11: 000000000006ad48
R12: ffff9b10f7bf90b8
Aug 23 20:41:20 archlinux kernel: R13: 000000000000000a R14: 0000000000000000
R15: 0000000000000000
Aug 23 20:41:20 archlinux kernel: FS:  00007fcf6cc95500(0000)
GS:ffff9b10fe740000(0000) knlGS:0000000000000000
Aug 23 20:41:20 archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Aug 23 20:41:20 archlinux kernel: CR2: 00007fcf6cb1d960 CR3: 00000007e1190000
CR4: 00000000003406e0
Aug 23 20:41:20 archlinux kernel: Call Trace:
Aug 23 20:41:20 archlinux kernel:  ? ida_simple_get+0x91/0xf0
Aug 23 20:41:20 archlinux kernel:  amdgpu_driver_open_kms+0x83/0x1d0 [amdgpu]
Aug 23 20:41:20 archlinux kernel:  drm_open+0x20b/0x440 [drm]
Aug 23 20:41:20 archlinux kernel:  drm_stub_open+0xaf/0xf0 [drm]
Aug 23 20:41:20 archlinux kernel:  chrdev_open+0xa3/0x1b0
Aug 23 20:41:20 archlinux kernel:  ? cdev_put.part.3+0x20/0x20
Aug 23 20:41:20 archlinux kernel:  do_dentry_open+0x1ab/0x2d0
Aug 23 20:41:20 archlinux kernel:  path_openat+0x31b/0x1440
Aug 23 20:41:20 archlinux kernel:  ? alloc_set_pte+0x1fd/0x4e0
Aug 23 20:41:20 archlinux kernel:  do_filp_open+0x93/0x100
Aug 23 20:41:20 archlinux kernel:  ? __check_object_size+0x9c/0x171
Aug 23 20:41:20 archlinux kernel:  do_sys_open+0x186/0x210
Aug 23 20:41:20 archlinux kernel:  do_syscall_64+0x4e/0x100
Aug 23 20:41:20 archlinux kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 23 20:41:20 archlinux kernel: RIP: 0033:0x7fcf6cbbc452
Aug 23 20:41:20 archlinux kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48
8d 05 f5 70 0d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff
ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 
Aug 23 20:41:20 archlinux kernel: RSP: 002b:00007ffe9a15b0a0 EFLAGS: 00000246
ORIG_RAX: 0000000000000101
Aug 23 20:41:20 archlinux kernel: RAX: ffffffffffffffda RBX: 0000000000000000
RCX: 00007fcf6cbbc452
Aug 23 20:41:20 archlinux kernel: RDX: 0000000000000002 RSI: 00007ffe9a15b180
RDI: 00000000ffffff9c
Aug 23 20:41:20 archlinux kernel: RBP: 00007ffe9a15b130 R08: 0000000000000000
R09: 0000000000000000
Aug 23 20:41:20 archlinux kernel: R10: 0000000000000000 R11: 0000000000000246
R12: 00007ffe9a15b180
Aug 23 20:41:20 archlinux kernel: R13: 0000000000000000 R14: 0000000000000000
R15: 0000000000000000
Aug 23 20:41:20 archlinux kernel: ---[ end trace eb5bc55fd8b7f883 ]---


and then the issue OP posted too:


Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: GPU fault detected: 147
0x00a60401 for process payday2_release pid 6643 thread amdgpu_cs:0 pid 6644
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x06ABF814
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x2B004001
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: VM fault (0x01, vmid 5,
pasid 32776) at page 111933460, write from 'TC1' (0x54433100) (4)
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: GPU fault detected: 147
0x00a60401 for process payday2_release pid 6643 thread amdgpu_cs:0 pid 6644
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x06ABF814
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x2B004001
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: VM fault (0x01, vmid 5,
pasid 32776) at page 111933460, write from 'TC1' (0x54433100) (4)
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: GPU fault detected: 147
0x00a60401 for process payday2_release pid 6643 thread amdgpu_cs:0 pid 6644
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x06ABF814
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x23004001
Aug 23 19:40:06 archlinux kernel: amdgpu 0000:0d:00.0: VM fault (0x01, vmid 1,
pasid 32776) at page 111933460, write from 'TC1' (0x54433100) (4)
Aug 23 19:42:06 archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring gfx timeout, signaled seq=519868, emitted seq=519871
Aug 23 19:42:06 archlinux kernel: [drm] GPU recovery disabled.


Happens on pretty much any application using Vulkan after some time or Core
OpenGL applications too. Doesn't happen on normal desktop usage with Chrome.

Happens on 4.18.3 and these traces are from 4.18.0-rc1-5024f8dfe478
X370 chipset (like OP)
RX 480 (same as OP)
Ryzen 7 1700x
Mesa 18.1.6
xorg 1.20.1
i3wm


You are receiving this mail because:
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux