Comment # 186
on bug 91880
from Chris Heald
I've been doing a lot of experimentation, and I've found a few more things that I feel are probably related: * I can force a system hard-lock by doing anything which disables a monitor. Notably, going full-screen under KDE/Xorg does this, but I can trigger it just as easily by disabling a monitor with xrandr. Fullscreen under gnome doesn't seem to trigger the issue, which I suspect is due to gnome's using mutter for screen management. * Occassioanlly, the system boots up and gets stuck with a 150MHz memory clock, rather than clocking up to the 1500MHz state. This causes the display corruption even if the sclk is set to 500MHz+. Setting the mclk mask manually fixes display corruption. * I've been experimenting with different kernels ranging from 4.4 to 4.16rc5. Earlier kernels feel more susceptible to hard-locking, though the later kernels aren't immune to it. * I tried a fresh Ubuntu 16.04 LTS install, and while it did NOT exhibit the artifacting behavior, the system hard-locked within a few minutes of light desktop usage. I've had a few classes of exceptions show up in kern.log: On 4.4, my kde/wayland session hard-froze when moving a window, and produced a log like this: kernel: [ 116.904013] radeon 0000:06:00.0: GPU fault detected: 146 0x0d8e040c kernel: [ 116.904017] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0001776C kernel: [ 116.904019] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E10400C kernel: [ 116.904021] VM fault (0x0c, vmid 7) at page 96108, read from 'TC3' (0x54433300) (260) kernel: [ 127.306156] radeon 0000:06:00.0: ring 0 stalled for more than 10404msec kernel: [ 127.306164] radeon 0000:06:00.0: GPU lockup (current fence id 0x0000000000002419 last fence id 0x0000000000002431 on ring 0) kernel: [ 127.357942] radeon 0000:06:00.0: Saved 2200 dwords of commands on ring 0. kernel: [ 127.357961] radeon 0000:06:00.0: GPU softreset: 0x00000009 kernel: [ 127.357963] radeon 0000:06:00.0: GRBM_STATUS=0xF5D01028 kernel: [ 127.357965] radeon 0000:06:00.0: GRBM_STATUS2=0x50000008 kernel: [ 127.357968] radeon 0000:06:00.0: GRBM_STATUS_SE0=0xEC400002 kernel: [ 127.357970] radeon 0000:06:00.0: GRBM_STATUS_SE1=0xEC400002 kernel: [ 127.357972] radeon 0000:06:00.0: GRBM_STATUS_SE2=0x08000002 kernel: [ 127.357974] radeon 0000:06:00.0: GRBM_STATUS_SE3=0xEC000002 kernel: [ 127.357976] radeon 0000:06:00.0: SRBM_STATUS=0x20000040 kernel: [ 127.357978] radeon 0000:06:00.0: SRBM_STATUS2=0x00000000 kernel: [ 127.357980] radeon 0000:06:00.0: SDMA0_STATUS_REG = 0x46CEE557 kernel: [ 127.357982] radeon 0000:06:00.0: SDMA1_STATUS_REG = 0x46CEE557 kernel: [ 127.357984] radeon 0000:06:00.0: CP_STAT = 0x84228600 kernel: [ 127.357986] radeon 0000:06:00.0: CP_STALLED_STAT1 = 0x00000c00 kernel: [ 127.357988] radeon 0000:06:00.0: CP_STALLED_STAT2 = 0x40000000 kernel: [ 127.357991] radeon 0000:06:00.0: CP_STALLED_STAT3 = 0x00000400 kernel: [ 127.357993] radeon 0000:06:00.0: CP_CPF_BUSY_STAT = 0x00000006 kernel: [ 127.357995] radeon 0000:06:00.0: CP_CPF_STALLED_STAT1 = 0x00000003 kernel: [ 127.357997] radeon 0000:06:00.0: CP_CPF_STATUS = 0x80000063 kernel: [ 127.357999] radeon 0000:06:00.0: CP_CPC_BUSY_STAT = 0x00000000 kernel: [ 127.358001] radeon 0000:06:00.0: CP_CPC_STALLED_STAT1 = 0x00000000 kernel: [ 127.358003] radeon 0000:06:00.0: CP_CPC_STATUS = 0x00000000 kernel: [ 127.358005] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 kernel: [ 127.358007] radeon 0000:06:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 kernel: [ 127.404670] radeon 0000:06:00.0: GRBM_SOFT_RESET=0x00010001 kernel: [ 127.404725] radeon 0000:06:00.0: SRBM_SOFT_RESET=0x00000100 kernel: [ 127.405874] radeon 0000:06:00.0: GRBM_STATUS=0x00003028 kernel: [ 127.405876] radeon 0000:06:00.0: GRBM_STATUS2=0x00000008 kernel: [ 127.405878] radeon 0000:06:00.0: GRBM_STATUS_SE0=0x00000006 kernel: [ 127.405880] radeon 0000:06:00.0: GRBM_STATUS_SE1=0x00000006 kernel: [ 127.405882] radeon 0000:06:00.0: GRBM_STATUS_SE2=0x00000006 kernel: [ 127.405884] radeon 0000:06:00.0: GRBM_STATUS_SE3=0x00000006 kernel: [ 127.405885] radeon 0000:06:00.0: SRBM_STATUS=0x20000A40 kernel: [ 127.405887] radeon 0000:06:00.0: SRBM_STATUS2=0x00000000 kernel: [ 127.405889] radeon 0000:06:00.0: SDMA0_STATUS_REG = 0x46CEE557 kernel: [ 127.405891] radeon 0000:06:00.0: SDMA1_STATUS_REG = 0x46CEE557 kernel: [ 127.405893] radeon 0000:06:00.0: CP_STAT = 0x00000000 kernel: [ 127.405893] radeon 0000:06:00.0: CP_STAT = 0x00000000 kernel: [ 127.405895] radeon 0000:06:00.0: CP_STALLED_STAT1 = 0x00000000 kernel: [ 127.405896] radeon 0000:06:00.0: CP_STALLED_STAT2 = 0x00000000 kernel: [ 127.405898] radeon 0000:06:00.0: CP_STALLED_STAT3 = 0x00000000 kernel: [ 127.405900] radeon 0000:06:00.0: CP_CPF_BUSY_STAT = 0x00000000 kernel: [ 127.405902] radeon 0000:06:00.0: CP_CPF_STALLED_STAT1 = 0x00000000 kernel: [ 127.405903] radeon 0000:06:00.0: CP_CPF_STATUS = 0x00000000 kernel: [ 127.405905] radeon 0000:06:00.0: CP_CPC_BUSY_STAT = 0x00000000 kernel: [ 127.405907] radeon 0000:06:00.0: CP_CPC_STALLED_STAT1 = 0x00000000 kernel: [ 127.405909] radeon 0000:06:00.0: CP_CPC_STATUS = 0x00000000 kernel: [ 127.405929] radeon 0000:06:00.0: GPU reset succeeded, trying to resume kernel: [ 127.658172] [drm:ci_dpm_enable [radeon]] *ERROR* ci_start_dpm failed kernel: [ 127.658189] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed kernel: [ 127.658194] [drm] probing gen 2 caps for device 1022:1453 = 733903/e kernel: [ 127.658197] [drm] PCIE gen 3 link speeds already enabled kernel: [ 127.664213] [drm] PCIE GART of 2048M enabled (table at 0x0000000000326000). kernel: [ 127.664341] radeon 0000:06:00.0: WB enabled kernel: [ 127.664344] radeon 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000200000c00 and cpu addr 0xffff8807f3799c00 kernel: [ 127.664346] radeon 0000:06:00.0: fence driver on ring 1 use gpu addr 0x0000000200000c04 and cpu addr 0xffff8807f3799c04 kernel: [ 127.664347] radeon 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000200000c08 and cpu addr 0xffff8807f3799c08 kernel: [ 127.664349] radeon 0000:06:00.0: fence driver on ring 3 use gpu addr 0x0000000200000c0c and cpu addr 0xffff8807f3799c0c kernel: [ 127.664350] radeon 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000200000c10 and cpu addr 0xffff8807f3799c10 kernel: [ 127.664772] radeon 0000:06:00.0: fence driver on ring 5 use gpu addr 0x0000000000078b30 and cpu addr 0xffffc90003c38b30 kernel: [ 127.664933] radeon 0000:06:00.0: fence driver on ring 6 use gpu addr 0x0000000200000c18 and cpu addr 0xffff8807f3799c18 kernel: [ 127.664934] radeon 0000:06:00.0: fence driver on ring 7 use gpu addr 0x0000000200000c1c and cpu addr 0xffff8807f3799c1c kernel: [ 127.666482] [drm] ring test on 0 succeeded in 2 usecs kernel: [ 127.666568] [drm] ring test on 1 succeeded in 2 usecs kernel: [ 127.666586] [drm] ring test on 2 succeeded in 2 usecs kernel: [ 127.666735] [drm] ring test on 3 succeeded in 3 usecs kernel: [ 127.666745] [drm] ring test on 4 succeeded in 3 usecs kernel: [ 127.692636] [drm] ring test on 5 succeeded in 1 usecs kernel: [ 127.712543] [drm] UVD initialized successfully. kernel: [ 127.813896] [drm] ring test on 6 succeeded in 708 usecs kernel: [ 127.813920] [drm] ring test on 7 succeeded in 3 usecs kernel: [ 127.813921] [drm] VCE initialized successfully. kernel: [ 127.814029] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed On 4.15.10-041510-generic, I left my computer running overnight and came back to it frozen with this in kern.log: Mar 18 04:25:10 Gaia kernel: [ 559.092721] BUG: stack guard page was hit at 000000001ecd1fa8 (stack is 0000000020941864..00000000cf703fbf) Mar 18 04:25:10 Gaia kernel: [ 559.092729] kernel stack overflow (page fault): 0000 [#1] SMP NOPTI Mar 18 04:25:10 Gaia kernel: [ 559.092733] Modules linked in: nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay xfrm_user xfrm4_tunnel tunnel4 l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel ipcomp xfrm_ipcomp udp_tunnel esp4 pppox ah4 af_key xfrm_algo xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib kvm_amd snd_hwdep kvm uvcvideo snd_seq_midi irqbypass snd_seq_midi_event snd_rawmidi crct10dif_pclmul videobuf2_vmalloc crc32_pclmul Mar 18 04:25:10 Gaia kernel: [ 559.092784] videobuf2_memops videobuf2_v4l2 snd_seq ghash_clmulni_intel videobuf2_core snd_pcm pcbc videodev snd_seq_device media snd_timer joydev aesni_intel aes_x86_64 snd crypto_simd input_leds glue_helper serio_raw soundcore cryptd ccp k10temp shpchp mac_hid wmi_bmof sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu chash radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 r8169 ahci mii libahci wmi gpio_amdpt gpio_generic Mar 18 04:25:10 Gaia kernel: [ 559.092832] CPU: 5 PID: 7352 Comm: tail Tainted: G W 4.15.10-041510-generic #201803152130 Mar 18 04:25:10 Gaia kernel: [ 559.092834] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F10 12/01/2017 Mar 18 04:25:10 Gaia kernel: [ 559.092881] RIP: 0010:amdgpu_get_pp_num_states+0x88/0x120 [amdgpu] Mar 18 04:25:10 Gaia kernel: [ 559.092884] RSP: 0018:ffffb3cb8a837ca8 EFLAGS: 00010282 Mar 18 04:25:10 Gaia kernel: [ 559.092888] RAX: 00000000000000d4 RBX: ffffb3cb8a837cac RCX: 0000000000000001 Mar 18 04:25:10 Gaia kernel: [ 559.092890] RDX: 0000000000000000 RSI: ffffffffc087a88c RDI: 0000000000000000 Mar 18 04:25:10 Gaia kernel: [ 559.092893] RBP: ffffb3cb8a837d20 R08: ffffffffc087a865 R09: ffff88c9ecebd98b Mar 18 04:25:10 Gaia kernel: [ 559.092895] R10: 0000000000000000 R11: ffff88c9ecebd98a R12: ffff88c9ecebd000 Mar 18 04:25:10 Gaia kernel: [ 559.092898] R13: ffffffffc087a858 R14: 00000000000000d4 R15: 0000000000000993 Mar 18 04:25:10 Gaia kernel: [ 559.092901] FS: 00007fccb1787540(0000) GS:ffff88c9fe740000(0000) knlGS:0000000000000000 Mar 18 04:25:10 Gaia kernel: [ 559.092904] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 18 04:25:10 Gaia kernel: [ 559.092906] CR2: ffffb3cb8a838000 CR3: 00000004a30d0000 CR4: 00000000003406e0 Mar 18 04:25:10 Gaia kernel: [ 559.092909] Call Trace: Mar 18 04:25:10 Gaia kernel: [ 559.092918] ? tty_insert_flip_string_fixed_flag+0x86/0xe0 Mar 18 04:25:10 Gaia kernel: [ 559.092925] dev_attr_show+0x23/0x60 Mar 18 04:25:10 Gaia kernel: [ 559.092931] sysfs_kf_seq_show+0xa3/0x130 Mar 18 04:25:10 Gaia kernel: [ 559.092935] kernfs_seq_show+0x27/0x30 Mar 18 04:25:10 Gaia kernel: [ 559.092939] seq_read+0xe5/0x430 Mar 18 04:25:10 Gaia kernel: [ 559.092943] kernfs_fop_read+0x137/0x180 Mar 18 04:25:10 Gaia kernel: [ 559.092948] __vfs_read+0x3a/0x170 Mar 18 04:25:10 Gaia kernel: [ 559.092954] ? security_file_permission+0xa1/0xc0 Mar 18 04:25:10 Gaia kernel: [ 559.092958] vfs_read+0x8e/0x130 Mar 18 04:25:10 Gaia kernel: [ 559.092962] SyS_read+0x55/0xc0 Mar 18 04:25:10 Gaia kernel: [ 559.092967] do_syscall_64+0x73/0x130 Mar 18 04:25:10 Gaia kernel: [ 559.092973] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Mar 18 04:25:10 Gaia kernel: [ 559.092976] RIP: 0033:0x7fccb12b5081 Mar 18 04:25:10 Gaia kernel: [ 559.092978] RSP: 002b:00007ffc17d84d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Mar 18 04:25:10 Gaia kernel: [ 559.092982] RAX: ffffffffffffffda RBX: 0000000000002000 RCX: 00007fccb12b5081 Mar 18 04:25:10 Gaia kernel: [ 559.092984] RDX: 0000000000002000 RSI: 00007ffc17d84db0 RDI: 0000000000000003 Mar 18 04:25:10 Gaia kernel: [ 559.092986] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fccb1313b40 Mar 18 04:25:10 Gaia kernel: [ 559.092988] R10: 00000000fffffff3 R11: 0000000000000246 R12: 00007ffc17d84db0 Mar 18 04:25:10 Gaia kernel: [ 559.092991] R13: 0000000000000003 R14: ffffffffffffffff R15: 000055e8f3b747e0 Mar 18 04:25:10 Gaia kernel: [ 559.092994] Code: c7 c2 7a a8 87 c0 be 00 10 00 00 4c 89 e7 e8 d0 08 90 d1 41 89 c7 8b 45 8c 85 c0 74 72 48 8d 5d 8c 45 31 f6 49 c7 c5 58 a8 87 c0 <42> 8b 44 b3 04 44 89 f1 4d 89 e8 83 f8 0a 74 2d 83 f8 02 49 c7 Mar 18 04:25:10 Gaia kernel: [ 559.093080] RIP: amdgpu_get_pp_num_states+0x88/0x120 [amdgpu] RSP: ffffb3cb8a837ca8 Mar 18 04:25:10 Gaia kernel: [ 559.093084] ---[ end trace dbba232a9ca4c5c7 ]--- Possibly related, if I `cat pp_num_states` from a terminal, I get a segmentation fault: root@Gaia:~# cat /sys/class/drm/card0/device/pp_num_states Segmentation fault I'm going to continue to dig. Let me know what logs/tests/whatnot I can provide that would be useful.
You are receiving this mail because:
- You are the assignee for the bug.
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel