Hi, current linux kernel commit 90450a06162e ("Merge tag 'rcu-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks") and the attached config cause the following BUG when booting on a reference AMD Zen4 development server: [ 59.995717] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input4 [ 60.033135] ast 0000:c2:00.0: vgaarb: deactivate vga console [ 60.066230] ast 0000:c2:00.0: [drm] Using default configuration [ 60.070342] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0 [ 60.072843] ast 0000:c2:00.0: [drm] AST 2600 detected [ 60.072851] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter [ 60.099891] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [ 60.115780] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on minor 0 [ 60.135643] fbcon: astdrmfb (fb0) is primary device [ 60.135649] fbcon: Deferring console take-over [ 60.146162] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device [ 60.331802] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input5 [ 60.405807] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0 [ 60.423774] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input6 [ 60.443170] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1 [ 60.460675] ast 0000:c2:00.0: vgaarb: deactivate vga console [ 60.479996] ast 0000:c2:00.0: [drm] Using default configuration [ 60.486603] ast 0000:c2:00.0: [drm] AST 2600 detected [ 60.492249] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter [ 60.499732] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [ 60.508955] BUG: unable to handle page fault for address: ffff8881e98109f0 [ 60.516623] #PF: supervisor write access in kernel mode [ 60.522449] #PF: error_code(0x0002) - not-present page [ 60.528168] PGD 8dbc01067 P4D 8dbc01067 PUD 104c984067 PMD 104c837067 PTE 800ffffe167ef060 [ 60.537394] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI [ 60.543805] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.6.0+ #3 [ 60.552251] Hardware name: AMD Corporation ONYX/ONYX, BIOS ROX100AB 09/14/2023 [ 60.560309] Workqueue: events work_for_cpu_fn [ 60.565173] RIP: 0010:enqueue_timer (/home/amd/git/linux/./include/linux/list.h:1034 /home/amd/git/linux/kernel/time/timer.c:605) [ 60.570129] Code: 44 00 00 55 48 89 e5 41 55 49 89 cd 41 54 49 89 fc 53 48 89 f3 89 d6 48 8d 84 f7 b0 00 00 00 48 8b 08 48 89 0b 48 85 c9 74 04 <48> 89 59 08 48 89 18 48 89 43 08 49 8d 44 24 68 48 0f ab 30 8b 4b All code ======== 0: 44 00 00 add %r8b,(%rax) 3: 55 push %rbp 4: 48 89 e5 mov %rsp,%rbp 7: 41 55 push %r13 9: 49 89 cd mov %rcx,%r13 c: 41 54 push %r12 e: 49 89 fc mov %rdi,%r12 11: 53 push %rbx 12: 48 89 f3 mov %rsi,%rbx 15: 89 d6 mov %edx,%esi 17: 48 8d 84 f7 b0 00 00 lea 0xb0(%rdi,%rsi,8),%rax 1e: 00 1f: 48 8b 08 mov (%rax),%rcx 22: 48 89 0b mov %rcx,(%rbx) 25: 48 85 c9 test %rcx,%rcx 28: 74 04 je 0x2e 2a:* 48 89 59 08 mov %rbx,0x8(%rcx) <-- trapping instruction 2e: 48 8 31: 48 89 43 08 mov %rax,0x8(%rbx) 35: 49 8d 44 24 68 lea 0x68(%r12),%rax 3a: 48 0f ab 30 bts %rsi,(%rax) 3e: 8b .byte 0x8b 3f: 4b rex.WXB Code starting with the faulting instruction =========================================== 0: 48 89 59 08 mov %rbx,0x8(%rcx) 4: 48 89 18 mov %rbx,(%rax) 7: 48 89 43 08 mov %rax,0x8(%rbx) b: 49 8d 44 24 68 lea 0x68(%r12),%rax 10: 48 0f ab 30 bts %rsi,(%rax) 14: 8b .byte 0x8b 15: 4b rex.WXB [ 60.591081] RSP: 0018:ffffc900000dbbe0 EFLAGS: 00010086 [ 60.596908] RAX: ffff888fd59e31b8 RBX: ffff8881ec87c9e8 RCX: ffff8881e98109e8 [ 60.604866] RDX: 0000000000000099 RSI: 0000000000000099 RDI: ffff888fd59e2c40 [ 60.612826] RBP: ffffc900000dbbf8 R08: 0000000000000001 R09: ffff888fd59e2c40 [ 60.620787] R10: 000000000000550d R11: 0000000000000000 R12: ffff888fd59e2c40 [ 60.628748] R13: 00000000ffff1640 R14: 00000000ffff163c R15: 0000000000000000 [ 60.636706] FS: 0000000000000000(0000) GS:ffff888fd5800000(0000) knlGS:0000000000000000 [ 60.645732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.652141] CR2: ffff8881e98109f0 CR3: 00000008d5e3c003 CR4: 0000000000770ef0 [ 60.660101] PKRU: 55555554 [ 60.663114] Call Trace: [ 60.665838] <TASK> [ 60.668174] ? show_regs (/home/amd/git/linux/arch/x86/kernel/dumpstack.c:479) [ 60.671971] ? __die (/home/amd/git/linux/arch/x86/kernel/dumpstack.c:421 /home/amd/git/linux/arch/x86/kernel/dumpstack.c:434) [ 60.675375] ? page_fault_oops (/home/amd/git/linux/arch/x86/mm/fault.c:707) [ 60.679942] ? search_bpf_extables (/home/amd/git/linux/kernel/bpf/core.c:765) [ 60.684800] ? enqueue_timer (/home/amd/git/linux/./include/linux/list.h:1034 /home/amd/git/linux/kernel/time/timer.c:605) [ 60.689077] ? srso_alias_return_thunk (/home/amd/git/linux/arch/x86/lib/retpoline.S:181) [ 60.694422] ? search_exception_tables (/home/amd/git/linux/kernel/extable.c:64) [ 60.699571] ? srso_alias_return_thunk (/home/amd/git/linux/arch/x86/lib/retpoline.S:181) [ 60.704917] ? kernelmode_fixup_or_oops (/home/amd/git/linux/arch/x86/mm/fault.c:762) [ 60.710256] ? __bad_area_nosemaphore (/home/amd/git/linux/arch/x86/mm/fault.c:860) [ 60.715505] ? bad_area_nosemaphore (/home/amd/git/linux/arch/x86/mm/fault.c:867) [ 60.720364] ? do_kern_addr_fault (/home/amd/git/linux/arch/x86/mm/fault.c:1227) [ 60.725030] ? exc_page_fault (/home/amd/git/linux/arch/x86/mm/fault.c:1503 /home/amd/git/linux/arch/x86/mm/fault.c:1561) [ 60.729503] ? asm_exc_page_fault (/home/amd/git/linux/./arch/x86/include/asm/idtentry.h:570) [ 60.734174] ? enqueue_timer (/home/amd/git/linux/./include/linux/list.h:1034 /home/amd/git/linux/kernel/time/timer.c:605) [ 60.738453] __mod_timer (/home/amd/git/linux/kernel/time/timer.c:635 /home/amd/git/linux/kernel/time/timer.c:1131) [ 60.742439] ? local_clock_noinstr (/home/amd/git/linux/kernel/sched/clock.c:301) [ 60.747202] add_timer (/home/amd/git/linux/kernel/time/timer.c:1245) [ 60.750798] __queue_delayed_work (/home/amd/git/linux/kernel/workqueue.c:1962) [ 60.755463] queue_delayed_work_on (/home/amd/git/linux/kernel/workqueue.c:1987) [ 60.760226] drm_kms_helper_poll_enable (/home/amd/git/linux/drivers/gpu/drm/drm_probe_helper.c:310) drm_kms_helper [ 60.767229] drm_kms_helper_poll_init (/home/amd/git/linux/drivers/gpu/drm/drm_probe_helper.c:914) drm_kms_helper [ 60.773936] ast_mode_config_init (/home/amd/git/linux/drivers/gpu/drm/ast/ast_mode.c:1931) ast [ 60.779382] ast_device_create (/home/amd/git/linux/drivers/gpu/drm/ast/ast_main.c:518) ast [ 60.784533] ast_pci_probe (/home/amd/git/linux/drivers/gpu/drm/ast/ast_drv.c:106) ast [ 60.789107] local_pci_probe (/home/amd/git/linux/drivers/pci/pci-driver.c:324) [ 60.793292] work_for_cpu_fn (/home/amd/git/linux/kernel/workqueue.c:5621) [ 60.797471] process_one_work (/home/amd/git/linux/kernel/workqueue.c:2630) [ 60.801941] ? process_one_work (/home/amd/git/linux/kernel/workqueue.c:2605) [ 60.806608] worker_thread (/home/amd/git/linux/kernel/workqueue.c:2697 /home/amd/git/linux/kernel/workqueue.c:2784) [ 60.810790] ? __pfx_worker_thread (/home/amd/git/linux/kernel/workqueue.c:2730) [ 60.815554] kthread (/home/amd/git/linux/kernel/kthread.c:388) [ 60.819151] ? __pfx_kthread (/home/amd/git/linux/kernel/kthread.c:341) [ 60.823331] ret_from_fork (/home/amd/git/linux/arch/x86/kernel/process.c:147) [ 60.827318] ? __pfx_kthread (/home/amd/git/linux/kernel/kthread.c:341) [ 60.831498] ret_from_fork_asm (/home/amd/git/linux/arch/x86/entry/entry_64.S:250) [ 60.835878] </TASK> [ 60.838309] Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 ast(+) i2c_algo_bit drm_shmem_helper hid_generic(+) drm_kms_helper uas dax_hmem nvme usbhid usb_storage drm hid ahci(+) libahci i2c_piix4 nvme_core wmi aesni_intel crypto_simd cryptd [ 60.867920] CR2: ffff8881e98109f0 [ 60.871616] ---[ end trace 0000000000000000 ]--- drivers/gpu/drm/drm_probe_helper.c:310 is the dev->mode_config.poll_running assignment here: void drm_kms_helper_poll_enable(struct drm_device *dev) { if (!dev->mode_config.poll_enabled || !drm_kms_helper_poll || dev->mode_config.poll_running) return; if (drm_kms_helper_enable_hpd(dev) || dev->mode_config.delayed_event) reschedule_output_poll_work(dev); dev->mode_config.poll_running = true; <<<<< HERE } EXPORT_SYMBOL(drm_kms_helper_poll_enable); If I revert commit f81bb0ac7872893241319ea82504956676ef02fd ("drm/ast: report connection status on Display Port."), the splat goes away: [ 60.603837] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input4 [ 60.651733] ast 0000:c2:00.0: vgaarb: deactivate vga console [ 60.659978] 4k 16711104 large 0 gb 0 x 1303[ffff888000097000-ffff8880a7ffe000] miss 383488 [ 60.669321] ok. [ 60.670497] ast 0000:c2:00.0: [drm] Using default configuration [ 60.677894] ast 0000:c2:00.0: [drm] AST 2600 detected [ 60.683545] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter [ 60.685381] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0 [ 60.691032] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [ 60.697172] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on minor 0 [ 60.729565] fbcon: astdrmfb (fb0) is primary device [ 60.729570] fbcon: Deferring console take-over [ 60.741322] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device [ 60.928226] ast 0000:c2:00.0: vgaarb: deactivate vga console [ 60.940376] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.0/0003:1D6B:0104.0002/input/input5 [ 60.965436] ast 0000:c2:00.0: [drm] Using default configuration [ 60.972051] ast 0000:c2:00.0: [drm] AST 2600 detected [ 60.977698] ast 0000:c2:00.0: [drm] Using ASPEED DisplayPort transmitter [ 60.985181] ast 0000:c2:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [ 61.000056] [drm] Initialized ast 0.1.0 20120228 for 0000:c2:00.0 on minor 0 [ 61.013486] fbcon: Deferring console take-over [ 61.016918] hid-generic 0003:1D6B:0104.0002: input,hidraw0: USB HID v1.01 Keyboard [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input0 [ 61.018454] ast 0000:c2:00.0: [drm] fb0: astdrmfb frame buffer device [ 61.040853] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input6 [ 61.059112] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1 [ 61.358397] input: OpenBMC virtual_input as /devices/pci0000:00/0000:00:07.1/0000:02:00.4/usb3/3-2/3-2.6/3-2.6:1.1/0003:1D6B:0104.0004/input/input7 [ 61.376885] hid-generic 0003:1D6B:0104.0004: input,hidraw1: USB HID v1.01 Mouse [OpenBMC virtual_input] on usb-0000:02:00.4-2.6/input1 This has happened before when drm_kms_helper_poll_init() was added to an ast connector_init(), see: commit 595cb5e0b832a3e100cbbdefef797b0c27bf725a Author: Kim Phillips <kim.phillips@xxxxxxx> Date: Thu Oct 21 10:30:06 2021 -0500 Revert "drm/ast: Add detect function support" I'm willing to test any proposed changes, esp. if it means not reverting this commit, too, because that will only likely lead to yet another BUG instance if/when another poll_init() gets added in the future. Should the FIXME described in reschedule_output_poll_work() be addressed? Thanks, Kim
Attachment:
ast-splat.config.gz
Description: application/gzip