Hi, I occasionally see the below trace with Linus' master on an AMD Milan system: [ 25.657322] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 25.665097] #PF: supervisor instruction fetch in kernel mode [ 25.671448] #PF: error_code(0x0010) - not-present page [ 25.677178] PGD 0 P4D 0 [ 25.680000] Oops: 0010 [#1] PREEMPT SMP NOPTI [ 25.684857] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc3+ #71 [ 25.692043] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021 [ 25.701069] RIP: 0010:0x0 [ 25.703993] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 25.711661] RSP: 0018:ffffc90000003eb8 EFLAGS: 00010002 [ 25.717488] RAX: 0000000000000101 RBX: 0000000000000101 RCX: 0000000000000000 [ 25.725449] RDX: 00000000fffef080 RSI: 0000000000000000 RDI: ffff88817751e350 [ 25.733408] RBP: ffffc90000003ed8 R08: 0000000000000000 R09: 00000000000002e6 [ 25.741368] R10: ffff88810dc8b2d0 R11: ffff88bf4de290b0 R12: ffff88817751e350 [ 25.749327] R13: 0000000000000000 R14: ffff88bf4de18dc0 R15: ffff88817751e350 [ 25.757286] FS: 0000000000000000(0000) GS:ffff88bf4de00000(0000) knlGS:0000000000000000 [ 25.766312] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.772720] CR2: ffffffffffffffd6 CR3: 0000000166650002 CR4: 0000000000770ef0 [ 25.780754] PKRU: 55555554 [ 25.783768] Call Trace: [ 25.786492] <IRQ> [ 25.788729] call_timer_fn+0x2e/0x130 [ 25.792814] run_timer_softirq+0x361/0x4a0 [ 25.797381] ? lapic_next_event+0x21/0x30 [ 25.801852] ? clockevents_program_event+0x8f/0xe0 [ 25.807195] __do_softirq+0xfb/0x2db [ 25.811183] irq_exit_rcu+0x98/0xd0 [ 25.815073] sysvec_apic_timer_interrupt+0xac/0xd0 [ 25.820415] </IRQ> [ 25.822751] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 25.828481] RIP: 0010:native_safe_halt+0xb/0x10 [ 25.833533] Code: 08 74 81 eb c0 cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 68 60 00 f4 c3 0f 1f 44 00 00 eb 07 0f 00 2d 59 68 60 00 fb f4 <c3> cc cc cc cc 0f 1f 44 00 00 55 48 89 e5 53 e8 b1 17 ff ff 66 90 [ 25.854485] RSP: 0018:ffffffff82803e10 EFLAGS: 00000206 [ 25.860311] RAX: ffffffff81c06c60 RBX: 0000000000000000 RCX: 0000000000000000 [ 25.868270] RDX: 0000000000000000 RSI: ffffffff825afb2f RDI: ffffffff825bda91 [ 25.876229] RBP: ffffffff82803e18 R08: 00000066a1710f94 R09: 0000000000000001 [ 25.884188] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff82813940 [ 25.892146] R13: ffffffff82813940 R14: 0000000000000000 R15: 0000000000000000 [ 25.900098] ? __sched_text_end+0x6/0x6 [ 25.904462] ? default_idle+0xe/0x20 [ 25.908446] arch_cpu_idle+0x15/0x20 [ 25.912432] default_idle_call+0x33/0xf0 [ 25.916804] do_idle+0x209/0x270 [ 25.920402] cpu_startup_entry+0x20/0x30 [ 25.924774] rest_init+0xd4/0xe0 [ 25.928370] arch_call_rest_init+0xe/0x1b [ 25.932840] start_kernel+0x6bc/0x6e2 [ 25.936922] x86_64_start_reservations+0x24/0x26 [ 25.942069] x86_64_start_kernel+0x75/0x79 [ 25.946635] secondary_startup_64_no_verify+0xb0/0xbb [ 25.952270] Modules linked in: intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm ipmi_si(+) ipmi_devintf wmi_bmof rapl ipmi_msghandler input_leds ccp k10temp mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect hid_generic aesni_intel sysimgblt crypto_simd fb_sys_fops cryptd cec usbhid nvme ahci drm e1000e nvme_core libahci hid i2c_piix4 wmi [ 26.017199] CR2: 0000000000000000 [ 26.020893] ---[ end trace 4ffc93f5c298732e ]--- [ 26.026040] RIP: 0010:0x0 [ 26.028958] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 26.036626] RSP: 0018:ffffc90000003eb8 EFLAGS: 00010002 [ 26.042557] RAX: 0000000000000101 RBX: 0000000000000101 RCX: 0000000000000000 [ 26.050506] RDX: 00000000fffef080 RSI: 0000000000000000 RDI: ffff88817751e350 [ 26.058464] RBP: ffffc90000003ed8 R08: 0000000000000000 R09: 00000000000002e6 [ 26.066423] R10: ffff88810dc8b2d0 R11: ffff88bf4de290b0 R12: ffff88817751e350 [ 26.074380] R13: 0000000000000000 R14: ffff88bf4de18dc0 R15: ffff88817751e350 [ 26.082330] FS: 0000000000000000(0000) GS:ffff88bf4de00000(0000) knlGS:0000000000000000 [ 26.091356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.097765] CR2: ffffffffffffffd6 CR3: 0000000166650002 CR4: 0000000000770ef0 [ 26.105722] PKRU: 55555554 [ 26.108736] Kernel panic - not syncing: Fatal exception in interrupt [ 26.116349] Kernel Offset: disabled [ 26.120237] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Adding 'nokaslr' and CONFIG_DEBUG_OBJECTS_TIMERS=y didn't make it more reproducible, but CONFIG_KASAN=y changed the signature to this use-after-free that I used to drive bisect: [ 40.503396] ================================================================== [ 40.514471] input: HID 0b1f:03e9 as /devices/pci0000:20/0000:20:08.1/0000:22:00.3/usb1/1-2/1-2.2/1-2.2:1.1/0003:0B1F:03E9.0004/input/input6 [ 40.521823] BUG: KASAN: use-after-free in enqueue_timer+0x4f/0x1e0 [ 40.536634] hid-generic 0003:0B1F:03E9.0004: input,hidraw1: USB HID v1.00 Mouse [HID 0b1f:03e9] on usb-0000:22:00.3-2.2/input1 [ 40.542696] Write of size 8 at addr ffff8881feb80358 by task kworker/0:2/786 [ 40.542705] CPU: 0 PID: 786 Comm: kworker/0:2 Not tainted 5.15.0-rc4+ #105 [ 40.572608] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021 [ 40.581639] Workqueue: events work_for_cpu_fn [ 40.586507] Call Trace: [ 40.589234] dump_stack_lvl+0x4a/0x5f [ 40.593324] print_address_description.constprop.0+0x28/0x150 [ 40.599750] ? enqueue_timer+0x4f/0x1e0 [ 40.604032] kasan_report.cold+0x82/0xdb [ 40.608435] ? ast_pci_probe+0xb1/0x100 [ast] [ 40.613305] ? enqueue_timer+0x4f/0x1e0 [ 40.617582] __asan_store8+0x6c/0x90 [ 40.621570] enqueue_timer+0x4f/0x1e0 [ 40.625654] internal_add_timer+0x9b/0xd0 [ 40.630127] ? enqueue_timer+0x1e0/0x1e0 [ 40.634501] ? debug_smp_processor_id+0x17/0x20 [ 40.639557] ? get_nohz_timer_target+0x6f/0x290 [ 40.644610] ? lock_timer_base+0xb6/0xe0 [ 40.648986] add_timer+0x29a/0x3a0 [ 40.652781] ? run_timer_softirq+0xa40/0xa40 [ 40.657547] ? preempt_count_sub+0x18/0xc0 [ 40.662120] ? _raw_spin_unlock_irqrestore+0x22/0x40 [ 40.667670] ? drm_connector_list_iter_next+0x1bb/0x230 [drm] [ 40.674163] __queue_delayed_work+0xdc/0x140 [ 40.678932] queue_delayed_work_on+0x6c/0x90 [ 40.683698] drm_kms_helper_poll_enable.part.0+0xf5/0x140 [drm_kms_helper] [ 40.691411] ? drm_kms_helper_poll_fini+0x40/0x40 [drm_kms_helper] [ 40.698344] ? __queue_work+0x640/0x640 [ 40.702623] ? debug_object_init+0x16/0x20 [ 40.707195] drm_kms_helper_poll_init+0xb8/0xc0 [drm_kms_helper] [ 40.713968] ast_mode_config_init+0x5f8/0x800 [ast] [ 40.719419] ? ast_crtc_reset+0xa0/0xa0 [ast] [ 40.724287] ast_device_create.cold+0xab6/0xbdd [ast] [ 40.729931] ast_pci_probe+0xba/0x100 [ast] [ 40.734602] ? __pm_runtime_resume+0x65/0xb0 [ 40.739366] ? ast_pm_suspend+0x70/0x70 [ast] [ 40.739887] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 40.744231] local_pci_probe+0x80/0xd0 [ 40.744237] ? pci_device_shutdown+0x90/0x90 [ 40.753180] ata7.00: supports DRM functions and may not be fully accessible [ 40.755316] work_for_cpu_fn+0x2e/0x50 [ 40.760086] ata7.00: ATA-11: Samsung SSD 860 PRO 256GB, RVM01B6Q, max UDMA/133 [ 40.767845] process_one_work+0x4b9/0x7f0 [ 40.767852] worker_thread+0x47a/0x6b0 [ 40.772971] ata7.00: disabling queued TRIM support [ 40.780081] kthread+0x209/0x240 [ 40.780085] ? process_one_work+0x7f0/0x7f0 [ 40.780089] ? set_kthread_struct+0x80/0x80 [ 40.784562] ata7.00: 500118192 sectors, multi 1: LBA48 NCQ (depth 32), AA [ 40.788736] ret_from_fork+0x22/0x30 [ 40.796196] ata7.00: Features: Trust Dev-Sleep NCQ-sndrcv [ 40.802735] ata7.00: supports DRM functions and may not be fully accessible [ 40.807011] Allocated by task 786: [ 40.807015] kasan_save_stack+0x23/0x50 [ 40.815459] ata7.00: disabling queued TRIM support [ 40.818577] __kasan_kmalloc+0x83/0xa0 [ 40.818581] __kmalloc+0x2f7/0x4c0 [ 40.818585] __devm_drm_dev_alloc+0x2f/0xb0 [drm] [ 40.826718] ata7.00: configured for UDMA/133 [ 40.834032] ast_device_create+0x32/0x2c0 [ast] [ 40.838646] scsi 6:0:0:0: Direct-Access ATA Samsung SSD 860 1B6Q PQ: 0 ANSI: 5 [ 40.842107] ast_pci_probe+0xba/0x100 [ast] [ 40.842117] local_pci_probe+0x80/0xd0 [ 40.842121] work_for_cpu_fn+0x2e/0x50 [ 40.848592] ata7.00: Enabling discard_zeroes_data [ 40.851648] process_one_work+0x4b9/0x7f0 [ 40.855629] sd 6:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB) [ 40.860689] worker_thread+0x47a/0x6b0 [ 40.860694] kthread+0x209/0x240 [ 40.860696] ret_from_fork+0x22/0x30 [ 40.865913] sd 6:0:0:0: Attached scsi generic sg0 type 0 [ 40.870516] Freed by task 1190: [ 40.870519] kasan_save_stack+0x23/0x50 [ 40.870523] kasan_set_track+0x20/0x30 [ 40.871945] sd 6:0:0:0: [sda] Write Protect is off [ 40.871968] sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 40.872098] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 40.879881] ata7.00: disabled [ 40.884218] kasan_set_free_info+0x24/0x40 [ 40.884222] __kasan_slab_free+0xef/0x120 [ 40.884227] kfree+0xa1/0x3e0 [ 40.978619] drm_dev_release+0x58/0x60 [drm] [ 40.983446] devm_drm_dev_init_release+0x46/0x60 [drm] [ 40.989235] devm_action_release+0x2c/0x40 [ 40.993803] release_nodes+0x62/0x150 [ 40.997887] devres_release_all+0x10a/0x150 [ 41.002551] really_probe+0x177/0x660 [ 41.006636] __driver_probe_device+0x19e/0x230 [ 41.011592] driver_probe_device+0x4e/0xf0 [ 41.016161] __driver_attach+0x102/0x1d0 [ 41.020536] bus_for_each_dev+0xfb/0x150 [ 41.024911] driver_attach+0x2d/0x40 [ 41.028898] bus_add_driver+0x23f/0x300 [ 41.033177] driver_register+0xdc/0x170 [ 41.037455] __pci_register_driver+0xfc/0x110 [ 41.042314] ast_init+0x48/0x1000 [ast] [ 41.046598] do_one_initcall+0x99/0x2d0 [ 41.050877] do_init_module+0x10f/0x3c0 [ 41.055154] load_module+0x43fa/0x4710 [ 41.059336] __do_sys_finit_module+0x12a/0x1b0 [ 41.064291] __x64_sys_finit_module+0x43/0x50 [ 41.069149] do_syscall_64+0x3b/0xc0 [ 41.073137] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.080433] Last potentially related work creation: [ 41.085873] kasan_save_stack+0x23/0x50 [ 41.090151] kasan_record_aux_stack+0xac/0xc0 [ 41.095010] insert_work+0x37/0x170 [ 41.098899] __queue_work+0x208/0x640 [ 41.102982] queue_work_on+0x62/0x80 [ 41.106969] __drm_connector_put_safe+0x80/0xa0 [drm] [ 41.112675] drm_connector_list_iter_next+0x1a0/0x230 [drm] [ 41.118961] drm_mode_config_cleanup+0x12e/0x500 [drm] [ 41.124760] drm_mode_config_init_release+0xe/0x10 [drm] [ 41.130753] drm_managed_release+0x11c/0x240 [drm] [ 41.136164] drm_dev_release+0x46/0x60 [drm] [ 41.140992] devm_drm_dev_init_release+0x46/0x60 [drm] [ 41.146790] devm_action_release+0x2c/0x40 [ 41.151358] release_nodes+0x62/0x150 [ 41.155441] devres_release_all+0x10a/0x150 [ 41.160106] really_probe+0x177/0x660 [ 41.164190] __driver_probe_device+0x19e/0x230 [ 41.169147] driver_probe_device+0x4e/0xf0 [ 41.173716] __driver_attach+0x102/0x1d0 [ 41.178091] bus_for_each_dev+0xfb/0x150 [ 41.182466] driver_attach+0x2d/0x40 [ 41.186452] bus_add_driver+0x23f/0x300 [ 41.190729] driver_register+0xdc/0x170 [ 41.195006] __pci_register_driver+0xfc/0x110 [ 41.199868] ast_init+0x48/0x1000 [ast] [ 41.204151] do_one_initcall+0x99/0x2d0 [ 41.208429] do_init_module+0x10f/0x3c0 [ 41.212707] load_module+0x43fa/0x4710 [ 41.216888] __do_sys_finit_module+0x12a/0x1b0 [ 41.221843] __x64_sys_finit_module+0x43/0x50 [ 41.226704] do_syscall_64+0x3b/0xc0 [ 41.230691] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.237985] The buggy address belongs to the object at ffff8881feb80000 which belongs to the cache kmalloc-8k of size 8192 [ 41.251959] The buggy address is located 856 bytes inside of 8192-byte region [ffff8881feb80000, ffff8881feb82000) [ 41.265157] The buggy address belongs to the page: [ 41.270501] page:00000000eecae4f9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1feb80 [ 41.280985] head:00000000eecae4f9 order:3 compound_mapcount:0 compound_pincount:0 [ 41.289336] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [ 41.297595] raw: 0017ffffc0010200 0000000000000000 dead000000000122 ffff88810004d180 [ 41.306236] raw: 0000000000000000 0000000000020002 00000001ffffffff 0000000000000000 [ 41.314875] page dumped because: kasan: bad access detected [ 41.322749] Memory state around the buggy address: [ 41.328093] ffff8881feb80200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 41.336152] ffff8881feb80280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 41.344211] >ffff8881feb80300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 41.352267] ^ [ 41.359066] ffff8881feb80380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 41.367123] ffff8881feb80400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 41.375181] ================================================================== That bisection led to this commit: commit aae74ff9caa8de9a45ae2e46068c417817392a26 Author: Ainux <ainux.wang@xxxxxxxxx> Date: Wed May 26 19:15:15 2021 +0800 drm/ast: Add detect function support The existence of the connector cannot be detected, so add the detect function to support. Signed-off-by: Ainux <ainux.wang@xxxxxxxxx> Signed-off-by: Thomas Zimmermann <tzimmermann@xxxxxxx> Link: https://patchwork.freedesktop.org/patch/msgid/20210526111515.40015-1-ainux.wang@xxxxxxxxx drivers/gpu/drm/ast/ast_mode.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) I confirmed that if I revert it from v5.15-rc4 (after reverting its dependent 572994bf18ff "drm/ast: Zero is missing in detect function"), the problem goes away. Full .config, dmesg attached. I can test any possible fixes... Thanks, Kim
Attachment:
aae74ff9caa8-failure.config.xz
Description: application/xz
Attachment:
f6274b06e326.dmesg.xz
Description: application/xz