[AMD Official Use Only - General] Hi, It is fixed on https://patchwork.freedesktop.org/patch/542647/?series=119384&rev=2 Could you make sure if this patch is included. Thanks, Jiadong -----Original Message----- From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Mikhail Gavrilov Sent: Wednesday, June 21, 2023 3:38 PM To: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Linux List Kernel Mailing <linux-kernel@xxxxxxxxxxxxxxx> Subject: [6.4-rc7][regression] slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu] Hi, after commit 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 I see KASAN sanitizer bug message at every boot: Backtrace: [ 18.600551] ================================================================== [ 18.600558] BUG: KASAN: slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu] [ 18.600943] Write of size 8 at addr ffff8881e4d3a098 by task kworker/8:1/133 [ 18.600952] CPU: 8 PID: 133 Comm: kworker/8:1 Tainted: G W L ------- --- 6.4.0-0.rc7.53.fc39.x86_64+debug #1 [ 18.600960] Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY, BIOS G513QY.331 02/24/2023 [ 18.600966] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu] [ 18.601253] Call Trace: [ 18.601256] <TASK> [ 18.601260] dump_stack_lvl+0x76/0xd0 [ 18.601267] print_report+0xcf/0x670 [ 18.601275] ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu] [ 18.601573] ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu] [ 18.601865] kasan_report+0xa8/0xe0 [ 18.601870] ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu] [ 18.602163] amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu] [ 18.602455] gfx_v9_0_ring_emit_ib_gfx+0x4cc/0xd50 [amdgpu] [ 18.602767] ? amdgpu_sw_ring_ib_begin+0x1b4/0x3d0 [amdgpu] [ 18.603061] amdgpu_ib_schedule+0x7cb/0x1570 [amdgpu] [ 18.603354] gfx_v9_0_ring_test_ib+0x375/0x540 [amdgpu] [ 18.603656] ? __pfx_gfx_v9_0_ring_test_ib+0x10/0x10 [amdgpu] [ 18.603959] ? __pfx_lock_acquire+0x10/0x10 [ 18.603966] amdgpu_ib_ring_tests+0x2bc/0x490 [amdgpu] [ 18.604260] amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu] [ 18.604544] process_one_work+0x888/0x1460 [ 18.604551] ? worker_thread+0x2c8/0x12c0 [ 18.604555] ? __pfx_process_one_work+0x10/0x10 [ 18.604562] worker_thread+0x104/0x12c0 [ 18.604567] ? __kthread_parkme+0xc1/0x1f0 [ 18.604573] ? __pfx_worker_thread+0x10/0x10 [ 18.604577] kthread+0x2ee/0x3c0 [ 18.604581] ? __pfx_kthread+0x10/0x10 [ 18.604586] ret_from_fork+0x2c/0x50 [ 18.604593] </TASK> [ 18.604598] Allocated by task 466: [ 18.604601] kasan_save_stack+0x33/0x60 [ 18.604606] kasan_set_track+0x25/0x30 [ 18.604610] __kasan_kmalloc+0x8f/0xa0 [ 18.604614] __kmalloc+0x62/0x160 [ 18.604618] amdgpu_ring_mux_init+0x6e/0x1b0 [amdgpu] [ 18.604905] gfx_v9_0_sw_init+0xffe/0x2930 [amdgpu] [ 18.605197] amdgpu_device_init+0x3c36/0x7fc0 [amdgpu] [ 18.605476] amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu] [ 18.605753] amdgpu_pci_probe+0x279/0x9a0 [amdgpu] [ 18.606029] local_pci_probe+0xdd/0x190 [ 18.606034] pci_device_probe+0x23a/0x770 [ 18.606039] really_probe+0x3e2/0xb80 [ 18.606044] __driver_probe_device+0x18c/0x450 [ 18.606048] driver_probe_device+0x4a/0x120 [ 18.606052] __driver_attach+0x1e5/0x4a0 [ 18.606056] bus_for_each_dev+0x109/0x190 [ 18.606061] bus_add_driver+0x2a1/0x570 [ 18.606064] driver_register+0x134/0x460 [ 18.606069] do_one_initcall+0xd5/0x3b0 [ 18.606073] do_init_module+0x238/0x770 [ 18.606079] load_module+0x5581/0x6f10 [ 18.606082] __do_sys_init_module+0x1f2/0x220 [ 18.606086] do_syscall_64+0x60/0x90 [ 18.606091] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 18.606099] The buggy address belongs to the object at ffff8881e4d3a000 which belongs to the cache kmalloc-128 of size 128 [ 18.606106] The buggy address is located 24 bytes to the right of allocated 128-byte region [ffff8881e4d3a000, ffff8881e4d3a080) [ 18.606115] The buggy address belongs to the physical page: [ 18.606119] page:00000000024dbf3d refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1e4d3a [ 18.606126] head:00000000024dbf3d order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 18.606132] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [ 18.606138] page_type: 0xffffffff() [ 18.606143] raw: 0017ffffc0010200 ffff8881000428c0 dead000000000122 0000000000000000 [ 18.606148] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 18.606153] page dumped because: kasan: bad access detected [ 18.606159] Memory state around the buggy address: [ 18.606162] ffff8881e4d39f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 18.606167] ffff8881e4d3a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 18.606172] >ffff8881e4d3a080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 18.606176] ^ [ 18.606180] ffff8881e4d3a100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 18.606184] ffff8881e4d3a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 18.606189] ================================================================== [ 18.606201] Disabling lock debugging due to kernel taint >From bisect log: 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 is the first bad commit commit 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 Author: Jiadong Zhu <Jiadong.Zhu@xxxxxxx> Date: Thu May 25 18:42:15 2023 +0800 drm/amdgpu: Implement gfx9 patch functions for resubmission Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9 during the resubmission scenario. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@xxxxxxx> Acked-by: Alex Deucher <alexander.deucher@xxxxxxx> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Cc: stable@xxxxxxxxxxxxxxx # 6.3.x drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 80 +++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) Appears only on my laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007 (Radeon 6800M). I didn't see such a problem on the desktop Radeon 7900XTX and Radeon 6900XT. Is there anything else I can help with? -- Best Regards, Mike Gavrilov.