On Fri, Oct 04, 2024 at 02:25:07AM +0500, Mikhail Gavrilov wrote: > On Thu, Oct 3, 2024 at 1:45 AM Mikhail Gavrilov > <mikhail.v.gavrilov@xxxxxxxxx> wrote: > > > > On Wed, Oct 2, 2024 at 10:56 PM Lorenzo Stoakes > > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and > > > CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more > > > quickly (let us know if you do). > > > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM_MAPLE_TREE' > > # CONFIG_DEBUG_VM_MAPLE_TREE is not set > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM' > > CONFIG_DEBUG_VM_IRQSOFF=y > > CONFIG_DEBUG_VM=y > > # CONFIG_DEBUG_VM_MAPLE_TREE is not set > > # CONFIG_DEBUG_VM_RB is not set > > CONFIG_DEBUG_VM_PGFLAGS=y > > CONFIG_DEBUG_VM_PGTABLE=y > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_MAPLE_TREE' > > # CONFIG_DEBUG_MAPLE_TREE is not set > > > > Fedora's kernel build uses only CONFIG_DEBUG_VM and it's enough for > > reproducing this issue. > > Anyway I enabled all three options. I'll try to live for a day without > > steam launching. In a day I'll write whether it is reproducing without > > steam or not. > > A day passed, and as expected, the problem did not occur until I launch Steam. > But with suggested options the stacktrace looks different. > Instead of "KASAN: slab-use-after-free in m_next+0x13b" I see this: > > [88841.586167] node00000000b4c54d84: data_end 9 != the last slot offset 8 Thanks, looking into the attached dmesg this looks to be identical to the issue that Bert reported in the other thread. The nature of it is that once the corruption happens 'weird stuff' will happen after this, luckily this debug mode lets us pick up on the original corruption. Bert is somehow luckily is able to reproduce very repeatably, so we have been able to get a lot more information, but it's taking time to truly narrow it down. Am working flat out to try to resolve the issue, we have before/after maple trees and it seems like a certain operation is resulting in a corrupted maple tree (duplicate 0x67ffffff entry). It is proving very very stubborn to be able to reproduce locally even in a controlled environment where the maple tree is manually set up, but am continuing my efforts to try to do so as best I can! :) Will respond here once we have a viable fix. Thanks again for taking the time to report and to grab the debug maple tree, it's very useful! Cheers, Lorenzo > [88841.586315] BUG at mas_validate_limits:7523 (1) > [88841.586320] maple_tree(0000000067811125) flags 30F, height 3 root > 0000000040e0c786 > [88841.586324] 0-ffffffffffffffff: node 000000009b462d47 depth 0 type > 3 parent 00000000db18456d contents: 10000 11400000 1e000 1f000 1f000 > 75e15000 0 0 0 ffffffff00283000 | 09 09| 000000005518cec0 67FFFFFF > 0000000085840a0a 79970FFF 00000000975349aa 79F50FFF 00000000afe6ddd8 > 7B140FFF 0000000083c903b1 7BB96FFF 00000000335e109c F605AFFF > 000000007e7333d1 F6570FFF 00000000d8e9900e F6C92FFF 00000000250ada8a > F76E1FFF 00000000e567baed > [88841.586357] 0-67ffffff: node 000000005c64e204 depth 1 type 3 > parent 0000000069e1180e contents: 10000 0 0 0 0 0 0 0 0 0 | 05 00| > 000000000cfac463 16FFFF 00000000f0522fec 400FFF 00000000cd8938b8 > 94FFFF 00000000d2bcb2e3 E9FFFF 00000000ed8d307e 173FFFF > 0000000056285bf1 67FFFFFF 0000000000000000 0 0000000000000000 0 > 0000000000000000 0 0000000000000000 > [88841.586388] 0-16ffff: node 0000000037648f62 depth 2 type 1 > parent 00000000978387fd contents: 0000000000000000 FFFF > 000000000bc2e123 10FFFF 0000000049345b43 11FFFF 000000008940e7cb > 126FFF 000000007c2365c0 12FFFF 00000000cfc1c890 142FFF > 00000000b64ae6ea 14FFFF 00000000f8f8f6c9 165FFF 000000008460c3ec > 16FFFF 0000000000000000 0 0000000000000000 0 0000000000000000 0 > 0000000000000000 0 0000000000000000 0 0000000000000000 0 > 000000009d394510 > [88841.586413] 0-ffff: 0000000000000000 > [88841.586417] 10000-10ffff: 000000000bc2e123 > [88841.586420] 110000-11ffff: 0000000049345b43 > [88841.586424] 120000-126fff: 000000008940e7cb > [88841.586428] 127000-12ffff: 000000007c2365c0 > [88841.586431] 130000-142fff: 00000000cfc1c890 > [88841.586435] 143000-14ffff: 00000000b64ae6ea > [88841.586438] 150000-165fff: 00000000f8f8f6c9 > [88841.586442] 166000-16ffff: 000000008460c3ec > [88841.586445] 170000-400fff: node 0000000030a5de34 depth 2 type 1 > parent 00000000161b9281 contents: 0000000090f8ff7b 171FFF > 00000000a90cdf09 17FFFF 00000000ad657f59 190FFF 0000000026397ca7 > 19FFFF 000000003413c0f4 1B0FFF 000000000ca6dd7d 1BFFFF > 00000000cf83b99b 1CEFFF 0000000096a06890 1CFFFF 00000000ed96cdbd > 1E5FFF 00000000e6e9d2cb 1EFFFF 00000000bc54b9f4 1FFFFF > 000000006e42b324 3DFFFF 00000000afd4728b 3FFFFF 0000000082572c0c > 400FFF 0000000000000000 0 00000000e89e29fc > [88841.586471] 170000-171fff: 0000000090f8ff7b > [88841.586474] 172000-17ffff: 00000000a90cdf09 > [88841.586478] 180000-190fff: 00000000ad657f59 > [88841.586481] 191000-19ffff: 0000000026397ca7 > [88841.586485] 1a0000-1b0fff: 000000003413c0f4 > [88841.586511] 1b1000-1bffff: 000000000ca6dd7d > [88841.586515] 1c0000-1cefff: 00000000cf83b99b > [88841.586519] 1cf000-1cffff: 0000000096a06890 > [88841.586522] 1d0000-1e5fff: 00000000ed96cdbd > [88841.586526] 1e6000-1effff: 00000000e6e9d2cb > [88841.586529] 1f0000-1fffff: 00000000bc54b9f4 > [88841.586533] 200000-3dffff: 000000006e42b324 > [88841.586537] 3e0000-3fffff: 00000000afd4728b > [88841.586540] 400000-400fff: 0000000082572c0c > [88841.586544] 401000-94ffff: node 00000000f4ffb374 depth 2 type 1 > parent 000000005fb58d4e contents: 000000004eafabe6 403FFF > 00000000104e2e73 404FFF 000000004dbe1ca9 406FFF 00000000ffb92c1b > 407FFF 00000000cffd3517 409FFF 000000009ef45250 40FFFF > 00000000373dd145 410FFF 00000000eaff67b3 50FFFF 000000002e632fe1 > 511FFF 000000001839285f 60FFFF 0000000043d54299 611FFF > 00000000da2961ba 80FFFF 00000000155e68ba 8C9FFF 0000000010bfe63e > 8CFFFF 00000000a4834cd3 94FFFF 000000000e628eae > [88841.586569] 401000-403fff: 000000004eafabe6 > [88841.586572] 404000-404fff: 00000000104e2e73 > [88841.586576] 405000-406fff: 000000004dbe1ca9 > [88841.586579] 407000-407fff: 00000000ffb92c1b > [88841.586583] 408000-409fff: 00000000cffd3517 > [88841.586586] 40a000-40ffff: 000000009ef45250 > [88841.586590] 410000-410fff: 00000000373dd145 > [88841.586594] 411000-50ffff: 00000000eaff67b3 > [88841.586597] 510000-511fff: 000000002e632fe1 > [88841.586601] 512000-60ffff: 000000001839285f > [88841.586604] 610000-611fff: 0000000043d54299 > [88841.586608] 612000-80ffff: 00000000da2961ba > [88841.586611] 810000-8c9fff: 00000000155e68ba > [88841.586615] 8ca000-8cffff: 0000000010bfe63e > [88841.586618] 8d0000-94ffff: 00000000a4834cd3 > *** > [88841.592355] Pass: 3886705433 Run:3886705434 > [88841.592359] CPU: 22 UID: 1000 PID: 273842 Comm: rundll32.exe > Tainted: G W L > 6.11.0-rc6-13b-f8d112a4e657c65c888e6b8a8435ef61a66e4ab8+ #720 > [88841.592364] Tainted: [W]=WARN, [L]=SOFTLOCKUP > [88841.592366] Hardware name: ASUS System Product Name/ROG STRIX > B650E-I GAMING WIFI, BIOS 3040 09/12/2024 > [88841.592369] Call Trace: > [88841.592372] <TASK> > [88841.592376] dump_stack_lvl+0x84/0xd0 > [88841.592384] mt_validate+0x2932/0x2980 > [88841.592397] ? __pfx_mt_validate+0x10/0x10 > [88841.592408] validate_mm+0xa5/0x310 > [88841.592414] ? __pfx_validate_mm+0x10/0x10 > [88841.592427] vms_complete_munmap_vmas+0x572/0x9b0 > [88841.592431] ? __pfx_mas_prev+0x10/0x10 > [88841.592438] mmap_region+0x10f9/0x24a0 > [88841.592447] ? __pfx_mmap_region+0x10/0x10 > [88841.592450] ? __pfx_mark_lock+0x10/0x10 > [88841.592459] ? mark_lock+0xf5/0x16d0 > [88841.592474] ? mm_get_unmapped_area_vmflags+0x48/0xc0 > [88841.592482] ? security_mmap_addr+0x57/0x90 > [88841.592487] ? __get_unmapped_area+0x191/0x2c0 > [88841.592492] do_mmap+0x8cf/0xff0 > [88841.592500] ? __pfx_do_mmap+0x10/0x10 > [88841.592503] ? down_write_killable+0x19d/0x280 > [88841.592506] ? __pfx_down_write_killable+0x10/0x10 > [88841.592513] vm_mmap_pgoff+0x178/0x2f0 > [88841.592521] ? __pfx_vm_mmap_pgoff+0x10/0x10 > [88841.592524] ? lockdep_hardirqs_on+0x7c/0x100 > [88841.592528] ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0 > [88841.592537] __do_fast_syscall_32+0x86/0x110 > [88841.592540] ? kfree+0x257/0x3a0 > [88841.592547] ? audit_reset_context+0x8c5/0xee0 > [88841.592555] ? lockdep_hardirqs_on_prepare+0x171/0x400 > [88841.592558] ? __do_fast_syscall_32+0x92/0x110 > [88841.592561] ? lockdep_hardirqs_on+0x7c/0x100 > [88841.592564] ? __do_fast_syscall_32+0x92/0x110 > [88841.592571] ? lockdep_hardirqs_on_prepare+0x171/0x400 > [88841.592574] ? __do_fast_syscall_32+0x92/0x110 > [88841.592577] ? lockdep_hardirqs_on+0x7c/0x100 > [88841.592580] ? __do_fast_syscall_32+0x92/0x110 > [88841.592583] ? audit_reset_context+0x8c5/0xee0 > [88841.592590] ? lockdep_hardirqs_on_prepare+0x171/0x400 > [88841.592593] ? __do_fast_syscall_32+0x92/0x110 > [88841.592596] ? lockdep_hardirqs_on+0x7c/0x100 > [88841.592600] ? rcu_is_watching+0x12/0xc0 > [88841.592603] ? trace_irq_disable.constprop.0+0xce/0x110 > [88841.592609] do_fast_syscall_32+0x32/0x80 > [88841.592612] entry_SYSCALL_compat_after_hwframe+0x75/0x75 > [88841.592616] RIP: 0023:0xf7f3e5a9 > [88841.592632] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 > 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 cd 0f > 05 cd 80 <5d> 5a 59 c3 cc 90 90 90 2e 8d b4 26 00 00 00 00 8d b4 26 00 > 00 00 > [88841.592635] RSP: 002b:000000000050f450 EFLAGS: 00000256 ORIG_RAX: > 00000000000000c0 > [88841.592639] RAX: ffffffffffffffda RBX: 0000000001b90000 RCX: 000000000001f000 > [88841.592641] RDX: 0000000000000000 RSI: 0000000000004032 RDI: 00000000ffffffff > [88841.592644] RBP: 0000000000000000 R08: 000000000050f450 R09: 0000000000000000 > [88841.592646] R10: 0000000000000000 R11: 0000000000000256 R12: 0000000000000000 > [88841.592648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [88841.592658] </TASK> > [88841.592668] 00000000b4c54d84[9] should not have entry 00000000f0273bd5 > > Full kernel log attached here below as archive. > > -- > Best Regards, > Mike Gavrilov.