On Thu, Oct 03, 2024 at 10:52:03PM +0100, Lorenzo Stoakes wrote: > On Fri, Oct 04, 2024 at 02:25:07AM +0500, Mikhail Gavrilov wrote: > > On Thu, Oct 3, 2024 at 1:45 AM Mikhail Gavrilov > > <mikhail.v.gavrilov@xxxxxxxxx> wrote: > > > > > > On Wed, Oct 2, 2024 at 10:56 PM Lorenzo Stoakes > > > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and > > > > CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more > > > > quickly (let us know if you do). > > > > > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM_MAPLE_TREE' > > > # CONFIG_DEBUG_VM_MAPLE_TREE is not set > > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM' > > > CONFIG_DEBUG_VM_IRQSOFF=y > > > CONFIG_DEBUG_VM=y > > > # CONFIG_DEBUG_VM_MAPLE_TREE is not set > > > # CONFIG_DEBUG_VM_RB is not set > > > CONFIG_DEBUG_VM_PGFLAGS=y > > > CONFIG_DEBUG_VM_PGTABLE=y > > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_MAPLE_TREE' > > > # CONFIG_DEBUG_MAPLE_TREE is not set > > > > > > Fedora's kernel build uses only CONFIG_DEBUG_VM and it's enough for > > > reproducing this issue. > > > Anyway I enabled all three options. I'll try to live for a day without > > > steam launching. In a day I'll write whether it is reproducing without > > > steam or not. > > > > A day passed, and as expected, the problem did not occur until I launch Steam. > > But with suggested options the stacktrace looks different. > > Instead of "KASAN: slab-use-after-free in m_next+0x13b" I see this: > > > > [88841.586167] node00000000b4c54d84: data_end 9 != the last slot offset 8 > > Thanks, looking into the attached dmesg this looks to be identical to the > issue that Bert reported in the other thread. > > The nature of it is that once the corruption happens 'weird stuff' will > happen after this, luckily this debug mode lets us pick up on the original > corruption. > > Bert is somehow luckily is able to reproduce very repeatably, so we have > been able to get a lot more information, but it's taking time to truly > narrow it down. > > Am working flat out to try to resolve the issue, we have before/after maple > trees and it seems like a certain operation is resulting in a corrupted > maple tree (duplicate 0x67ffffff entry). > > It is proving very very stubborn to be able to reproduce locally even in a > controlled environment where the maple tree is manually set up, but am > continuing my efforts to try to do so as best I can! :) > > Will respond here once we have a viable fix. I cc'd (and tagged) you over there, but I have a fix for this problem, do give it a try! [0] [0]: https://lore.kernel.org/linux-mm/20241005064114.42770-1-lorenzo.stoakes@xxxxxxxxxx/ [snip] Cheers, Lorenzo