On Tue, Sep 22, 2015 at 06:39:52PM -0700, Hugh Dickins wrote: > On Tue, 22 Sep 2015, Andrey Konovalov wrote: > > On Tue, Sep 22, 2015 at 8:54 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > On Tue, 22 Sep 2015, Andrey Konovalov wrote: > > >> If anybody comes up with a patch to fix the original issue I easily > > >> can test it, since I'm hitting "BUG: Bad page state" in a second when > > >> fuzzing with KTSAN and Trinity. > > > > > > This "BUG: Bad page state" sounds more serious, but I cannot track down > > > your report of it: please repost - thanks - though on seeing it, I may > > > well end up with no ideas. > > > > The report is below. > > Thanks. > > > > > I get it after a few seconds of running Trinity on a kernel with KTSAN > > and targeting mlock, munlock and madvise syscalls. > > Sasha also observed a very similar crash a while ago > > (https://lkml.org/lkml/2014/11/6/1055). > > I didn't manage to reproduce this in a kernel build without KTSAN though. > > The idea was that data races KTSAN reports might be the explanation of > > these crashes. > > > > BUG: Bad page state in process trinity-c15 pfn:281999 > > page:ffffea000a066640 count:0 mapcount:0 mapping: (null) index:0xd > > flags: 0x20000000028000c(referenced|uptodate|swapbacked|mlocked) > > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > > bad because of flags: > > flags: 0x200000(mlocked) > > Modules linked in: > > CPU: 3 PID: 11190 Comm: trinity-c15 Not tainted 4.2.0-tsan #1295 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > ffffffff821c3b70 0000000000000000 0000000100004741 ffff8800b857f948 > > ffffffff81e9926c 0000000000000003 ffffea000a066640 ffff8800b857f978 > > ffffffff811ce045 ffffffff821c3b70 ffffea000a066640 0000000000000001 > > Call Trace: > > [< inline >] __dump_stack lib/dump_stack.c:15 > > [<ffffffff81e9926c>] dump_stack+0x63/0x81 lib/dump_stack.c:50 > > [<ffffffff811ce045>] bad_page+0x115/0x1a0 mm/page_alloc.c:409 > > [< inline >] free_pages_check mm/page_alloc.c:731 > > [<ffffffff811cf3b8>] free_pages_prepare+0x2f8/0x330 mm/page_alloc.c:922 > > [<ffffffff811d2911>] free_hot_cold_page+0x51/0x2b0 mm/page_alloc.c:1908 > > [<ffffffff811d2bcf>] free_hot_cold_page_list+0x5f/0x100 > > mm/page_alloc.c:1956 (discriminator 3) > > [<ffffffff811dd9c1>] release_pages+0x151/0x300 mm/swap.c:967 > > [<ffffffff811de723>] __pagevec_release+0x43/0x60 mm/swap.c:984 > > [< inline >] pagevec_release include/linux/pagevec.h:69 > > [<ffffffff811ef36a>] shmem_undo_range+0x4fa/0x9d0 mm/shmem.c:446 > > [<ffffffff811ef86f>] shmem_truncate_range+0x2f/0x60 mm/shmem.c:540 > > [<ffffffff811f15d5>] shmem_fallocate+0x555/0x6e0 mm/shmem.c:2086 > > [<ffffffff812568d0>] vfs_fallocate+0x1e0/0x310 fs/open.c:303 > > [< inline >] madvise_remove mm/madvise.c:326 > > [< inline >] madvise_vma mm/madvise.c:378 > > [< inline >] SYSC_madvise mm/madvise.c:528 > > [<ffffffff81225548>] SyS_madvise+0x378/0x760 mm/madvise.c:459 > > [<ffffffff8124ef36>] ? kt_atomic64_store+0x76/0x130 mm/ktsan/sync_atomic.c:161 > > [<ffffffff81ea8691>] entry_SYSCALL_64_fastpath+0x31/0x95 > > arch/x86/entry/entry_64.S:188 > > Disabling lock debugging due to kernel taint > > This is totally untested, and one of you may quickly prove me wrong; > but I went in to fix your "Bad page state (mlocked)" by holding pte > lock across the down_read_trylock of mmap_sem in try_to_unmap_one(), > then couldn't see why it would need mmap_sem at all, given how mlock > and munlock first assert intention by setting or clearing VM_LOCKED > in vm_flags, then work their way up the vma, taking pte locks. > > Calling mlock_vma_page() under pte lock may look suspicious > at first: but what it does is similar to clear_page_mlock(), > which we regularly call under pte lock from page_remove_rmap(). Indeed. Looks fishy. But probably will work. That was not obvious for me what makes clearing VM_LOCKED visible in try_to_unmap_one() without mmap_sem. After looking some more it seems we would rely on page_check_address() in try_to_unmap_one() having acquire semantics and follow_page_mask() in munlock_vma_pages_range() having release semantics. But I would prefer to have something more explicit. That's mess. > > I'd rather wait to hear whether this appears to work in practice, > and whether you agree that it should work in theory, before writing > the proper description. I'd love to lose that down_read_trylock. > > You mention how Sasha hit the "Bad page state (mlocked)" back in > November: that was one of the reasons we reverted Davidlohr's > i_mmap_lock_read to i_mmap_lock_write in unmap_mapping_range(), > without understanding why it was needed. Yes, it would lock out > a concurrent try_to_unmap(), whose setting of PageMlocked was not > sufficiently serialized by the down_read_trylock of mmap_sem. > > But I don't remember the other reasons for that revert (and > haven't looked very hard as yet): anyone else remember? I hoped Davidlohr will come back with something after the revert, but it never happend. I think the reverted patch was responsible for most of scalability boost from rwsem for i_mmap_lock... -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>