On Fri, Dec 08, 2023 at 06:54:19PM +0100, Vlastimil Babka wrote: > On 8/16/23 17:11, Matthew Wilcox (Oracle) wrote: > > We can use a bit in page[1].flags to indicate that this folio belongs > > to hugetlb instead of using a value in page[1].dtors. That lets > > folio_test_hugetlb() become an inline function like it should be. > > We can also get rid of NULL_COMPOUND_DTOR. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > I think this (as commit 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")) is > causing the bug reported by Luis here: > https://bugzilla.kernel.org/show_bug.cgi?id=218227 Luis, please stop using bugzilla. If you'd sent email like a normal kernel developer, I'd've seen this bug earlier. > > page:000000009006bf10 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x3f8a0 pfn:0x1035c0 > > flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff) > > page_type: 0xffffff7f(buddy) > > raw: 017fffc000000000 ffffe704c422f808 ffffe704c41ac008 0000000000000000 > > raw: 000000000003f8a0 0000000000000005 00000000ffffff7f 0000000000000000 > > page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((vo> > > ------------[ cut here ]------------ > > kernel BUG at include/linux/page-flags.h:314! > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > CPU: 6 PID: 2435641 Comm: md5sum Not tainted 6.6.0-rc5 #2 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > RIP: 0010:folio_flags+0x65/0x70 > > Code: a8 40 74 de 48 8b 47 48 a8 01 74 d6 48 83 e8 01 48 39 c7 75 bd eb cb 48 8b 07 a8 40 75 c8 48 c7 c6 d8 a7 c3 89 e8 3b c7 fa ff <0f> 0b 66 0f > > > RSP: 0018:ffffad51c0bfb7a8 EFLAGS: 00010246 > > RAX: 000000000000015f RBX: ffffe704c40d7000 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: ffffffff89be8040 RDI: 00000000ffffffff > > RBP: 0000000000103600 R08: 0000000000000000 R09: ffffad51c0bfb658 > > R10: 0000000000000003 R11: ffffffff89eacb08 R12: 0000000000000035 > > R13: ffffe704c40d7000 R14: 0000000000000000 R15: ffffad51c0bfb930 > > FS: 00007f350c51b740(0000) GS:ffff9b62fbd80000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000555860919508 CR3: 00000001217fe002 CR4: 0000000000770ee0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > PKRU: 55555554 > > Call Trace: > > <TASK> > > ? die+0x32/0x80 > > ? do_trap+0xd6/0x100 > > ? folio_flags+0x65/0x70 > > ? do_error_trap+0x6a/0x90 > > ? folio_flags+0x65/0x70 > > ? exc_invalid_op+0x4c/0x60 > > ? folio_flags+0x65/0x70 > > ? asm_exc_invalid_op+0x16/0x20 > > ? folio_flags+0x65/0x70 > > ? folio_flags+0x65/0x70 > > PageHuge+0x67/0x80 > > isolate_migratepages_block+0x1c5/0x13b0 > > compact_zone+0x746/0xfc0 > > compact_zone_order+0xbb/0x100 > > try_to_compact_pages+0xf0/0x2f0 > > __alloc_pages_direct_compact+0x78/0x210 > > __alloc_pages_slowpath.constprop.0+0xac1/0xdb0 > > __alloc_pages+0x218/0x240 > > folio_alloc+0x17/0x50 > > It's because PageHuge() now does folio_test_hugetlb() which is documented to > assume caller holds a reference, but the test in > isolate_migratepages_block() doesn't. The code is there from 2021 by Oscar, > perhaps it could be changed to take a reference (and e.g. only test > PageHead() before), but it will be a bit involved as the > isolate_or_dissolve_huge_page() it calls has some logic based on the > refcount being zero/non-zero as well. Oscar, what do you think? > Also I wonder if any of the the other existing PageHuge() callers are also > affected because they might be doing so without a reference. I don't think the warning is actually wrong! We're living very dangerously here as PageHuge() could have returned a false positive before this change [1]. Then we assume that compound_nr() returns a consistent result (and compound_order() might, for example, return a value larger than 63, leading to UB). I think there's a place for a folio_test_hugetlb_unsafe(), but that would only remove the warning, and do nothing to fix all the unsafe usage. The hugetlb handling code in isolate_migratepages_block() doesn't seem to have any understanding that it's working with pages that can change under it. I can have a go at fixing it up, but maybe Oscar would prefer to do it? [1] terribly unlikely, but consider what happens if the page starts out as a non-hugetlb compound allocation. Then it is freed and reallocated; the new owner of the second page could have put enough information into it that tricked PageHuge() into returning true. Then we call isolate_or_dissolve_huge_page() which takes a lock, but doesn't take a refcount. Now it gets a bogus hstate and things go downhill from there ...)