On Fri, Mar 08, 2024 at 02:21:30PM +0000, Ryan Roberts wrote: > > [ 247.788985] BUG: Bad page state in process usemem pfn:ae58c2 > > [ 247.789617] page: refcount:0 mapcount:0 mapping:00000000dc16b680 index:0x1 > > pfn:0xae58c2 > > [ 247.790129] aops:0x0 ino:dead000000000122 > > [ 247.790394] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) > > [ 247.790821] page_type: 0xffffffff() > > [ 247.791052] raw: 0bfffc0000000000 0000000000000000 fffffc002a963090 > > fffffc002a963090 > > [ 247.791546] raw: 0000000000000001 0000000000000000 00000000ffffffff > > 0000000000000000 > > [ 247.792258] page dumped because: non-NULL mapping > > [ 247.792567] Modules linked in: > > [ 247.792772] CPU: 0 PID: 2052 Comm: usemem Not tainted > > 6.8.0-rc5-00456-g52fd6cd3bee5 #30 > > [ 247.793300] Hardware name: linux,dummy-virt (DT) > > [ 247.793680] Call trace: > > [ 247.793894] dump_backtrace+0x9c/0x100 > > [ 247.794200] show_stack+0x20/0x38 > > [ 247.794460] dump_stack_lvl+0x90/0xb0 > > [ 247.794726] dump_stack+0x18/0x28 > > [ 247.794964] bad_page+0x88/0x128 > > [ 247.795196] get_page_from_freelist+0xdc4/0x1280 > > [ 247.795520] __alloc_pages+0xe8/0x1038 ... > > My sense is that the first deferred split issue is now fully resolved once the > > extra code above is reinserted, but we still have a second problem. Thoughts? That seems likely ;-( It doesn't fit the same pattern as the ones we've been looking at. > bisect lands back on the same patch it always does; "mm: Allow non-hugetlb large > folios to be batch processed". Without this change, I can't reproduce the above > oops. > > With that change present, if I "re-narrow" the window as you suggested, I also > can't reproduce the problem. Ah, a pre-existing condition ;-( > As far as I can tell, mapping is zeroed when the page is freed, and the same > page checks are run at at that point too. So mapping must be written to while > the page is in the buddy? Perhaps something thinks its still a tail page during > split, but the buddy thinks its been freed? I'll stare at those codepaths; see if I can see anything. > Also the mapping value 00000000dc16b680 is not a valid kernel address, I don't > think. So surprised that get_kernel_nofault(host, &mapping->host) works. Ah, you've been caught by hashed kernel pointers. You can tell because the top 32 bits are 0. The real pointer is fffffc002a963090 (see the raw dump). Actually, I have a clue! The third and fourth word have the same value. That's indicative of an empty list_head. And if this were LRU, that would be the second and third word. And the PFN is congruent to 2 modulo 4. So this is the second tail page, and that's an empty deferred_list. So how do we init a list_head after a folio gets freed?