On 2/7/25 10:45, Matthew Wilcox wrote: > On Fri, Feb 07, 2025 at 10:34:52AM +0100, Miklos Szeredi wrote: >> Seems like page allocation gets an inconsistent page (mapcount != -1) >> in the report below. > > I think you're misreading the report. _mapcount is -1. Which means > mapcount is 0. > >> > Feb 06 08:54:47 archvm kernel: BUG: Bad page state in process rnote pfn:67587 >> > Feb 06 08:54:47 archvm kernel: page: refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x67587 refcount of -1 doesn't look healthy too, should be 0 at this point? >> > Feb 06 08:54:47 archvm kernel: flags: 0xfffffc8000020(lru|node=0|zone=1|lastcpupid=0x1fffff) >> > Feb 06 08:54:47 archvm kernel: raw: 000fffffc8000020 dead000000000100 dead000000000122 0000000000000000 > > flags lru.next lru.prev mapping > >> > Feb 06 08:54:47 archvm kernel: raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 > > index private mapcount:refcount memcg_data > >> > Feb 06 08:54:47 archvm kernel: page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set > > So the problem is the lru flag is set. > >> > Feb 06 08:54:47 archvm kernel: dump_stack_lvl+0x5d/0x80 >> > Feb 06 08:54:47 archvm kernel: bad_page.cold+0x7a/0x91 >> > Feb 06 08:54:47 archvm kernel: __rmqueue_pcplist+0x200/0xc50 >> > Feb 06 08:54:47 archvm kernel: get_page_from_freelist+0x2ae/0x1740 >> > Feb 06 08:54:47 archvm kernel: __alloc_frozen_pages_noprof+0x184/0x330 >> > Feb 06 08:54:47 archvm kernel: alloc_pages_mpol+0x7d/0x160 >> > Feb 06 08:54:47 archvm kernel: folio_alloc_mpol_noprof+0x14/0x40 >> > Feb 06 08:54:47 archvm kernel: vma_alloc_folio_noprof+0x69/0xb0 >> > Feb 06 08:54:47 archvm kernel: do_anonymous_page+0x32a/0x8b0 > > It's very weird, because PG_lru is also in PAGE_FLAGS_CHECK_AT_FREE. > So it should already have been checked and not be set. I'm on holiday Could be a use-after free of the page, which sets PG_lru again. The list corruptions in __rmqueue_pcplist also suggest some page manipulation after free. The -1 refcount suggests somebody was using the page while it was freed due to refcount dropping to 0 and then did a put_page()? > until Monday, so I'm not going to dive into this any further.