On Tue, Aug 06, 2024 at 09:47:19AM +0100, Ryan Roberts wrote: > Our CI is reporting an oops during boot on linux-next (next-20240806) on arm64. Bisect tells me that it is due to your commit cdc4ad36a871b ("fs: Convert aops->write_begin to take a folio"), but there is no link to a mail thread on the patch and I can't find it in lore. You're looking in the wrong place ;-) https://lore.kernel.org/linux-fsdevel/20240717154716.237943-22-willy@xxxxxxxxxxxxx/#Z31mm:shmem.c > Anyway, I believe the issue is that you are doing this in shmem_write_begin(): > > if (folio_test_has_hwpoisoned(folio)) { > > But folio could be small and I think that function is only safe for large folios? (AFAICT it is unconditionally looking at the flags in the second page?). > > Elsewhere in the file, this pattern is used: > > if (folio_test_hwpoison(folio) || > (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { Ugh. The hwpoison stuff is too complicated. Because that's wrong too. It should be ... if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio) || !folio_test_large(folio) && folio_test_hwpoison(folio)) right? But that's a mouthful to write. I'm tempted to rip it all out and start again ... > Here is the oops (pretty much as soon as we get into user space): > > [ 0.623253] page: refcount:3 mapcount:0 mapping:00000000eebcb8cf index:0x0 pfn:0x18cc07 > [ 0.624212] memcg:ffff000142023000 > [ 0.624617] aops:shmem_aops ino:800 dentry name:"memfd:snapd-env-generator" > [ 0.625444] flags: 0xbfffe0000040005(locked|referenced|swapbacked|node=0|zone=2|lastcpupid=0x1ffff) > [ 0.626532] raw: 0bfffe0000040005 0000000000000000 dead000000000122 ffff000181dd0ac0 > [ 0.627442] raw: 0000000000000000 0000000000000000 00000003ffffffff ffff000142023000 > [ 0.628331] page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((void *)0)) && (uintptr_t)(&page->flags) != (uintptr_t)((void *)0) && __builtin_constant_p(*(const unsigned long *)(&page->flags))) ? const_test_bit(PG_head, &page->flags) : generic_test_bit(PG_head, &page->flags))) > [ 0.632106] ------------[ cut here ]------------ > [ 0.632630] kernel BUG at include/linux/page-flags.h:308! I'm glad I made it so noisy instead of silently checking something that's not the flag we thought it was ...