On 06/08/2024 16:16, Matthew Wilcox wrote: > On Tue, Aug 06, 2024 at 09:47:19AM +0100, Ryan Roberts wrote: >> Our CI is reporting an oops during boot on linux-next (next-20240806) on arm64. Bisect tells me that it is due to your commit cdc4ad36a871b ("fs: Convert aops->write_begin to take a folio"), but there is no link to a mail thread on the patch and I can't find it in lore. > > You're looking in the wrong place ;-) > > https://lore.kernel.org/linux-fsdevel/20240717154716.237943-22-willy@xxxxxxxxxxxxx/#Z31mm:shmem.c Ahha, cheers! > >> Anyway, I believe the issue is that you are doing this in shmem_write_begin(): >> >> if (folio_test_has_hwpoisoned(folio)) { >> >> But folio could be small and I think that function is only safe for large folios? (AFAICT it is unconditionally looking at the flags in the second page?). >> >> Elsewhere in the file, this pattern is used: >> >> if (folio_test_hwpoison(folio) || >> (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { > > Ugh. The hwpoison stuff is too complicated. Because that's wrong too. > It should be ... > > if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio) || > !folio_test_large(folio) && folio_test_hwpoison(folio)) Err... I clearly don't understand it properly. I guess you sometimes want to know if any page in the folio is poisioned, and sometimes if a specific page is poisoned? So // returns true if any page in the folio is hwpoisoned. // works for any folio (large or small). folio_test_hwpoison(folio); // returns true if the page at index within folio is hwpoisoned. // works for any folio (large or small). // BUGs if index out of range. folio_test_hwpoison_page(folio, index); Why isn't this the right interface? Why do we have a function that takes a folio but is only correct to call if the folio is large? > > right? But that's a mouthful to write. I'm tempted to rip it all out > and start again ... > >> Here is the oops (pretty much as soon as we get into user space): >> >> [ 0.623253] page: refcount:3 mapcount:0 mapping:00000000eebcb8cf index:0x0 pfn:0x18cc07 >> [ 0.624212] memcg:ffff000142023000 >> [ 0.624617] aops:shmem_aops ino:800 dentry name:"memfd:snapd-env-generator" >> [ 0.625444] flags: 0xbfffe0000040005(locked|referenced|swapbacked|node=0|zone=2|lastcpupid=0x1ffff) >> [ 0.626532] raw: 0bfffe0000040005 0000000000000000 dead000000000122 ffff000181dd0ac0 >> [ 0.627442] raw: 0000000000000000 0000000000000000 00000003ffffffff ffff000142023000 >> [ 0.628331] page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((void *)0)) && (uintptr_t)(&page->flags) != (uintptr_t)((void *)0) && __builtin_constant_p(*(const unsigned long *)(&page->flags))) ? const_test_bit(PG_head, &page->flags) : generic_test_bit(PG_head, &page->flags))) >> [ 0.632106] ------------[ cut here ]------------ >> [ 0.632630] kernel BUG at include/linux/page-flags.h:308! > > I'm glad I made it so noisy instead of silently checking something > that's not the flag we thought it was ...