Re: shmem folio changes have broken linux-next

Ryan Roberts <ryan.roberts@xxxxxxx> · Tue, 6 Aug 2024 16:29:34 +0100

On 06/08/2024 16:16, Matthew Wilcox wrote:
> On Tue, Aug 06, 2024 at 09:47:19AM +0100, Ryan Roberts wrote:
>> Our CI is reporting an oops during boot on linux-next (next-20240806) on arm64. Bisect tells me that it is due to your commit cdc4ad36a871b ("fs: Convert aops->write_begin to take a folio"), but there is no link to a mail thread on the patch and I can't find it in lore.
> 
> You're looking in the wrong place ;-)
> 
> https://lore.kernel.org/linux-fsdevel/20240717154716.237943-22-willy@xxxxxxxxxxxxx/#Z31mm:shmem.c

Ahha, cheers!

> 
>> Anyway, I believe the issue is that you are doing this in shmem_write_begin():
>>
>>   if (folio_test_has_hwpoisoned(folio)) {
>>
>> But folio could be small and I think that function is only safe for large folios? (AFAICT it is unconditionally looking at the flags in the second page?).
>>
>> Elsewhere in the file, this pattern is used:
>>
>>   if (folio_test_hwpoison(folio) ||
>>       (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
> 
> Ugh.  The hwpoison stuff is too complicated.  Because that's wrong too.
> It should be ...
> 
> if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio) ||
>     !folio_test_large(folio) && folio_test_hwpoison(folio))

Err... I clearly don't understand it properly. I guess you sometimes want to
know if any page in the folio is poisioned, and sometimes if a specific page is
poisoned? So

// returns true if any page in the folio is hwpoisoned.
// works for any folio (large or small).
folio_test_hwpoison(folio);

// returns true if the page at index within folio is hwpoisoned.
// works for any folio (large or small).
// BUGs if index out of range.
folio_test_hwpoison_page(folio, index);

Why isn't this the right interface? Why do we have a function that takes a folio
but is only correct to call if the folio is large?

> 
> right?  But that's a mouthful to write.  I'm tempted to rip it all out
> and start again ...
> 
>> Here is the oops (pretty much as soon as we get into user space):
>>
>> [    0.623253] page: refcount:3 mapcount:0 mapping:00000000eebcb8cf index:0x0 pfn:0x18cc07
>> [    0.624212] memcg:ffff000142023000
>> [    0.624617] aops:shmem_aops ino:800 dentry name:"memfd:snapd-env-generator"
>> [    0.625444] flags: 0xbfffe0000040005(locked|referenced|swapbacked|node=0|zone=2|lastcpupid=0x1ffff)
>> [    0.626532] raw: 0bfffe0000040005 0000000000000000 dead000000000122 ffff000181dd0ac0
>> [    0.627442] raw: 0000000000000000 0000000000000000 00000003ffffffff ffff000142023000
>> [    0.628331] page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((void *)0)) && (uintptr_t)(&page->flags) != (uintptr_t)((void *)0) && __builtin_constant_p(*(const unsigned long *)(&page->flags))) ? const_test_bit(PG_head, &page->flags) : generic_test_bit(PG_head, &page->flags)))
>> [    0.632106] ------------[ cut here ]------------
>> [    0.632630] kernel BUG at include/linux/page-flags.h:308!
> 
> I'm glad I made it so noisy instead of silently checking something
> that's not the flag we thought it was ...