Re: shmem folio changes have broken linux-next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 06, 2024 at 09:47:19AM +0100, Ryan Roberts wrote:
> Our CI is reporting an oops during boot on linux-next (next-20240806) on arm64. Bisect tells me that it is due to your commit cdc4ad36a871b ("fs: Convert aops->write_begin to take a folio"), but there is no link to a mail thread on the patch and I can't find it in lore.

You're looking in the wrong place ;-)

https://lore.kernel.org/linux-fsdevel/20240717154716.237943-22-willy@xxxxxxxxxxxxx/#Z31mm:shmem.c

> Anyway, I believe the issue is that you are doing this in shmem_write_begin():
> 
>   if (folio_test_has_hwpoisoned(folio)) {
> 
> But folio could be small and I think that function is only safe for large folios? (AFAICT it is unconditionally looking at the flags in the second page?).
> 
> Elsewhere in the file, this pattern is used:
> 
>   if (folio_test_hwpoison(folio) ||
>       (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {

Ugh.  The hwpoison stuff is too complicated.  Because that's wrong too.
It should be ...

if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio) ||
    !folio_test_large(folio) && folio_test_hwpoison(folio))

right?  But that's a mouthful to write.  I'm tempted to rip it all out
and start again ...

> Here is the oops (pretty much as soon as we get into user space):
> 
> [    0.623253] page: refcount:3 mapcount:0 mapping:00000000eebcb8cf index:0x0 pfn:0x18cc07
> [    0.624212] memcg:ffff000142023000
> [    0.624617] aops:shmem_aops ino:800 dentry name:"memfd:snapd-env-generator"
> [    0.625444] flags: 0xbfffe0000040005(locked|referenced|swapbacked|node=0|zone=2|lastcpupid=0x1ffff)
> [    0.626532] raw: 0bfffe0000040005 0000000000000000 dead000000000122 ffff000181dd0ac0
> [    0.627442] raw: 0000000000000000 0000000000000000 00000003ffffffff ffff000142023000
> [    0.628331] page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((void *)0)) && (uintptr_t)(&page->flags) != (uintptr_t)((void *)0) && __builtin_constant_p(*(const unsigned long *)(&page->flags))) ? const_test_bit(PG_head, &page->flags) : generic_test_bit(PG_head, &page->flags)))
> [    0.632106] ------------[ cut here ]------------
> [    0.632630] kernel BUG at include/linux/page-flags.h:308!

I'm glad I made it so noisy instead of silently checking something
that's not the flag we thought it was ...




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux