Re: [PATCH 0/5] Remove some races around folio_test_hugetlb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/3/2 5:47, Matthew Wilcox (Oracle) wrote:
> Oscar and I have been exchanging a bit of email recently about the
> bug reported here:
> https://lore.kernel.org/all/ZXNhGsX32y19a2Xv@xxxxxxxxxxxxxxxxxxxx

Thanks for your patch.

> 
> I've come to the conclusion that folio_test_hugetlb() is just too fragile
> as it can give both false positives and false negatives, as well as
> resulting in the above bug.  With this patch series, it becomes a lot
> more robust.  In the memory-failure case, we always hold the hugetlb_lock
> so it's perfectly reliable.  In the compaction caase, it's unreliable, but
> the failures are acceptable and we recheck after taking the hugetlb_lock.

I encountered similar issues with PageSwapCache check when doing memory-failure test:

[66258.945079] page:00000000135e1205 refcount:1 mapcount:0 mapping:0000000000000000 index:0x9b pfn:0xa04e9a
[66258.949096] head:0000000038449724 order:9 entire_mapcount:1 nr_pages_mapped:0 pincount:0
[66258.949485] memcg:ffff95fb43379000
[66258.950334] anon flags: 0x6fffc00000a0068(uptodate|lru|head|mappedtodisk|swapbacked|node=1|zone=2|lastcpupid=0x3fff)
[66258.951212] page_type: 0xffffffff()
[66258.951882] raw: 06fffc0000000000 ffffc89628138001 dead000000000122 dead000000000400
[66258.952273] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[66258.952884] head: 06fffc00000a0068 ffffc896218a8008 ffffc89621680008 ffff95fb4349c439
[66258.953239] head: 0000000700000600 0000000000000000 00000001ffffffff ffff95fb43379000
[66258.953725] page dumped because: VM_BUG_ON_PAGE(PageTail(page))
[66258.954497] ------------[ cut here ]------------
[66258.954937] kernel BUG at include/linux/page-flags.h:313!
[66258.956502] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[66258.957001] CPU: 14 PID: 174237 Comm: page-types Kdump: loaded Not tainted 6.8.0-rc1-00162-gd162e170f118 #11
[66258.957001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[66258.958415] RIP: 0010:folio_flags.constprop.0+0x1c/0x50
[66258.958415] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 8b 57 08 48 89 f8 83 e2 01 74 12 48 c7 c6 a0 59 34 a7 48 89 c7 e8 b5 60 e8 ff 90 <0f> 0b 66 90 c3 cc cc cc cc f7 c7 ff 0f 00 00 75 1a 48 8b 17 83 e2
[66258.958415] RSP: 0018:ffffa0f38ae53e00 EFLAGS: 00000282
[66258.958415] RAX: 0000000000000033 RBX: 0000000000000000 RCX: ffff96031fd9c9c8
[66258.958415] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff96031fd9c9c0
[66258.958415] RBP: ffffc8962813a680 R08: ffffffffa7756f88 R09: 0000000000009ffb
[66258.962155] R10: 000000000000054a R11: ffffffffa7726fa0 R12: 06fffc0000000000
[66258.962155] R13: 0000000000000000 R14: 00007fff93bf1348 R15: 0000000000a04e9a
[66258.962155] FS:  00007f47cc5c4740(0000) GS:ffff96031fd80000(0000) knlGS:0000000000000000
[66258.962155] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66258.962155] CR2: 00007fff93c7b000 CR3: 0000000850c28000 CR4: 00000000000006f0
[66258.962155] Call Trace:
[66258.962155]  <TASK>
[66258.965730]  ? die+0x32/0x90
[66258.965730]  ? do_trap+0xdf/0x110
[66258.965730]  ? folio_flags.constprop.0+0x1c/0x50
[66258.965730]  ? do_error_trap+0x8b/0x110
[66258.965730]  ? folio_flags.constprop.0+0x1c/0x50
[66258.965730]  ? folio_flags.constprop.0+0x1c/0x50
[66258.965730]  ? exc_invalid_op+0x53/0x70
[66258.965730]  ? folio_flags.constprop.0+0x1c/0x50
[66258.965730]  ? asm_exc_invalid_op+0x1a/0x20
[66258.965730]  ? folio_flags.constprop.0+0x1c/0x50
[66258.965730]  stable_page_flags+0x210/0x940
[66258.965730]  kpageflags_read+0x97/0xf0
[66258.965730]  vfs_read+0xa0/0x370
[66258.965730]  __x64_sys_pread64+0x90/0xc0
[66258.965730]  do_syscall_64+0xcd/0x1e0
[66258.965730]  entry_SYSCALL_64_after_hwframe+0x6f/0x77
[66258.965730] RIP: 0033:0x7f47cc31274a
[66258.969711] Code: 44 24 78 00 00 00 00 e9 2b f1 ff ff 0f 1f 40 00 f3 0f 1e fa 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[66258.969711] RSP: 002b:00007fff93af1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
[66258.969711] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f47cc31274a
[66258.969711] RDX: 0000000000000008 RSI: 00007fff93bf1340 RDI: 0000000000000004
[66258.969711] RBP: 00007fff93af12e0 R08: 0000000000000001 R09: 8100000000a04e99
[66258.969711] R10: 00000000050274d0 R11: 0000000000000246 R12: 00007fff93cf1588
[66258.972680] R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f47cc609040
[66258.972680]  </TASK>
[66258.972680] Modules linked in: mce_inject hwpoison_inject

After debugging, I think below race leads to the above panic:

 CPU1								CPU2
 kpageflags_read
  stable_page_flags
   PageSwapCache() check 4k page without page refcnt held
    folio_test_swapcache(page_folio(page));
     folio_test_swapbacked(folio) && /* page is swapbacked. */

								 page is freed into buddy and merged into larger order.
								 page is allocated as THP tail page.

     test_bit(PG_swapcache, folio_flags(folio, 0)); /* BUG_ON PageTail check in folio_flags. It's tail page now! */

So the PageSwapCache test is fragile too. Any thought on how to fix this 'similar' issue?

Thanks.

> 
> The cost of this reliability is that we now consume the word I recently
> freed in folio->page[1].  I think this is acceptable; we've still gained
> a completely reliable folio_test_hugetlb() (which we didn't have before
> I started messing around with the folio dtors).  Non-hugetlb users
> can use large_id as a pointer to something else entirely, or even as a
> non-pointer, as long as they can guarantee it can't conflict (ie don't
> use it as a bitfield).
> 
> So far, this is working for me.  Some stress testing would be appreciated.
> 
> Matthew Wilcox (Oracle) (5):
>   hugetlb: Make folio_test_hugetlb safer to call
>   hugetlb: Add hugetlb_pfn_folio
>   memory-failure: Use hugetlb_pfn_folio
>   memory-failure: Reorganise get_huge_page_for_hwpoison()
>   compaction: Use hugetlb_pfn_folio in isolate_migratepages_block
> 
>  include/linux/hugetlb.h    | 13 ++-----
>  include/linux/mm.h         |  8 -----
>  include/linux/mm_types.h   |  4 ++-
>  include/linux/page-flags.h | 25 +++----------
>  kernel/vmcore_info.c       |  3 +-
>  mm/compaction.c            | 16 ++++-----
>  mm/huge_memory.c           | 10 ++----
>  mm/hugetlb.c               | 72 +++++++++++++++++++++++++++++---------
>  mm/memory-failure.c        | 14 +++++---
>  9 files changed, 87 insertions(+), 78 deletions(-)
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux