On 30 May 2024, at 7:42, Johannes Weiner wrote: > On Wed, May 29, 2024 at 09:04:25PM -0400, Johannes Weiner wrote: >> Subject: [PATCH] mm: page_alloc: fix highatomic typing in multi-block buddies > > Argh, I dropped the reserve_highatomic_pageblock() caller update when > removing the printks right before sending out. My apologies. Here is > the fixed version: > > --- > > From 6aa9498ee0d7161b0605251116d16b18cd448552 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@xxxxxxxxxxx> > Date: Wed, 29 May 2024 18:18:12 -0400 > Subject: [PATCH] mm: page_alloc: fix highatomic typing in multi-block buddies > > Christoph reports a page allocator splat triggered by xfstests: > > generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30 > [] XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273 > [] XFS (nvme0n1): Ending clean mount > [] XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 > [] XFS (nvme1n1): Ending clean mount > [] XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 > [] XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9 > [] XFS (nvme1n1): Ending clean mount > [] ------------[ cut here ]------------ > [] page type is 3, passed migratetype is 1 (nr=512) > [] WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0 > [] Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug] > [] CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437 > [] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > [] RIP: 0010:expand+0x1c5/0x1f0 > [] Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3 > [] RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082 > [] RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000 > [] RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff > [] RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001 > [] R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009 > [] R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000 > [] FS: 00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000 > [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [] CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0 > [] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 > [] PKRU: 55555554 > [] Call Trace: > [] <TASK> > [] ? __warn+0x7b/0x120 > [] ? expand+0x1c5/0x1f0 > [] ? report_bug+0x191/0x1c0 > [] ? handle_bug+0x3c/0x80 > [] ? exc_invalid_op+0x17/0x70 > [] ? asm_exc_invalid_op+0x1a/0x20 > [] ? expand+0x1c5/0x1f0 > [] ? expand+0x1c5/0x1f0 > [] __rmqueue_pcplist+0x3a9/0x730 > [] get_page_from_freelist+0x7a0/0xf00 > [] __alloc_pages_noprof+0x153/0x2e0 > [] __folio_alloc_noprof+0x10/0xa0 > [] __filemap_get_folio+0x16b/0x370 > [] iomap_write_begin+0x496/0x680 > > While trying to service a movable allocation (page type 1), the page > allocator runs into a two-pageblock buddy on the movable freelist > whose second block is typed as highatomic (page type 3). > > This inconsistency is caused by the highatomic reservation system > operating on single pageblocks, while MAX_ORDER can be bigger than > that - in this configuration, pageblock_order is 9 while > MAX_PAGE_ORDER is 10. The test case is observed to make several > adjacent order-3 requests with __GFP_DIRECT_RECLAIM cleared, which > marks the surrounding block as highatomic. Upon freeing, the blocks > merge into an order-10 buddy. When the highatomic pool is drained > later on, this order-10 buddy gets moved back to the movable list, but > only the first pageblock is marked movable again. A subsequent > expand() of this buddy warns about the tail being of a different type. > > This is a long-standing bug that's surfaced by the recent block type > warnings added to the allocator. The consequences seem mostly benign, > it just results in odd behavior: the highatomic tail blocks are not > properly drained, instead they end up on the movable list first, then > go back to the highatomic list after an alloc-free cycle. > > To fix this, make the highatomic reservation code aware that > allocations/buddies can be larger than a pageblock. > > While it's an old quirk, the recently added type consistency warnings > seem to be the most prominent consequence of it. Set the Fixes: tag > accordingly to highlight this backporting dependency. > > Fixes: e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") > Reported-by: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > --- > mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++---------------- > 1 file changed, 34 insertions(+), 16 deletions(-) > The changes look good to me. Thank you for the explanation to my question. Reviewed-by: Zi Yan <ziy@xxxxxxxxxx> -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature