Hello, On Mon, May 27, 2024 at 06:14:12AM -0700, Christoph Hellwig wrote: > On Mon, May 27, 2024 at 01:58:25AM -0700, Christoph Hellwig wrote: > > Hi all, > > > > when running xfstests on nfs against a local server I see warnings like > > the ones above, which appear to have been added in commit > > e0932b6c1f94 (mm: page_alloc: consolidate free page accounting"). > > I've also reproduced this with xfstests on local xfs and no nfs in the > loop: > > generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30 > [ 1204.969286] XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273 > [ 1204.993621] XFS (nvme0n1): Ending clean mount > [ 1205.387032] XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 > [ 1205.412322] XFS (nvme1n1): Ending clean mount > [ 1205.440388] XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 > [ 1205.808063] XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9 > [ 1205.827290] XFS (nvme1n1): Ending clean mount > [ 1208.058931] ------------[ cut here ]------------ > [ 1208.059613] page type is 3, passed migratetype is 1 (nr=512) > [ 1208.060402] WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0 > [ 1208.061352] Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug] > [ 1208.062344] CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437 > [ 1208.063150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Thanks for the report. Could you please send me your .config? I'll try to reproduce it locally. > [ 1208.064204] RIP: 0010:expand+0x1c5/0x1f0 > [ 1208.064625] Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3 > [ 1208.066555] RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082 > [ 1208.067111] RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000 > [ 1208.067872] RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff > [ 1208.068629] RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001 > [ 1208.069336] R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009 > [ 1208.070038] R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000 > [ 1208.070750] FS: 00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000 > [ 1208.071552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1208.072121] CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0 > [ 1208.072829] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1208.073527] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 > [ 1208.074225] PKRU: 55555554 > [ 1208.074507] Call Trace: > [ 1208.074758] <TASK> > [ 1208.074977] ? __warn+0x7b/0x120 > [ 1208.075308] ? expand+0x1c5/0x1f0 > [ 1208.075652] ? report_bug+0x191/0x1c0 > [ 1208.076043] ? handle_bug+0x3c/0x80 > [ 1208.076400] ? exc_invalid_op+0x17/0x70 > [ 1208.076782] ? asm_exc_invalid_op+0x1a/0x20 > [ 1208.077203] ? expand+0x1c5/0x1f0 > [ 1208.077543] ? expand+0x1c5/0x1f0 > [ 1208.077878] __rmqueue_pcplist+0x3a9/0x730 Ok so the allocator is taking a larger buddy off the freelist to satisfy a smaller request, then puts the remainder back on the list. There is no warning from the del_page_from_free_list(), so the buddy type and the type of the list it was taken from are coherent. The warning happens when it expands the remainder of the buddy and finds the tail block to be of a different type. Specifically, it takes a movable buddy (type 1) off the movable list, but finds a tail block of it marked highatomic (type 3). I don't see how we could have merged those during freeing, because the highatomic buddy would have failed migratetype_is_mergeable(). Ah, but there DOES seem to be an issue with how we reserve highatomics: reserving and unreserving happens one pageblock at a time, but MAX_ORDER is usually bigger. If we rmqueue() an order-10 request, reserve_highatomic_block() will only convert the first order-9 block in it; the tail will remain the original type, which will produce a buddy of mixed type blocks upon freeing. This doesn't fully explain the warning here. We'd expect to see it the other way round - passing an assumed type of 3 (HIGHATOMIC) for the remainder that is actually 1 (MOVABLE). But the pageblock-based reservations look fishy. I'll cook up a patch to make this range-based. It might just fix it in a way I'm not seeing just yet. > [ 1208.078285] get_page_from_freelist+0x7a0/0xf00 > [ 1208.078745] __alloc_pages_noprof+0x153/0x2e0 > [ 1208.079181] __folio_alloc_noprof+0x10/0xa0 > [ 1208.079603] __filemap_get_folio+0x16b/0x370 > [ 1208.080030] iomap_write_begin+0x496/0x680 > [ 1208.080441] iomap_file_buffered_write+0x17f/0x440 > [ 1208.080916] xfs_file_buffered_write+0x7e/0x2a0 > [ 1208.081374] vfs_write+0x262/0x440 > [ 1208.081717] __x64_sys_pwrite64+0x8f/0xc0 > [ 1208.082112] do_syscall_64+0x4f/0x120 > [ 1208.082487] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 1208.082982] RIP: 0033:0x7f72ca4ce2b7 > [ 1208.083350] Code: 08 89 3c 24 48 89 4c 24 18 e8 15 f4 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 b > [ 1208.085126] RSP: 002b:00007ffe56d1a930 EFLAGS: 00000293 ORIG_RAX: 0000000000000012 > [ 1208.085867] RAX: ffffffffffffffda RBX: 0000000154400000 RCX: 00007f72ca4ce2b7 > [ 1208.086560] RDX: 0000000000400000 RSI: 00007f72c9401000 RDI: 0000000000000003 > [ 1208.087248] RBP: 0000000154400000 R08: 0000000000000000 R09: 00007ffe56d1a9d0 > [ 1208.087946] R10: 0000000154400000 R11: 0000000000000293 R12: 00000000ffffffff > [ 1208.088639] R13: 00000000abc00000 R14: 0000000000000000 R15: 0000000000000551 > [ 1208.089340] </TASK> > [ 1208.089565] ---[ end trace 0000000000000000 ]---