On 15.06.24 01:48, Chris Li wrote:
This is the short term solutiolns "swap cluster order" listed in my "Swap Abstraction" discussion slice 8 in the recent LSF/MM conference. When commit 845982eb264bc "mm: swap: allow storage of all mTHP orders" is introduced, it only allocates the mTHP swap entries from new empty cluster list. It has a fragmentation issue reported by Barry. https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@xxxxxxxxxxxxxx/ The mTHP allocation failure rate raises to almost 100% after a few hours in Barry's test run. The reason is that all the empty cluster has been exhausted while there are planty of free swap entries to in the cluster that is not 100% free. Remember the swap allocation order in the cluster. Keep track of the per order non full cluster list for later allocation. This greatly improve the sucess rate of the mTHP swap allocation. There is some test number in the V1 thread of this series: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@xxxxxxxxxx Reported-by: Barry Song <21cnbao@xxxxxxxxx> Signed-off-by: Chris Li <chrisl@xxxxxxxxxx> ---
Running the cow.c selftest with a bunch of debug config settings enabled, I get on mm-unstable: [ 25.236555] list_add corruption. prev->next should be next (ffff888105b5ad08), but was ffff888105b5ae78. (prev=ffff88812580b048). [ 25.237432] ------------[ cut here ]------------ [ 25.237702] kernel BUG at lib/list_debug.c:32! [ 25.237962] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 25.238288] CPU: 23 PID: 1264 Comm: cow Tainted: G W 6.10.0-rc4+ #301 [ 25.238720] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 25.239335] RIP: 0010:__list_add_valid_or_report+0x78/0xa0 [ 25.239646] Code: 6b ff 0f 0b 48 89 c1 48 c7 c7 c0 30 0e 83 e8 7f e5 6b ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 18 31 0e 83 e8 68 e5 6b ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 70 31 0e 83 e8 51 e5b [ 25.240670] RSP: 0000:ffffc90002c87bd0 EFLAGS: 00010246 [ 25.240964] RAX: 0000000000000075 RBX: ffff888105b5ac00 RCX: 0000000000000000 [ 25.241362] RDX: 0000000000000000 RSI: ffff88885f9a1a00 RDI: ffff88885f9a1a00 [ 25.241762] RBP: ffff88810624de20 R08: 0000000000000000 R09: 0000000000000003 [ 25.242158] R10: ffffc90002c87a78 R11: ffffffff83b5b808 R12: 0000000000044000 [ 25.242556] R13: 0000000000044000 R14: ffff88810624e000 R15: ffff88812580bb00 [ 25.242960] FS: 00007f4fb364b740(0000) GS:ffff88885f980000(0000) knlGS:0000000000000000 [ 25.243413] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.243737] CR2: 00007f4fb343c000 CR3: 000000010a5dc000 CR4: 0000000000750ef0 [ 25.244145] PKRU: 55555554 [ 25.244303] Call Trace: [ 25.244445] <TASK> [ 25.244572] ? die+0x36/0x90 [ 25.244742] ? do_trap+0xdd/0x100 [ 25.244935] ? __list_add_valid_or_report+0x78/0xa0 [ 25.245211] ? __list_add_valid_or_report+0x78/0xa0 [ 25.245488] ? do_error_trap+0x81/0x110 [ 25.245710] ? __list_add_valid_or_report+0x78/0xa0 [ 25.245988] ? exc_invalid_op+0x50/0x70 [ 25.246211] ? __list_add_valid_or_report+0x78/0xa0 [ 25.246488] ? asm_exc_invalid_op+0x1a/0x20 [ 25.246737] ? __list_add_valid_or_report+0x78/0xa0 [ 25.247016] swapcache_free_entries+0x1ec/0x240 [ 25.247286] free_swap_slot+0xcc/0xe0 [ 25.247498] put_swap_folio+0xf3/0x3b0 [ 25.247720] delete_from_swap_cache+0x68/0x90 [ 25.247972] folio_free_swap+0xd0/0x200 [ 25.248201] do_swap_page+0xd95/0x12d0 [ 25.248418] ? __entry_text_end+0x101e45/0x101e49 [ 25.248695] ? srso_alias_return_thunk+0x5/0xfbef5 [ 25.248969] ? srso_alias_return_thunk+0x5/0xfbef5 [ 25.249246] ? __pte_offset_map+0x18e/0x270 [ 25.249490] __handle_mm_fault+0x915/0xf80 [ 25.249731] ? srso_alias_return_thunk+0x5/0xfbef5 [ 25.250010] handle_mm_fault+0x1d1/0x400 [ 25.250242] do_user_addr_fault+0x16f/0x790 [ 25.250485] exc_page_fault+0x83/0x260 [ 25.250706] asm_exc_page_fault+0x26/0x30 Maybe what Hugh reported already. I'll try reverting your patches to see if that fixes these issues. -- Cheers, David / dhildenb