Re: [PATCH v2 0/2] mm: swap: mTHP swap allocator base on swap cluster order

David Hildenbrand <david@xxxxxxxxxx> · Tue, 18 Jun 2024 15:08:02 +0200

On 15.06.24 01:48, Chris Li wrote:
This is the short term solutiolns "swap cluster order" listed
in my "Swap Abstraction" discussion slice 8 in the recent
LSF/MM conference.

When commit 845982eb264bc "mm: swap: allow storage of all mTHP
orders" is introduced, it only allocates the mTHP swap entries
from new empty cluster list.  It has a fragmentation issue
reported by Barry.

https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@xxxxxxxxxxxxxx/

The mTHP allocation failure rate raises to almost 100% after a few
hours in Barry's test run.

The reason is that all the empty cluster has been exhausted while
there are planty of free swap entries to in the cluster that is
not 100% free.

Remember the swap allocation order in the cluster.
Keep track of the per order non full cluster list for later allocation.

This greatly improve the sucess rate of the mTHP swap allocation.

There is some test number in the V1 thread of this series:
https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@xxxxxxxxxx

Reported-by: Barry Song <21cnbao@xxxxxxxxx>
Signed-off-by: Chris Li <chrisl@xxxxxxxxxx>
---

Running the cow.c selftest with a bunch of debug config
settings enabled, I get on mm-unstable:

[   25.236555] list_add corruption. prev->next should be next (ffff888105b5ad08), but was ffff888105b5ae78. (prev=ffff88812580b048).
[   25.237432] ------------[ cut here ]------------
[   25.237702] kernel BUG at lib/list_debug.c:32!
[   25.237962] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   25.238288] CPU: 23 PID: 1264 Comm: cow Tainted: G        W          6.10.0-rc4+ #301
[   25.238720] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   25.239335] RIP: 0010:__list_add_valid_or_report+0x78/0xa0
[   25.239646] Code: 6b ff 0f 0b 48 89 c1 48 c7 c7 c0 30 0e 83 e8 7f e5 6b ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 18 31 0e 83 e8 68 e5 6b ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 70 31 0e 83 e8 51 e5b
[   25.240670] RSP: 0000:ffffc90002c87bd0 EFLAGS: 00010246
[   25.240964] RAX: 0000000000000075 RBX: ffff888105b5ac00 RCX: 0000000000000000
[   25.241362] RDX: 0000000000000000 RSI: ffff88885f9a1a00 RDI: ffff88885f9a1a00
[   25.241762] RBP: ffff88810624de20 R08: 0000000000000000 R09: 0000000000000003
[   25.242158] R10: ffffc90002c87a78 R11: ffffffff83b5b808 R12: 0000000000044000
[   25.242556] R13: 0000000000044000 R14: ffff88810624e000 R15: ffff88812580bb00
[   25.242960] FS:  00007f4fb364b740(0000) GS:ffff88885f980000(0000) knlGS:0000000000000000
[   25.243413] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   25.243737] CR2: 00007f4fb343c000 CR3: 000000010a5dc000 CR4: 0000000000750ef0
[   25.244145] PKRU: 55555554
[   25.244303] Call Trace:
[   25.244445]  <TASK>
[   25.244572]  ? die+0x36/0x90
[   25.244742]  ? do_trap+0xdd/0x100
[   25.244935]  ? __list_add_valid_or_report+0x78/0xa0
[   25.245211]  ? __list_add_valid_or_report+0x78/0xa0
[   25.245488]  ? do_error_trap+0x81/0x110
[   25.245710]  ? __list_add_valid_or_report+0x78/0xa0
[   25.245988]  ? exc_invalid_op+0x50/0x70
[   25.246211]  ? __list_add_valid_or_report+0x78/0xa0
[   25.246488]  ? asm_exc_invalid_op+0x1a/0x20
[   25.246737]  ? __list_add_valid_or_report+0x78/0xa0
[   25.247016]  swapcache_free_entries+0x1ec/0x240
[   25.247286]  free_swap_slot+0xcc/0xe0
[   25.247498]  put_swap_folio+0xf3/0x3b0
[   25.247720]  delete_from_swap_cache+0x68/0x90
[   25.247972]  folio_free_swap+0xd0/0x200
[   25.248201]  do_swap_page+0xd95/0x12d0
[   25.248418]  ? __entry_text_end+0x101e45/0x101e49
[   25.248695]  ? srso_alias_return_thunk+0x5/0xfbef5
[   25.248969]  ? srso_alias_return_thunk+0x5/0xfbef5
[   25.249246]  ? __pte_offset_map+0x18e/0x270
[   25.249490]  __handle_mm_fault+0x915/0xf80
[   25.249731]  ? srso_alias_return_thunk+0x5/0xfbef5
[   25.250010]  handle_mm_fault+0x1d1/0x400
[   25.250242]  do_user_addr_fault+0x16f/0x790
[   25.250485]  exc_page_fault+0x83/0x260
[   25.250706]  asm_exc_page_fault+0x26/0x30

Maybe what Hugh reported already. I'll try reverting your patches
to see if that fixes these issues.

--
Cheers,

David / dhildenb