This is the short term solutions "swap cluster order" listed in my "Swap Abstraction" discussion slice 8 in the recent LSF/MM conference. When commit 845982eb264bc "mm: swap: allow storage of all mTHP orders" is introduced, it only allocates the mTHP swap entries from the new empty cluster list. It has a fragmentation issue reported by Barry. https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@xxxxxxxxxxxxxx/ The reason is that all the empty clusters have been exhausted while there are plenty of free swap entries in the cluster that are not 100% free. Remember the swap allocation order in the cluster. Keep track of the per order non full cluster list for later allocation. The patch 3 of this series gives the swap SSD allocation a new separate code path from the HDD allocation. The new allocator use cluster list only and do not global scan swap_map[] without lock any more. This streamline the swap allocation for SSD. The code matches the execution flow much better. User impact: For users that allocate and free mix order mTHP swapping, It greatly improves the success rate of the mTHP swap allocation after the initial phase. It also performs faster when the swapfile is close to full, because the allocator can get the non full cluster from a list rather than scanning a lot of swap_map entries. This series still lacks the swap cache reclaim feature. The reclaim series of patches are under development and testing right now. Will post the mail list soon. For this reason, the patch 3 is consider RFC and not ready to merge. With Barry's mthp test program V2: Without: $ ./thp_swap_allocator_test -a Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71% Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00% Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00% ... Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00% Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% $ ./thp_swap_allocator_test -a -s Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00% Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% .. Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00% Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00% Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00% $ ./thp_swap_allocator_test -s Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00% Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% .. Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00% Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00% Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00% $ ./thp_swap_allocator_test Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00% Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% .. Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00% Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00% Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00% With: $ ./thp_swap_allocator_test -a Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 3: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00% ... Iteration 98: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 99: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 100: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% $ ./thp_swap_allocator_test -a -s Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 3: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 4: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 5: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 6: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 7: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 8: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 9: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 10: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 11: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 12: swpout inc: 232, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 13: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33% Iteration 15: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 16: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 17: swpout inc: 212, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 18: swpout inc: 234, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 19: swpout inc: 220, swpout fallback inc: 6, Fallback percentage: 2.65% Iteration 20: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 21: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 22: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 23: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 24: swpout inc: 232, swpout fallback inc: 1, Fallback percentage: 0.43% Iteration 25: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 26: swpout inc: 230, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 27: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44% Iteration 29: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88% Iteration 30: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44% Iteration 31: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 32: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 33: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 34: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88% Iteration 35: swpout inc: 230, swpout fallback inc: 3, Fallback percentage: 1.29% Iteration 36: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 37: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 38: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 39: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% Iteration 40: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 41: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 42: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 43: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 44: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 45: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 46: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90% Iteration 47: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 48: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45% Iteration 49: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 50: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 51: swpout inc: 224, swpout fallback inc: 2, Fallback percentage: 0.88% Iteration 52: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 53: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 54: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 55: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 56: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88% Iteration 57: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 58: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 59: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44% Iteration 60: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% Iteration 61: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 62: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 63: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 64: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 65: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 66: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 67: swpout inc: 220, swpout fallback inc: 2, Fallback percentage: 0.90% Iteration 68: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 69: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 70: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 71: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 72: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 73: swpout inc: 218, swpout fallback inc: 5, Fallback percentage: 2.24% Iteration 74: swpout inc: 223, swpout fallback inc: 5, Fallback percentage: 2.19% Iteration 75: swpout inc: 222, swpout fallback inc: 7, Fallback percentage: 3.06% Iteration 76: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 77: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 78: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 79: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89% Iteration 80: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45% Iteration 81: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 82: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 83: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% Iteration 84: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 85: swpout inc: 213, swpout fallback inc: 1, Fallback percentage: 0.47% Iteration 86: swpout inc: 215, swpout fallback inc: 8, Fallback percentage: 3.59% Iteration 87: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45% Iteration 88: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 89: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63% Iteration 90: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44% Iteration 91: swpout inc: 214, swpout fallback inc: 1, Fallback percentage: 0.47% Iteration 92: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 93: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 94: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89% Iteration 95: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45% Iteration 96: swpout inc: 223, swpout fallback inc: 4, Fallback percentage: 1.76% Iteration 97: swpout inc: 223, swpout fallback inc: 7, Fallback percentage: 3.04% Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44% Iteration 99: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% Iteration 100: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00% $ ./thp_swap_allocator_test Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 134, swpout fallback inc: 98, Fallback percentage: 42.24% Iteration 3: swpout inc: 72, swpout fallback inc: 154, Fallback percentage: 68.14% Iteration 4: swpout inc: 40, swpout fallback inc: 183, Fallback percentage: 82.06% Iteration 5: swpout inc: 27, swpout fallback inc: 199, Fallback percentage: 88.05% Iteration 6: swpout inc: 22, swpout fallback inc: 202, Fallback percentage: 90.18% Iteration 7: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74% Iteration 8: swpout inc: 14, swpout fallback inc: 214, Fallback percentage: 93.86% Iteration 9: swpout inc: 5, swpout fallback inc: 221, Fallback percentage: 97.79% Iteration 10: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61% ... Iteration 97: swpout inc: 12, swpout fallback inc: 207, Fallback percentage: 94.52% Iteration 98: swpout inc: 8, swpout fallback inc: 219, Fallback percentage: 96.48% Iteration 99: swpout inc: 16, swpout fallback inc: 218, Fallback percentage: 93.16% Iteration 100: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61% $ ./thp_swap_allocator_test -s Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 84, swpout fallback inc: 148, Fallback percentage: 63.79% Iteration 3: swpout inc: 39, swpout fallback inc: 195, Fallback percentage: 83.33% Iteration 4: swpout inc: 16, swpout fallback inc: 217, Fallback percentage: 93.13% Iteration 5: swpout inc: 11, swpout fallback inc: 214, Fallback percentage: 95.11% Iteration 6: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61% ... Iteration 96: swpout inc: 5, swpout fallback inc: 225, Fallback percentage: 97.83% Iteration 97: swpout inc: 2, swpout fallback inc: 215, Fallback percentage: 99.08% Iteration 98: swpout inc: 2, swpout fallback inc: 220, Fallback percentage: 99.10% Iteration 99: swpout inc: 4, swpout fallback inc: 222, Fallback percentage: 98.23% Iteration 100: swpout inc: 3, swpout fallback inc: 221, Fallback percentage: 98.66% Kernel compile under tmpfs with cgroup memory.max = 2G. 12 core 24 hyperthreading, 32 jobs. HDD swap 3 runs average, 20G swap file: Without: user 4186.290 system 421.743 real 597.317 With: user 4113.897 system 413.123 real 659.543 SSD swap 10 runs average, 20G swap partition: Without: user 4736.810 system 500.921 real 250.243 With: user 4729.478 system 500.265 real 249.633 Two zram swap: zram0 1.4G zram1 20G. The idea is forcing the zram0 almost full then overflow to zram1: Two zram 10 runs average: Without: user 4600.693 system 384.105 real 238.735 With: user 4604.502 system 382.087 real 239.063 Reported-by: Barry Song <21cnbao@xxxxxxxxx> Signed-off-by: Chris Li <chrisl@xxxxxxxxxx> --- Changes in v4: - Remove a warning in patch 2. - Allocating from the free cluster list before the nonfull list. Revert the v3 behavior. - Add cluster_index and cluster_offset function. - Patch 3 has a new allocating path for SSD. - HDD swap allocation does not need to consider clusters any more. Changes in v3: - Using V1 as base. - Rename "next" to "list" for the list field, suggested by Ying. - Update comment for the locking rules for cluster fields and list, suggested by Ying. - Allocate from the nonfull list before attempting free list, suggested by Kairui. - Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@xxxxxxxxxx Changes in v2: - Abandoned. - Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@xxxxxxxxxx --- Chris Li (3): mm: swap: swap cluster switch to double link list mm: swap: mTHP allocate swap entries from nonfull list RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots() include/linux/swap.h | 30 ++-- mm/swapfile.c | 490 +++++++++++++++++++++++---------------------------- 2 files changed, 238 insertions(+), 282 deletions(-) --- base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc change-id: 20240523-swap-allocator-1534c480ece4 Best regards, -- Chris Li <chrisl@xxxxxxxxxx>