Chris Li <chrisl@xxxxxxxxxx> writes: > This is the short term solutions "swap cluster order" listed > in my "Swap Abstraction" discussion slice 8 in the recent > LSF/MM conference. > > When commit 845982eb264bc "mm: swap: allow storage of all mTHP > orders" is introduced, it only allocates the mTHP swap entries > from the new empty cluster list. It has a fragmentation issue > reported by Barry. > > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@xxxxxxxxxxxxxx/ > > The reason is that all the empty clusters have been exhausted while > there are plenty of free swap entries in the cluster that are > not 100% free. > > Remember the swap allocation order in the cluster. > Keep track of the per order non full cluster list for later allocation. > > The patch 3 of this series gives the swap SSD allocation > a new separate code path from the HDD allocation. The new allocator > use cluster list only and do not global scan swap_map[] without lock > any more. > > This streamline the swap allocation for SSD. The code matches the execution > flow much better. > > User impact: For users that allocate and free mix order mTHP swapping, > It greatly improves the success rate of the mTHP swap allocation after the > initial phase. > > It also performs faster when the swapfile is close to full, because the > allocator can get the non full cluster from a list rather than scanning > a lot of swap_map entries. > > This series still lacks the swap cache reclaim feature. The reclaim series > of patches are under development and testing right now. Will post the > mail list soon. For this reason, the patch 3 is consider RFC and not > ready to merge. > > With Barry's mthp test program V2: > > Without: > $ ./thp_swap_allocator_test -a > Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71% > Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00% > Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00% > ... > Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% > Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00% > Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% > > $ ./thp_swap_allocator_test -a -s > Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% > Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00% > Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% > .. > Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00% > Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00% > Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00% > > $ ./thp_swap_allocator_test -s > Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% > Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00% > Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% > .. > Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00% > Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00% > Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00% > > $ ./thp_swap_allocator_test > Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00% > Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00% > Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00% > .. > Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00% > Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00% > Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00% > > With: > $ ./thp_swap_allocator_test -a > Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 2: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 3: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00% > ... > Iteration 98: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 99: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 100: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > > $ ./thp_swap_allocator_test -a -s > Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 2: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 3: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 4: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 5: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 6: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 7: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 8: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 9: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 10: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 11: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 12: swpout inc: 232, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 13: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33% > Iteration 15: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 16: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 17: swpout inc: 212, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 18: swpout inc: 234, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 19: swpout inc: 220, swpout fallback inc: 6, Fallback percentage: 2.65% > Iteration 20: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 21: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 22: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 23: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 24: swpout inc: 232, swpout fallback inc: 1, Fallback percentage: 0.43% > Iteration 25: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 26: swpout inc: 230, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 27: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44% > Iteration 29: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88% > Iteration 30: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44% > Iteration 31: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 32: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 33: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 34: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88% > Iteration 35: swpout inc: 230, swpout fallback inc: 3, Fallback percentage: 1.29% > Iteration 36: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 37: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 38: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 39: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% > Iteration 40: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 41: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 42: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 43: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 44: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 45: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 46: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90% > Iteration 47: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 48: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45% > Iteration 49: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 50: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 51: swpout inc: 224, swpout fallback inc: 2, Fallback percentage: 0.88% > Iteration 52: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 53: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 54: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 55: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 56: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88% > Iteration 57: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 58: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 59: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44% > Iteration 60: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% > Iteration 61: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 62: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 63: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 64: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 65: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 66: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 67: swpout inc: 220, swpout fallback inc: 2, Fallback percentage: 0.90% > Iteration 68: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 69: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 70: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 71: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 72: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 73: swpout inc: 218, swpout fallback inc: 5, Fallback percentage: 2.24% > Iteration 74: swpout inc: 223, swpout fallback inc: 5, Fallback percentage: 2.19% > Iteration 75: swpout inc: 222, swpout fallback inc: 7, Fallback percentage: 3.06% > Iteration 76: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 77: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 78: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 79: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89% > Iteration 80: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45% > Iteration 81: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 82: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 83: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% > Iteration 84: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 85: swpout inc: 213, swpout fallback inc: 1, Fallback percentage: 0.47% > Iteration 86: swpout inc: 215, swpout fallback inc: 8, Fallback percentage: 3.59% > Iteration 87: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45% > Iteration 88: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 89: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63% > Iteration 90: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44% > Iteration 91: swpout inc: 214, swpout fallback inc: 1, Fallback percentage: 0.47% > Iteration 92: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 93: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 94: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89% > Iteration 95: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45% > Iteration 96: swpout inc: 223, swpout fallback inc: 4, Fallback percentage: 1.76% > Iteration 97: swpout inc: 223, swpout fallback inc: 7, Fallback percentage: 3.04% > Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44% > Iteration 99: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43% > Iteration 100: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00% > > $ ./thp_swap_allocator_test > Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 2: swpout inc: 134, swpout fallback inc: 98, Fallback percentage: 42.24% > Iteration 3: swpout inc: 72, swpout fallback inc: 154, Fallback percentage: 68.14% > Iteration 4: swpout inc: 40, swpout fallback inc: 183, Fallback percentage: 82.06% > Iteration 5: swpout inc: 27, swpout fallback inc: 199, Fallback percentage: 88.05% > Iteration 6: swpout inc: 22, swpout fallback inc: 202, Fallback percentage: 90.18% > Iteration 7: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74% > Iteration 8: swpout inc: 14, swpout fallback inc: 214, Fallback percentage: 93.86% > Iteration 9: swpout inc: 5, swpout fallback inc: 221, Fallback percentage: 97.79% > Iteration 10: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61% > ... > Iteration 97: swpout inc: 12, swpout fallback inc: 207, Fallback percentage: 94.52% > Iteration 98: swpout inc: 8, swpout fallback inc: 219, Fallback percentage: 96.48% > Iteration 99: swpout inc: 16, swpout fallback inc: 218, Fallback percentage: 93.16% > Iteration 100: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61% > > $ ./thp_swap_allocator_test -s > Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% > Iteration 2: swpout inc: 84, swpout fallback inc: 148, Fallback percentage: 63.79% > Iteration 3: swpout inc: 39, swpout fallback inc: 195, Fallback percentage: 83.33% > Iteration 4: swpout inc: 16, swpout fallback inc: 217, Fallback percentage: 93.13% > Iteration 5: swpout inc: 11, swpout fallback inc: 214, Fallback percentage: 95.11% > Iteration 6: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61% > ... > Iteration 96: swpout inc: 5, swpout fallback inc: 225, Fallback percentage: 97.83% > Iteration 97: swpout inc: 2, swpout fallback inc: 215, Fallback percentage: 99.08% > Iteration 98: swpout inc: 2, swpout fallback inc: 220, Fallback percentage: 99.10% > Iteration 99: swpout inc: 4, swpout fallback inc: 222, Fallback percentage: 98.23% > Iteration 100: swpout inc: 3, swpout fallback inc: 221, Fallback percentage: 98.66% > > Kernel compile under tmpfs with cgroup memory.max = 2G. > 12 core 24 hyperthreading, 32 jobs. > > HDD swap 3 runs average, 20G swap file: > > Without: > user 4186.290 > system 421.743 > real 597.317 > > With: > user 4113.897 > system 413.123 > real 659.543 > > SSD swap 10 runs average, 20G swap partition: > > Without: > user 4736.810 > system 500.921 > real 250.243 > > With: > user 4729.478 > system 500.265 > real 249.633 > > Two zram swap: > zram0 1.4G zram1 20G. > The idea is forcing the zram0 almost > full then overflow to zram1: > > Two zram 10 runs average: > > Without: > user 4600.693 > system 384.105 > real 238.735 > > With: > user 4604.502 > system 382.087 > real 239.063 > > Reported-by: Barry Song <21cnbao@xxxxxxxxx> > Signed-off-by: Chris Li <chrisl@xxxxxxxxxx> > --- > Changes in v4: > - Remove a warning in patch 2. > - Allocating from the free cluster list before the nonfull list. Revert the v3 behavior. > - Add cluster_index and cluster_offset function. > - Patch 3 has a new allocating path for SSD. > - HDD swap allocation does not need to consider clusters any more. It appears that my comments in the following emails are ignored? https://lore.kernel.org/linux-mm/87bk3pzr5p.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ https://lore.kernel.org/linux-mm/874j9hzqr3.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > changes in v3: > - Using V1 as base. > - Rename "next" to "list" for the list field, suggested by Ying. > - Update comment for the locking rules for cluster fields and list, > suggested by Ying. > - Allocate from the nonfull list before attempting free list, suggested > by Kairui. > - Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@xxxxxxxxxx > > Changes in v2: > - Abandoned. > - Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@xxxxxxxxxx > > --- > Chris Li (3): > mm: swap: swap cluster switch to double link list > mm: swap: mTHP allocate swap entries from nonfull list > RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots() > > include/linux/swap.h | 30 ++-- > mm/swapfile.c | 490 +++++++++++++++++++++++---------------------------- > 2 files changed, 238 insertions(+), 282 deletions(-) > --- > base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc > change-id: 20240523-swap-allocator-1534c480ece4 > -- Best Regards, Huang, Ying