On Sat, May 25, 2024 at 5:17 AM Chris Li <chrisl@xxxxxxxxxx> wrote: > > This is the short term solutiolns "swap cluster order" listed > in my "Swap Abstraction" discussion slice 8 in the recent > LSF/MM conference. > > When commit 845982eb264bc "mm: swap: allow storage of all mTHP > orders" is introduced, it only allocates the mTHP swap entries > from new empty cluster list. That works well for PMD size THP, > but it has a serius fragmentation issue reported by Barry. > > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@xxxxxxxxxxxxxx/ > > The mTHP allocation failure rate raises to almost 100% after a few > hours in Barry's test run. > > The reason is that all the empty cluster has been exhausted while > there are planty of free swap entries to in the cluster that is > not 100% free. > > Address this by remember the swap allocation order in the cluster. > Keep track of the per order non full cluster list for later allocation. > > This greatly improve the sucess rate of the mTHP swap allocation. > While I am still waiting for Barry's test result. I paste Kairui's test Hi Chris, Attached are the test results from a real phone using 4-order mTHP. The results seem better overall, but after 7 hours, especially when the swap device becomes full(soon some apps are killed to free memory and swap), the fallback ratio still reaches 100%. I haven't debugged this, but my guess is that the cluster's order can shift between 4-order and 0-order. Sometimes, they all shift to 0-order, and hardly can they get back to 4-order. > result here: > > I'm able to reproduce such an issue with a simple script (enabling all order of mthp): > > modprobe brd rd_nr=1 rd_size=$(( 10 * 1024 * 1024)) > swapoff -a > mkswap /dev/ram0 > swapon /dev/ram0 > > rmdir /sys/fs/cgroup/benchmark > mkdir -p /sys/fs/cgroup/benchmark > cd /sys/fs/cgroup/benchmark > echo 8G > memory.max > echo $$ > cgroup.procs > > memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 -t 32 -B binary & > > /usr/local/bin/memtier_benchmark -S /tmp/memcached.socket \ > -P memcache_binary -n allkeys --key-minimum=1 \ > --key-maximum=18000000 --key-pattern=P:P -c 1 -t 32 \ > --ratio 1:0 --pipeline 8 -d 1024 > > Before: > Totals 48805.63 0.00 0.00 5.26045 1.19100 38.91100 59.64700 51063.98 > After: > Totals 71098.84 0.00 0.00 3.60585 0.71100 26.36700 39.16700 74388.74 > > And the fallback ratio dropped by a lot: > Before: > hugepages-32kB/stats/anon_swpout_fallback:15997 > hugepages-32kB/stats/anon_swpout:18712 > hugepages-512kB/stats/anon_swpout_fallback:192 > hugepages-512kB/stats/anon_swpout:0 > hugepages-2048kB/stats/anon_swpout_fallback:2 > hugepages-2048kB/stats/anon_swpout:0 > hugepages-1024kB/stats/anon_swpout_fallback:0 > hugepages-1024kB/stats/anon_swpout:0 > hugepages-64kB/stats/anon_swpout_fallback:18246 > hugepages-64kB/stats/anon_swpout:17644 > hugepages-16kB/stats/anon_swpout_fallback:13701 > hugepages-16kB/stats/anon_swpout:18234 > hugepages-256kB/stats/anon_swpout_fallback:8642 > hugepages-256kB/stats/anon_swpout:93 > hugepages-128kB/stats/anon_swpout_fallback:21497 > hugepages-128kB/stats/anon_swpout:7596 > > (Still collecting more data, the success swpout was mostly done early, then the fallback began to increase, nearly 100% failure rate) > > After: > hugepages-32kB/stats/swpout:34445 > hugepages-32kB/stats/swpout_fallback:0 > hugepages-512kB/stats/swpout:1 > hugepages-512kB/stats/swpout_fallback:134 > hugepages-2048kB/stats/swpout:1 > hugepages-2048kB/stats/swpout_fallback:1 > hugepages-1024kB/stats/swpout:6 > hugepages-1024kB/stats/swpout_fallback:0 > hugepages-64kB/stats/swpout:35495 > hugepages-64kB/stats/swpout_fallback:0 > hugepages-16kB/stats/swpout:32441 > hugepages-16kB/stats/swpout_fallback:0 > hugepages-256kB/stats/swpout:2223 > hugepages-256kB/stats/swpout_fallback:6278 > hugepages-128kB/stats/swpout:29136 > hugepages-128kB/stats/swpout_fallback:52 > > Reported-by: Barry Song <21cnbao@xxxxxxxxx> > Tested-by: Kairui Song <kasong@xxxxxxxxxxx> > Signed-off-by: Chris Li <chrisl@xxxxxxxxxx> > --- > Chris Li (2): > mm: swap: swap cluster switch to double link list > mm: swap: mTHP allocate swap entries from nonfull list > > include/linux/swap.h | 18 ++-- > mm/swapfile.c | 252 +++++++++++++++++---------------------------------- > 2 files changed, 93 insertions(+), 177 deletions(-) > --- > base-commit: c65920c76a977c2b73c3a8b03b4c0c00cc1285ed > change-id: 20240523-swap-allocator-1534c480ece4 > > Best regards, > -- > Chris Li <chrisl@xxxxxxxxxx> > Thanks Barry
Attachment:
chris-swap-patch.png
Description: PNG image