Re: [PATCH v3 0/8] make slab shrink lockless

Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> · Mon, 27 Feb 2023 21:31:51 +0800

On 2023/2/27 03:51, Andrew Morton wrote:
On Sun, 26 Feb 2023 22:46:47 +0800 Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:

Hi all,

This patch series aims to make slab shrink lockless.

What an awesome changelog.

2. Survey
=========

Especially this part.

Looking through all the prior efforts and at this patchset I am not
immediately seeing any statements about the overall effect upon
real-world workloads.  For a good example, does this patchset
measurably improve throughput or energy consumption on your servers?

Hi Andrew,

I re-tested with the following physical machines:

Architecture:        x86_64
CPU(s):              96
On-line CPU(s) list: 0-95
Model name:          Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz

I found that the reason for the hotspot I described in cover letter is
wrong. The reason for the down_read_trylock() hotspot is not because of
the failure to trylock, but simply because of the atomic operation
(cmpxchg). And this will lead to a significant reduction in IPC (insn
per cycle).

To verify this, I did the following tests:

1. Run the following script to create down_read_trylock() hotspots:

```
#!/bin/bash

DIR="/root/shrinker/memcg/mnt"

do_create()
{
	mkdir -p /sys/fs/cgroup/memory/test
	mkdir -p /sys/fs/cgroup/perf_event/test
	echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
	for i in `seq 0 $1`;
	do
		mkdir -p /sys/fs/cgroup/memory/test/$i;
		echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
		echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs;
		mkdir -p $DIR/$i;
	done
}

do_mount()
{
	for i in `seq $1 $2`;
	do
		mount -t tmpfs $i $DIR/$i;
	done
}

do_touch()
{
	for i in `seq $1 $2`;
	do
		echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
		echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs;
	        dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 &
	done
}

case "$1" in
  touch)
	do_touch $2 $3
	;;
  test)
  	do_create 4000
	do_mount 0 4000
	do_touch 0 3000
	;;
  *)
	exit 1
	;;
esac
```

Save the above script, then run test and touch commands.

Then we can use the following perf command to view hotspots:

perf top -U -F 999

1) Before applying this patchset:

  32.31%  [kernel]           [k] down_read_trylock
  19.40%  [kernel]           [k] pv_native_safe_halt
  16.24%  [kernel]           [k] up_read
  15.70%  [kernel]           [k] shrink_slab
   4.69%  [kernel]           [k] _find_next_bit
   2.62%  [kernel]           [k] shrink_node
   1.78%  [kernel]           [k] shrink_lruvec
   0.76%  [kernel]           [k] do_shrink_slab

2) After applying this patchset:

  27.83%  [kernel]           [k] _find_next_bit
  16.97%  [kernel]           [k] shrink_slab
  15.82%  [kernel]           [k] pv_native_safe_halt
   9.58%  [kernel]           [k] shrink_node
   8.31%  [kernel]           [k] shrink_lruvec
   5.64%  [kernel]           [k] do_shrink_slab
   3.88%  [kernel]           [k] mem_cgroup_iter

2. At the same time, we use the following perf command to capture IPC
information:

perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10

1) Before applying this patchset:

 Performance counter stats for 'system wide' (5 runs):

      454187219766      cycles                    test 
                   ( +-  1.84% )
       78896433101      instructions              test #    0.17  insn 
per cycle           ( +-  0.44% )

        10.0020430 +- 0.0000366 seconds time elapsed  ( +-  0.00% )

2) After applying this patchset:

 Performance counter stats for 'system wide' (5 runs):

      841954709443      cycles                    test 
                   ( +- 15.80% )  (98.69%)
      527258677936      instructions              test #    0.63  insn 
per cycle           ( +- 15.11% )  (98.68%)

          10.01064 +- 0.00831 seconds time elapsed  ( +-  0.08% )

We can see that IPC drops very seriously when calling
down_read_trylock() at high frequency. After using SRCU,
the IPC is at a normal level.

Thanks,
Qi