在 2020/8/26 上午9:19, Daniel Jordan 写道: > On Tue, Aug 25, 2020 at 11:26:58AM +0800, Alex Shi wrote: >> 在 2020/8/25 上午9:56, Daniel Jordan 写道: >>> Alex, do you have a pointer to the modified readtwice case? >> >> Sorry, no. my developer machine crashed, so I lost case my container and modified >> case. I am struggling to get my container back from a account problematic repository. >> >> But some testing scripts is here, generally, the original readtwice case will >> run each of threads on each of cpus. The new case will run one container on each cpus, >> and just run one readtwice thead in each of containers. > > Ok, what you've sent so far gives me an idea of what you did. My readtwice > changes were similar, except I used the cgroup interface directly instead of > docker and shared a filesystem between all the cgroups whereas it looks like > you had one per memcg. 30 second runs on 5.9-rc2 and v18 gave 11% more data > read with v18. This was using 16 cgroups (32 dd tasks) on a 40 CPU, 2 socket > machine. I clean up my testing and make it reproducable by a Dockerfile and a case patch which attached. User can build a container from the file, and then do testing like following: #start some testing containers for ((i=0; i< 80; i++)); do docker run --privileged=true --rm lrulock bash -c " sleep 20000" & done #do testing evn setup for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do docker exec --privileged=true -it $i bash -c "cd vm-scalability/; bash -x ./case-lru-file-readtwice m"& done #kick testing for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do docker exec --privileged=true -it $i bash -c "cd vm-scalability/; bash -x ./case-lru-file-readtwice r"& done #show result for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do echo === $i ===; docker exec $i bash -c 'cat /tmp/vm-scalability-tmp/dd-output-* ' & done | grep MB | awk 'BEGIN {a=0;} { a+=$10 } END {print NR, a/(NR)}' This time, on a 2P * 20 core * 2 HT machine, This readtwice performance is 252% compare to v5.9-rc2 kernel. A good surprise! > >>> Even better would be a description of the problem you're having in production >>> with lru_lock. We might be able to create at least a simulation of it to show >>> what the expected improvement of your real workload is. >> >> we are using thousands memcgs in a machine, but as a simulation, I guess above case >> could be helpful to show the problem. > > Using thousands of memcgs to do what? Any particulars about the type of > workload? Surely it's more complicated than page cache reads :) Yes, the workload are quit different on different business, some use cpu a lot, some use memory a lot, and some are may mixed. For containers number, that are also quit various from tens to hundreds to thousands. > >>> I ran a few benchmarks on v17 last week (sysbench oltp readonly, kerndevel from >>> mmtests, a memcg-ized version of the readtwice case I cooked up) and then today >>> discovered there's a chance I wasn't running the right kernels, so I'm redoing >>> them on v18. > > Neither kernel compile nor git checkout in the root cgroup changed much, just > 0.31% slower on elapsed time for the compile, so no significant regressions > there. Now for sysbench again. > Thanks a lot for testing report! Alex
FROM centos:8 MAINTAINER Alexs #WORKDIR /vm-scalability #RUN yum update -y && yum groupinstall "Development Tools" -y && yum clean all && \ #examples https://www.linuxtechi.com/build-docker-container-images-with-dockerfile/ RUN yum install git xfsprogs patch make gcc -y && yum clean all && \ git clone https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/ && \ cd vm-scalability && make usemem COPY readtwice.patch /vm-scalability/ RUN cd vm-scalability && patch -p1 < readtwice.patch
diff --git a/case-lru-file-readtwice b/case-lru-file-readtwice index 85533b248634..57cb97d121ae 100755 --- a/case-lru-file-readtwice +++ b/case-lru-file-readtwice @@ -15,23 +15,30 @@ . ./hw_vars -for i in `seq 1 $nr_task` -do - create_sparse_file $SPARSE_FILE-$i $((ROTATE_BYTES / nr_task)) - timeout --foreground -s INT ${runtime:-600} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-1-$i 2>&1 & - timeout --foreground -s INT ${runtime:-600} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-2-$i 2>&1 & -done +OUT_DIR=$(hostname)-${nr_task}c-$(((mem + (1<<29))>>30))g +TEST_CASES=${@:-$(echo case-*)} + +echo $((1<<30)) > /proc/sys/vm/max_map_count +echo $((1<<20)) > /proc/sys/kernel/threads-max +echo 1 > /proc/sys/vm/overcommit_memory +#echo 3 > /proc/sys/vm/drop_caches + + +i=1 + +if [ "$1" == "m" ];then + mount_tmpfs + create_sparse_root + create_sparse_file $SPARSE_FILE-$i $((ROTATE_BYTES)) + exit +fi + + +if [ "$1" == "r" ];then + (timeout --foreground -s INT ${runtime:-300} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-1-$i 2>&1)& + (timeout --foreground -s INT ${runtime:-300} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-2-$i 2>&1)& +fi wait sleep 1 -for file in $TMPFS_MNT/dd-output-* -do - [ -s "$file" ] || { - echo "dd output file empty: $file" >&2 - } - cat $file - rm $file -done - -rm `seq -f $SPARSE_FILE-%g 1 $nr_task` diff --git a/hw_vars b/hw_vars index 8731cefb9f57..ceeaa9f17c0b 100755 --- a/hw_vars +++ b/hw_vars @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/sh -ex if [ -n "$runtime" ]; then USEMEM="$CMD ./usemem --runtime $runtime" @@ -43,7 +43,7 @@ create_loop_devices() modprobe loop 2>/dev/null [ -e "/dev/loop0" ] || modprobe loop 2>/dev/null - for i in $(seq 0 8) + for i in $(seq 0 104) do [ -e "/dev/loop$i" ] && continue mknod /dev/loop$i b 7 $i @@ -101,11 +101,11 @@ remove_sparse_root () { create_sparse_file () { name=$1 size=$2 - # echo "$name is of size $size" + echo "$name is of size $size" $CMD truncate $name -s $size # dd if=/dev/zero of=$name bs=1k count=1 seek=$((size >> 10)) 2>/dev/null - # ls $SPARSE_ROOT - # ls /tmp/vm-scalability/* + ls $SPARSE_ROOT + ls /tmp/vm-scalability/* }