On Mon, 26 Apr 2021 at 18:33, Odin Ugedal <odin@xxxxxxxxxx> wrote: > > Hi, > > > > Have you been able to reproduce this on mainline ? > > Yes. I have been debugging and testing with v5.12-rc8. After I found > the suspected > commit in ~v4.8, I compiled both the v4.4.267 and v4.9.267, and was able to > successfully reproduce it on v4.9.267 and not on v4.4.267. It is also > reproducible > on 5.11.16-arch1-1 that my distro ships, and it is reproducible on all > the machines > I have tested. > > > When running the script below on v5.12, I'm not able to reproduce your problem > > v5.12 is pretty fresh, so I have not tested on anything before v5.12-rc8. I did > compile v5.12.0 now, and I am able to reproduce it there as well. I wanted to say one v5.12-rcX version to make sure this is still a valid problem on latest version > > Which version did you try (the one for cgroup v1 or v2)? And/or did you try > to run the inspection bpftrace script? If you tested the cg v1 > version, it will often > end up at 50/50, 51/49 etc., and sometimes 60/40+-, making it hard to > verify without inspection. I tried cgroup v1 and v2 but not the bpf script > > I have attached a version of the "sub cgroup" example for cgroup v1, > that also force > the process to start on cpu 1 (CPU_ME), and sends it over to cpu 0 > (CPU) after attaching > to the new cgroup. That will make it evident each time. This example should also > always end up with 50/50 per stress process, but "always" ends up more > like 99/1. > > Can you confirm if you are able to reproduce with this version? I confirm that I can see a ratio of 4ms vs 204ms running time with the patch below. But when I look more deeply in my trace (I have instrumented the code), it seems that the 2 stress-ng don't belong to the same cgroup but remained in cg-1 and cg-2 which explains such running time difference. So your script doesn't reproduce the bug you want to highlight. That being said, I can also see a diff between the contrib of the cpu0 in the tg_load. I'm going to look further > > --- bash start > CGROUP_CPU=/sys/fs/cgroup/cpu/slice > CGROUP_CPUSET=/sys/fs/cgroup/cpuset/slice > CGROUP_CPUSET_ME=/sys/fs/cgroup/cpuset/me > CPU=0 > CPU_ME=1 > > function run_sandbox { > local CG_CPUSET="$1" > local CG_CPU="$2" > local INNER_SHARES="$3" > local CMD="$4" > > local PIPE="$(mktemp -u)" > mkfifo "$PIPE" > sh -c "read < $PIPE ; exec $CMD" & > local TASK="$!" > sleep .1 > mkdir -p "$CG_CPUSET" > mkdir -p "$CG_CPU"/sub > tee "$CG_CPU"/sub/cgroup.procs <<< "$TASK" > tee "$CG_CPU"/sub/cpu.shares <<< "$INNER_SHARES" > > tee "$CG_CPUSET"/cgroup.procs <<< "$TASK" > > tee "$PIPE" <<< sandox_done > rm "$PIPE" > } > > mkdir -p "$CGROUP_CPU" > mkdir -p "$CGROUP_CPUSET" > mkdir -p "$CGROUP_CPUSET_ME" > > tee "$CGROUP_CPUSET"/cpuset.cpus <<< "$CPU" > tee "$CGROUP_CPUSET"/cpuset.mems <<< "$CPU" > > tee "$CGROUP_CPUSET_ME"/cpuset.cpus <<< "$CPU_ME" > echo $$ | tee "$CGROUP_CPUSET_ME"/cgroup.procs > > run_sandbox "$CGROUP_CPUSET" "$CGROUP_CPU/cg-1" 50000 "stress --cpu 1" > run_sandbox "$CGROUP_CPUSET" "$CGROUP_CPU/cg-2" 2 "stress --cpu 1" > > read # click enter to cleanup and stop all stress procs > killall stress > sleep .2 > rmdir /sys/fs/cgroup/cpuset/slice/ > rmdir /sys/fs/cgroup/cpu/slice/{cg-{1,2}{/sub,},} > --- bash end > > > Thanks > Odin