On Mon, Jul 10, 2023 at 7:44 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > Hello. > > On Fri, Jun 30, 2023 at 04:22:28PM -0700, Ivan Babrou <ivan@xxxxxxxxxxxxxx> wrote: > > As you might've noticed from the output, splitting the loop into two > > makes the code run 10x faster. > > That is curious. > > > We're running Linux v6.1 (the output is from v6.1.25) with no patches > > that touch the cgroup or mm subsystems, so you can assume vanilla > > kernel. > > Have you watched for this on older kernels too? We've been on v6.1 for quite a while now, but it's possible that we weren't paying enough attention before to notice. > > I am happy to try out patches or to do some tracing to help understand > > this better. > > I see in your reproducer you tried swapping order of controllers > flushed. > Have you also tried flushing same controller twice (in the inner loop)? > (Despite the expectation is that it shouldn't be different from half the > scenario where ran two loops.) Same controller twice is fast (whether it's mem + mem or cpu + cpu): warm-up completed: 17.24s [manual / cpu-stat + mem-stat] completed: 1.02s [manual / mem-stat+mem-stat] completed: 0.59s [manual / cpu-stat+cpu-stat] completed: 0.44s [manual / mem-stat] completed: 0.16s [manual / cpu-stat] running completed: 14.32s [manual / cpu-stat + mem-stat] completed: 1.25s [manual / mem-stat+mem-stat] completed: 0.42s [manual / cpu-stat+cpu-stat] completed: 0.12s [manual / mem-stat] completed: 0.50s [manual / cpu-stat]