On Wed, Feb 26, 2020 at 05:46:32PM +0100, Michal Koutný wrote: > On Tue, Feb 25, 2020 at 01:40:14PM -0500, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > Hm, this example doesn't change with my patch because there is no > > "floating" protection that gets distributed among the siblings. > Maybe it had changed even earlier and the example obsoleted. > > > In my testing with the above parameters, the equilibrium still comes > > out to roughly this distribution. > I'm attaching my test (10-times smaller) and I'm getting these results: > > > /sys/fs/cgroup/test.slice/memory.current:838750208 > > /sys/fs/cgroup/test.slice/pressure.service/memory.current:616972288 > > /sys/fs/cgroup/test.slice/test-A.slice/memory.current:221782016 > > /sys/fs/cgroup/test.slice/test-A.slice/B.service/memory.current:123428864 > > /sys/fs/cgroup/test.slice/test-A.slice/C.service/memory.current:93495296 > > /sys/fs/cgroup/test.slice/test-A.slice/D.service/memory.current:4702208 > > /sys/fs/cgroup/test.slice/test-A.slice/E.service/memory.current:155648 > > (I'm running that on 5.6.0-rc2 + first two patches of your series.) > > That's IMO closer to the my simulation (1.16:0.84) > than the example prediction (1.3:0.6) I think you're correct about the moving points of equilibrium. I'm going to experiment more with your script. I had written it off as noise from LRU rotations, reclaim concurrency etc. but your script shows that these points do shift around as the input parameters change. This is useful, thanks. But AFAICS, my patches here do not change or introduce this behavior so it's a longer-standing issue. > > It's just to illustrate the pressure weight, not to reflect each > > factor that can influence the equilibrium. > But it's good to have some idea about the equilibrium when configuring > the values. Agreed. > > > > @@ -6272,12 +6262,63 @@ struct cgroup_subsys memory_cgrp_subsys = { > > > > * for next usage. This part is intentionally racy, but it's ok, > > > > * as memory.low is a best-effort mechanism. > > > Although it's a different issue but since this updates the docs I'm > > > mentioning it -- we treat memory.min the same, i.e. it's subject to the > > > same race, however, it's not meant to be best effort. I didn't look into > > > outcomes of potential misaccounting but the comment seems to miss impact > > > on memory.min protection. > > > > Yeah I think we can delete that bit. > Erm, which part? > Make the racy behavior undocumented or that it applies both memory.low > and memory.min? I'm honestly not sure what it's trying to say. Reclaim and charging can happen concurrently, so the protection enforcement, whether it's min or low, has always been racy. Even OOM itself is racy. Caching the parental value while iterating over a group of siblings shouldn't fundamentally alter that outcome. We do enough priority iterations and reclaim loops from the allocator that we shouldn't see premature OOMs or apply totally incorrect balances because of that. So IMO the statement that "this is racy, but low is best-effort anyway, so it's okay" is misleading. I think more accurate would be to say that reclaim is fundamentally racy, so a bit of additional noise here doesn't matter. Either way, I don't find this paragraph all that useful. If you think it's informative, could you please let me know which important aspect it communicates? Otherwise, I'm inclined to delete it.