On Thu, Dec 19, 2019 at 03:07:17PM -0500, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > The effective protection of any given cgroup is a somewhat complicated > construct that depends on the ancestor's configuration, siblings' > configurations, as well as current memory utilization in all these > groups. I agree with that. It makes it a bit hard to determine the equilibrium in advance. > + * Consider the following example tree: > * > + * A A/memory.low = 2G, A/memory.current = 6G > + * //\\ > + * BC DE B/memory.low = 3G B/memory.current = 2G > + * C/memory.low = 1G C/memory.current = 2G > + * D/memory.low = 0 D/memory.current = 2G > + * E/memory.low = 10G E/memory.current = 0 > * > + * and memory pressure is applied, the following memory > + * distribution is expected (approximately*): > * > + * A/memory.current = 2G > + * B/memory.current = 1.3G > + * C/memory.current = 0.6G > + * D/memory.current = 0 > + * E/memory.current = 0 > * > + * *assuming equal allocation rate and reclaimability I think the assumptions for this example don't hold (anymore). Because reclaim rate depends on the usage above protection, the siblings won't be reclaimed equally and so the low_usage proportionality will change over time and the equilibrium distribution is IMO different (I'm attaching an Octave script to calculate it). As it depends on the initial usage, I don't think there can be given such a general example (for overcommit). > @@ -6272,12 +6262,63 @@ struct cgroup_subsys memory_cgrp_subsys = { > * for next usage. This part is intentionally racy, but it's ok, > * as memory.low is a best-effort mechanism. Although it's a different issue but since this updates the docs I'm mentioning it -- we treat memory.min the same, i.e. it's subject to the same race, however, it's not meant to be best effort. I didn't look into outcomes of potential misaccounting but the comment seems to miss impact on memory.min protection. > @@ -6292,52 +6333,29 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > [...] > + if (parent == root) { > + memcg->memory.emin = memcg->memory.min; > + memcg->memory.elow = memcg->memory.low; > + goto out; > } Shouldn't this condition be 'if (parent == root_mem_cgroup)'? (I.e. 1st level takes direct input, but 2nd and further levels redistribute only what they really got from parent.) Michal
% run as: octave-cli script % % Input configurations % ------------------- % E parent effective protection % n nominal protection of siblings set at the givel level % c current consumption -,,- % example from effective_protection 3. E = 2; n = [3 1 0 10]; c = [2 2 2 0]; % this converges to [1.16 0.84 0 0] % c = [6 2 2 0]; % keeps ratio [1.5 0.5 0 0] % c = [5 2 2 0]; % mixed ratio [1.45 0.55 0 0] % c = [8 2 2 0]; % mixed ratio [1.53 0.47 0 0] % example from effective_protection 5. %E = 2; %n = [1 0]; %c = [2 1]; % coming from close to equilibrium -> [1.50 0.50] %c = [100 100]; % coming from "infinity" -> [1.50 0.50] %c = [2 2]; % coming from uniformity -> [1.33 0.67] % example of recursion by default %E = 2; %n = [0 0]; %c = [2 1]; % coming from disbalance -> [1.33 0.67] %c = [100 100]; % coming from "infinity" -> [1.00 1.00] %c = [2 2]; % coming from uniformity -> [1.00 1.00] % example by using infinities (_without_ recursive protection) %E = 2; %n = [1e7 1e7]; %c = [2 1]; % coming from disbalance -> [1.33 0.67] %c = [100 100]; % coming from "infinity" -> [1.00 1.00] %c = [2 2]; % coming from uniformity -> [1.00 1.00] % Reclaim parameters % ------------------ % Minimal reclaim amount (GB) cluster = 4e-6; % Reclaim coefficient (think as 0.5^sc->priority) alpha = .1 % Simulation parameters % --------------------- epsilon = 1e-7; timeout = 1000; % Simulation loop % --------------------- % Simulation assumes siblings consumed the initial amount of memory (w/out % reclaim) and then the reclaim starts, all memory is reclaimable, i.e. treated % same. It simulates only non-low reclaim and assumes all memory.min = 0. ch = []; eh = []; rh = []; for t = 1:timeout % low_usage u = min(c, n); siblings = sum(u); % effective_protection() protected = min(n, c); % start with nominal e = protected * min(1, E / siblings); % normalize overcommit % recursive protection unclaimed = max(0, E - siblings); parent_overuse = sum(c) - siblings; if (unclaimed > 0 && parent_overuse > 0) overuse = max(0, c - protected); e += unclaimed * (overuse / parent_overuse); endif % get_scan_count() r = alpha * c; % assume all memory is in a single LRU list % 1bc63fb1272b ("mm, memcg: make scan aggression always exclude protection") sz = max(e, c); r .*= (1 - (e+epsilon) ./ (sz+epsilon)); % uncomment to debug prints e, c, r % nothing to reclaim, reached equilibrium if max(r) < epsilon break; endif % SWAP_CLUSTER_MAX r = max(r, (r > epsilon) .* cluster); c = max(c - r, 0); ch = [ch ; c]; eh = [eh ; e]; rh = [rh ; r]; endfor t c, e plot([ch, eh]) pause()