Re: [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 19, 2019 at 03:07:17PM -0500, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> The effective protection of any given cgroup is a somewhat complicated
> construct that depends on the ancestor's configuration, siblings'
> configurations, as well as current memory utilization in all these
> groups.
I agree with that. It makes it a bit hard to determine the equilibrium
in advance.


> + *    Consider the following example tree:
>   *
> + *        A      A/memory.low = 2G, A/memory.current = 6G
> + *       //\\
> + *      BC  DE   B/memory.low = 3G  B/memory.current = 2G
> + *               C/memory.low = 1G  C/memory.current = 2G
> + *               D/memory.low = 0   D/memory.current = 2G
> + *               E/memory.low = 10G E/memory.current = 0
>   *
> + *    and memory pressure is applied, the following memory
> + *    distribution is expected (approximately*):
>   *
> + *      A/memory.current = 2G
> + *      B/memory.current = 1.3G
> + *      C/memory.current = 0.6G
> + *      D/memory.current = 0
> + *      E/memory.current = 0
>   *
> + *    *assuming equal allocation rate and reclaimability
I think the assumptions for this example don't hold (anymore).
Because reclaim rate depends on the usage above protection, the siblings
won't be reclaimed equally and so the low_usage proportionality will
change over time and the equilibrium distribution is IMO different (I'm
attaching an Octave script to calculate it).

As it depends on the initial usage, I don't think there can be given
such a general example (for overcommit).


> @@ -6272,12 +6262,63 @@ struct cgroup_subsys memory_cgrp_subsys = {
>   * for next usage. This part is intentionally racy, but it's ok,
>   * as memory.low is a best-effort mechanism.
Although it's a different issue but since this updates the docs I'm
mentioning it -- we treat memory.min the same, i.e. it's subject to the
same race, however, it's not meant to be best effort. I didn't look into
outcomes of potential misaccounting but the comment seems to miss impact
on memory.min protection.

> @@ -6292,52 +6333,29 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> [...]
> +	if (parent == root) {
> +		memcg->memory.emin = memcg->memory.min;
> +		memcg->memory.elow = memcg->memory.low;
> +		goto out;
>  	}
Shouldn't this condition be 'if (parent == root_mem_cgroup)'? (I.e. 1st
level takes direct input, but 2nd and further levels redistribute only
what they really got from parent.)


Michal

% run as: octave-cli script
%
% Input configurations
% -------------------
% E parent effective protection
% n nominal protection of siblings set at the givel level
% c current consumption -,,-

% example from effective_protection 3.
E = 2;
n = [3 1 0 10];
c = [2 2 2 0];  % this converges to      [1.16 0.84 0 0]
% c = [6 2 2 0];  % keeps ratio          [1.5 0.5 0 0]
% c = [5 2 2 0];  % mixed ratio          [1.45 0.55 0 0]
% c = [8 2 2 0];  % mixed ratio          [1.53 0.47 0 0]

% example from effective_protection 5.
%E = 2;
%n = [1 0];
%c = [2 1];  % coming from close to equilibrium  -> [1.50 0.50]
%c = [100 100];  % coming from "infinity"        -> [1.50 0.50]
%c = [2 2];   % coming from uniformity            -> [1.33 0.67]

% example of recursion by default
%E = 2;
%n = [0 0];
%c = [2 1];  % coming from disbalance            -> [1.33 0.67]
%c = [100 100];  % coming from "infinity"        -> [1.00 1.00]
%c = [2 2];   % coming from uniformity           -> [1.00 1.00]

% example by using infinities (_without_ recursive protection)
%E = 2;
%n = [1e7 1e7];
%c = [2 1];  % coming from disbalance            -> [1.33 0.67]
%c = [100 100];  % coming from "infinity"        -> [1.00 1.00]
%c = [2 2];   % coming from uniformity           -> [1.00 1.00]

% Reclaim parameters
% ------------------

% Minimal reclaim amount (GB)
cluster = 4e-6;

% Reclaim coefficient (think as 0.5^sc->priority)
alpha = .1

% Simulation parameters
% ---------------------
epsilon = 1e-7;
timeout = 1000;

% Simulation loop
% ---------------------
% Simulation assumes siblings consumed the initial amount of memory (w/out
% reclaim) and then the reclaim starts, all memory is reclaimable, i.e. treated
% same. It simulates only non-low reclaim and assumes all memory.min = 0.

ch = [];
eh = [];
rh = [];

for t = 1:timeout
	% low_usage
	u = min(c, n);
	siblings = sum(u);

	% effective_protection()
	protected = min(n, c);                % start with nominal
	e = protected * min(1, E / siblings); % normalize overcommit

	% recursive protection
	unclaimed = max(0, E - siblings);
	parent_overuse = sum(c) - siblings;
	if (unclaimed > 0 && parent_overuse > 0)
		overuse = max(0, c - protected);
		e += unclaimed * (overuse / parent_overuse);
	endif

	% get_scan_count()
	r = alpha * c;             % assume all memory is in a single LRU list

	% 1bc63fb1272b ("mm, memcg: make scan aggression always exclude protection")
	sz = max(e, c);
	r .*= (1 - (e+epsilon) ./ (sz+epsilon));

	% uncomment to debug prints
	e, c, r
	
	% nothing to reclaim, reached equilibrium
	if max(r) < epsilon
		break;
	endif

	% SWAP_CLUSTER_MAX
	r = max(r, (r > epsilon) .* cluster);
	c = max(c - r, 0);
	
	ch = [ch ; c];
	eh = [eh ; e];
	rh = [rh ; r];
endfor

t
c, e
plot([ch, eh])
pause()

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux