Re: [PATCH 1/2] Add mempressure cgroup

Kamezawa Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Tue, 08 Jan 2013 17:24:32 +0900

(2013/01/08 16:29), Anton Vorontsov wrote:
On Mon, Jan 07, 2013 at 05:51:46PM +0900, Kamezawa Hiroyuki wrote:
[...]
I'm just curious..

Thanks for taking a look! :)

[...]
+/*
+ * The window size is the number of scanned pages before we try to analyze
+ * the scanned/reclaimed ratio (or difference).
+ *
+ * It is used as a rate-limit tunable for the "low" level notification,
+ * and for averaging medium/oom levels. Using small window sizes can cause
+ * lot of false positives, but too big window size will delay the
+ * notifications.
+ */
+static const uint vmpressure_win = SWAP_CLUSTER_MAX * 16;
+static const uint vmpressure_level_med = 60;
+static const uint vmpressure_level_oom = 99;
+static const uint vmpressure_level_oom_prio = 4;
+

Hmm... isn't this window size too small ?
If vmscan cannot find a reclaimable page while scanning 2M of pages in a zone,
oom notify will be returned. Right ?

Yup, you are right, if we were not able to find anything within the window
size (which is 2M, but see below), then it is effectively the "OOM level".
The thing is, the vmpressure reports... the pressure. :) Or, the
allocation cost, and if the cost becomes high, it is no good.

The 2M is, of course, not ideal. And the "ideal" depends on many factors,
alike to vmstat. And, actually I dream about deriving the window size from
zone->stat_threshold, which would make the window automatically adjustable
for different "machine sizes" (as we do in calculate_normal_threshold(),
in vmstat.c).

But again, this is all "implementation details"; tunable stuff that we can
either adjust ourselves as needed, or try to be smart, i.e. apply some
heuristics, again, as in vmstat.

Hmm, I like automatic adjustment for things like this (but may be need to be tunable by
user). My concern is, for example, that if a qemu-kvm with pci-passthrough running on
a node using the most of memory on it, the interface will say "Hey it's near to OOM"
to users. We may need a complicated heuristics ;)

Anyway, your approach seems interesting to me but it seems peaky to usual users.
Uses should know what they should check (vmstat, zoneinfo, malloc latency ??) when they
get notify before rising real alarm. (not explained in the doc.)
For example, if the user takes care of usage of swap, he should check it.

I'm glad if you explain in Doc that this interface just makes a hint and notify status
of _recent_ vmscans of some amount of window. That means latency of recent memory allocations.
Users should confirm the real status and make the final judge by themselves.
The point is that this notify is important because it's quick and related to ongoing memory
allocation latency. But kernel is not sure there are long-standing heavy vm pressure.

I'm sorry if I misundestand the concept.

Thank you,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>