On 7/23/18 2:33 PM, David Rientjes wrote:
On Mon, 23 Jul 2018, David Rientjes wrote:
The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive. If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.
It likely should also be per node.
Have you benchmarked making the non-huge zero page per-node?
Not since we disable it :) I will, though. The more concerning issue for
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
compacting memory for allocating the hzp in real time after it has been
reclaimed. We've observed this happening tens or hundreds of thousands
of times on some systems. It will be 2MB per node on x86 if the data
suggests we should make it NUMA aware, I don't think the cost is too high
to leave it persistently available even under memory pressure if
use_zero_page is enabled.
Measuring access latency to 4GB of memory on Naples I observe ~6.7%
slower access latency intrasocket and ~14% slower intersocket.
use_zero_page is currently a simple thp flag, meaning it rejects writes
where val != !!val, so perhaps it would be best to overload it with
additional options? I can imagine 0x2 defining persistent allocation so
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
the hzp should be per node. Implementing persistent allocation fixes our
concern with it, so I'd like to start there. Comments?
Sounds worth trying to me :-) It might be worth making it persistent by
default. Keeping 2MB memory unreclaimable sounds not harmful for the use
case which prefer to use THP.