On 04/24/2018 09:27 AM, Michal Hocko wrote:
On Mon 23-04-18 11:29:21, Chris Friesen wrote:
On 04/22/2018 08:46 AM, Michal Hocko wrote:
On Fri 20-04-18 11:43:07, Chris Friesen wrote:
The specific scenario I'm considering is that of a hypervisor host. I have
system management stuff running on the host that may need more than one
core, and currently these host tasks might be affined to cores from multiple
NUMA nodes. I'd like to put a cap on how much memory the host tasks can
allocate from each NUMA node in order to ensure that there is a guaranteed
amount of memory available for VMs on each NUMA node.
Is this possible, or are the knobs just not there?
Not possible right now. What would be the policy when you reach the
limit on one node? Fallback to other nodes? What if those hit the limit
as well? OOM killer or an allocation failure?
I'd envision it working exactly the same as the current memory cgroup, but
with the ability to specify optional per-NUMA-node limits in addition to
system-wide.
OK, so you would have a per numa percentage of the hard limit?
I think it'd make more sense as a hard limit per NUMA node.
But more
importantly, note that the page allocation is done way before the charge
so we do not have any control over where the memory get allocated from
so we would have to play nasty tricks in the reclaim to somehow balance
NUMA charge pools.
Reading the docs on the memory controller it does seem a bit tricky. I had
envisioned some sort of "is there memory left in this group" check before
"approving" the memory allocation, but it seems it doesn't really work that way.
Chris