On 20.09.24 11:06, Gregory Price wrote:
2. Coarse grained memory increases for 'normal' memory.
Can use memory hot-plug. Recovery of capacity likely to only be possible on
VM shutdown.
Is there are reason "movable" (ZONE_MOVABLE) is not an option, at least in
some setups? If not, why?
This seems like a bit of a muddied conversation.
Cleaning up my inbox ... well at least trying :)
"'normal' memory" has no defined meaning - so lets clear this up a bit
There is:
* System-RAM (memory managed by kernel allocators)
* Special Purpose Memory (generally presented as DAX)
> > System-RAM is managed as zones - the relevant ones are
* ZONE_NORMAL allows both movable and non-movable allocations
.. except in corner cases like MIGRATE_CMA :)
* ZONE_MOVABLE only allows non-movable allocations
(Caveat: this generally only applies to allocation, you can
violate this with stuff like pinning)
Note that long-term pinning is forbidden on MOVABLE, just like it is on
MIGRATE_CMA. So we try that common use cases cannot violate this.
Hotplug can be thought of as two discrete mechanisms
* Exposing capacity to the kernel (CXL DCD Transactions)
* Exposing capacity to allocators (mm/memory-hotplug.c)
> > 1) if the intent is to primarily utilize dynamic capacity for VMs, then
the host does not need (read: should not need) to map the memory as
System-RAM in the host. The VMM should be made to consume it directly
via DAX or otherwise.
That capacity is almost by definition "Capital G Guaranteed" to be
reclaimable regardless of what the guest does. A VMM can force a guest
to let go of resources - that's its job.
2) if the intent is to provide dynamic capacity to a host as System-RAM, then
recoverability is dictated by system usage of that capacity. If onlined
into ZONE_MOVABLE, then if the system has avoided doing things like pinning
those pages it should *generally* be recoverable (but not guaranteed).
There is, of course, the use case of memory overcommit -- in which case
you would want 2). But likely that's out of the picture for this tagged
memory.
For the virtualization discussion:
Hotplug and recoverability is a non-issue. The capacity should never be
exposed to system allocators and the VMM should be made to consume special
purpose memory directly. That's on the VMM/orchestration software to get right.
For the host System-RAM discussion:
Auto-onlined hotplug capacity presently defaults to ZONE_NORMAL, but we
discussed (yesterday, at Plumbers) changing this default to ZONE_MOVABLE.
The only concern is when insufficient ZONE_NORMAL exists to support
ZONE_MOVABLE capacity - but this is unlikely to be the general scenario AND
can be mitigated w/ existing mechanisms.
It might be worthwhile looking at
Documentation/admin-guide/mm/memory-hotplug.rst "auto-movable" memory
onlining polciy. It might not fit all sue cases, though (just like
ZONE_MOVABLE doesn't)
Manually onlined capacity defaults to ZONE_MOVABLE.
It would be nice to make this behavior consistent, since the general opinion
appears to be that this capacity should default to ZONE_MOVABLE.
It's much easier to shoot yourself into the foot with ZONE_MOVABLE,
that's why the default can be adjusted manually using "online_movable"
with e.g., memhp_default_state.
It's all a bit complicated, because there are various use cases and
mechanisms for memory hotplug ... IIRC RHEL defaults with its udev rules
to "ZONE_MOVABLE" on bare metal and "ZONE_NORMAL" in VMs. Except on
s390, where we default to "offline" (standby memory ....).
I once worked on a systemd unit to make this configuration easier (and
avoid udev rules), and possibly more "automatic" depending on the
detected environment.
--
Cheers,
David / dhildenb