Re: [RFC] Virtualizing tagged disaggregated memory capacity (app specific, multi host shared)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.09.24 11:06, Gregory Price wrote:
2. Coarse grained memory increases for 'normal' memory.
     Can use memory hot-plug. Recovery of capacity likely to only be possible on
     VM shutdown.

Is there are reason "movable" (ZONE_MOVABLE) is not an option, at least in
some setups? If not, why?



This seems like a bit of a muddied conversation.

Cleaning up my inbox ... well at least trying :)


"'normal' memory" has no defined meaning - so lets clear this up a bit

There is:
* System-RAM (memory managed by kernel allocators)
* Special Purpose Memory (generally presented as DAX)
> > System-RAM is managed as zones - the relevant ones are
* ZONE_NORMAL allows both movable and non-movable allocations

.. except in corner cases like MIGRATE_CMA :)

* ZONE_MOVABLE only allows non-movable allocations
   (Caveat: this generally only applies to allocation, you can
    violate this with stuff like pinning)

Note that long-term pinning is forbidden on MOVABLE, just like it is on MIGRATE_CMA. So we try that common use cases cannot violate this.


Hotplug can be thought of as two discrete mechanisms
* Exposing capacity to the kernel (CXL DCD Transactions)
* Exposing capacity to allocators (mm/memory-hotplug.c)
> > 1) if the intent is to primarily utilize dynamic capacity for VMs, then
    the host does not need (read: should not need) to map the memory as
    System-RAM in the host. The VMM should be made to consume it directly
    via DAX or otherwise.

    That capacity is almost by definition "Capital G Guaranteed" to be
    reclaimable regardless of what the guest does. A VMM can force a guest
    to let go of resources - that's its job.

2) if the intent is to provide dynamic capacity to a host as System-RAM, then
    recoverability is dictated by system usage of that capacity. If onlined
    into ZONE_MOVABLE, then if the system has avoided doing things like pinning
    those pages it should *generally* be recoverable (but not guaranteed).

There is, of course, the use case of memory overcommit -- in which case you would want 2). But likely that's out of the picture for this tagged memory.



For the virtualization discussion:

Hotplug and recoverability is a non-issue.  The capacity should never be
exposed to system allocators and the VMM should be made to consume special
purpose memory directly. That's on the VMM/orchestration software to get right.


For the host System-RAM discussion:

Auto-onlined hotplug capacity presently defaults to ZONE_NORMAL, but we
discussed (yesterday, at Plumbers) changing this default to ZONE_MOVABLE.

The only concern is when insufficient ZONE_NORMAL exists to support
ZONE_MOVABLE capacity - but this is unlikely to be the general scenario AND
can be mitigated w/ existing mechanisms.

It might be worthwhile looking at Documentation/admin-guide/mm/memory-hotplug.rst "auto-movable" memory onlining polciy. It might not fit all sue cases, though (just like ZONE_MOVABLE doesn't)


Manually onlined capacity defaults to ZONE_MOVABLE.

It would be nice to make this behavior consistent, since the general opinion
appears to be that this capacity should default to ZONE_MOVABLE.

It's much easier to shoot yourself into the foot with ZONE_MOVABLE, that's why the default can be adjusted manually using "online_movable" with e.g., memhp_default_state.

It's all a bit complicated, because there are various use cases and mechanisms for memory hotplug ... IIRC RHEL defaults with its udev rules to "ZONE_MOVABLE" on bare metal and "ZONE_NORMAL" in VMs. Except on s390, where we default to "offline" (standby memory ....).

I once worked on a systemd unit to make this configuration easier (and avoid udev rules), and possibly more "automatic" depending on the detected environment.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux