On Fri, Dec 20, 2024 at 03:59:58PM +0100, David Hildenbrand wrote: > On 20.12.24 15:45, Gregory Price wrote: > > When memory hotplug auto-online is enabled, hotplug memory blocks are > > onlined into ZONE_NORMAL by default. The `memhp_default_state` boot > > param allows runtime configuration, but no build-time config exists. > > + you can configure it at runtime. > > > > > Add a build-time configuration option to change default hotplug zone. > > > > build config: > > MEMHP_DEFAULT_TYPE > > > > Selections: > > MEMHP_DEFAULT_TYPE_NORMAL => mhp_default_online_type = "online" > > MEMHP_DEFAULT_TYPE_MOVABLE => mhp_default_online_type = "online_movable" > > > > When MEMORY_HOTPLUG_DEFAULT_ONLINE is disabled, MEMHP_DEFAULT_TYPE is > > set to "offline" to match the current system behavior. > > > > ZONE_NORMAL still remains the default, because for systems with a large > > amount of hotplug memory, defaulting it to ZONE_MOVABLE may result in > > portions failing to online if sufficient ZONE_NORMAL memory does not > > exist to describe it. > > > > What's the use case? > > I'm hoping that we can move away from the compile-time option and let user > space, who better knows what to do (especially with different kinds of > memory having different requirements) configure auto-onlining or online > manually (e.g., devdax). > At Meta we have a fairly complex boot process that goes through multiple kernels before we get to a target kernel to run workloads. Each of those kernels may have health-checks that want to see the memory is online. The build switch makes this particular feature consistent for us across all those kernels without having to carry the boot parameter. Eventually we'd like to move to udev, but it's not feasible for us right now due to the state of CXL BIOS/Platform/Drivers - driver-management does not work reliable for all platforms and all devices. This gets us where we're going while the rest catch up. > For example, in RHEL we traditionally use udev rules, because we want a > different behavior on bare-metal vs. VMs, but they are not particularly easy > to extend to implement wilder policies. > > For a while I worked on a systemd unit [1] to configure+handle memory > onlining so we can get rid of the udev rules we use in RHEL. But it only > configured+handled having "one type of hotplugged memort". > > I'm planning on picking that up again at some point, to also make it > possible to handle different policies for different memory types. > > For example, maybe someone wants to auto-online virtio-mem memory to > ZONE_NORMAL, but let onlining of devdax memory be handled by the devdax > utility (e.g. ZONE_MOVABLE). We can identify in some cases "what" memory was > added using /proc/iomem. > Generally I agree this is the way to go. But in those cases - you're probably not turning MEMORY_HOTPLUG_DEFAULT_ONLINE on anyway. So this doesn't really affect that usage pattern. ~Gregory