Re: [LSF/MM/BPF TOPIC] Restricting or migrating unmovable kernel allocations from slow tier

Gregory Price <gourry@xxxxxxxxxx> · Sat, 1 Feb 2025 11:30:24 -0500

On Sun, Feb 02, 2025 at 12:13:23AM +0900, Hyeonggon Yoo wrote:
> On Sat, Feb 1, 2025 at 11:04 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > This all seems like a grand waste of time.  Don't do that.  Don't allow
> > kernel allocations from CXL at all. Don't build systems that have
> > vast quantities of CXL memory (or if you do, expose it as really fast
> > swap, not as memory).
> >
> 
> Hi, Matthew. Thank you for sharing your opinion.
> 
> I don't want to introduce too much complexity to MM due to CXL madness either,
> but I think at least we need to guide users who buy CXL hardware to avoid
> doing stupid things.
> 
> My initial subject was "Clearly documenting the use cases of
> memhp_default_state=online{,_kernel}" because at first glance,
> it was deemed usable for allowing kernel allocations from CXL,
> which turned out to be not after some evaluation.
>

This was the motivation for implementing the build-time switch for
memhp_default_state.  Distros and builders can now have flexibility
to make this their default policy for hotplug memory blocks.

https://lore.kernel.org/linux-mm/20241226182918.648799-1-gourry@xxxxxxxxxx/

I don't normally agree with Willy's hard takes on CXL, but I do agree
that it's generally not fit for kernel use - and I share general skepticism
that movement-based tiering is fundamentally better than reclaim/swap
semantics (though I have been convinced otherwise in some scenarios,
and I think some clear performance benefits in many scenarios are lost
by treating it as super-fast-swap).

Rather than ask whether we can make portions of the kernel more ammenable
to movable allocations, I think it's more beneficial to focus on whether
we can reduce the ZONE_NORMAL cost of ZONE_MOVABLE capacity. That seems
(to me) like the actual crux of this particular issue.

~Gregory