Re: [LSF/MM/BPF TOPIC] Restricting or migrating unmovable kernel allocations from slow tier

"Harry (Hyeonggon) Yoo" <42.hyeyoo@xxxxxxxxx> · Mon, 10 Feb 2025 11:33:47 +0900

On Fri, Feb 07, 2025 at 04:54:10AM -0500, Gregory Price wrote:
> On Fri, Feb 07, 2025 at 06:34:43PM +0900, Honggyu Kim wrote:
> > On 2/7/2025 5:57 PM, Gregory Price wrote:
> > 
> > > The default kernel stack size is like 16kb.  You'd need like 100,000
> > > threads to eat up 1.5GB, and 2048 threads only eats like 32MB.
> > > 
> > > It's not an interesting amount of memory if you have a 20TB system.
> > 
> > The amount might be small, but having those data in slow tier can
> > make performance degradation if it is heavily accessed.
> > 
> > The number of accesses isn't linearly corelated to the size of the
> > memory region.
> > 
> 
> Right, I started by saying:
> 
> [CXL is] "generally not fit for kernel use"
> 
> I have the opinion that CXL memory should be defaulted to ZONE_MOVABLE,

Agreed, when the ratio of slow to fast capacity makes it feasible.

> but I understand the pressure on ZONE_NORMAL means this may not be
> possible for large capacities.

Yes, I this is when we start consider some ZONE_NORMAL capacity on CXL memory.

> I don't think the solution is to make kernel memory migratable and allow
> kernel allocations on CXL.

IMHO the relevant questions here are:

Premise: Some ZONE_NORMAL capacity exists on CXL memory
         due to its large capacity.

Q1. How aggressively should the kernel avoid allocating kernel allocations
from ZONE_NORMAL in slow tier (and instead reclaim pages in fast tier)? e.g.:
  - Only when there's no easily reclaimable memory?
  - Or as a last resort before OOM?
  - Or should certain types of kernel allocations simply not be allowed
    from slow tier?

Q2. If kernel allocations are made from slow tier anyway, would it be
worthwhile to migrate _certain types_ of kernel memory back to fast tier later
when free space becomes available? (sounds like a promotion policy)

> There's a reason most kernel allocations are not swappable.

Because most kernel allocations cannot be swapped, with a few exceptions.

However, there's non-LRU page migration functionality where kernel
allocations can be migrated.

I don't understand why we shouldn't introduce more kernel movable memory
if that turns out to be beneficial?

-- 
Harry