Re: [LSF/MM/BPF TOPIC] Restricting or migrating unmovable kernel allocations from slow tier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.02.25 14:29, Hyeonggon Yoo wrote:
Hi,

Byungchul and I would like to suggest a topic about the performance impact of
kernel allocations on CXL memory.

As CXL-enabled servers and memory devices are being developed, CXL-supported
hardware is expected to continue emerging in the coming years.

The Linux kernel supports hot-plugging CXL memory via dax/kmem functionality.
The hot-plugged memory allows either unmovable kernel allocations
(ZONE_NORMAL), or restricts them to movable allocations (ZONE_MOVABLE)
depending on the hot-plug policy.

Recently, Byungchul and I observed a measurable performance degradation with
memhp_default_state=online compared to memhp_default_state=online_movable
on a server where the ratio of memory capacity between DRAM and CXL is 1:2
when running the llama.cpp workload with the default mempolicy.
The workload performs LLM inference and pressures the memory subsystem
due to its large working set size.

Obviously, allowing kernel allocations from CXL memory degrades performance
because kernel memory like page tables, kernel stacks, and slab allocations,
is accessed frequently and may reside in physical memory with significantly
higher access latency.

However, as far as I can tell there are at least two reasons why we need to
support ZONE_NORMAL for CXL memory (please add if there are more):
   1. When hot-plugging a huge amount of CXL memory, the size of
      the struct page array might not fit into DRAM
      -> This could be relaxed with memmap_on_memory

There are some others, although most are less significant, and I tried documenting them here:

https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html#zone-movable-sizing-considerations


E.g., a 4 KiB page requires a single PTE (8 bytes) to be mapped into user space, corresponding to 0.2 %. At least for anonymous memory, PMD-sized THPs don't help, because we still have to allocate the page table to be prepared for a PMD->PTE remapping. In the worst case, the directmap requires another 0.2 % (but usually, we rely on PMD mappings). So that usage depends on how you are intending to use the CXL memory (e.g., pagecache vs. anonymous memory).


> 2. To hot-unplug CXL memory, pages in CXL memory should be migrated to DRAM, > which means sometimes some portion of CXL memory should be ZONE_NORMAL.

I don't quite understand that argument for ZONE_NORMAL.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux