On Fri, Feb 07, 2025 at 04:20:24PM +0900, Byungchul Park wrote: > On Sat, Feb 01, 2025 at 02:04:17PM +0000, Matthew Wilcox wrote: > > We can work with from the easiest object >e.g. page table It's more efficient and easier to change page sizes than it is to make page tables migratable. It's also easier to reclaim cold pages eating up significantly more memory than the page table (which describes pages at ~8 bytes per page). Also, there's quite a bit of literature that shows page tables landing on remote nodes (cross-socket) has negative performance impacts. Putting them on CXL makes the problem worse. > struct page, `struct page` is a structure that describes a physically addressed page. It is common to access it by simply doing `pfn_to_page()`, which is a fairly simply conversion (bit more complex in sparsemem w/ sections) This is used in a lockless manner to acquire page references all over the kernel. Making that migratable is... ambitious, to say the least. > and kernel stack, The default kernel stack size is like 16kb. You'd need like 100,000 threads to eat up 1.5GB, and 2048 threads only eats like 32MB. It's not an interesting amount of memory if you have a 20TB system. > When it comes to this topic, the most important thing is the collected > *direction* from the community so that we can start the work under the > *direction*. > My thoughts here are that memory tiering is the wrong tool for the problem you are trying to solve. Maybe there's a world in which we propose a ZONE_MEMDESC which is exclusively used for `struct page` for a node. At least then you could design CXL capacities *around* that. ~Gregory