On Thu, 25 Jan 2024, Matthew Wilcox wrote: > On Thu, Jan 25, 2024 at 12:04:37PM -0800, David Rientjes wrote: > > On Thu, 25 Jan 2024, Matthew Wilcox wrote: > > > On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote: > > > > There is a lot of excitement around upcoming CXL type 3 memory expansion > > > > devices and their cost savings potential. As the industry starts to > > > > adopt this technology, one of the key components in strategic planning is > > > > how the upstream Linux kernel will support various tiered configurations > > > > to meet various user needs. I think it goes without saying that this is > > > > quite interesting to cloud providers as well as other hyperscalers :) > > > > > > I'm not excited. I'm disappointed that people are falling for this scam. > > > CXL is the ATM of this decade. The protocol is not fit for the purpose > > > of accessing remote memory, adding 10ns just for an encode/decode cycle. > > > Hands up everybody who's excited about memory latency increasing by 17%. > > > > Right, I don't think that anybody is claiming that we can leverage locally > > attached CXL memory as through it was DRAM on the same or remote socket > > and that there won't be a noticable impact to application performance > > while the memory is still across the device. > > > > It does offer several cost savings benefits for offloading of cold memory, > > though, if locally attached and I think the support for that use case is > > inevitable -- in fact, Linux has some sophisticated support for the > > locally attached use case already. > > > > > Then there are the lies from the vendors who want you to buy switches. > > > Not one of them are willing to guarantee you the worst case latency > > > through their switches. > > > > I should have prefaced this thread by saying "locally attached CXL memory > > expansion", because that's the primary focus of many of the folks on this > > email thread :) > > That's a huge relief. I was not looking forward to the patches to add > support for pooling (etc). > > Using CXL as cold-data-storage makes a certain amount of sense, although > I'm not really sure why it offers an advantage over NAND. It's faster > than NAND, but you still want to bring it back locally before operating > on it. NAND is denser, and consumes less power while idle. NAND comes > with a DMA controller to move the data instead of relying on the CPU to > move the data around. And of course moving the data first to CXL and > then to swap means that it's got to go over the memory bus multiple > times, unless you're building a swap device which attaches to the > other end of the CXL bus ... > This is **exactly** the type of discussion we're looking to have :) There are some things that I've chatted informally with folks about that I'd like to bring to the forum: - Decoupling CPU migration from memory migration for NUMA Balancing (or perhaps deprecating CPU migration entirely) - Allowing NUMA Balancing to do migration as part of a kthread asynchronous to the NUMA hint fault, in kernel context - Abstraction for future hardware devices that can provide an expanded view into page hotness that can be leveraged in different areas of the kernel, including as a backend for NUMA Balanacing to replace NUMA hint faults - Per-container support for configuring balancing and memory migration - Opting certain types of memory into NUMA Balancing (like tmpfs) while leaving other types alone - Utilizing hardware accelerated memory migration as a replacement for the traditional migrate_pages() path when available I could go code all of this up and spend an enormous amount of time doing so only to get NAKed by somebody because I'm ripping out their critical use case that I just didn't know about :) There's also the question of whether DAMON should be the source of truth for this or it should be decoupled. My dream world would be where we could discuss various use cases for locally attached CXL memory and determine, as a group, what the shared, comprehensive "Linux vision" for it is and do so before LSF/MM/BPF. In a perfect world, we could block out an expanded MM session in Salt Lake City to bring all these concepts together, what approaches sound reasonable vs unreasonable, and leave that conference with a clear understanding of what needs to happen.