On Thu, 25 Jan 2024, Matthew Wilcox wrote: > On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote: > > There is a lot of excitement around upcoming CXL type 3 memory expansion > > devices and their cost savings potential. As the industry starts to > > adopt this technology, one of the key components in strategic planning is > > how the upstream Linux kernel will support various tiered configurations > > to meet various user needs. I think it goes without saying that this is > > quite interesting to cloud providers as well as other hyperscalers :) > > I'm not excited. I'm disappointed that people are falling for this scam. > CXL is the ATM of this decade. The protocol is not fit for the purpose > of accessing remote memory, adding 10ns just for an encode/decode cycle. > Hands up everybody who's excited about memory latency increasing by 17%. > Right, I don't think that anybody is claiming that we can leverage locally attached CXL memory as through it was DRAM on the same or remote socket and that there won't be a noticable impact to application performance while the memory is still across the device. It does offer several cost savings benefits for offloading of cold memory, though, if locally attached and I think the support for that use case is inevitable -- in fact, Linux has some sophisticated support for the locally attached use case already. > Then there are the lies from the vendors who want you to buy switches. > Not one of them are willing to guarantee you the worst case latency > through their switches. > I should have prefaced this thread by saying "locally attached CXL memory expansion", because that's the primary focus of many of the folks on this email thread :) FWIW, I fully agree with your evaluation for memory pooling and some of the extensions provided by CXL 2.0. I think that a lot of the pooling concepts are currently being overhyped, that's just my personal opinion. Happy to talk about the advantages and disadvantages (as well as the use cases), but I remain unconvinced on memory pooling use cases. > The concept is wrong. Nobody wants to tie all of their machines together > into a giant single failure domain. There's no possible redundancy > here. Availability is diminished; how do you upgrade firmware on a > switch without taking it down? Nobody can answer my contentions about > contention either; preventing a single machine from hogging access to > a single CXL endpoint seems like an unsolved problem. > > CXL is great for its real purpose of attaching GPUs and migrating memory > back and forth in a software-transparent way. We should support that, > and nothing more. > > We should reject this technology before it harms our kernel and the > entire industry. There's a reason that SGI died. Nobody wants to buy > single image machines the size of a data centre. > >