Re: [PATCH v3] Weighted interleave auto-tuning

Gregory Price <gourry@xxxxxxxxxx> · Fri, 24 Jan 2025 10:48:16 -0500



On Fri, Jan 24, 2025 at 05:58:09AM +0000, Matthew Wilcox wrote:
> On Wed, Jan 15, 2025 at 10:58:54AM -0800, Joshua Hahn wrote:
> > On machines with multiple memory nodes, interleaving page allocations
> > across nodes allows for better utilization of each node's bandwidth.
> > Previous work by Gregory Price [1] introduced weighted interleave, which
> > allowed for pages to be allocated across NUMA nodes according to
> > user-set ratios.
> 
> I still don't get it.  You always want memory to be on the local node or
> the fabric gets horribly congested and slows you right down.  But you're
> not really talking about NUMA, are you?  You're talking about CXL.
> 
> And CXL is terrible for bandwidth.  I just ran the numbers.
> 
> On a current Intel top-end CPU, we're looking at 8x DDR5-4800 DIMMs,
> each with a bandwidth of 38.4GB/s for a total of 300GB/s.
> 
> For each CXL lane, you take a lane of PCIe gen5 away.  So that's
> notionally 32Gbit/s, or 4GB/s per lane.  But CXL is crap, and you'll be
> lucky to get 3 cachelines per 256 byte packet, dropping you down to 3GB/s.
> You're not going to use all 80 lanes for CXL (presumably these CPUs are
> going to want to do I/O somehow), so maybe allocate 20 of them to CXL.
> That's 60GB/s, or a 20% improvement in bandwidth.  On top of that,
> it's slow, with a minimum of 10ns latency penalty just from the CXL
> encode/decode penalty.
>