Micron Confidential Hi Huang, Thanks to you for your comments and in the next version, these suggestions will be incorporated. Regards, Srini Micron Confidential +AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8- From: Huang, Ying +ADw-ying.huang+AEA-intel.com+AD4- Sent: Thursday, September 28, 2023 11:44 AM To: Ravis OpenSrc Cc: linux-mm+AEA-vger.kernel.org+ADs- linux-cxl+AEA-vger.kernel.org+ADs- linux-kernel+AEA-vger.kernel.org+ADs- linux-arch+AEA-vger.kernel.org+ADs- linux-api+AEA-vger.kernel.org+ADs- luto+AEA-kernel.org+ADs- tglx+AEA-linutronix.de+ADs- mingo+AEA-redhat.com+ADs- bp+AEA-alien8.de+ADs- dietmar.eggemann+AEA-arm.com+ADs- vincent.guittot+AEA-linaro.org+ADs- dave.hansen+AEA-linux.intel.com+ADs- hpa+AEA-zytor.com+ADs- arnd+AEA-arndb.de+ADs- akpm+AEA-linux-foundation.org+ADs- x86+AEA-kernel.org+ADs- aneesh.kumar+AEA-linux.ibm.com+ADs- gregory.price+AEA-memverge.com+ADs- John Groves+ADs- Srinivasulu Thanneeru+ADs- Eishan Mirakhur+ADs- Vishal Tanna Subject: +AFs-EXT+AF0- Re: +AFs-RFC PATCH 0/2+AF0- mm: mempolicy: Multi-tier interleaving CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message. Hi, Ravi, Thanks for the patch+ACE- Ravi Jonnalagadda +ADw-ravis.opensrc+AEA-micron.com+AD4- writes: +AD4- From: Ravi Shankar +ADw-ravis.opensrc+AEA-micron.com+AD4- +AD4- +AD4- Hello, +AD4- +AD4- The current interleave policy operates by interleaving page requests +AD4- among nodes defined in the memory policy. To accommodate the +AD4- introduction of memory tiers for various memory types (e.g., DDR, CXL, +AD4- HBM, PMEM, etc.), a mechanism is needed for interleaving page requests +AD4- across these memory types or tiers. Why do we need interleaving page allocation among memory tiers? I think that you need to make it more explicit. I guess that it's to increase maximal memory bandwidth for workloads? Yes, it is to increase the maximal memory bandwidth. +AD4- This can be achieved by implementing an interleaving method that +AD4- considers the tier weights. +AD4- The tier weight will determine the proportion of nodes to select from +AD4- those specified in the memory policy. +AD4- A tier weight can be assigned to each memory type within the system. What is the problem of the original interleaving? I think you need to make it explicit too. The original approach, page distribution is fixed 1:1, user/admin cannot be changed as required. The need to use different ratios has become evident from the introduction of new memory tiers that cover a wide range of memory types. With default interleaving we observed memory bandwidth utilization is less compare to the proposed approach with 85:15, when interleave between DRR and CXL. We will capture this information in next series. +AD4- Hasan Al Maruf had put forth a proposal for interleaving between two +AD4- tiers, namely the top tier and the low tier. However, this patch was +AD4- not adopted due to constraints on the number of available tiers. +AD4- +AD4- https://lore.kernel.org/linux-mm/YqD0+ACU-2FtzFwXvJ1gK6+AEA-cmpxchg.org/T/ +AD4- +AD4- New proposed changes: +AD4- +AD4- 1. Introducea sysfs entry to allow setting the interleave weight for each +AD4- memory tier. +AD4- 2. Each tier with a default weight of 1, indicating a standard 1:1 +AD4- proportion. +AD4- 3. Distribute the weight of that tier in a uniform manner across all nodes. +AD4- 4. Modifications to the existing interleaving algorithm to support the +AD4- implementation of multi-tier interleaving based on tier-weights. +AD4- +AD4- This is inline with Huang, Ying's presentation in lpc22, 16th slide in +AD4- https://lpc.events/event/16/contributions/1209/attachments/1042/1995// +AD4- Live+ACU-20In+ACU-20a+ACU-20World+ACU-20With+ACU-20Multiple+ACU-20Memory+ACU-20Types.pdf Thanks to refer to the original work about this. +AD4- Observed a significant increase (165+ACU-) in bandwidth utilization +AD4- with the newly proposed multi-tier interleaving compared to the +AD4- traditional 1:1 interleaving approach between DDR and CXL tier nodes, +AD4- where 85+ACU- of the bandwidth is allocated to DDR tier and 15+ACU- to CXL +AD4- tier with MLC -w2 option. It appears that +ACI-mlc+ACI- isn't an open source software. Better to use a open source software to test. And, even better to use a more practical workloads instead of a memory bandwidth/latency measurement tool. Sure, will try it. +AD4- Usage Example: +AD4- +AD4- 1. Set weights for DDR (tier4) and CXL(teir22) tiers. +AD4- echo 85 +AD4- /sys/devices/virtual/memory+AF8-tiering/memory+AF8-tier4/interleave+AF8-weight +AD4- echo 15 +AD4- /sys/devices/virtual/memory+AF8-tiering/memory+AF8-tier22/interleave+AF8-weight +AD4- +AD4- 2. Interleave between DRR(tier4, node-0) and CXL (tier22, node-1) using numactl +AD4- numactl -i0,1 mlc --loaded+AF8-latency W2 +AD4- +AD4- Srinivasulu Thanneeru (2): +AD4- memory tier: Introduce sysfs for tier interleave weights. +AD4- mm: mempolicy: Interleave policy for tiered memory nodes +AD4- +AD4- include/linux/memory-tiers.h +AHw- 27 +-+-+-+-+-+-+-+-- +AD4- include/linux/sched.h +AHw- 2 +- +AD4- mm/memory-tiers.c +AHw- 67 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------- +AD4- mm/mempolicy.c +AHw- 107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--- +AD4- 4 files changed, 174 insertions(+-), 29 deletions(-) -- Best Regards, Huang, Ying