Micron Confidential Micron Confidential +AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8- From: Huang, Ying +ADw-ying.huang+AEA-intel.com+AD4- Sent: Friday, December 15, 2023 10:32 AM To: Srinivasulu Opensrc Cc: linux-cxl+AEA-vger.kernel.org+ADs- linux-mm+AEA-kvack.org+ADs- Srinivasulu Thanneeru+ADs- aneesh.kumar+AEA-linux.ibm.com+ADs- dan.j.williams+AEA-intel.com+ADs- gregory.price+ADs- mhocko+AEA-suse.com+ADs- tj+AEA-kernel.org+ADs- john+AEA-jagalactic.com+ADs- Eishan Mirakhur+ADs- Vinicius Tavares Petrucci+ADs- Ravis OpenSrc+ADs- Jonathan.Cameron+AEA-huawei.com+ADs- linux-kernel+AEA-vger.kernel.org Subject: +AFs-EXT+AF0- Re: +AFs-RFC PATCH v2 0/2+AF0- Node migration between memory tiers CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message. +ADw-sthanneeru.opensrc+AEA-micron.com+AD4- writes: +AD4- From: Srinivasulu Thanneeru +ADw-sthanneeru.opensrc+AEA-micron.com+AD4- +AD4- +AD4- The memory tiers feature allows nodes with similar memory types +AD4- or performance characteristics to be grouped together in a +AD4- memory tier. However, there is currently no provision for +AD4- moving a node from one tier to another on demand. +AD4- +AD4- This patch series aims to support node migration between tiers +AD4- on demand by sysadmin/root user using the provided sysfs for +AD4- node migration. +AD4- +AD4- To migrate a node to a tier, the corresponding node+IBk-s sysfs +AD4- memtier+AF8-override is written with target tier id. +AD4- +AD4- Example: Move node2 to memory tier2 from its default tier(i.e 4) +AD4- +AD4- 1. To check current memtier of node2 +AD4- +ACQ-cat /sys/devices/system/node/node2/memtier+AF8-override +AD4- memory+AF8-tier4 +AD4- +AD4- 2. To migrate node2 to memory+AF8-tier2 +AD4- +ACQ-echo 2 +AD4- /sys/devices/system/node/node2/memtier+AF8-override +AD4- +ACQ-cat /sys/devices/system/node/node2/memtier+AF8-override +AD4- memory+AF8-tier2 +AD4- +AD4- Usecases: +AD4- +AD4- 1. Useful to move cxl nodes to the right tiers from userspace, when +AD4- the hardware fails to assign the tiers correctly based on +AD4- memorytypes. +AD4- +AD4- On some platforms we have observed cxl memory being assigned to +AD4- the same tier as DDR memory. This is arguably a system firmware +AD4- bug, but it is true that tiers represent +ACo-ranges+ACo- of performance +AD4- and we believe it's important for the system operator to have +AD4- the ability to override bad firmware or OS decisions about tier +AD4- assignment as a fail-safe against potential bad outcomes. +AD4- +AD4- 2. Useful if we want interleave weights to be applied on memory tiers +AD4- instead of nodes. +AD4- In a previous thread, Huang Ying +ADw-ying.huang+AEA-intel.com+AD4- thought +AD4- this feature might be useful to overcome limitations of systems +AD4- where nodes with different bandwidth characteristics are grouped +AD4- in a single tier. +AD4- https://lore.kernel.org/lkml/87a5rw1wu8.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/ +AD4- +AD4- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0- +AD4- Version Notes: +AD4- +AD4- V2 : Changed interface to memtier+AF8-override from adistance+AF8-offset. +AD4- memtier+AF8-override was recommended by +AD4- 1. John Groves +ADw-john+AEA-jagalactic.com+AD4- +AD4- 2. Ravi Shankar +ADw-ravis.opensrc+AEA-micron.com+AD4- +AD4- 3. Brice Goglin +ADw-Brice.Goglin+AEA-inria.fr+AD4- It appears that you ignored my comments for V1 as follows ... https://lore.kernel.org/lkml/87o7f62vur.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/ Thank you Huang, Ying for pointing to this. https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live+ACU-20In+ACU-20a+ACU-20World+ACU-20With+ACU-20Multiple+ACU-20Memory+ACU-20Types.pdf In the presentation above, the adistance+AF8-offsets are per memtype. We believe that adistance+AF8-offset per node is more suitable and flexible since we can change it per node. If we keep adistance+AF8-offset per memtype, then we cannot change it for a specific node of a given memtype. https://lore.kernel.org/lkml/87jzpt2ft5.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/ I guess that you need to move all NUMA nodes with same performance metrics together? If so, That is why we previously proposed to place the knob in +ACI-memory+AF8-type+ACI-? (From: Huang, Ying ) Yes, memory+AF8-type would be group the related memories togather as single tier. We should also have a flexibility to move nodes between tiers, to address the issues described in usecases above. https://lore.kernel.org/lkml/87a5qp2et0.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/ This patch provides a way to move a node to the correct tier. We observed in test setups where DRAM and CXL are put under the same tier (memory+AF8-tier4). By using this patch, we can move the CXL node away from the DRAM-linked tier4 and put it in the desired tier. Regards, Srini -- Best Regards, Huang, Ying +AD4- V1 : Introduced adistance+AF8-offset sysfs. +AD4- +AD4- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0- +AD4- +AD4- Srinivasulu Thanneeru (2): +AD4- base/node: Add sysfs for memtier+AF8-override +AD4- memory tier: Support node migration between tiers +AD4- +AD4- Documentation/ABI/stable/sysfs-devices-node +AHw- 7 +-+- +AD4- drivers/base/node.c +AHw- 47 +-+-+-+-+-+-+-+-+-+-+-+- +AD4- include/linux/memory-tiers.h +AHw- 11 +-+-+- +AD4- include/linux/node.h +AHw- 11 +-+-+- +AD4- mm/memory-tiers.c +AHw- 85 +-+-+-+-+-+-+-+-+-+-+-+---------- +AD4- 5 files changed, 125 insertions(+-), 36 deletions(-)