Micron Confidential Hi Huang, Ying, My apologies for wrong mail reply format, my mail client settings got changed on my PC. Please find comments bellow inline. Regards, Srini Micron Confidential +AD4- -----Original Message----- +AD4- From: Huang, Ying +ADw-ying.huang+AEA-intel.com+AD4- +AD4- Sent: Monday, December 18, 2023 11:26 AM +AD4- To: gregory.price +ADw-gregory.price+AEA-memverge.com+AD4- +AD4- Cc: Srinivasulu Opensrc +ADw-sthanneeru.opensrc+AEA-micron.com+AD4AOw- linux- +AD4- cxl+AEA-vger.kernel.org+ADs- linux-mm+AEA-kvack.org+ADs- Srinivasulu Thanneeru +AD4- +ADw-sthanneeru+AEA-micron.com+AD4AOw- aneesh.kumar+AEA-linux.ibm.com+ADs- +AD4- dan.j.williams+AEA-intel.com+ADs- mhocko+AEA-suse.com+ADs- tj+AEA-kernel.org+ADs- +AD4- john+AEA-jagalactic.com+ADs- Eishan Mirakhur +ADw-emirakhur+AEA-micron.com+AD4AOw- Vinicius +AD4- Tavares Petrucci +ADw-vtavarespetr+AEA-micron.com+AD4AOw- Ravis OpenSrc +AD4- +ADw-Ravis.OpenSrc+AEA-micron.com+AD4AOw- Jonathan.Cameron+AEA-huawei.com+ADs- linux- +AD4- kernel+AEA-vger.kernel.org+ADs- Johannes Weiner +ADw-hannes+AEA-cmpxchg.org+AD4AOw- Wei Xu +AD4- +ADw-weixugc+AEA-google.com+AD4- +AD4- Subject: +AFs-EXT+AF0- Re: +AFs-RFC PATCH v2 0/2+AF0- Node migration between memory tiers +AD4- +AD4- CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless +AD4- you recognize the sender and were expecting this message. +AD4- +AD4- +AD4- Gregory Price +ADw-gregory.price+AEA-memverge.com+AD4- writes: +AD4- +AD4- +AD4- On Fri, Dec 15, 2023 at 01:02:59PM +-0800, Huang, Ying wrote: +AD4- +AD4APg- +ADw-sthanneeru.opensrc+AEA-micron.com+AD4- writes: +AD4- +AD4APg- +AD4- +AD4APg- +AD4- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0- +AD4- +AD4APg- +AD4- Version Notes: +AD4- +AD4APg- +AD4- +AD4- +AD4APg- +AD4- V2 : Changed interface to memtier+AF8-override from adistance+AF8-offset. +AD4- +AD4APg- +AD4- memtier+AF8-override was recommended by +AD4- +AD4APg- +AD4- 1. John Groves +ADw-john+AEA-jagalactic.com+AD4- +AD4- +AD4APg- +AD4- 2. Ravi Shankar +ADw-ravis.opensrc+AEA-micron.com+AD4- +AD4- +AD4APg- +AD4- 3. Brice Goglin +ADw-Brice.Goglin+AEA-inria.fr+AD4- +AD4- +AD4APg- +AD4- +AD4APg- It appears that you ignored my comments for V1 as follows ... +AD4- +AD4APg- +AD4- +AD4APg- +AD4- https://lore.k/ +AD4- ernel.org+ACU-2Flkml+ACU-2F87o7f62vur.fsf+ACU-40yhuang6- +AD4- desk2.ccr.corp.intel.com+ACU-2F+ACY-data+AD0-05+ACU-7C02+ACU-7Csthanneeru+ACU-40micron.com +AD4- +ACU-7C5e614e5f028342b6b59c08dbff8e3e37+ACU-7Cf38a5ecd28134862b11bac1d56 +AD4- 3c806f+ACU-7C0+ACU-7C0+ACU-7C638384758666895965+ACU-7CUnknown+ACU-7CTWFpbGZsb3d +AD4- 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0+ACU-3 +AD4- D+ACU-7C3000+ACU-7C+ACU-7C+ACU-7C+ACY-sdata+AD0-OpMkYCar+ACU-2Fv8uHb7AvXbmaNltnXeTvcNUTi +AD4- bLhwV12Fg+ACU-3D+ACY-reserved+AD0-0 Thank you, Huang, Ying for pointing to this. https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live+ACU-20In+ACU-20a+ACU-20World+ACU-20With+ACU-20Multiple+ACU-20Memory+ACU-20Types.pdf In the presentation above, the adistance+AF8-offsets are per memtype. We believe that adistance+AF8-offset per node is more suitable and flexible. since we can change it per node. If we keep adistance+AF8-offset per memtype, then we cannot change it for a specific node of a given memtype. +AD4- +AD4APg- +AD4- https://lore.k/ +AD4- ernel.org+ACU-2Flkml+ACU-2F87jzpt2ft5.fsf+ACU-40yhuang6- +AD4- desk2.ccr.corp.intel.com+ACU-2F+ACY-data+AD0-05+ACU-7C02+ACU-7Csthanneeru+ACU-40micron.com +AD4- +ACU-7C5e614e5f028342b6b59c08dbff8e3e37+ACU-7Cf38a5ecd28134862b11bac1d56 +AD4- 3c806f+ACU-7C0+ACU-7C0+ACU-7C638384758666895965+ACU-7CUnknown+ACU-7CTWFpbGZsb3d +AD4- 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0+ACU-3 +AD4- D+ACU-7C3000+ACU-7C+ACU-7C+ACU-7C+ACY-sdata+AD0-O0+ACU-2B6T+ACU-2FgU0TicCEYBac+ACU-2FAyjOLwAeouh +AD4- D+ACU-2BcMI+ACU-2BflOsI1M+ACU-3D+ACY-reserved+AD0-0 Yes, memory+AF8-type would be grouping the related memories together as single tier. We should also have a flexibility to move nodes between tiers, to address the issues. described in use cases above. +AD4- +AD4APg- +AD4- https://lore.k/ +AD4- ernel.org+ACU-2Flkml+ACU-2F87a5qp2et0.fsf+ACU-40yhuang6- +AD4- desk2.ccr.corp.intel.com+ACU-2F+ACY-data+AD0-05+ACU-7C02+ACU-7Csthanneeru+ACU-40micron.com +AD4- +ACU-7C5e614e5f028342b6b59c08dbff8e3e37+ACU-7Cf38a5ecd28134862b11bac1d56 +AD4- 3c806f+ACU-7C0+ACU-7C0+ACU-7C638384758666895965+ACU-7CUnknown+ACU-7CTWFpbGZsb3d +AD4- 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0+ACU-3 +AD4- D+ACU-7C3000+ACU-7C+ACU-7C+ACU-7C+ACY-sdata+AD0-W+ACU-2FWcAD4b9od+ACU-2BS0zIak+ACU-2Bv5hkjFG1Xcf +AD4- 6p8q3xwmspUiI+ACU-3D+ACY-reserved+AD0-0 This patch provides a way to move a node to the correct tier. We observed in test setups where DRAM and CXL are put under the same. tier (memory+AF8-tier4). By using this patch, we can move the CXL node away from the DRAM-linked (memory+AF8-tier4) and put it in the desired tier. +AD4- +AD4APg- +AD4- +AD4- +AD4- +AD4- Not speaking for the group, just chiming in because i'd discussed it +AD4- +AD4- with them. +AD4- +AD4- +AD4- +AD4- +ACI-Memory Type+ACI- is a bit nebulous. Is a Micron Type-3 with performance X +AD4- +AD4- and an SK Hynix Type-3 with performance Y a +ACI-Different type+ACI-, or are +AD4- +AD4- they the +ACI-Same Type+ACI- given that they're both Type 3 backed by some form +AD4- +AD4- of DDR? Is socket placement of those devices relevant for determining +AD4- +AD4- +ACI-Type+ACI-? Is whether they are behind a switch relevant for determining +AD4- +AD4- +ACI-Type+ACI-? +ACI-Type+ACI- is frustrating when everything we're talking about +AD4- +AD4- managing is +ACI-Type-3+ACI- with difference performance. +AD4- +AD4- +AD4- +AD4- A concrete example: +AD4- +AD4- To the system, a Multi-Headed Single Logical Device (MH-SLD) looks +AD4- +AD4- exactly the same as an standard SLD. I may want to have some +AD4- +AD4- combination of local memory expansion devices on the majority of my +AD4- +AD4- expansion slots, but reserve 1 slot on each socket for a connection to +AD4- +AD4- the MH-SLD. As of right now: There is no good way to differentiate the +AD4- +AD4- devices in terms of +ACI-Type+ACI- - and even if you had that, the tiering +AD4- +AD4- system would still lump them together. +AD4- +AD4- +AD4- +AD4- Similarly, an initial run of switches may or may not allow enumeration +AD4- +AD4- of devices behind it (depends on the configuration), so you may end up +AD4- +AD4- with a static numa node that +ACI-looks like+ACI- another SLD - despite it being +AD4- +AD4- some definition of +ACI-GFAM+ACI-. Do number of hops matter in determining +AD4- +AD4- +ACI-Type+ACI-? +AD4- +AD4- In the original design, the memory devices of same memory type are +AD4- managed by the same device driver, linked with system in same way +AD4- (including switches), built with same media. So, the performance is +AD4- same too. And, same as memory tiers, memory types are orthogonal to +AD4- sockets. Do you think the definition itself is clear enough? +AD4- +AD4- I admit +ACI-memory type+ACI- is a confusing name. Do you have some better +AD4- suggestion? +AD4- +AD4- +AD4- So I really don't think +ACI-Type+ACI- is useful for determining tier placement. +AD4- +AD4- +AD4- +AD4- As of right now, the system lumps DRAM nodes as one tier, and pretty +AD4- +AD4- much everything else as +ACI-the other tier+ACI-. To me, this patch set is an +AD4- +AD4- initial pass meant to allow user-control over tier composition while +AD4- +AD4- the internal mechanism is sussed out and the environment develops. +AD4- +AD4- The patchset to identify the performance of memory devices and put them +AD4- in proper +ACI-memory types+ACI- and memory tiers via HMAT has been merged by +AD4- v6.7-rc1. +AD4- +AD4- 07a8bdd4120c (memory tiering: add abstract distance calculation +AD4- algorithms management, 2023-09-26) +AD4- d0376aac59a1 (acpi, hmat: refactor hmat+AF8-register+AF8-target+AF8-initiators(), +AD4- 2023-09-26) +AD4- 3718c02dbd4c (acpi, hmat: calculate abstract distance with HMAT, 2023-09- +AD4- 26) +AD4- 6bc2cfdf82d5 (dax, kmem: calculate abstract distance with general +AD4- interface, 2023-09-26) +AD4- +AD4- +AD4- In general, a release valve that lets you redefine tiers is very welcome +AD4- +AD4- for testing and validation of different setups while the industry evolves. +AD4- +AD4- +AD4- +AD4- Just my two cents. +AD4- +AD4- -- +AD4- Best Regards, +AD4- Huang, Ying