RE: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Micron Confidential

Hi Huang, Ying,

My apologies for wrong mail reply format, my mail client settings got changed on my PC.
Please find comments bellow inline.

Regards,
Srini


Micron Confidential
+AD4- -----Original Message-----
+AD4- From: Huang, Ying +ADw-ying.huang+AEA-intel.com+AD4-
+AD4- Sent: Monday, December 18, 2023 11:26 AM
+AD4- To: gregory.price +ADw-gregory.price+AEA-memverge.com+AD4-
+AD4- Cc: Srinivasulu Opensrc +ADw-sthanneeru.opensrc+AEA-micron.com+AD4AOw- linux-
+AD4- cxl+AEA-vger.kernel.org+ADs- linux-mm+AEA-kvack.org+ADs- Srinivasulu Thanneeru
+AD4- +ADw-sthanneeru+AEA-micron.com+AD4AOw- aneesh.kumar+AEA-linux.ibm.com+ADs-
+AD4- dan.j.williams+AEA-intel.com+ADs- mhocko+AEA-suse.com+ADs- tj+AEA-kernel.org+ADs-
+AD4- john+AEA-jagalactic.com+ADs- Eishan Mirakhur +ADw-emirakhur+AEA-micron.com+AD4AOw- Vinicius
+AD4- Tavares Petrucci +ADw-vtavarespetr+AEA-micron.com+AD4AOw- Ravis OpenSrc
+AD4- +ADw-Ravis.OpenSrc+AEA-micron.com+AD4AOw- Jonathan.Cameron+AEA-huawei.com+ADs- linux-
+AD4- kernel+AEA-vger.kernel.org+ADs- Johannes Weiner +ADw-hannes+AEA-cmpxchg.org+AD4AOw- Wei Xu
+AD4- +ADw-weixugc+AEA-google.com+AD4-
+AD4- Subject: +AFs-EXT+AF0- Re: +AFs-RFC PATCH v2 0/2+AF0- Node migration between memory tiers
+AD4-
+AD4- CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
+AD4- you recognize the sender and were expecting this message.
+AD4-
+AD4-
+AD4- Gregory Price +ADw-gregory.price+AEA-memverge.com+AD4- writes:
+AD4-
+AD4- +AD4- On Fri, Dec 15, 2023 at 01:02:59PM +-0800, Huang, Ying wrote:
+AD4- +AD4APg- +ADw-sthanneeru.opensrc+AEA-micron.com+AD4- writes:
+AD4- +AD4APg-
+AD4- +AD4APg- +AD4- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0-
+AD4- +AD4APg- +AD4- Version Notes:
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- V2 : Changed interface to memtier+AF8-override from adistance+AF8-offset.
+AD4- +AD4APg- +AD4- memtier+AF8-override was recommended by
+AD4- +AD4APg- +AD4- 1. John Groves +ADw-john+AEA-jagalactic.com+AD4-
+AD4- +AD4APg- +AD4- 2. Ravi Shankar +ADw-ravis.opensrc+AEA-micron.com+AD4-
+AD4- +AD4APg- +AD4- 3. Brice Goglin +ADw-Brice.Goglin+AEA-inria.fr+AD4-
+AD4- +AD4APg-
+AD4- +AD4APg- It appears that you ignored my comments for V1 as follows ...
+AD4- +AD4APg-
+AD4- +AD4APg-
+AD4- https://lore.k/
+AD4- ernel.org+ACU-2Flkml+ACU-2F87o7f62vur.fsf+ACU-40yhuang6-
+AD4- desk2.ccr.corp.intel.com+ACU-2F+ACY-data+AD0-05+ACU-7C02+ACU-7Csthanneeru+ACU-40micron.com
+AD4- +ACU-7C5e614e5f028342b6b59c08dbff8e3e37+ACU-7Cf38a5ecd28134862b11bac1d56
+AD4- 3c806f+ACU-7C0+ACU-7C0+ACU-7C638384758666895965+ACU-7CUnknown+ACU-7CTWFpbGZsb3d
+AD4- 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0+ACU-3
+AD4- D+ACU-7C3000+ACU-7C+ACU-7C+ACU-7C+ACY-sdata+AD0-OpMkYCar+ACU-2Fv8uHb7AvXbmaNltnXeTvcNUTi
+AD4- bLhwV12Fg+ACU-3D+ACY-reserved+AD0-0

Thank you, Huang, Ying for pointing to this.
https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live+ACU-20In+ACU-20a+ACU-20World+ACU-20With+ACU-20Multiple+ACU-20Memory+ACU-20Types.pdf

In the presentation above, the adistance+AF8-offsets are per memtype.
We believe that adistance+AF8-offset per node is more suitable and flexible.
since we can change it per node. If we keep adistance+AF8-offset per memtype,
then we cannot change it for a specific node of a given memtype.

+AD4- +AD4APg-
+AD4- https://lore.k/
+AD4- ernel.org+ACU-2Flkml+ACU-2F87jzpt2ft5.fsf+ACU-40yhuang6-
+AD4- desk2.ccr.corp.intel.com+ACU-2F+ACY-data+AD0-05+ACU-7C02+ACU-7Csthanneeru+ACU-40micron.com
+AD4- +ACU-7C5e614e5f028342b6b59c08dbff8e3e37+ACU-7Cf38a5ecd28134862b11bac1d56
+AD4- 3c806f+ACU-7C0+ACU-7C0+ACU-7C638384758666895965+ACU-7CUnknown+ACU-7CTWFpbGZsb3d
+AD4- 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0+ACU-3
+AD4- D+ACU-7C3000+ACU-7C+ACU-7C+ACU-7C+ACY-sdata+AD0-O0+ACU-2B6T+ACU-2FgU0TicCEYBac+ACU-2FAyjOLwAeouh
+AD4- D+ACU-2BcMI+ACU-2BflOsI1M+ACU-3D+ACY-reserved+AD0-0

Yes, memory+AF8-type would be grouping the related memories together as single tier.
We should also have a flexibility to move nodes between tiers, to address the issues.
described in use cases above.

+AD4- +AD4APg-
+AD4- https://lore.k/
+AD4- ernel.org+ACU-2Flkml+ACU-2F87a5qp2et0.fsf+ACU-40yhuang6-
+AD4- desk2.ccr.corp.intel.com+ACU-2F+ACY-data+AD0-05+ACU-7C02+ACU-7Csthanneeru+ACU-40micron.com
+AD4- +ACU-7C5e614e5f028342b6b59c08dbff8e3e37+ACU-7Cf38a5ecd28134862b11bac1d56
+AD4- 3c806f+ACU-7C0+ACU-7C0+ACU-7C638384758666895965+ACU-7CUnknown+ACU-7CTWFpbGZsb3d
+AD4- 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0+ACU-3
+AD4- D+ACU-7C3000+ACU-7C+ACU-7C+ACU-7C+ACY-sdata+AD0-W+ACU-2FWcAD4b9od+ACU-2BS0zIak+ACU-2Bv5hkjFG1Xcf
+AD4- 6p8q3xwmspUiI+ACU-3D+ACY-reserved+AD0-0

This patch provides a way to move a node to the correct tier.
We observed in test setups where DRAM and CXL are put under the same.
tier (memory+AF8-tier4).
By using this patch, we can move the CXL node away from the DRAM-linked (memory+AF8-tier4)
and put it in the desired tier.

+AD4- +AD4APg-
+AD4- +AD4-
+AD4- +AD4- Not speaking for the group, just chiming in because i'd discussed it
+AD4- +AD4- with them.
+AD4- +AD4-
+AD4- +AD4- +ACI-Memory Type+ACI- is a bit nebulous.  Is a Micron Type-3 with performance X
+AD4- +AD4- and an SK Hynix Type-3 with performance Y a +ACI-Different type+ACI-, or are
+AD4- +AD4- they the +ACI-Same Type+ACI- given that they're both Type 3 backed by some form
+AD4- +AD4- of DDR?  Is socket placement of those devices relevant for determining
+AD4- +AD4- +ACI-Type+ACI-?  Is whether they are behind a switch relevant for determining
+AD4- +AD4- +ACI-Type+ACI-? +ACI-Type+ACI- is frustrating when everything we're talking about
+AD4- +AD4- managing is +ACI-Type-3+ACI- with difference performance.
+AD4- +AD4-
+AD4- +AD4- A concrete example:
+AD4- +AD4- To the system, a Multi-Headed Single Logical Device (MH-SLD) looks
+AD4- +AD4- exactly the same as an standard SLD.  I may want to have some
+AD4- +AD4- combination of local memory expansion devices on the majority of my
+AD4- +AD4- expansion slots, but reserve 1 slot on each socket for a connection to
+AD4- +AD4- the MH-SLD.   As of right now: There is no good way to differentiate the
+AD4- +AD4- devices in terms of +ACI-Type+ACI- - and even if you had that, the tiering
+AD4- +AD4- system would still lump them together.
+AD4- +AD4-
+AD4- +AD4- Similarly, an initial run of switches may or may not allow enumeration
+AD4- +AD4- of devices behind it (depends on the configuration), so you may end up
+AD4- +AD4- with a static numa node that +ACI-looks like+ACI- another SLD - despite it being
+AD4- +AD4- some definition of +ACI-GFAM+ACI-.  Do number of hops matter in determining
+AD4- +AD4- +ACI-Type+ACI-?
+AD4-
+AD4- In the original design, the memory devices of same memory type are
+AD4- managed by the same device driver, linked with system in same way
+AD4- (including switches), built with same media.  So, the performance is
+AD4- same too.  And, same as memory tiers, memory types are orthogonal to
+AD4- sockets.  Do you think the definition itself is clear enough?
+AD4-
+AD4- I admit +ACI-memory type+ACI- is a confusing name.  Do you have some better
+AD4- suggestion?
+AD4-
+AD4- +AD4- So I really don't think +ACI-Type+ACI- is useful for determining tier placement.
+AD4- +AD4-
+AD4- +AD4- As of right now, the system lumps DRAM nodes as one tier, and pretty
+AD4- +AD4- much everything else as +ACI-the other tier+ACI-. To me, this patch set is an
+AD4- +AD4- initial pass meant to allow user-control over tier composition while
+AD4- +AD4- the internal mechanism is sussed out and the environment develops.
+AD4-
+AD4- The patchset to identify the performance of memory devices and put them
+AD4- in proper +ACI-memory types+ACI- and memory tiers via HMAT has been merged by
+AD4- v6.7-rc1.
+AD4-
+AD4-       07a8bdd4120c (memory tiering: add abstract distance calculation
+AD4- algorithms management, 2023-09-26)
+AD4-       d0376aac59a1 (acpi, hmat: refactor hmat+AF8-register+AF8-target+AF8-initiators(),
+AD4- 2023-09-26)
+AD4-       3718c02dbd4c (acpi, hmat: calculate abstract distance with HMAT, 2023-09-
+AD4- 26)
+AD4-       6bc2cfdf82d5 (dax, kmem: calculate abstract distance with general
+AD4- interface, 2023-09-26)
+AD4-
+AD4- +AD4- In general, a release valve that lets you redefine tiers is very welcome
+AD4- +AD4- for testing and validation of different setups while the industry evolves.
+AD4- +AD4-
+AD4- +AD4- Just my two cents.
+AD4-
+AD4- --
+AD4- Best Regards,
+AD4- Huang, Ying





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux