Ravi Jonnalagadda <ravis.opensrc@xxxxxxxxxx> writes: > Should Node based interleave solution be considered complex or not would probably > depend on number of numa nodes that would be present in the system and whether > we are able to setup the default weights correctly to obtain optimum bandwidth > expansion. Node based interleave is more complex than tier based interleave. Because you have less tiers than nodes in general. >> >>> Pros and Cons of Memory Tier based interleave: >>> Pros: >>> 1. Programming weight per initiator would apply for all the nodes in the tier. >>> 2. Weights can be calculated considering the cumulative bandwidth of all >>> the nodes in the tier and need to be programmed once for all the nodes in a >>> given tier. >>> 3. It may be useful in cases where numa nodes with similar latency and bandwidth >>> characteristics increase, possibly with pooling use cases. >> >>4. simpler. >> >>> Cons: >>> 1. If nodes with different bandwidth and latency characteristics are placed >>> in same tier as seen in the current mainline kernel, it will be difficult to >>> apply a correct interleave weight policy. >>> 2. There will be a need for functionality to move nodes between different tiers >>> or create new tiers to place such nodes for programming correct interleave weights. >>> We are working on a patch to support it currently. >> >>Thanks! If we have such system, we will need this. >> >>> 3. For systems where each numa node is having different characteristics, >>> a single node might end up existing in different memory tier, which would be >>> equivalent to node based interleaving. >> >>No. A node can only exist in one memory tier. > > Sorry for the confusion what i meant was, if each node is having different > characteristics, to program the memory tier weights correctly we need to place > each node in a separate tier of it's own. So each memory tier will contain > only a single node and the solution would resemble node based interleaving. > >> >>> On newer systems where all CXL memory from different devices under a >>> port are combined to form single numa node, this scenario might be >>> applicable. >> >>You mean the different memory ranges of a NUMA node may have different >>performance? I don't think that we can deal with this. > > Example Configuration: On a server that we are using now, four different > CXL cards are combined to form a single NUMA node and two other cards are > exposed as two individual numa nodes. > So if we have the ability to combine multiple CXL memory ranges to a > single NUMA node the number of NUMA nodes in the system would potentially > decrease even if we can't combine the entire range to form a single node. Sorry, I misunderstand your words. Yes, it's possible that there one tier for each node in some systems. But I guess we will have less tiers than nodes in general. -- Best Regards, Huang, Ying >> >>> 4. Users may need to keep track of different memory tiers and what nodes are present >>> in each tier for invoking interleave policy. >> >>I don't think this is a con. With node based solution, you need to know >>your system too. >> >>>> >>>>> Could you elaborate on the 'get what you pay for' usecase you >>>>> mentioned? >>>> >> >>-- >>Best Regards, >>Huang, Ying > -- > Best Regards, > Ravi Jonnalagadda