On Tue, 2022-06-07 at 13:19 -0400, Johannes Weiner wrote: > From: Hasan Al Maruf <hasanalmaruf@xxxxxx> > > Existing interleave policy spreads out pages evenly across a set of > specified nodes, i.e. 1:1 interleave. Upcoming tiered memory systems > have CPU-less memory nodes with different peak bandwidth and > latency-bandwidth characteristics. In such systems, we will want to > use the additional bandwidth provided by lowtier memory for > bandwidth-intensive applications. However, the default 1:1 interleave > can lead to suboptimal bandwidth distribution. > > Introduce an N:M interleave policy, where N pages allocated to the > top-tier nodes are followed by M pages allocated to lowtier nodes. > This provides the capability to steer the fraction of memory traffic > that goes to toptier vs. lowtier nodes. For example, 4:1 interleave > leads to an 80%/20% traffic breakdown between toptier and lowtier. > > The ratios are configured through a new sysctl: > > vm.numa_tier_interleave = toptier lowtier > > We have run experiments on bandwidth-intensive production services on > CXL-based tiered memory systems, where lowtier CXL memory has, when > compared to the toptier memory directly connected to the CPU: > > - ~half of the peak bandwidth > - ~80ns higher idle latency > - steeper latency vs. bandwidth curve > > Results show that regular interleaving leads to a ~40% performance > regression over baseline; 5:1 interleaving shows an ~8% improvement > over baseline. We have found the optimal distribution changes based on > hardware characteristics: slower CXL memory will shift the optimal > breakdown from 5:1 to (e.g.) 8:1. > > The sysctl only applies to processes and vmas with an "interleave" > policy and has no bearing on contexts using prefer or bind policies. > > It defaults to a setting of "1 1", which represents even interleaving, > and so is backward compatible with existing setups. > > Signed-off-by: Hasan Al Maruf <hasanalmaruf@xxxxxx> > Signed-off-by: Hao Wang <haowang3@xxxxxx> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> In general, I think the use case is valid. But we are changing memory tiering now, including - make memory tiering explict - support more than 2 tiers - expose memory tiering via sysfs Details can be found int the following threads, https://lore.kernel.org/lkml/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@xxxxxxxxxxxxxx/ https://lore.kernel.org/lkml/20220603134237.131362-1-aneesh.kumar@xxxxxxxxxxxxx/ With these changes, we may need to revise your implementation. For example, put interleave knobs in memory tier sysfs interface, support more than 2 tiers, etc. Best Regards, Huang, Ying [snip]