On Wed, 2022-06-08 at 15:14 -0400, Johannes Weiner wrote: > Hi Tim, > > On Wed, Jun 08, 2022 at 11:15:27AM -0700, Tim Chen wrote: > > On Tue, 2022-06-07 at 13:19 -0400, Johannes Weiner wrote: > > > /* Do dynamic interleaving for a process */ > > > static unsigned interleave_nodes(struct mempolicy *policy) > > > { > > > unsigned next; > > > struct task_struct *me = current; > > > > > > - next = next_node_in(me->il_prev, policy->nodes); > > > + if (numa_tier_interleave[0] > 1 || numa_tier_interleave[1] > 1) { > > > > When we have three memory tiers, do we expect an N:M:K policy? > > Like interleaving between DDR5, DDR4 and PMEM memory. > > Or we expect an N:M policy still by interleaving between two specific tiers? > > In the context of the proposed 'explicit tiers' interface, I think it > would make sense to have a per-tier 'interleave_ratio knob. Because > the ratio is configured based on hardware properties, it can be > configured meaningfully for the entire tier hierarchy, even if > individual tasks or vmas interleave over only a subset of nodes. I think that makes sense. So if have 3 tiers of memory whose bandwidth ratio are 4:2:1, then it makes sense to interleave according to this ratio, even if we choose to interleave for a subset of nodes. Say between tier 1 and tier 3, the interleave ratio will be 4:1 as I can read 4 lines of data from tier 3 while I got 1 line of data from tier 3. > > > The other question is whether we will need multiple interleave policies depending > > on cgroup? > > One policy could be interleave between tier1, tier2, tier3. > > Another could be interleave between tier1 and tier2. > > This is a good question. > > One thing that has defined cgroup development in recent years is the > concept of "work conservation". Moving away from fixed limits and hard > partitioning, cgroups are increasingly configured with weights, > priorities, and guarantees (cpu.weight, io.latency/io.cost.qos, > memory.low). These weights and priorities are enforced when cgroups > are directly competing over a resource; but if there is no contention, > any active cgroup, regardless of priority, has full access to the > surplus (which could be the entire host if the main load is idle). > > With that background, yes, we likely want some way of prioritizing > tier access when multiple cgroups are competing. But we ALSO want the > ability to say that if resources are NOT contended, a cgroup should > interleave memory over all tiers according to optimal bandwidth. > > That means that regardless of how the competitive cgroup rules for > tier access end up looking like, it makes sense to have global > interleaving weights based on hardware properties as proposed here. > > The effective cgroup IL ratio for each tier could then be something > like cgroup.tier_weight[tier] * tier/interleave_weight. Thanks. I agree that a interleave ratio that's proportional to hardware properties of each tier will suffice. Tim