Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 18, 2023 at 04:29:02PM +0800, Huang, Ying wrote:
> Gregory Price <gregory.price@xxxxxxxxxxxx> writes:
> 
> > There are at least 5 proposals that i know of at the moment
> >
> > 1) mempolicy
> > 2) memory-tiers
> > 3) memory-block interleaving? (weighting among blocks inside a node)
> >    Maybe relevant if Dynamic Capacity devices arrive, but it seems
> >    like the wrong place to do this.
> > 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...)
> > 5) "just do it in hardware"
> 
> It may be easier to start with the use case.  What is the practical use
> cases in your mind that can not be satisfied with simple per-memory-tier
> weight?  Can you compare the memory layout with different proposals?
>

Before I delve in, one clarifying question:  When you asked whether
weights should be part of node or memory-tiers, i took that to mean
whether it should be part of mempolicy or memory-tiers.

Were you suggesting that weights should actually be part of
drivers/base/node.c?

Because I had not considered that, and this seems reasonable, easy to
implement, and would not require tying mempolicy.c to memory-tiers.c



Beyond this, i think there's been 3 imagined use cases (now, including
this).

a)
numactl --weighted-interleave=Node:weight,0:16,1:4,...

b)
echo weight > /sys/.../memory-tiers/memtier/access0/interleave_weight
numactl --interleave=0,1

c)
echo weight > /sys/bus/node/node0/access0/interleave_weight
numactl --interleave=0,1

d)
options b or c, but with --weighted-interleave=0,1 instead
this requires libnuma changes to pick up, but it retains --interleave
as-is to avoid user confusion.

The downside of an approach like A (which was my original approach), was
that the weights cannot really change should a node be hotplugged. Tasks
would need to detect this and change the policy themselves.  That's not
a good solution.

However in both B and C's design, weights can be rebalanced in response
to any number of events.  Ultimately B and C are equivalent, but
the placement in nodes is cleaner and more intuitive.  If memory-tiers
wants to use/change this information, there's nothing that prevents it.

Assuming this is your meaning, I agree and I will pivot to this.

~Gregory




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux