On Thu, Jun 09, 2022 at 08:03:26AM +0530, Aneesh Kumar K V wrote: > On 6/8/22 11:46 PM, Johannes Weiner wrote: > > On Wed, Jun 08, 2022 at 09:43:52PM +0530, Aneesh Kumar K V wrote: > > > On 6/8/22 9:25 PM, Johannes Weiner wrote: > > > > Hello, > > > > > > > > On Wed, Jun 08, 2022 at 10:11:31AM -0400, Johannes Weiner wrote: > > > > > On Fri, Jun 03, 2022 at 07:12:29PM +0530, Aneesh Kumar K.V wrote: > > > > > > @@ -0,0 +1,20 @@ > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > > > +#ifndef _LINUX_MEMORY_TIERS_H > > > > > > +#define _LINUX_MEMORY_TIERS_H > > > > > > + > > > > > > +#ifdef CONFIG_TIERED_MEMORY > > > > > > + > > > > > > +#define MEMORY_TIER_HBM_GPU 0 > > > > > > +#define MEMORY_TIER_DRAM 1 > > > > > > +#define MEMORY_TIER_PMEM 2 > > > > > > + > > > > > > +#define MEMORY_RANK_HBM_GPU 300 > > > > > > +#define MEMORY_RANK_DRAM 200 > > > > > > +#define MEMORY_RANK_PMEM 100 > > > > > > + > > > > > > +#define DEFAULT_MEMORY_TIER MEMORY_TIER_DRAM > > > > > > +#define MAX_MEMORY_TIERS 3 > > > > > > > > > > I understand the names are somewhat arbitrary, and the tier ID space > > > > > can be expanded down the line by bumping MAX_MEMORY_TIERS. > > > > > > > > > > But starting out with a packed ID space can get quite awkward for > > > > > users when new tiers - especially intermediate tiers - show up in > > > > > existing configurations. I mentioned in the other email that DRAM != > > > > > DRAM, so new tiers seem inevitable already. > > > > > > > > > > It could make sense to start with a bigger address space and spread > > > > > out the list of kernel default tiers a bit within it: > > > > > > > > > > MEMORY_TIER_GPU 0 > > > > > MEMORY_TIER_DRAM 10 > > > > > MEMORY_TIER_PMEM 20 > > > > > > > > Forgive me if I'm asking a question that has been answered. I went > > > > back to earlier threads and couldn't work it out - maybe there were > > > > some off-list discussions? Anyway... > > > > > > > > Why is there a distinction between tier ID and rank? I undestand that > > > > rank was added because tier IDs were too few. But if rank determines > > > > ordering, what is the use of a separate tier ID? IOW, why not make the > > > > tier ID space wider and have the kernel pick a few spread out defaults > > > > based on known hardware, with plenty of headroom to be future proof. > > > > > > > > $ ls tiers > > > > 100 # DEFAULT_TIER > > > > $ cat tiers/100/nodelist > > > > 0-1 # conventional numa nodes > > > > > > > > <pmem is onlined> > > > > > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1 # conventional numa > > > > tiers/200/nodelist:2 # pmem > > > > > > > > $ grep . nodes/*/tier > > > > nodes/0/tier:100 > > > > nodes/1/tier:100 > > > > nodes/2/tier:200 > > > > > > > > <unknown device is online as node 3, defaults to 100> > > > > > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1,3 > > > > tiers/200/nodelist:2 > > > > > > > > $ echo 300 >nodes/3/tier > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1 > > > > tiers/200/nodelist:2 > > > > tiers/300/nodelist:3 > > > > > > > > $ echo 200 >nodes/3/tier > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1 > > > > tiers/200/nodelist:2-3 > > > > > > > > etc. > > > > > > tier ID is also used as device id memtier.dev.id. It was discussed that we > > > would need the ability to change the rank value of a memory tier. If we make > > > rank value same as tier ID or tier device id, we will not be able to support > > > that. > > > > Is the idea that you could change the rank of a collection of nodes in > > one go? Rather than moving the nodes one by one into a new tier? > > > > [ Sorry, I wasn't able to find this discussion. AFAICS the first > > patches in RFC4 already had the struct device { .id = tier } > > logic. Could you point me to it? In general it would be really > > helpful to maintain summarized rationales for such decisions in the > > coverletter to make sure things don't get lost over many, many > > threads, conferences, and video calls. ] > > Most of the discussion happened not int he patch review email threads. > > RFC: Memory Tiering Kernel Interfaces (v2) > https://lore.kernel.org/linux-mm/CAAPL-u_diGYEb7+WsgqNBLRix-nRCk2SsDj6p9r8j5JZwOABZQ@xxxxxxxxxxxxxx > > RFC: Memory Tiering Kernel Interfaces (v4) > https://lore.kernel.org/linux-mm/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@xxxxxxxxxxxxxx I read the RFCs, the discussions and your code. It's still not clear why the tier/device ID and the rank need to be two separate, user-visible things. There is only one tier of a given rank, why can't the rank be the unique device id? dev->id = 100. One number. Or use a unique device id allocator if large numbers are causing problems internally. But I don't see an explanation why they need to be two different things, let alone two different things in the user ABI.