Re: [PATCH v5 1/9] mm/demotion: Add support for explicit memory tiers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 09, 2022 at 08:03:26AM +0530, Aneesh Kumar K V wrote:
> On 6/8/22 11:46 PM, Johannes Weiner wrote:
> > On Wed, Jun 08, 2022 at 09:43:52PM +0530, Aneesh Kumar K V wrote:
> > > On 6/8/22 9:25 PM, Johannes Weiner wrote:
> > > > Hello,
> > > > 
> > > > On Wed, Jun 08, 2022 at 10:11:31AM -0400, Johannes Weiner wrote:
> > > > > On Fri, Jun 03, 2022 at 07:12:29PM +0530, Aneesh Kumar K.V wrote:
> > > > > > @@ -0,0 +1,20 @@
> > > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > > +#ifndef _LINUX_MEMORY_TIERS_H
> > > > > > +#define _LINUX_MEMORY_TIERS_H
> > > > > > +
> > > > > > +#ifdef CONFIG_TIERED_MEMORY
> > > > > > +
> > > > > > +#define MEMORY_TIER_HBM_GPU	0
> > > > > > +#define MEMORY_TIER_DRAM	1
> > > > > > +#define MEMORY_TIER_PMEM	2
> > > > > > +
> > > > > > +#define MEMORY_RANK_HBM_GPU	300
> > > > > > +#define MEMORY_RANK_DRAM	200
> > > > > > +#define MEMORY_RANK_PMEM	100
> > > > > > +
> > > > > > +#define DEFAULT_MEMORY_TIER	MEMORY_TIER_DRAM
> > > > > > +#define MAX_MEMORY_TIERS  3
> > > > > 
> > > > > I understand the names are somewhat arbitrary, and the tier ID space
> > > > > can be expanded down the line by bumping MAX_MEMORY_TIERS.
> > > > > 
> > > > > But starting out with a packed ID space can get quite awkward for
> > > > > users when new tiers - especially intermediate tiers - show up in
> > > > > existing configurations. I mentioned in the other email that DRAM !=
> > > > > DRAM, so new tiers seem inevitable already.
> > > > > 
> > > > > It could make sense to start with a bigger address space and spread
> > > > > out the list of kernel default tiers a bit within it:
> > > > > 
> > > > > MEMORY_TIER_GPU		0
> > > > > MEMORY_TIER_DRAM	10
> > > > > MEMORY_TIER_PMEM	20
> > > > 
> > > > Forgive me if I'm asking a question that has been answered. I went
> > > > back to earlier threads and couldn't work it out - maybe there were
> > > > some off-list discussions? Anyway...
> > > > 
> > > > Why is there a distinction between tier ID and rank? I undestand that
> > > > rank was added because tier IDs were too few. But if rank determines
> > > > ordering, what is the use of a separate tier ID? IOW, why not make the
> > > > tier ID space wider and have the kernel pick a few spread out defaults
> > > > based on known hardware, with plenty of headroom to be future proof.
> > > > 
> > > >     $ ls tiers
> > > >     100				# DEFAULT_TIER
> > > >     $ cat tiers/100/nodelist
> > > >     0-1				# conventional numa nodes
> > > > 
> > > >     <pmem is onlined>
> > > > 
> > > >     $ grep . tiers/*/nodelist
> > > >     tiers/100/nodelist:0-1	# conventional numa
> > > >     tiers/200/nodelist:2		# pmem
> > > > 
> > > >     $ grep . nodes/*/tier
> > > >     nodes/0/tier:100
> > > >     nodes/1/tier:100
> > > >     nodes/2/tier:200
> > > > 
> > > >     <unknown device is online as node 3, defaults to 100>
> > > > 
> > > >     $ grep . tiers/*/nodelist
> > > >     tiers/100/nodelist:0-1,3
> > > >     tiers/200/nodelist:2
> > > > 
> > > >     $ echo 300 >nodes/3/tier
> > > >     $ grep . tiers/*/nodelist
> > > >     tiers/100/nodelist:0-1
> > > >     tiers/200/nodelist:2
> > > >     tiers/300/nodelist:3
> > > > 
> > > >     $ echo 200 >nodes/3/tier
> > > >     $ grep . tiers/*/nodelist
> > > >     tiers/100/nodelist:0-1	
> > > >     tiers/200/nodelist:2-3
> > > > 
> > > > etc.
> > > 
> > > tier ID is also used as device id memtier.dev.id. It was discussed that we
> > > would need the ability to change the rank value of a memory tier. If we make
> > > rank value same as tier ID or tier device id, we will not be able to support
> > > that.
> > 
> > Is the idea that you could change the rank of a collection of nodes in
> > one go? Rather than moving the nodes one by one into a new tier?
> > 
> > [ Sorry, I wasn't able to find this discussion. AFAICS the first
> >    patches in RFC4 already had the struct device { .id = tier }
> >    logic. Could you point me to it? In general it would be really
> >    helpful to maintain summarized rationales for such decisions in the
> >    coverletter to make sure things don't get lost over many, many
> >    threads, conferences, and video calls. ]
> 
> Most of the discussion happened not int he patch review email threads.
> 
> RFC: Memory Tiering Kernel Interfaces (v2)
> https://lore.kernel.org/linux-mm/CAAPL-u_diGYEb7+WsgqNBLRix-nRCk2SsDj6p9r8j5JZwOABZQ@xxxxxxxxxxxxxx
> 
> RFC: Memory Tiering Kernel Interfaces (v4)
> https://lore.kernel.org/linux-mm/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@xxxxxxxxxxxxxx

I read the RFCs, the discussions and your code. It's still not clear
why the tier/device ID and the rank need to be two separate,
user-visible things. There is only one tier of a given rank, why can't
the rank be the unique device id? dev->id = 100. One number. Or use a
unique device id allocator if large numbers are causing problems
internally. But I don't see an explanation why they need to be two
different things, let alone two different things in the user ABI.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux