>-----Original Message----- >From: Martin Kletzander <mkletzan@xxxxxxxxxx> >Sent: Wednesday, March 31, 2021 12:21 AM >To: Zhong, Luyao <luyao.zhong@xxxxxxxxx> >Cc: Daniel P. Berrangé <berrange@xxxxxxxxxx>; libvir-list@xxxxxxxxxx >Subject: Re: [PATCH v4 0/3] introduce 'restrictive' mode in numatune > >On Tue, Mar 30, 2021 at 08:53:19AM +0000, Zhong, Luyao wrote: >> >> >>> -----Original Message----- >>> From: Martin Kletzander <mkletzan@xxxxxxxxxx> >>> Sent: Thursday, March 25, 2021 10:28 PM >>> To: Daniel P. Berrangé <berrange@xxxxxxxxxx> >>> Cc: Zhong, Luyao <luyao.zhong@xxxxxxxxx>; libvir-list@xxxxxxxxxx >>> Subject: Re: [PATCH v4 0/3] introduce 'restrictive' mode in >>> numatune >>> >>> On Thu, Mar 25, 2021 at 02:14:47PM +0000, Daniel P. Berrangé wrote: >>> >On Thu, Mar 25, 2021 at 03:10:56PM +0100, Martin Kletzander wrote: >>> >> On Thu, Mar 25, 2021 at 09:11:02AM +0000, Zhong, Luyao wrote: >>> >> > >>> >> > >>> >> > > -----Original Message----- >>> >> > > From: Martin Kletzander <mkletzan@xxxxxxxxxx> >>> >> > > Sent: Thursday, March 25, 2021 4:46 AM >>> >> > > To: Daniel P. Berrangé <berrange@xxxxxxxxxx> >>> >> > > Cc: Zhong, Luyao <luyao.zhong@xxxxxxxxx>; >>> >> > > libvir-list@xxxxxxxxxx >>> >> > > Subject: Re: [PATCH v4 0/3] introduce 'restrictive' >>> >> > > mode in numatune >>> >> > > >>> >> > > On Tue, Mar 23, 2021 at 09:48:02AM +0000, Daniel P. Berrangé wrote: >>> >> > > >On Tue, Mar 23, 2021 at 10:59:02AM +0800, Luyao Zhong wrote: >>> >> > > >> Before this patch set, numatune only has three memory modes: >>> >> > > >> static, interleave and prefered. These memory policies are >>> >> > > >> ultimately set by mbind() system call. >>> >> > > >> >>> >> > > >> Memory policy could be 'hard coded' into the kernel, but >>> >> > > >> none of above policies fit our requirment under this case. >>> >> > > >> mbind() support default memory policy, but it requires a >>> >> > > >> NULL nodemask. So obviously setting allowed memory nodes is >cgroups' >>> mission under this case. >>> >> > > >> So we introduce a new option for mode in numatune named >>> 'restrictive'. >>> >> > > >> >>> >> > > >> <numatune> >>> >> > > >> <memory mode="restrictive" nodeset="1-4,^3"/> >>> >> > > >> <memnode cellid="0" mode="restrictive" nodeset="1"/> >>> >> > > >> <memnode cellid="2" mode="restrictive" nodeset="2"/> >>> >> > > >> </numatune> >>> >> > > > >>> >> > > >'restrictive' is rather a wierd name and doesn't really tell >>> >> > > >me what the memory policy is going to be. As far as I can >>> >> > > >tell from the patches, it seems this causes us to not set any >>> >> > > >memory alllocation policy at all. IOW, we're using some >>> >> > > >undefined host default >>> policy. >>> >> > > > >>> >> > > >Given this I think we should be calling it either "none" or "default" >>> >> > > > >>> >> > > >>> >> > > I was against "default" because having such option possible, >>> >> > > but the actual default being different sounds stupid. >>> >> > > Similarly "none" sounds like no restrictions are applied or >>> >> > > that it is the same as if nothing was specified. It is funny >>> >> > > to imagine the situation when I am explaining to someone how to >achieve this solution: >>> >> > > >>> >> > > "The default is 'strict', you need to explicitly set it to 'default'." >>> >> > > >>> >> > > or >>> >> > > >>> >> > > "What setting did you use?" >>> >> > > "None" >>> >> > > "As in no mode or in mode='none'?" >>> >> > > >>> >> > > As I said before, please come up with any name, but not these >>> >> > > that are IMHO actually more confusing. >>> >> > > >>> >> > >>> >> > Hi Daniel and Martin, thanks for your reply, just as Martin said >>> >> > current default mode is "strict", so "default" was deprecated at >>> >> > the beginning when I proposed this change. And actually we have >>> >> > cgroups restricting the memory resource so could we call this a >>> >> > "none" mode? I still don't have a better name. ☹ >>> >> > >>> >> >>> >> Me neither as figuring out the names when our names do not >>> >> precisely map to anything else (since we are using multiple >>> >> solutions to get as close to the desired result as possible) is >>> >> difficult because there is no similar pre-existing setting. And using anything >like "cgroups-only" >>> >> would limit us in the future, probably. >>> > >>> >What I'm still really missing in this series is a clear statement of >>> >what the problem with the current modes is, and what this new mode >>> >provides to solve it. The documentation for the new XML attribute is >>> >not clear on this and neither are the commit messages. There's a >>> >pointer to an enourmous mailing list thread, but reading through >>> >50 messages is a not a viable way to learn the answer. >>> > >>> >I'm not even certain that we should be introducing a new mode value >>> >at all, as opposed to a separate attribute. >>> > >>> >>> Yes, Luyao, could you summarize the reason for the new mode? I think >>> that the difference in behaviour between using cgroups and memory >>> binding as opposed to just using cgroups should be enough for others >>> to be able to figure out when to use this mode and when not. >>> >>Sure. >>Let me give a concrete use case first. There is a new feature in kernel >>but not merged yet, we call it memory tiering. >>(https://lwn.net/Articles/802544/). If memory tiering is enabled on >>host, DRAM is top tier memory, and PMEM(persistent memory) is second >>tier memory, PMEM is shown as numa node without cpu. Pages can be >>migrated between DRAM node and PMEM node based on DRAM pressure and >how >>cold/hot they are. *this memory policy* is implemented in kernel. So we >>need a default mode here, but from libvirt's perspective, the "defaut" >>mode is "strict", it's not MPOL_DEFAULT >>(https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel. >>Besides, to make memory tiering works well, cgroups setting is >>necessary, since it restricts that the pages can only be migrated between the >DRAM and PMEM nodes that we specified (NUMA affinity support). >> >>Except for upper use case, we might have some scenarios that only requires >cgroups restriction. >>That's why "restrictive" mode is proposed. >> >>In a word, if a user requires default mode(MPOL_DEFAULT) and require >>cgroups to restrict memory allocation, "restrictive" mode will be useful. >> > >Yeah, I also seem to recall something about the fact that just using cgroups with >multiple nodes in the nodeset makes kernel decide on which node (out of those in >the restricted set) to allocate on, but specifying "strict" basically allocates it >sequentially (on the first one until it is full, then on the next one and so on). I do >not have anything to back this, so do you remember if this was that the case as >well or does my memory serve me poorly? > Yeah, exactly. 😊 cpuset.mems just specify the list of memory nodes on which the processes are allowed to allocate memory. https://man7.org/linux/man-pages/man7/cpuset.7.html This link gives a detailed introduction of "strict" mode: https://man7.org/linux/man-pages/man2/mbind.2.html >>BR, >>Luyao >> >>> >Regards, >>> >Daniel >>> >-- >>> >|: https://berrange.com -o- >https://www.flickr.com/photos/dberrange :| >>> >|: https://libvirt.org -o- https://fstop138.berrange.com :| >>> >|: https://entangle-photo.org -o- >>> https://www.instagram.com/dberrange :|