Tvrtko Ursulin <tursulin@xxxxxxxxxx> writes: > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> > > Since balancing mode was added in > bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes"), > it was possible to set this mode but it wouldn't be shown in > /proc/<pid>/numa_maps since there was no support for it in the > mpol_to_str() helper. > > Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it > would be displayed as 'default' due a workaround introduced a few years > earlier in > 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps"). > > To tidy this up we implement two changes: > > First we introduce a new internal flag MPOL_F_KERNEL and with it mark the > kernel's internal default and fallback policies (for tasks and/or VMAs > with no explicit policy set). By doing this we generalise the current > special casing and replace the incorrect 'default' with the correct > 'bind'. > > Secondly, we add a string representation and corresponding handling for > MPOL_F_NUMA_BALANCING. We do this by adding a sparse mapping array of > flags to names. With the sparseness being the downside, but with the > advantage of generalising and removing the "policy" from flags display. Please split these 2 changes into 2 patches. Because we will need to back port the first one to -stable kernel. > End result: > > $ numactl -b -m 0-1,3 cat /proc/self/numa_maps > 555559580000 bind=balancing:0-1,3 file=/usr/bin/cat mapped=3 active=0 N0=3 kernelpagesize_kB=16 > ... > > v2: > * Fully fix by introducing MPOL_F_KERNEL. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> > Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") > References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps") > Cc: Huang Ying <ying.huang@xxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxx> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > --- > include/uapi/linux/mempolicy.h | 1 + > mm/mempolicy.c | 44 ++++++++++++++++++++++++---------- > 2 files changed, 32 insertions(+), 13 deletions(-) > > diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h > index 1f9bb10d1a47..bcf56ce9603b 100644 > --- a/include/uapi/linux/mempolicy.h > +++ b/include/uapi/linux/mempolicy.h > @@ -64,6 +64,7 @@ enum { > #define MPOL_F_SHARED (1 << 0) /* identify shared policies */ > #define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */ > #define MPOL_F_MORON (1 << 4) /* Migrate On protnone Reference On Node */ > +#define MPOL_F_KERNEL (1 << 5) /* Kernel's internal policy */ > > /* > * These bit locations are exposed in the vm.zone_reclaim_mode sysctl > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index aec756ae5637..8ecc6d9f100a 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -134,6 +134,7 @@ enum zone_type policy_zone = 0; > static struct mempolicy default_policy = { > .refcnt = ATOMIC_INIT(1), /* never free it */ > .mode = MPOL_LOCAL, > + .flags = MPOL_F_KERNEL, > }; > > static struct mempolicy preferred_node_policy[MAX_NUMNODES]; > @@ -3095,7 +3096,7 @@ void __init numa_policy_init(void) > preferred_node_policy[nid] = (struct mempolicy) { > .refcnt = ATOMIC_INIT(1), > .mode = MPOL_PREFERRED, > - .flags = MPOL_F_MOF | MPOL_F_MORON, > + .flags = MPOL_F_MOF | MPOL_F_MORON | MPOL_F_KERNEL, > .nodes = nodemask_of_node(nid), > }; > } > @@ -3150,6 +3151,12 @@ static const char * const policy_modes[] = > [MPOL_PREFERRED_MANY] = "prefer (many)", > }; > > +static const char * const policy_flags[] = { > + [ilog2(MPOL_F_STATIC_NODES)] = "static", > + [ilog2(MPOL_F_RELATIVE_NODES)] = "relative", > + [ilog2(MPOL_F_NUMA_BALANCING)] = "balancing", > +}; > + > #ifdef CONFIG_TMPFS > /** > * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option. > @@ -3293,17 +3300,18 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) > * @pol: pointer to mempolicy to be formatted > * > * Convert @pol into a string. If @buffer is too short, truncate the string. > - * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the > - * longest flag, "relative", and to display at least a few node ids. > + * Recommend a @maxlen of at least 42 for the longest mode, "weighted > + * interleave", the longest flag, "balancing", and to display at least a few > + * node ids. > */ > void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) > { > char *p = buffer; > nodemask_t nodes = NODE_MASK_NONE; > unsigned short mode = MPOL_DEFAULT; > - unsigned short flags = 0; > + unsigned long flags = 0; > > - if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) { > + if (!(pol->flags & MPOL_F_KERNEL)) { Can we avoid to introduce a new flag? Whether the following code work? if (pol && pol != &default_policy && !(pol->mode != MPOL_PREFERRED) && !(pol->flags & MPOL_F_MORON)) But I think that this is kind of fragile. A flag is better. But personally, I don't think MPOL_F_KERNEL is a good name, maybe MPOL_F_DEFAULT? > mode = pol->mode; > flags = pol->flags; > } > @@ -3328,15 +3336,25 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) > p += snprintf(p, maxlen, "%s", policy_modes[mode]); > > if (flags & MPOL_MODE_FLAGS) { > - p += snprintf(p, buffer + maxlen - p, "="); > + unsigned int bit, cnt = 0; > > - /* > - * Currently, the only defined flags are mutually exclusive > - */ > - if (flags & MPOL_F_STATIC_NODES) > - p += snprintf(p, buffer + maxlen - p, "static"); > - else if (flags & MPOL_F_RELATIVE_NODES) > - p += snprintf(p, buffer + maxlen - p, "relative"); > + for_each_set_bit(bit, &flags, ARRAY_SIZE(policy_flags)) { > + if (bit <= ilog2(MPOL_F_KERNEL)) > + continue; > + > + if (cnt == 0) > + p += snprintf(p, buffer + maxlen - p, "="); > + else > + p += snprintf(p, buffer + maxlen - p, ","); > + > + if (WARN_ON_ONCE(!policy_flags[bit])) > + p += snprintf(p, buffer + maxlen - p, "bit%u", > + bit); > + else > + p += snprintf(p, buffer + maxlen - p, > + policy_flags[bit]); > + cnt++; > + } Please refer to commit 2291990ab36b ("mempolicy: clean-up mpol-to-str() mempolicy formatting") for the original format. > } > > if (!nodes_empty(nodes)) -- Best Regards, Huang, Ying