PATCH 10/15 - Mempolicy: MPOL_PREFERRED cleanups for "local allocation" Against: 2.6.25-rc8-mm1 V4 -> V5: + change mpol_to_str() to show "local" policy for MPOL_PREFERRED with preferred_node == -1. libnuma wrappers and numactl use the term "local allocation", so let's use it here. V3 -> V4: + updated Documentation/vm/numa_memory_policy.txt to better explain [I think] the "local allocation" feature of MPOL_PREFERRED. V2 -> V3: + renamed get_nodemask() to get_policy_nodemask() to more closely match what it's doing. V1 -> V2: + renamed get_zonemask() to get_nodemask(). Mel Gorman suggested this was a valid "cleanup". Here are a couple of "cleanups" for MPOL_PREFERRED behavior when v.preferred_node < 0 -- i.e., "local allocation": 1) [do_]get_mempolicy() calls the now renamed get_policy_nodemask() to fetch the nodemask associated with a policy. Currently, get_policy_nodemask() returns the set of nodes with memory, when the policy 'mode' is 'PREFERRED, and the preferred_node is < 0. Change to return an empty nodemask, as this is what was specified to achieve "local allocation". 2) When a task is moved into a [new] cpuset, mpol_rebind_policy() is called to adjust any task and vma policy nodes to be valid in the new cpuset. However, when the policy is MPOL_PREFERRED, and the preferred_node is <0, no rebind is necessary. The "local allocation" indication is valid in any cpuset. Existing code will "do the right thing" because node_remap() will just return the argument node when it is outside of the valid range of node ids. However, I think it is clearer and cleaner to skip the remap explicitly in this case. 3) mpol_to_str() produces a printable, "human readable" string from a struct mempolicy. For MPOL_PREFERRED with preferred_node <0, show "local", as this indicates local allocation, as the task migrates among nodes. Note that this matches the usage of "local allocation" in libnuma() and numactl. Without this change, I believe that node_set() [via set_bit()] will set bit 31, resulting in a misleading display. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@xxxxxx> mm/mempolicy.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) Index: linux-2.6.25-rc8-mm1/mm/mempolicy.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/mm/mempolicy.c 2008-04-02 17:47:37.000000000 -0400 +++ linux-2.6.25-rc8-mm1/mm/mempolicy.c 2008-04-02 17:47:41.000000000 -0400 @@ -645,11 +645,9 @@ static void get_policy_nodemask(struct m *nodes = p->v.nodes; break; case MPOL_PREFERRED: - /* or use current node instead of memory_map? */ - if (p->v.preferred_node < 0) - *nodes = node_states[N_HIGH_MEMORY]; - else + if (p->v.preferred_node >= 0) node_set(p->v.preferred_node, *nodes); + /* else return empty node mask for local allocation */ break; default: BUG(); @@ -804,7 +802,7 @@ int do_migrate_pages(struct mm_struct *m int err = 0; nodemask_t tmp; - down_read(&mm->mmap_sem); + down_read(&mm->mmap_sem); err = migrate_vmas(mm, from_nodes, to_nodes, flags); if (err) @@ -1949,10 +1947,12 @@ void numa_default_policy(void) } /* - * Display pages allocated per node and memory policy via /proc. + * "local" is pseudo-policy: MPOL_PREFERRED with preferred_node == -1 + * Used only for mpol_to_str() */ +#define MPOL_LOCAL (MPOL_INTERLEAVE + 1) static const char * const policy_types[] = - { "default", "prefer", "bind", "interleave" }; + { "default", "prefer", "bind", "interleave", "local" }; /* * Convert a mempolicy into a string. @@ -1963,6 +1963,7 @@ static inline int mpol_to_str(char *buff { char *p = buffer; int l; + int nid; nodemask_t nodes; unsigned short mode; unsigned short flags = pol ? pol->flags : 0; @@ -1979,7 +1980,11 @@ static inline int mpol_to_str(char *buff case MPOL_PREFERRED: nodes_clear(nodes); - node_set(pol->v.preferred_node, nodes); + nid = pol->v.preferred_node; + if (nid < 0) + mode = MPOL_LOCAL; /* pseudo-policy */ + else + node_set(nid, nodes); break; case MPOL_BIND: @@ -1994,8 +1999,8 @@ static inline int mpol_to_str(char *buff } l = strlen(policy_types[mode]); - if (buffer + maxlen < p + l + 1) - return -ENOSPC; + if (buffer + maxlen < p + l + 1) + return -ENOSPC; strcpy(p, policy_types[mode]); p += l; @@ -2094,6 +2099,9 @@ static inline void check_huge_range(stru } #endif +/* + * Display pages allocated per node and memory policy via /proc. + */ int show_numa_map(struct seq_file *m, void *v) { struct proc_maps_private *priv = m->private; -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html