[PATCH 10/15] Mempolicy: MPOL_PREFERRED cleanups for "local allocation"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



PATCH 10/15 - Mempolicy: MPOL_PREFERRED cleanups for "local allocation" 

Against: 2.6.25-rc8-mm1

V4 -> V5:
+  change mpol_to_str() to show "local" policy for MPOL_PREFERRED with
   preferred_node == -1.  libnuma wrappers and numactl use the term
   "local allocation", so let's use it here.

V3 -> V4:
+  updated Documentation/vm/numa_memory_policy.txt to better explain
   [I think] the "local allocation" feature of MPOL_PREFERRED.

V2 -> V3:
+  renamed get_nodemask() to get_policy_nodemask() to more closely
   match what it's doing.

V1 -> V2:
+  renamed get_zonemask() to get_nodemask().  Mel Gorman suggested this
   was a valid "cleanup".

Here are a couple of "cleanups" for MPOL_PREFERRED behavior
when v.preferred_node < 0 -- i.e., "local allocation":

1)  [do_]get_mempolicy() calls the now renamed get_policy_nodemask()
    to fetch the nodemask associated with a policy.  Currently,
    get_policy_nodemask() returns the set of nodes with memory, when
    the policy 'mode' is 'PREFERRED, and the preferred_node is < 0.
    Change to return an empty nodemask, as this is what was specified
    to achieve "local allocation".

2)  When a task is moved into a [new] cpuset, mpol_rebind_policy() is
    called to adjust any task and vma policy nodes to be valid in the
    new cpuset.  However, when the policy is MPOL_PREFERRED, and the
    preferred_node is <0, no rebind is necessary.  The "local allocation"
    indication is valid in any cpuset.  Existing code will "do the right
    thing" because node_remap() will just return the argument node when
    it is outside of the valid range of node ids.  However, I think it is
    clearer and cleaner to skip the remap explicitly in this case.

3)  mpol_to_str() produces a printable, "human readable" string from a
    struct mempolicy.  For MPOL_PREFERRED with preferred_node <0,  show
    "local", as this indicates local allocation, as the task migrates
    among nodes.  Note that this matches the usage of "local allocation"
    in libnuma() and numactl.  Without this change, I believe that node_set()
    [via set_bit()] will set bit 31, resulting in a misleading display.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@xxxxxx>

 mm/mempolicy.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Index: linux-2.6.25-rc8-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/mm/mempolicy.c	2008-04-02 17:47:37.000000000 -0400
+++ linux-2.6.25-rc8-mm1/mm/mempolicy.c	2008-04-02 17:47:41.000000000 -0400
@@ -645,11 +645,9 @@ static void get_policy_nodemask(struct m
 		*nodes = p->v.nodes;
 		break;
 	case MPOL_PREFERRED:
-		/* or use current node instead of memory_map? */
-		if (p->v.preferred_node < 0)
-			*nodes = node_states[N_HIGH_MEMORY];
-		else
+		if (p->v.preferred_node >= 0)
 			node_set(p->v.preferred_node, *nodes);
+		/* else return empty node mask for local allocation */
 		break;
 	default:
 		BUG();
@@ -804,7 +802,7 @@ int do_migrate_pages(struct mm_struct *m
 	int err = 0;
 	nodemask_t tmp;
 
-  	down_read(&mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 
 	err = migrate_vmas(mm, from_nodes, to_nodes, flags);
 	if (err)
@@ -1949,10 +1947,12 @@ void numa_default_policy(void)
 }
 
 /*
- * Display pages allocated per node and memory policy via /proc.
+ * "local" is pseudo-policy:  MPOL_PREFERRED with preferred_node == -1
+ * Used only for mpol_to_str()
  */
+#define MPOL_LOCAL (MPOL_INTERLEAVE + 1)
 static const char * const policy_types[] =
-	{ "default", "prefer", "bind", "interleave" };
+	{ "default", "prefer", "bind", "interleave", "local" };
 
 /*
  * Convert a mempolicy into a string.
@@ -1963,6 +1963,7 @@ static inline int mpol_to_str(char *buff
 {
 	char *p = buffer;
 	int l;
+	int nid;
 	nodemask_t nodes;
 	unsigned short mode;
 	unsigned short flags = pol ? pol->flags : 0;
@@ -1979,7 +1980,11 @@ static inline int mpol_to_str(char *buff
 
 	case MPOL_PREFERRED:
 		nodes_clear(nodes);
-		node_set(pol->v.preferred_node, nodes);
+		nid = pol->v.preferred_node;
+		if (nid < 0)
+			mode = MPOL_LOCAL;	/* pseudo-policy */
+		else
+			node_set(nid, nodes);
 		break;
 
 	case MPOL_BIND:
@@ -1994,8 +1999,8 @@ static inline int mpol_to_str(char *buff
 	}
 
 	l = strlen(policy_types[mode]);
- 	if (buffer + maxlen < p + l + 1)
- 		return -ENOSPC;
+	if (buffer + maxlen < p + l + 1)
+		return -ENOSPC;
 
 	strcpy(p, policy_types[mode]);
 	p += l;
@@ -2094,6 +2099,9 @@ static inline void check_huge_range(stru
 }
 #endif
 
+/*
+ * Display pages allocated per node and memory policy via /proc.
+ */
 int show_numa_map(struct seq_file *m, void *v)
 {
 	struct proc_maps_private *priv = m->private;
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [Devices]

  Powered by Linux