Re: [PART3 Patch 00/14] introduce N_MEMORY

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 14 Nov 2012 11:52:27 -0800

On Fri, 02 Nov 2012 15:41:55 +0800
Wen Congyang <wency@xxxxxxxxxxxxxx> wrote:

> At 11/02/2012 05:36 AM, David Rientjes Wrote:
> > On Thu, 1 Nov 2012, Wen Congyang wrote:
> > 
> >>> This doesn't describe why we need the new node state, unfortunately.  It 
> >>
> >> 1. Somethimes, we use the node which contains the memory that can be used by
> >>    kernel.
> >> 2. Sometimes, we use the node which contains the memory.
> >>
> >> In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.
> >>
> > 
> > Yeah, that's clear, but the question is still _why_ we want two different 
> > nodemasks.  I know that this part of the patchset simply introduces the 
> > new nodemask because the name "N_MEMORY" is more clear than 
> > "N_HIGH_MEMORY", but there's no real incentive for making that change by 
> > introducing a new nodemask where a simple rename would suffice.
> > 
> > I can only assume that you want to later use one of them for a different 
> > purpose: those that do not include nodes that consist of only 
> > ZONE_MOVABLE.  But that change for MPOL_BIND is nacked since it 
> > significantly changes the semantics of set_mempolicy() and you can't break 
> > userspace (see my response to that from yesterday).  Until that problem is 
> > addressed, then there's no reason for the additional nodemask so nack on 
> > this series as well.

I cannot locate "my response to that from yesterday".  Specificity, please!

> 
> I still think that we need two nodemasks: one store the node which has memory
> that the kernel can use, and one store the node which has memory.
> 
> For example:
> 
> ==========================
> static void *__meminit alloc_page_cgroup(size_t size, int nid)
> {
> 	gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
> 	void *addr = NULL;
> 
> 	addr = alloc_pages_exact_nid(nid, size, flags);
> 	if (addr) {
> 		kmemleak_alloc(addr, size, 1, flags);
> 		return addr;
> 	}
> 
> 	if (node_state(nid, N_HIGH_MEMORY))
> 		addr = vzalloc_node(size, nid);
> 	else
> 		addr = vzalloc(size);
> 
> 	return addr;
> }
> ==========================
> If the node only has ZONE_MOVABLE memory, we should use vzalloc().
> So we should have a mask that stores the node which has memory that
> the kernel can use.
> 
> ==========================
> static int mpol_set_nodemask(struct mempolicy *pol,
> 		     const nodemask_t *nodes, struct nodemask_scratch *nsc)
> {
> 	int ret;
> 
> 	/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
> 	if (pol == NULL)
> 		return 0;
> 	/* Check N_HIGH_MEMORY */
> 	nodes_and(nsc->mask1,
> 		  cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
> ...
> 		if (pol->flags & MPOL_F_RELATIVE_NODES)
> 			mpol_relative_nodemask(&nsc->mask2, nodes,&nsc->mask1);
> 		else
> 			nodes_and(nsc->mask2, *nodes, nsc->mask1);
> ...
> }
> ==========================
> If the user specifies 2 nodes: one has ZONE_MOVABLE memory, and the other one doesn't.
> nsc->mask2 should contain these 2 nodes. So we should hava a mask that store the node
> which has memory.
> 
> There maybe something wrong in the change for MPOL_BIND. But this patchset is needed.

Well, let's discuss the userspace-visible non-back-compatible mpol
change.  What is it, why did it happen, what is its impact, is it
acceptable?

I grabbed "PART1" and "PART2", but that's as far as I got with the six
memory hotplug patch series.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>