On Thu, Aug 29, 2013 at 11:56:57AM +0100, Mel Gorman wrote: > > I thought it was, we crashed somewhere suspiciously close, but no. You > > need shared mpols for this to actually trigger and the NUMA stuff > > doesn't use that. > > > > Ah, so this is a red herring? Yeah, but I still think its an actual bug. It seems the easiest way to trigger this would be to: create a task that constantly allocates pages have said task have an MPOL_INTERLEAVE task policy put said task into a cpuset using a different task (your shell for example) flip the cpuset's mems_allowed back and forth. This would have the shell task constantly rebind (in two steps) our allocating task's INTERLEAVE policy. > > I used whatever nodemask.h did to detect end-of-bitmap and they use > > MAX_NUMNODES. See __next_node() and for_each_node() like. > > > > The check does prevent us going off the end of the bitmap but does not > necessarily return an online node. Right, but its guaranteed to return a 'valid' node. I don't think it returning an offline node is a problem, we'll find it empty and fail the page allocation. > > MAX_NUMNODES doesn't assume contiguous numbers since its the actual size > > of the bitmap, nr_online_nodes would hoever. > > > > I intended to say nr_node_ids, the same size as buffers such as the > task_numa_buffers. If we ever return a nid > nr_node_ids here then > task_numa_fault would corrupt memory. However, it should be possible for > node_weight to exceed nr_node_ids except maybe during node hot-remove so > it's not the problem. The nodemask situation seems somewhat more confused than the cpumask case; how would we ever return a nid > nr_node_ids? Corrupt nodemask? In the cpumask case we use the runtime limit nr_cpu_ids for all bitmap operations, arguably we should make the nodemask stuff do the same. Less bits to iterate is always good; a MAX_NUMNODES=64 (x86_64-defconfig) will still iterate all 64 bits, even though its unlikely to have more than 1 let alone more than 8 nodes. > > So I explicitly didn't use the node_isset() test because that's more > > likely to trigger than the nid >= MAX_NUMNODES test. Its fine to return > > a node that isn't actually part of the mask anymore -- a race is a race > > anyway. > > Yeah and as long as it's < nr_node_ids it should be ok within the task > numa fault handling as well. Right, I'm just a tad confused on how we could ever get a nid >= nr_node_ids except from a prior bug (corrupted nodemask). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>