On Fri, 2 Mar 2012, Peter Zijlstra wrote: > Also, for the write side it doesn't really matter, changing mems_allowed > should be rare and is an 'expensive' operation anyway. > It's very expensive even without memory barriers since the page allocator wraps itself in {get,put}_mems_allowed() until a page or NULL is returned and an update to current's set of allowed mems can stall indefinitely trying to change the nodemask during this time. The thread changing cpuset.mems is holding cgroup_mutex the entire time which locks out changes, including adding additional nodes to current's set of allowed mems. If direct reclaim takes a long time or an oom killed task fails to exit quickly (or the allocation is __GFP_NOFAIL and we just spin indefinitely holding get_mems_allowed()), then it's not uncommon to see a write to cpuset.mems taking minutes while holding the mutex, if it ever actually returns at all. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>