Nick Piggin reported that the allocator may see an empty nodemask when changing cpuset's mems. The problem is that: Cpuset updates task->mems_allowed and mempolicy by setting all new bits in the nodemask first, and clearing all old unallowed bits later. But the allocator may load a word of the mask before setting all new bits and then load another word of the mask after clearing all old unallowed bits, in this way, the allocator sees an empty nodemask. It happens only on the kernel that do not do atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG) But I found that there is also a problem on the kernel that can do atomic nodemask_t stores. The problem is that the allocator can't find a node to alloc page when changing cpuset's mems though there is a lot of free memory. I can use the attached program reproduce it by the following step: # mkdir /dev/cpuset # mount -t cpuset cpuset /dev/cpuset # mkdir /dev/cpuset/1 # echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus # echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems # echo $$ > /dev/cpuset/1/tasks # numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> & <nr_tasks> = max(nr_cpus - 1, 1) # killall -s SIGUSR1 cpuset_mem_hog # ./change_mems.sh several hours later, oom will happen though there is a lot of free memory. The problem is following: task1 task2 mmap() mems=1 Can alloc page on node0? NO mems=1 mems=0 change mems from 1 to 0 mems=0-1 set all new bits mems=0 clear all disallowed bits Can alloc page on node1? NO mems=0 ... can't alloc page goto oom this patchset fixes those problems. Thanks Miao
Attachment:
reproduce_prog.tar.gz
Description: application/gzip