On Fri 31-05-19 20:51:05, Yang Shi wrote: > > > On 5/30/19 11:41 PM, Michal Hocko wrote: > > On Thu 30-05-19 14:57:46, Yang Shi wrote: > > > Hi folks, > > > > > > > > > As what we discussed about page demotion for PMEM at LSF/MM, the demotion > > > should respect to the mempolicy and allowed mems of the process which the > > > page (anonymous page only for now) belongs to. > > cpusets memory mask (aka mems_allowed) is indeed tricky and somehow > > awkward. It is inherently an address space property and I never > > understood why we have it per _thread_. This just doesn't make any > > sense to me. This just leads to weird corner cases. What should happen > > if different threads disagree about the allocation affinity while > > working on a shared address space? > > I'm supposed (just my guess) such restriction should just apply for the > first allocation. Just like memcg charge, who does it first, whose policy > gets applied. I am not really sure that was the deliberate design choice. Maybe somebody has a different recollection though. > > > The vma that the page is mapped to can be retrieved from rmap walk easily, > > > but we need know the task_struct that the vma belongs to. It looks there is > > > not such API, and container_of seems not work with pointer member. > > I do not think this is a good idea. As you point out in the reply we > > have that for memcgs but we really hope to get rid of mm->owner there > > as well. It is just more tricky there. Moreover such a reverse mapping > > would be incorrect. Just think of a disagreeing yet overlapping cpusets > > for different threads mapping the same page. > > > > Is it such a big deal to document that the node migrate is not > > compatible with cpusets? > > Not only cpuset, but get_vma_policy() also needs find task_struct from vma. > Currently, get_vma_policy() just uses "current", so it just returns the > current process's mempolicy if the vma doesn't have mempolicy. For the node > migrate case, "current" is definitely not correct. > > It looks there is not an easy way to workaround it unless we claim node > migrate is not compatible with both cpusets and mempolicy. yep, it seems so. -- Michal Hocko SUSE Labs