On Mon 10-10-22 17:48:42, Zhongkun He wrote: > There is usecase that System Management Software(SMS) want to give a > memory policy to other processes to make better use of memory. > > The information about how to use memory is not known to the app. > Instead, it is known to the userspace daemon(SMS), and that daemon > will decide the memory usage policy based on different factors. Please add some explanation why the cpuset interface is not usable for that usecase. > To solve the issue, this patch introduces a new syscall > pidfd_set_mempolicy(2). it sets the NUMA memory policy of the thread > specified in pidfd. > > In current process context there is no locking because only the process > accesses its own memory policy, so task_work is used in > pidfd_set_mempolicy() to update the mempolicy of the process specified > in pidfd, avoid using locks and race conditions. Why cannot you alter kernel_set_mempolicy (and do_set_mempolicy) to accept a task rather than operate on current? I have to really say that I dislike the task_work approach because it detaches the syscall from the actual operation and the caller simply doesn't know when the operation has been completed. > > The API is as follows, > > long pidfd_set_mempolicy(int pidfd, int mode, > const unsigned long __user *nmask, > unsigned long maxnode, > unsigned int flags); > > Set's the [pidfd] task's "task/process memory policy". The pidfd argument > is a PID file descriptor (see pidfd_open(2) man page) that specifies the > process to which the mempolicy is to be applied. The flags argument is > reserved for future use; currently, this argument must be specified as 0. > Please see the set_mempolicy(2) man page for more details about > other's arguments. Please also describe the security model. -- Michal Hocko SUSE Labs