On 3/8/23 06:42, Daniel Dao wrote:
Hi all, We encountered EINVAL when enabling cpuset in cgroupv2 when io_uring worker threads are running. Here are the steps to reproduce the failure on kernel 6.1.14: 1. Remove cpuset from subtree_control > for d in $(find /sys/fs/cgroup/ -maxdepth 1 -type d); do echo '-cpuset' | sudo tee -a $d/cgroup.subtree_control; done > cat /sys/fs/cgroup/cgroup.subtree_control cpu io memory pids 2. Run any applications that utilize the uring worker thread pool. I used https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-io_uring-worker-pool > cargo run -- -a -w 2 -t 2 3. Enabling cpuset will return EINVAL > echo '+cpuset' | sudo tee -a /sys/fs/cgroup/cgroup.subtree_control +cpuset tee: /sys/fs/cgroup/cgroup.subtree_control: Invalid argument We traced this down to task_can_attach that will return EINVAL when it encounters kthreads with PF_NO_SETAFFINITY, which io_uring worker threads have. This seems like an unexpected interaction when enabling cpuset for the subtrees that contain kthreads. We are currently considering a workaround to try to enable cpuset in root subtree_control before any io_uring applications can start, hence failure to enable cpuset is localized to only cgroup with io_uring kthreads. But this is cumbersome. Any suggestions would be very much appreciated.
Anytime you echo "+cpuset" to cgroup.subtree_control to enable cpuset, the tasks within the child cgroups will do an implicit move from the parent cpuset to the child cpusets. However, that move will fail if any task has the PF_NO_SETAFFINITY flag set due to task_can_attach() function which checks for this. One possible solution is for the cpuset to ignore tasks with PF_NO_SETAFFINITY set for implicit move. IOW, allowing the implicit move without touching it, but not explicit one using cgroup.procs.