On Tue, 2024-09-10 at 13:42 -0400, Waiman Long wrote: > > On 9/10/24 13:11, Felix Moessbauer wrote: > > The io worker threads are userland threads that just never exit to > > the > > userland. By that, they are also assigned to a cgroup (the group of > > the > > creating task). > > The io-wq task is not actually assigned to a cgroup. To belong to a > cgroup, its pid has to be present to the cgroup.procs of the > corresponding cgroup, which is not the case here. Hi, thanks for jumping in. As said, I'm not too familiar with the internals of the io worker threads. Nonetheless, the kernel presents the cgroup assignment quite consistently. This however contradicts your statement from above. Example: pid tid 648460 648460 SCHED_OTHER 20 S 0 0-1 ./test/wq-aff.t 648460 648461 SCHED_OTHER 20 S 1 1 iou-sqp-648460 648460 648462 SCHED_OTHER 20 S 0 0 iou-wrk-648461 When I now check the cgroup.procs, I just see the 648460, which is expected as this the process (with its main thread). Checking cgroup.threads shows all three tids. When checking the other way round, I get the same information: $cat /proc/648460/task/648461/cgroup 0::/user.slice/user-1000.slice/session-1.scope $cat /proc/648460/task/648462/cgroup 0::/user.slice/user-1000.slice/session-1.scope Now I'm wondering if it is just presented incorrectly, or if these tasks indeed belong to the mentioned cgroup? > My understanding is > that you are just restricting the CPU affinity to follow the cpuset > of > the corresponding user task that creates it. The CPU affinity > (cpumask) > is just one of the many resources controlled by a cgroup. That > probably > needs to be clarified. That's clear. Looking at the bigger picture, I want to ensure that the io workers do not break out of the cgroup limits (I called it "ambient" before, similar to the capabilites), because this breaks the isolation assumption. In our case, we are mostly interested in not leaving the cpuset, as we use that to perform system partitioning into realtime and non realtime parts. > > Besides cpumask, the cpuset controller also controls the node mask of > the memory nodes allowed. Yes, and that is especially important as some memory can be "closer" to the IOs than others. Best regards, Felix -- Siemens AG, Technology Linux Expert Center