Re: [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2024-09-10 at 13:42 -0400, Waiman Long wrote:
> 
> On 9/10/24 13:11, Felix Moessbauer wrote:
> > The io worker threads are userland threads that just never exit to
> > the
> > userland. By that, they are also assigned to a cgroup (the group of
> > the
> > creating task).
> 
> The io-wq task is not actually assigned to a cgroup. To belong to a 
> cgroup, its pid has to be present to the cgroup.procs of the 
> corresponding cgroup, which is not the case here.

Hi, thanks for jumping in. As said, I'm not too familiar with the
internals of the io worker threads. Nonetheless, the kernel presents
the cgroup assignment quite consistently. This however contradicts your
statement from above. Example:

pid     tid
648460  648460  SCHED_OTHER   20  S    0  0-1  ./test/wq-aff.t
648460  648461  SCHED_OTHER   20  S    1  1    iou-sqp-648460
648460  648462  SCHED_OTHER   20  S    0  0    iou-wrk-648461

When I now check the cgroup.procs, I just see the 648460, which is
expected as this the process (with its main thread). Checking
cgroup.threads shows all three tids.

When checking the other way round, I get the same information:
$cat /proc/648460/task/648461/cgroup                                  
0::/user.slice/user-1000.slice/session-1.scope
$cat /proc/648460/task/648462/cgroup                                  
0::/user.slice/user-1000.slice/session-1.scope

Now I'm wondering if it is just presented incorrectly, or if these
tasks indeed belong to the mentioned cgroup?

> My understanding is
> that you are just restricting the CPU affinity to follow the cpuset
> of 
> the corresponding user task that creates it. The CPU affinity
> (cpumask) 
> is just one of the many resources controlled by a cgroup. That
> probably 
> needs to be clarified.

That's clear. Looking at the bigger picture, I want to ensure that the
io workers do not break out of the cgroup limits (I called it "ambient"
before, similar to the capabilites), because this breaks the isolation
assumption. In our case, we are mostly interested in not leaving the
cpuset, as we use that to perform system partitioning into realtime and
non realtime parts.

> 
> Besides cpumask, the cpuset controller also controls the node mask of
> the memory nodes allowed.

Yes, and that is especially important as some memory can be "closer" to
the IOs than others.

Best regards,
Felix

-- 
Siemens AG, Technology
Linux Expert Center






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux