On 10/22/20 12:16, Morten Rasmussen wrote: > On Wed, Oct 21, 2020 at 03:31:53PM +0100, Qais Yousef wrote: > > On 10/21/20 15:33, Morten Rasmussen wrote: > > > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote: > > > > one, though not as easy as automatic task placement by the scheduler (my > > > > first preference, followed by the id_* regs and the aarch32 mask, though > > > > not a strong preference for any). > > > > > > Automatic task placement by the scheduler would mean giving up the > > > requirement that the user-space affinity mask must always be honoured. > > > Is that on the table? > > > > > > Killing aarch32 tasks with an empty intersection between the user-space > > > mask and aarch32_mask is not really "automatic" and would require the > > > aarch32 capability to be exposed anyway. > > > > I just noticed this nasty corner case too. > > > > > > Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9 > > > > "If such a task had been bound to some subset of its cpuset using the > > sched_setaffinity() call, the task will be allowed to run on any CPU allowed in > > its new cpuset, negating the effect of the prior sched_setaffinity() call." > > > > So user space must put the tasks into a valid cpuset to fix the problem. Or > > make the scheduler behave like the affinity is associated with a cpuset. > > > > Can user space put the task into the correct cpuset without a race? Clone3 > > syscall learnt to specify a cgroup to attach to when forking. Should we do the > > same for execve()? > > Putting a task in a cpuset overrides any affinity mask applied by a > previous cpuset or sched_setaffinity() call. I wouldn't call it a corner > case though. Android user-space is exploiting it all the time on some > devices through the foreground, background, and top-app cgroups. Yep. What I was referring to is that if we go the kernel fixing affinity automatically route, that cpuset behavior will be problematic. In this case fixing the affinity at the task level will not be enough because cpusets will override it. So catering for that is another complication to take into account. > If a tasks fork() the child task will belong to the same cgroup > automatically. If you execve() you retain the previous affinity mask and > cgroup. So putting parent task about to execve() into aarch32 into a > cpuset with only aarch32 CPUs should be enough to never have the task or > any of its child tasks SIGKILLED. > > A few simple experiments with fork() and execve() seems to confirm that. +1 This made me wonder what happens when SCHED_RESET_ON_FORK is set. It only resets policty and priority. So we're good. > I don't see any changes needed in the kernel. Changing cgroup through > clone could of course fail if user-space specifies an unsuitable cgroup. > Fixing that would be part of fixing the cpuset setup in user-space which > is required anyway. +1 Thanks -- Qais Yousef