Re: [RFC PATCH v2 4/4] arm64: Export id_aar64fpr0 via sysfs

Morten Rasmussen <morten.rasmussen@xxxxxxx> · Thu, 22 Oct 2020 12:16:24 +0200

On Wed, Oct 21, 2020 at 03:31:53PM +0100, Qais Yousef wrote:
> On 10/21/20 15:33, Morten Rasmussen wrote:
> > On Wed, Oct 21, 2020 at 01:15:59PM +0100, Catalin Marinas wrote:
> > > one, though not as easy as automatic task placement by the scheduler (my
> > > first preference, followed by the id_* regs and the aarch32 mask, though
> > > not a strong preference for any).
> > 
> > Automatic task placement by the scheduler would mean giving up the
> > requirement that the user-space affinity mask must always be honoured.
> > Is that on the table?
> > 
> > Killing aarch32 tasks with an empty intersection between the user-space
> > mask and aarch32_mask is not really "automatic" and would require the
> > aarch32 capability to be exposed anyway.
> 
> I just noticed this nasty corner case too.
> 
> 
> Documentation/admin-guide/cgroup-v1/cpusets.rst: Section 1.9
> 
> "If such a task had been bound to some subset of its cpuset using the
> sched_setaffinity() call, the task will be allowed to run on any CPU allowed in
> its new cpuset, negating the effect of the prior sched_setaffinity() call."
> 
> So user space must put the tasks into a valid cpuset to fix the problem. Or
> make the scheduler behave like the affinity is associated with a cpuset.
> 
> Can user space put the task into the correct cpuset without a race? Clone3
> syscall learnt to specify a cgroup to attach to when forking. Should we do the
> same for execve()?

Putting a task in a cpuset overrides any affinity mask applied by a
previous cpuset or sched_setaffinity() call. I wouldn't call it a corner
case though. Android user-space is exploiting it all the time on some
devices through the foreground, background, and top-app cgroups.

If a tasks fork() the child task will belong to the same cgroup
automatically. If you execve() you retain the previous affinity mask and
cgroup. So putting parent task about to execve() into aarch32 into a
cpuset with only aarch32 CPUs should be enough to never have the task or
any of its child tasks SIGKILLED.

A few simple experiments with fork() and execve() seems to confirm that.

I don't see any changes needed in the kernel. Changing cgroup through
clone could of course fail if user-space specifies an unsuitable cgroup.
Fixing that would be part of fixing the cpuset setup in user-space which
is required anyway.

Morten