Re: [RFC PATCH v2 4/4] arm64: Export id_aar64fpr0 via sysfs

Morten Rasmussen <morten.rasmussen@xxxxxxx> · Thu, 22 Oct 2020 11:55:48 +0200

On Wed, Oct 21, 2020 at 06:19:48PM +0100, Will Deacon wrote:
> On Wed, Oct 21, 2020 at 05:18:37PM +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 04:37:38PM +0100, Will Deacon wrote:
> > > On Wed, Oct 21, 2020 at 04:10:06PM +0100, Catalin Marinas wrote:
> > > > On Wed, Oct 21, 2020 at 03:45:43PM +0100, Will Deacon wrote:
> > > > > On Wed, Oct 21, 2020 at 03:09:46PM +0100, Catalin Marinas wrote:
> > > > > > Anyway, if the task placement is entirely off the table, the next thing
> > > > > > is asking applications to set their own mask and kill them if they do
> > > > > > the wrong thing. Here I see two possibilities for killing an app:
> > > > > > 
> > > > > > 1. When it ends up scheduled on a non-AArch32-capable CPU
> > > > > 
> > > > > That sounds fine to me. If we could do the exception return and take a
> > > > > SIGILL, that's what we'd do, but we can't so we have to catch it before.
> > > > 
> > > > Indeed, the illegal ERET doesn't work for this scenario.
> > > > 
> > > > > > 2. If the user cpumask (bar the offline CPUs) is not a subset of the
> > > > > >    aarch32_mask
> > > > > > 
> > > > > > Option 1 is simpler but 2 would be slightly more consistent.
> > > > > 
> > > > > I disagree -- if we did this for something like fpsimd, then the consistent
> > > > > behaviour would be to SIGILL on the cores without the instructions.
> > > > 
> > > > For fpsimd it makes sense since the main ISA is still available and the
> > > > application may be able to do something with the signal. But here we
> > > > can't do much since the entire AArch32 mode is not supported. That's why
> > > > we went for SIGKILL instead of SIGILL but thinking of it, after execve()
> > > > the signals are reset to SIG_DFL so SIGILL cannot be ignored.
> > > > 
> > > > I think it depends on whether you look at this fault as a part of ISA
> > > > not being available or as the overall application not compatible with
> > > > the system it is running on. If the latter, option 2 above makes more
> > > > sense.
> > > 
> > > Hmm, I'm not sure I see the distinction in practice: you still have a binary
> > > application that cannot run on all CPUs in the system. Who cares if some of
> > > the instructions work?
> > 
> > The failure would be more predictable rather than the app running for a
> > while and randomly getting SIGKILL. If it only fails on execve or
> > sched_setaffinity, it may be easier to track down (well, there's the CPU
> > hotplug as well that can change the cpumask intersection outside the
> > user process control).

Migration between cpusets is another failure scenario where the app can
get SIGKILL randomly.

> But it's half-baked, because the moment the 32-bit task changes its affinity
> mask then you're back in the old situation. That's why I'm saying this
> doesn't add anything, because the rest of the series is designed entirely
> around delivering SIGKILL at the last minute rather than preventing us
> getting to that situation in the first place. The execve() case feels to me
> like we're considering doing something because we can, rather than because
> it's actually useful.

Agree.