Re: [PATCH v2 3/5] arm64: signal: Improve POR_EL0 handling to avoid uaccess failures

Kevin Brodsky <kevin.brodsky@xxxxxxx> · Fri, 25 Oct 2024 17:34:24 +0200

On 25/10/2024 13:33, Dave Martin wrote:
> On Fri, Oct 25, 2024 at 10:24:41AM +0200, Kevin Brodsky wrote:
>>>>>>> @@ -907,6 +964,7 @@ SYSCALL_DEFINE0(rt_sigreturn)
>>>>>>>  {
>>>>>>>  	struct pt_regs *regs = current_pt_regs();
>>>>>>>  	struct rt_sigframe __user *frame;
>>>>>>> +	struct user_access_state ua_state;
>>>>>>>  
>>>>>>>  	/* Always make any pending restarted system calls return -EINTR */
>>>>>>>  	current->restart_block.fn = do_no_restart_syscall;
>>>>>>> @@ -923,12 +981,14 @@ SYSCALL_DEFINE0(rt_sigreturn)
>>>>>>>  	if (!access_ok(frame, sizeof (*frame)))
>>>>>>>  		goto badframe;
>>>>>>>  
>>>>>>> -	if (restore_sigframe(regs, frame))
>>>>>>> +	if (restore_sigframe(regs, frame, &ua_state))
>>>>>>>  		goto badframe;
>>>>>>>  
>>>>>>>  	if (restore_altstack(&frame->uc.uc_stack))
>>>>>>>  		goto badframe;
>>>>>>>  
>>>>>>> +	restore_user_access_state(&ua_state);
>>>>>>> +
>>>>>>>  	return regs->regs[0];
>>>>>>>  
>>>>>>>  badframe:
>>>>>> The saving part I'm fine with. For restoring, I was wondering whether we
>>>>>> can get a more privileged POR_EL0 if reading the frame somehow failed.
>>>>>> This is largely theoretical, there are other ways to attack like
>>>>>> writing POR_EL0 directly than unmapping/remapping the signal stack.
>>>>>>
>>>>>> What I'd change here is always restore_user_access_state() to
>>>>>> POR_EL0_INIT. Maybe just initialise ua_state above and add the function
>>>>>> call after the badframe label.
>>>>> I'm not sure I understand. When we enter this function, POR_EL0 is set
>>>>> to whatever the signal handler set it to (POR_EL0_INIT by default).
>>>>> There are then two cases:
>>>>> 1) Everything succeeds, including reading the saved POR_EL0 from the
>>>>> frame. We then call restore_user_access_state(), setting POR_EL0 to the
>>>>> value we've read, and return to userspace.
>>>>> 2) Any uaccess fails (for instance reading POR_EL0). In that case we
>>>>> leave POR_EL0 unchanged and deliver SIGSEGV.
>>>>>
>>>>> In case 2 POR_EL0 is most likely already set to POR_EL0_INIT, or
>>>>> whatever the signal handler set it to. It's not clear to me that forcing
>>>>> it to POR_EL0_INIT helps much. Either way it's doubtful that the SIGSEGV
>>>>> handler will be able to recover, since the new signal frame we will
>>>>> create for it may be a mix of interrupted state and signal handler state
>>>>> (depending on exactly where we fail).
>>>> If the SIGSEGV delivery succeeds, returning would restore the POR_EL0
>>>> set up by the previous signal handler, potentially more privileged. Does
>>>> it matter? Can it return all the way to the original context?
>> What we store into the signal frame when delivering that SIGSEGV is a
>> mixture of the original state (up to the point of failure) and the
>> signal handler's state (what we couldn't restore). It's hard to reason
>> about how that SIGSEGV handler could possibly handle this, but in any
>> case it would have to massage its signal frame so that the next
>> sigreturn does the right thing. Restoring only part of the frame records
>> is bound to cause trouble and that's true for POR_EL0 as well - I doubt
>> there's much value in special-casing it.
> This feels like a simplification?
>
> We can leave a mix of restored and unrestored state when generating the
> SIGSEGV signal frame, providing that those changes will make no
> difference when the rt_sigreturn is replayed.

I'm not sure I understand what this means. If the SIGSEGV handler were
to sigreturn without touching its signal frame, things are likely to
explode: it may be returning to the point where the original handler
called sigreturn, for instance (if the first uaccess failed during that
sigreturn call).

> POR_EL0 will make a difference, though.
>
> The POR_EL0 image in the SIGSEGV signal frame needs be the same value
> that caused the original rt_sigreturn to barf (if this is what caused
> the barf).  It should be up to the SIGSEGV handler to decide what (if
> anything) to do about that.  The kernel can't know what userspace
> intended.

Unless I'm missing something this is exactly what happens now: what we
store in the SIGSEGV frame is the POR_EL0 value the original handler was
using.

> Note that for this to work, the SIGSEGV stack (whether main or
> alternate) must be accessible with POR_EL0_INIT permissions, or the
> SIGSEGV handler must start with a (gross) asm shim to establish a
> usable POR_EL0.  But that's not really our problem here.

This is indeed orthogonal - the SIGSEGV handler will be run with
POR_EL0_INIT, like any other handler. The value we store in the frame is
unrelated.

> (I'm not saying that the kernel necessarily fails to do this -- I
> haven't checked -- but just trying to understand the problem here.)
>
>
> The actual problem here is that if the SIGSEGV handler wants to bail
> out with a siglongjmp(), there is no way to determine the correct value
> of POR_EL0 to restore.

Correct, but again this is true of any other record - for instance TPIDR2.

> I wonder whether POR_EL0 should be saved in sigjmp_buf (depending on
> whether sigjmp_buf is horribly inextensible and also full up, or merely
> horribly inextensible).

It very much feels that this is the case - if a handler relies on
longjmp() or setcontext() to restore a known state, then POR_EL0 should
be part of that state.

>
> Does anyone know whether PKRU in sigjmp_buf on x86?

I can't say for sure but I don't see PKRU being handled in
setjmp/longjmp in glibc at least.

Kevin