Re: [PATCH v11 25/39] arm64/signal: Expose GCS state in signal frames

Catalin Marinas <catalin.marinas@xxxxxxx> · Fri, 23 Aug 2024 16:59:11 +0100

On Fri, Aug 23, 2024 at 11:25:30AM +0100, Mark Brown wrote:
> On Fri, Aug 23, 2024 at 10:37:19AM +0100, Catalin Marinas wrote:
> > On Thu, Aug 22, 2024 at 02:15:28AM +0100, Mark Brown wrote:
> 
> > > +	gcs_preserve_current_state();
> > > +	gcspr = current->thread.gcspr_el0 - 8;
> 
> > > +	__put_user_error(gcspr, &ctx->gcspr, err);
> 
> > Do we actually need to store the gcspr value after the cap token has
> > been pushed or just the value of the interrupted context? If we at some
> > point get a sigaltshadowstack() syscall, the saved GCS wouldn't point to
> > the new stack but rather the original one. Unwinders should be able to
> > get the actual GCSPR_EL0 register, no need for the sigcontext to point
> > to the new shadow stack.
> 
> We could store either the cap token or the interrupted GCSPR_EL0 (the
> address below the cap token).  It felt more joined up to go with the cap
> token since notionally signal return is consuming the cap token but
> either way would work, we could just add an offset when looking at the
> pointer.

In a hypothetical sigaltshadowstack() scenario, would the cap go on the
new signal shadow stack or on the old one? I assume on the new one but
in sigcontext we'd save the original GCSPR_EL0. In such hypothetical
case, the original GCSPR_EL0 would not need 8 subtracted.

I need to think some more about this. The gcs_restore_signal() function
makes sense, it starts with the current GCSPR_EL0 on the signal stack
and consumes the token, adds 8 to the shadow stack pointer. The
restore_gcs_context() one is confusing as it happens before consuming
the cap token and assumes that the GCSPR_EL0 value actually points to
the signal stack. If we ever implement an alternative shadow stack, the
original GCSPR_EL0 of the interrupted context would be lost. I know it's
not planned for now but the principles should be the same. The
sigframe.uc should store the interrupted state.

To me the order for sigreturn should be first to consume the cap token,
validate it etc. and then restore GCSPR_EL0 to whatever was saved in the
sigframe.uc prior to the signal being delivered.

> > Also in gcs_signal_entry() in the previous patch, we seem to subtract 16
> > rather than 8.
> 
> We need to not only place a cap but also a GCS frame for the sigreturn
> trampoline, the sigreturn trampoline isn't part of the interrupted
> context so isn't included in the signal frame but it needs to have a
> record on the GCS so that the signal handler doesn't just generate a GCS
> fault if it tries to return to the trampoline.  This means that the
> GCSPR_EL0 that is set for the signal handler needs to move two entries,
> one for the cap token and one for the trampoline.

Yes, this makes sense.

> > What I find confusing is that both restore_gcs_context() and
> > gcs_restore_signal() seem to touch current->thread.gcspr_el0 and the
> > sysreg. Which one takes priority? I should probably check the branch out
> > to see the end result.
> 
> restore_gcs_context() is loading values from the signal frame in memory
> (which will only happen if a GCS context is present) then
> gcs_restore_signal() consumes the token at the top of the stack.  The
> split is because userspace can skip the restore_X_context() functions
> for the optional signal frame elements by removing them from the context
> but we want to ensure that we always consume a token.

I agree we should always consume a token but this should be done from
the actual hardware GCSPR_EL0 value on the sigreturn call rather than
the one restored from sigframe.uc. The restoring should be the last
step.

-- 
Catalin