Re: [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3()

Mark Brown <broonie@xxxxxxxxxx> · Tue, 13 Aug 2024 19:58:26 +0100

On Tue, Aug 13, 2024 at 05:25:47PM +0100, Catalin Marinas wrote:

> However, the x86 would be slightly inconsistent here between clone() and
> clone3(). I guess it depends how you look at it. The classic clone()
> syscall, if one doesn't pass CLONE_VM but does set new stack, there's no
> new shadow stack allocated which I'd expect since it's a new stack.
> Well, I doubt anyone cares about this scenario. Are there real cases of
> !CLONE_VM but with a new stack?

ISTR the concerns were around someone being clever with vfork() but I
don't remember anything super concrete.  In terms of the inconsistency
here that was actually another thing that came up - if userspace
specifies a stack for clone3() it'll just get used even with CLONE_VFORK
so it seemed to make sense to do the same thing for the shadow stack.
This was part of the thinking when we were looking at it, if you can
specify a regular stack you should be able to specify a shadow stack.

> > > I'm confused that we need to consume the token here. I could not find
> > > the default shadow stack allocation doing this, only setting it via
> > > create_rstor_token() (or I did not search enough). In the default case,

> > As discussed for a couple of previous versions if we don't have the
> > token and userspace can specify any old shadow stack page as the shadow
> > stack this allows clone3() to be used to overwrite the shadow stack of
> > another thread, you can point to a shadow stack page which is currently

> IIUC, the kernel-allocated shadow stack will have the token always set
> while the user-allocated one will be cleared. I was looking to

No, when the kernel allocates we don't bother with tokens at all.  We
only look for and clear a token with the user specified shadow stack.

> understand the inconsistency between these two cases in terms of the
> final layout of the new shadow stack: one with the token, the other
> without. I can see the need for checking but maybe start with requiring
> it to be 0 and setting the token before returning, for consistency with
> clone().

The layout should be the same, the shadow stack will point to where the
token would be - the only difference is if we checked to see if there
was a token there.  Since we either clear the token on use or allocate a
fresh page in both cases the value there will be 0.

> In the kernel-allocated shadow stack, is the token used for anything? I
> can see it's used for signal delivery and return but I couldn't figure
> out what it is used for in a thread's shadow stack.

For arm64 we place differently formatted tokens there during signal
handling, and a token is placed at the top of the stack as part of the
architected stack pivoting instructions (and a token at the destination
consumed).  I believe x86 has the same pivoting behaviour but ICBW.  A
user specified shadow stack is handled in a very similar way to what
would happen if the newly created thread immediately pivoted to the
specified stack.

> Also, can one not use the clone3() to point to the clone()-allocated
> shadow stack? Maybe that's unlikely as an app tends to stick to one
> syscall flavour or the other.

A valid token will only be present on an inactive stack.  If a thread
pivots away from a kernel allocated stack then another thread could be
started using the original kernel allocated stack, any program doing
this should think carefully about the lifecycle of the kernel allocated
stack but it's possible.  If a thread has not pivoted away from it's
stack then there won't be a token at the top of the stack and it won't
be possible to pivot to it.

> > > > +		/*
> > > > +		 * For CLONE_VFORK the child will share the parents
> > > > +		 * shadow stack.  Make sure to clear the internal
> > > > +		 * tracking of the thread shadow stack so the freeing
> > > > +		 * logic run for child knows to leave it alone.
> > > > +		 */
> > > > +		if (clone_flags & CLONE_VFORK) {
> > > > +			shstk->base = 0;
> > > > +			shstk->size = 0;
> > > > +			return 0;
> > > > +		}

> > > I think we should leave the CLONE_VFORK check on its own independent of
> > > the clone3() arguments. If one passes both CLONE_VFORK and specific
> > > shadow stack address/size, they should be ignored (or maybe return an
> > > error if you want to make it stricter).

> > This is existing logic from the current x86 code that's been reindented
> > due to the addition of explicitly specified shadow stacks, it's not new
> > behaviour.  It is needed to stop the child thinking it has the parent's
> > shadow stack in the CLONE_VFORK case.

> I figured that. But similar to the current !CLONE_VM behaviour where no
> new shadow stack is allocated even if a new stack is passed to clone(),
> I was thinking of something similar here for consistency: don't set up a
> shadow stack in the CLONE_VFORK case or at least allow it only if a new
> stack is being set up (if we extend this to clone(), it would be a small
> ABI change).

We could restrict specifying a shadow stack to only be supported when a
regular stack is also specified, if we're doing that I'd prefer to do it
in all cases rather than only for vfork() since that reduces the number
of special cases and we don't restrict normal stacks like that.

> > This is again all existing behaviour for the case where the user has not
> > specified a shadow stack reindented, as mentioned above if the user has
> > specified one explicitly then we just do what we were asked.  The
> > existing behaviour is to only create a new shadow stack for the child in
> > the CLONE_VM case and leave the child using the same shadow stack as the
> > parent in the copied mm for !CLONE_VM.

> I guess I was rather questioning the current choices than the new
> clone3() ABI. But even for the new clone3() ABI, does it make sense to
> set up a shadow stack if the current stack isn't changed? We'll end up
> with a lot of possible combinations that will never get tested but
> potentially become obscure ABI. Limiting the options to the sane choices
> only helps with validation and unsurprising changes later on.

OTOH if we add the restrictions it's more code (and more test code) to
check, and thinking about if we've missed some important use case.  Not
that it's a *huge* amount of code, like I say I'd not be too unhappy
with adding a restriction on having a regular stack specified in order
to specify a shadow stack.
Attachment:
signature.asc

Description: PGP signature