Re: [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3()

Mark Brown <broonie@xxxxxxxxxx> · Sat, 10 Aug 2024 00:06:12 +0100

On Fri, Aug 09, 2024 at 07:19:26PM +0100, Catalin Marinas wrote:
> On Thu, Aug 08, 2024 at 09:15:25AM +0100, Mark Brown wrote:

> > +	/* This should really be an atomic cmpxchg.  It is not. */
> > +	if (access_remote_vm(mm, addr, &val, sizeof(val),
> > +			     FOLL_FORCE) != sizeof(val))
> > +		goto out;

> If we restrict the shadow stack creation only to the CLONE_VM case, we'd
> not need the remote vm access, it's in the current mm context already.
> More on this below.

The discussion in previous iterations was that it seemed better to allow
even surprising use cases since it simplifies the analysis of what we
have covered.  If the user has specified a shadow stack we just do what
they asked for and let them worry about if it's useful.

> > +	if (val != expected)
> > +		goto out;

> I'm confused that we need to consume the token here. I could not find
> the default shadow stack allocation doing this, only setting it via
> create_rstor_token() (or I did not search enough). In the default case,
> is the user consuming it? To me the only difference should been the
> default allocation vs the one passed by the user via clone3(), with the
> latter maybe requiring the user to set the token initially.

As discussed for a couple of previous versions if we don't have the
token and userspace can specify any old shadow stack page as the shadow
stack this allows clone3() to be used to overwrite the shadow stack of
another thread, you can point to a shadow stack page which is currently
in use and then run some code that causes shadow stack writes.  This
could potentially then in turn be used as part of a bigger exploit
chain, probably it's hard to get anything beyond just causing the other
thread to fault but won't be impossible.

With a kernel allocated shadow stack this is not an issue since we are
placing the shadow stack in new memory, userspace can't control where we
place it so it can't overwrite an existing shadow stack.

> > +		/*
> > +		 * For CLONE_VFORK the child will share the parents
> > +		 * shadow stack.  Make sure to clear the internal
> > +		 * tracking of the thread shadow stack so the freeing
> > +		 * logic run for child knows to leave it alone.
> > +		 */
> > +		if (clone_flags & CLONE_VFORK) {
> > +			shstk->base = 0;
> > +			shstk->size = 0;
> > +			return 0;
> > +		}

> I think we should leave the CLONE_VFORK check on its own independent of
> the clone3() arguments. If one passes both CLONE_VFORK and specific
> shadow stack address/size, they should be ignored (or maybe return an
> error if you want to make it stricter).

This is existing logic from the current x86 code that's been reindented
due to the addition of explicitly specified shadow stacks, it's not new
behaviour.  It is needed to stop the child thinking it has the parent's
shadow stack in the CLONE_VFORK case.

> > -	/*
> > -	 * For !CLONE_VM the child will use a copy of the parents shadow
> > -	 * stack.
> > -	 */
> > -	if (!(clone_flags & CLONE_VM))
> > -		return 0;
> > +		/*
> > +		 * For !CLONE_VM the child will use a copy of the
> > +		 * parents shadow stack.
> > +		 */
> > +		if (!(clone_flags & CLONE_VM))
> > +			return 0;

> Is the !CLONE_VM case specific only to the default shadow stack
> allocation? Sorry if this has been discussed already (or I completely
> forgot) but I thought we'd only implement this for the thread creation
> case. The typical fork() for a new process should inherit the parent's
> layout, so applicable to the clone3() with the shadow stack arguments as
> well (which should be ignored or maybe return an error with !CLONE_VM).

This is again all existing behaviour for the case where the user has not
specified a shadow stack reindented, as mentioned above if the user has
specified one explicitly then we just do what we were asked.  The
existing behaviour is to only create a new shadow stack for the child in
the CLONE_VM case and leave the child using the same shadow stack as the
parent in the copied mm for !CLONE_VM.

> > @@ -2790,6 +2808,8 @@ pid_t kernel_clone(struct kernel_clone_args *args)
> >  	 */
> >  	trace_sched_process_fork(current, p);
> >  
> > +	shstk_post_fork(p, args);

> Do we need this post fork call? Can we not handle the setup via the
> copy_thread() path in shstk_alloc_thread_stack()?

It looks like we do actually have the new mm in the process before we
call copy_thread() so we could move things into there though we'd loose
a small bit of factoring out of the error handling (at one point I had
more code factored out but right now it's quite small, looking again we
could also factor out the get_task_mm()/mput()).  ISTR having the new
process' mm was the biggest reason for this initially but looking again
I'm not sure why that was.  It does still feel like even the small
amount that's factored out currently is useful though, a bit less
duplication in the architecture code which feels welcome here.
Attachment:
signature.asc

Description: PGP signature