Re: [PATCH RFT v3 0/5] fork: Support shadow stacks in clone3()

Szabolcs Nagy <Szabolcs.Nagy@xxxxxxx> · Tue, 21 Nov 2023 12:21:37 +0000

The 11/21/2023 11:17, Christian Brauner wrote:
> On Mon, Nov 20, 2023 at 11:54:28PM +0000, Mark Brown wrote:
> > The kernel has recently added support for shadow stacks, currently
> > x86 only using their CET feature but both arm64 and RISC-V have
> > equivalent features (GCS and Zicfiss respectively), I am actively
> > working on GCS[1].  With shadow stacks the hardware maintains an
> > additional stack containing only the return addresses for branch
> > instructions which is not generally writeable by userspace and ensures
> > that any returns are to the recorded addresses.  This provides some
> > protection against ROP attacks and making it easier to collect call
> > stacks.  These shadow stacks are allocated in the address space of the
> > userspace process.
> > 
> > Our API for shadow stacks does not currently offer userspace any
> > flexiblity for managing the allocation of shadow stacks for newly
> > created threads, instead the kernel allocates a new shadow stack with
> > the same size as the normal stack whenever a thread is created with the
> > feature enabled.  The stacks allocated in this way are freed by the
> > kernel when the thread exits or shadow stacks are disabled for the
> > thread.  This lack of flexibility and control isn't ideal, in the vast
> > majority of cases the shadow stack will be over allocated and the
> > implicit allocation and deallocation is not consistent with other
> > interfaces.  As far as I can tell the interface is done in this manner
> > mainly because the shadow stack patches were in development since before
> > clone3() was implemented.
> > 
> > Since clone3() is readily extensible let's add support for specifying a
> > shadow stack when creating a new thread or process in a similar manner
> 
> So while I made clone3() readily extensible I don't want it to ever
> devolve into a fancier version of a prctl().
> 
> I would really like to see a strong reason for allowing userspace to
> configure the shadow stack size at this point in time.
> 
> I have a few questions that are probably me just not knowing much about
> shadow stacks so hopefully I'm not asking you write a thesis by
> accident:
> 
> (1) What does it mean for a shadow stack to be over allocated and is
>     over-allocation really that much of a problem out in the wild that
>     we need to give I userspace a knob to control a kernel security
>     feature?

over-allocation: allocating 100M shadow stack (RLIMIT_DATA, RLIMIT_AS)
for a thread that keeps arrays of data on the stack, instead of say 8k,
and thus running into resource limits.

under-allocation: small thread stack with runaway recursion, but large
sigaltstack: the stack overflow handler can run out of space because it
uses the same shadow stack as the thread.

> (2) With what other interfaces is implicit allocation and deallocation
>     not consistent? I don't understand this argument. The kernel creates
>     a shadow stack as a security measure to store return addresses. It
>     seems to me exactly that the kernel should implicitly allocate and
>     deallocate the shadow stack and not have userspace muck around with
>     its size?

the kernel is not supposed to impose stack size policy or a particular
programming model that limits the stack management options nor prevent
the handling of stack overflows.

> (3) Why is it safe for userspace to request the shadow stack size? What
>     if they request a tiny shadow stack size? Should this interface
>     require any privilege?

user can allocate huge or tiny stacks already.

and i think userspace can take control over shadow stack management:
it can disable signals, start a clone child with stack_size == 1 page,
map_shadow_stack and switch to it, enable signals. however this is
complicated, leaks 1 page of kernel allocated shadow stack (+reserved
guard page, i guess userspace could unmap, not sure if that works
currently) and requires additional syscalls.

> (4) Why isn't the @stack_size argument I added for clone3() enough?
>     If it is specified can't the size of the shadow stack derived from it?

shadow stack only contains return addresses so it is proportional
to the number of stack frames, not the stack size and it must
account for sigaltstack too, not just the thread stack.

if you make minimal assumptions about stack usage and ignore the
sigaltstack issue then the worst case shadow stack requirement
is indeed proportional to the stack_size, but this upper bound
can be pessimistic and userspace knows the tradeoffs better.

> 
> And my current main objection is that shadow stacks were just released
> to userspace. There can't be a massive amount of users yet - outside of
> maybe early adopters.

no upstream libc has code to enable shadow stacks at this point
so there are exactly 0 users in the open. (this feature requires
runtime support)

the change is expected to allow wider deployability. (e.g. not
just in glibc)

> 
> The fact that there are other architectures that bring in a similar
> feature makes me even more hesitant. If they have all agreed _and_
> implemented shadow stacks and have unified semantics then we can
> consider exposing control knobs to userspace that aren't implicitly
> architecture specific currently.
> 
> So I don't have anything against the patches per obviously but with the
> wider context.
> 
> Thanks!