Re: [PATCH RFT v3 0/5] fork: Support shadow stacks in clone3()

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 21 Nov 2023 11:17:45 +0100

On Mon, Nov 20, 2023 at 11:54:28PM +0000, Mark Brown wrote:
> The kernel has recently added support for shadow stacks, currently
> x86 only using their CET feature but both arm64 and RISC-V have
> equivalent features (GCS and Zicfiss respectively), I am actively
> working on GCS[1].  With shadow stacks the hardware maintains an
> additional stack containing only the return addresses for branch
> instructions which is not generally writeable by userspace and ensures
> that any returns are to the recorded addresses.  This provides some
> protection against ROP attacks and making it easier to collect call
> stacks.  These shadow stacks are allocated in the address space of the
> userspace process.
> 
> Our API for shadow stacks does not currently offer userspace any
> flexiblity for managing the allocation of shadow stacks for newly
> created threads, instead the kernel allocates a new shadow stack with
> the same size as the normal stack whenever a thread is created with the
> feature enabled.  The stacks allocated in this way are freed by the
> kernel when the thread exits or shadow stacks are disabled for the
> thread.  This lack of flexibility and control isn't ideal, in the vast
> majority of cases the shadow stack will be over allocated and the
> implicit allocation and deallocation is not consistent with other
> interfaces.  As far as I can tell the interface is done in this manner
> mainly because the shadow stack patches were in development since before
> clone3() was implemented.
> 
> Since clone3() is readily extensible let's add support for specifying a
> shadow stack when creating a new thread or process in a similar manner

So while I made clone3() readily extensible I don't want it to ever
devolve into a fancier version of a prctl().

I would really like to see a strong reason for allowing userspace to
configure the shadow stack size at this point in time.

I have a few questions that are probably me just not knowing much about
shadow stacks so hopefully I'm not asking you write a thesis by
accident:

(1) What does it mean for a shadow stack to be over allocated and is
    over-allocation really that much of a problem out in the wild that
    we need to give I userspace a knob to control a kernel security
    feature?
(2) With what other interfaces is implicit allocation and deallocation
    not consistent? I don't understand this argument. The kernel creates
    a shadow stack as a security measure to store return addresses. It
    seems to me exactly that the kernel should implicitly allocate and
    deallocate the shadow stack and not have userspace muck around with
    its size?
(3) Why is it safe for userspace to request the shadow stack size? What
    if they request a tiny shadow stack size? Should this interface
    require any privilege?
(4) Why isn't the @stack_size argument I added for clone3() enough?
    If it is specified can't the size of the shadow stack derived from it?

And my current main objection is that shadow stacks were just released
to userspace. There can't be a massive amount of users yet - outside of
maybe early adopters.

The fact that there are other architectures that bring in a similar
feature makes me even more hesitant. If they have all agreed _and_
implemented shadow stacks and have unified semantics then we can
consider exposing control knobs to userspace that aren't implicitly
architecture specific currently.

So I don't have anything against the patches per obviously but with the
wider context.

Thanks!