Re: [PATCH v4 05/17] add support for Clang's Shadow Call Stack (SCS)

Sami Tolvanen <samitolvanen@xxxxxxxxxx> · Mon, 4 Nov 2019 10:25:24 -0800

On Mon, Nov 4, 2019 at 4:31 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > +/*
> > + * In testing, 1 KiB shadow stack size (i.e. 128 stack frames on a 64-bit
> > + * architecture) provided ~40% safety margin on stack usage while keeping
> > + * memory allocation overhead reasonable.
> > + */
> > +#define SCS_SIZE     1024
>
> To make it easier to reason about type promotion rules (and avoid that
> we accidentaly mask out high bits when using this to generate a mask),
> can we please make this 1024UL?

Sure.

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu)
> >       raw_spin_lock_irqsave(&idle->pi_lock, flags);
> >       raw_spin_lock(&rq->lock);
> >
> > +     scs_task_reset(idle);
>
> Could we please do this next to the kasan_unpoison_task_stack() call,
> Either just before, or just after?
>
> They're boot addressing the same issue where previously live stack is
> being reused, and in general I'd expect them to occur at the same time
> (though I understand idle will be a bit different).

Good point, I'll move this.

> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -58,6 +58,7 @@
> >  #include <linux/profile.h>
> >  #include <linux/psi.h>
> >  #include <linux/rcupdate_wait.h>
> > +#include <linux/scs.h>
> >  #include <linux/security.h>
> >  #include <linux/stop_machine.h>
> >  #include <linux/suspend.h>
>
> This include looks extraneous.

I added this to sched.h, because most of the includes used in
kernel/sched appear to be there, but I can move this to
kernel/sched/core.c instead.

> > +static inline void *__scs_base(struct task_struct *tsk)
> > +{
> > +     /*
> > +      * We allow architectures to use the shadow_call_stack field in
> > +      * struct thread_info to store the current shadow stack pointer
> > +      * during context switches.
> > +      *
> > +      * This allows the implementation to also clear the field when
> > +      * the task is active to avoid keeping pointers to the current
> > +      * task's shadow stack in memory. This can make it harder for an
> > +      * attacker to locate the shadow stack, but also requires us to
> > +      * compute the base address when needed.
> > +      *
> > +      * We assume the stack is aligned to SCS_SIZE.
> > +      */
>
> How about:
>
>         /*
>          * To minimize risk the of exposure, architectures may clear a
>          * task's thread_info::shadow_call_stack while that task is
>          * running, and only save/restore the active shadow call stack
>          * pointer when the usual register may be clobbered (e.g. across
>          * context switches).
>          *
>          * The shadow call stack is aligned to SCS_SIZE, and grows
>          * upwards, so we can mask out the low bits to extract the base
>          * when the task is not running.
>          */
>
> ... which I think makes the lifetime and constraints a bit clearer.

Sounds good to me, thanks.

> > +     return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1));
>
> We usually use unsigned long ratehr than uintptr_t. Could we please use
> that for consistency?
>
> The kernel relies on sizeof(unsigned long) == sizeof(void *) tree-wide,
> so that doesn't cause issues for us here.
>
> Similarly, as suggested above, it would be easier to reason about this
> knowing that SCS_SIZE is an unsigned long. While IIUC we'd get sign
> extension here when it's promoted, giving the definition a UL suffix
> minimizes the scope for error.

OK, I'll switch to unsigned long.

> > +/* Keep a cache of shadow stacks */
> > +#define SCS_CACHE_SIZE 2
>
> How about:
>
> /* Matches NR_CACHED_STACKS for VMAP_STACK */
> #define NR_CACHED_SCS 2
>
> ... which explains where the number came from, and avoids confusion that
> the SIZE is a byte size rather than number of elements.

Agreed, that sounds better.

> > +static void scs_free(void *s)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < SCS_CACHE_SIZE; i++)
> > +             if (this_cpu_cmpxchg(scs_cache[i], 0, s) == 0)
> > +                     return;
>
> Here we should compare to NULL rather than 0.

Ack.

> > +void __init scs_init(void)
> > +{
> > +     cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL,
> > +             scs_cleanup);
>
> We probably want to do something if this call fails. It looks like we'd
> only leak two pages (and we'd be able to use them if/when that CPU is
> brought back online. A WARN_ON() is probably fine.

fork_init() in kernel/fork.c lets this fail quietly, but adding a
WARN_ON seems fine.

I will include these changes in v5.

Sami