Re: [PATCH v7 33/41] x86/shstk: Introduce map_shadow_stack syscall

Deepak Gupta <debug@xxxxxxxxxxxx> · Thu, 9 Mar 2023 13:08:17 -0800

On Thu, Mar 09, 2023 at 07:39:41PM +0000, Edgecombe, Rick P wrote:
On Thu, 2023-03-09 at 10:55 -0800, Deepak Gupta wrote:
On Thu, Mar 02, 2023 at 05:22:07PM +0000, Szabolcs Nagy wrote:
> The 02/27/2023 14:29, Rick Edgecombe wrote:
> > Previously, a new PROT_SHADOW_STACK was attempted,
>
> ...
> > So rather than repurpose two existing syscalls (mmap, madvise)
> > that don't
> > quite fit, just implement a new map_shadow_stack syscall to allow
> > userspace to map and setup new shadow stacks in one step. While
> > ucontext
> > is the primary motivator, userspace may have other unforeseen
> > reasons to
> > setup it's own shadow stacks using the WRSS instruction. Towards
> > this
> > provide a flag so that stacks can be optionally setup securely
> > for the
> > common case of ucontext without enabling WRSS. Or potentially
> > have the
> > kernel set up the shadow stack in some new way.
>
> ...
> > The following example demonstrates how to create a new shadow
> > stack with
> > map_shadow_stack:
> > void *shstk = map_shadow_stack(addr, stack_size,
> > SHADOW_STACK_SET_TOKEN);
>
> i think
>
> mmap(addr, size, PROT_READ, MAP_ANON|MAP_SHADOW_STACK, -1, 0);
>
> could do the same with less disruption to users (new syscalls
> are harder to deal with than new flags). it would do the
> guard page and initial token setup too (there is no flag for
> it but could be squeezed in).

Discussion on this topic in v6

https://lore.kernel.org/all/20230223000340.GB945966@xxxxxxxxxxxxxxxxxxxxx/

Again I know earlier CET patches had protection flag and somehow due
to pushback
on mailing list,
 it was adopted to go for special syscall because no one else
had shadow stack.

Seeing a response from Szabolcs, I am assuming arm4 would also want
to follow
using mmap to manufacture shadow stack. For reference RFC patches for
risc-v shadow stack,
use a new protection flag = PROT_SHADOWSTACK.

https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@xxxxxxxxxxxx/

I know earlier discussion had been that we let this go and do a re-
factor later as other
arch support trickle in. But as I thought more on this and I think it
may just be
messy from user mode point of view as well to have cognition of two
different ways of
creating shadow stack. One would be special syscall (in current libc)
and another `mmap`
(whenever future re-factor happens)

If it's not too late, it would be more wise to take `mmap`
approach rather than special `syscall` approach.

There is sort of two things intermixed here when we talk about a
PROT_SHADOW_STACK.

One is: what is the interface for specifying how the shadow stack
should be provisioned with data? Right now there are two ways
supported, all zero or with an X86 shadow stack restore token at the
end. Then there was already some conversation about a third type. In
which case the question would be is using mmap MAP_ flags the right
place for this? How many types of initialization will be needed in the
end and what is the overlap between the architectures?

First of all, arches can choose to have token at the bottom or not.

Token serve following purposes
 - It allows one to put desired value in shadow stack pointer in safe/secure manner.
   Note: x86 doesn't provide any opcode encoding to value in SSP register. So having
   a token is kind of a necessity because x86 doesn't easily allow writing shadow stack.

 - A token at the bottom acts marker / barrier and can be useful in debugging

 - If (and a big *if*) we ever reach a point in future where return address is only pushed
   on shadow stack (x86 should have motivation to do this because less uops on call/ret),
   a token at the bottom (bottom means lower address) is ensuring sure shot way of getting
   a fault when exhausted.

Current RISCV zisslpcfi proposal doesn't define CPU based tokens because it's RISC.
It allows mechanisms using which software can define formatting of token for itself.
Not sure of what ARM is doing.

Now coming to the point of all zero v/s shadow stack token.
Why not always have token at the bottom?

In case of x86, Why need for two ways and why not always have a token at the bottom.
The way x86 is going, user mode is responsible for establishing shadow stack and thus
whenever shadow stack is created then if x86 kernel implementation always place a token
at the base/bottom.

Now user mode can do following:--
 - If it has access to WRSS, it can sure go ahead and create a token of its choosing and
   overwrite kernel created token. and then do RSTORSSP on it's own created token.

 - If it doesn't have access to WRSS (and dont need to create its own token), it can do
   RSTORSSP on this. As soon as it does, no other thread in process can restore to it.
   On `fork`, you get the same un-restorable token.

So why not always have a token at the bottom.
This is my plan for riscv implementation as well (to have a token at the bottom)

The other thing is: should shadow stack memory creation be tightly
controlled? For example in x86 we limit this to anonymous memory, etc.
Some reasons for this are x86 specific, but some are not. So if we
disallow most of the options why allow the interface to take them? And
then you are in the position of carefully maintaining a list of not-
allowed options instead letting a list of allowed options sit there.

I am new to linux kernel and thus may be not able to follow the argument of
limiting to anonymous memory.

Why is limiting it to anonymous memory a problem. IIRC, ARM's PROT_MTE is applicable
only to anonymous memory. I can probably find few more examples. 

Eventually syscall will also go ahead and use memory management code to
perform mapping. So I didn't understand the reasoning here. The way syscall
can limit it to anonymous memory, why mmap can't do the same if it sees
PROT_SHADOWSTACK.

The only benefit I've heard is that it saves creating a new syscall,
but it also saves several MAP_ flags. That, and that the RFC for riscv
did a PROT_SHADOW_STACK to start. So, yes, two people asked the same
question, but I'm still not seeing any benefits. Can you give the pros
and cons please?

Again the way syscall will limit it to anonymous memory, Why mmap can't do same?
There is precedence for it (like PROT_MTE is applicable only to anonymous memory)

So if it can be done, then why introduce a new syscall?

BTW, in glibc map_shadow_stack is called from arch code. So I think
userspace wise, for this to affect other architectures there would need
to be some code that could do things generically, with somehow the
shadow stack pivot abstracted but the shadow stack allocation not.

Agreed, yes it can be done in a way where it won't put tax on other architectures.

But what about fragmentation within x86. Will x86 always choose to use system call
method map shadow stack. If future re-factor results in x86 also use `mmap` method.
Isn't it a mess for x86 glibc to figure out what to do; whether to use system call
or `mmap`?