Re: [PATCH RFT v4 0/5] fork: Support shadow stacks in clone3()

Catalin Marinas <catalin.marinas@xxxxxxx> · Thu, 30 Nov 2023 19:00:58 +0000

Hi Mark,

Thanks for putting this together and sorry it took me some time to catch
up (well, still not fully, so rather more questions below).

On Tue, Nov 28, 2023 at 06:22:38PM +0000, Mark Brown wrote:
> Since clone3() is readily extensible let's add support for specifying a
> shadow stack when creating a new thread or process in a similar manner
> to how the normal stack is specified, keeping the current implicit
> allocation behaviour if one is not specified either with clone3() or
> through the use of clone().  Unlike normal stacks only the shadow stack
> size is specified, similar issues to those that lead to the creation of
> map_shadow_stack() apply.

My hope when looking at the arm64 patches was that we can completely
avoid the kernel allocation/deallocation of the shadow stack since it
doesn't need to do this for the normal stack either. Could someone
please summarise why we dropped the shadow stack pointer after v1? IIUC
there was a potential security argument but I don't think it was a very
strong one. Also what's the threat model for this feature? I thought
it's mainly mitigating stack corruption. If some rogue code can do
syscalls, we have bigger problems than clone3() taking a shadow stack
pointer.

My (probably wrong) mental model was that libc can do an mmap() for
normal stack, a map_shadow_stack() for the shadow one and invoke
clone3() with both these pointers and sizes. There is an overhead of an
additional syscall but if some high-performance app needs to spawn
threads quickly, it would most likely do some pooling.

I'm not against clone3() getting a shadow_stack_size argument but asking
some more questions. If we won't pass a pointer as well, is there any
advantage in expanding this syscall vs a specific prctl() option? Do we
need a different size per thread or do all threads have the same shadow
stack size? A new RLIMIT doesn't seem to map well though, it is more
like an upper limit rather than a fixed/default size (glibc I think uses
it for thread stacks but bionic or musl don't AFAIK).

Another dumb question on arm64 - is GCSPR_EL0 writeable by the user? If
yes, can the libc wrapper for threads allocate a shadow stack via
map_shadow_stack() and set it up in the thread initialisation handler
before invoking the thread function?

Thanks.

-- 
Catalin