On Thu, Apr 23, 2020 at 11:28:40AM -0700, Kees Cook wrote: > On Wed, Apr 22, 2020 at 04:51:34PM -0700, Sami Tolvanen wrote: > > On Wed, Apr 22, 2020 at 06:39:47PM +0100, Will Deacon wrote: > > > On Mon, Apr 20, 2020 at 02:18:30PM -0700, Sami Tolvanen wrote: > > > > On Mon, Apr 20, 2020 at 06:17:28PM +0100, Will Deacon wrote: > > > > > > + * The shadow call stack is aligned to SCS_SIZE, and grows > > > > > > + * upwards, so we can mask out the low bits to extract the base > > > > > > + * when the task is not running. > > > > > > + */ > > > > > > + return (void *)((unsigned long)task_scs(tsk) & ~(SCS_SIZE - 1)); > > > > > > > > > > Could we avoid forcing this alignment it we stored the SCS pointer as a > > > > > (base,offset) pair instead? That might be friendlier on the allocations > > > > > later on. > > > > > > > > The idea is to avoid storing the current task's shadow stack address in > > > > memory, which is why I would rather not store the base address either. > > > > > > What I mean is that, instead of storing the current shadow stack pointer, > > > we instead store a base and an offset. We can still clear the base, as you > > > do with the pointer today, and I don't see that the offset is useful to > > > an attacker on its own. > > > > I see what you mean. However, even if we store the base address + > > the offset, we still need aligned allocation if we want to clear > > the address. This would basically just move __scs_base() logic to > > cpu_switch_to() / scs_save(). > > Okay, so, I feel like this has gotten off into the weeds, or I'm really > dense (or both). :) Going back to the original comment: > > > > > > Could we avoid forcing this alignment it we stored the SCS > > > > > pointer as a (base,offset) pair instead? That might be friendlier > > > > > on the allocations later on. > > I think there was some confusion about mixing the "we want to be able to > wipe the value" combined with the masking in __scs_base(). These are > unrelated, as was correctly observed with "We can still clear the base". Having just tried to implement this, it turns out they *are* related and we can't still clear the base, I was wrong about that :( See below. > What I don't understand here is the suggestion to store two values: > > Why is two better than storing one? With one, we only need a single access. > > Why would storing the base be "friendlier on the allocations later on"? > This is coming out of a single kmem cache, in 1K chunks. They will be > naturally aligned to 1K (unless redzoing has been turned on for some > slab debugging reason). The base masking is a way to avoid needing to > store two values, and only happens at task death. Fair enough about the kmem_cache, although I'm still worried about these things getting bigger in future and the alignment having to increase at the same time. We also have a bunch of static/percpu allocations that don't use this cache. Also, since you mentioned the lack of redzoning, isn't it a bit dodgy allocating blindly out of the kmem_cache? It means we don't have a redzone or a guard page, so if you can trigger something like a recursion bug then could you scribble past the SCS before the main stack overflows? Would this clobber somebody else's SCS? The vmap version that I asked Sami to drop is at least better in this regard, although the guard page is at the wrong end of the stack and we just hope that the allocation below us didn't pass VM_NO_GUARD. Looks like the same story for vmap stack :/ > Storing two values eats memory for all tasks for seemingly no meaningful > common benefit. What am I missing here? I would like to remove the alignment requirements for the static and percpu allocations. AFAICT, the only reason the alignment is needed is because you want to convert an SCS pointer into the base pointer. The only reason *that* is needed is because of the questionable wiping of the pointer in the thread_info, but I really don't see the benefit of this. Unlike a crypto secret (which was your analogy), the SCS pointer is stored in memory in at least the following situations: * The task isn't running * The task is running in userspace * The task is running a vCPU in KVM * We're calling into EFI * On exception entry from EL1, as part of stacking x18 * During CPU suspend If we split the pointer in two (base, offset) then we could leave the base live in the thread_info, not require alignment of the stacks (which may allow for unconditional redzoning?) and then just update the offset value on context switch, which could be trivially checked as part of the existing stack overflow checking on kernel entry. The base and offset can live in the same cacheline and be loaded with ldp, so I don't see there being an access cost compared to a single variable. Am I missing something (modulo us not agreeing on the utility of wiping the pointer)? Will