On Mon, Jun 22, 2020 at 10:07:37PM +0200, Jann Horn wrote: > On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > This provides the ability for architectures to enable kernel stack base > > address offset randomization. This feature is controlled by the boot > > param "randomize_kstack_offset=on/off", with its default value set by > > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. > [...] > > +#define add_random_kstack_offset() do { \ > > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > > + &randomize_kstack_offset)) { \ > > + u32 offset = this_cpu_read(kstack_offset); \ > > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > > + asm volatile("" : "=m"(*ptr)); \ > > + } \ > > +} while (0) > > clang generates better code here if the mask is stack-aligned - > otherwise it needs to round the stack pointer / the offset: Interesting. I was hoping to avoid needing to know the architecture stack alignment (leaving it up to the compiler). > > $ cat alloca_align.c > #include <alloca.h> > void callee(void); > > void alloca_blah(unsigned long rand) { > asm volatile(""::"r"(alloca(rand & MASK))); > callee(); > } > $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff > $ objdump -d alloca_align.o > [...] > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 81 e7 ff 03 00 00 and $0x3ff,%edi > a: 83 c7 0f add $0xf,%edi > d: 83 e7 f0 and $0xfffffff0,%edi > 10: 48 89 e0 mov %rsp,%rax > 13: 48 29 f8 sub %rdi,%rax > 16: 48 89 c4 mov %rax,%rsp > 19: e8 00 00 00 00 callq 1e <alloca_blah+0x1e> > 1e: 48 89 ec mov %rbp,%rsp > 21: 5d pop %rbp > 22: c3 retq > $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0 > $ objdump -d alloca_align.o > [...] > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 48 89 e0 mov %rsp,%rax > 7: 81 e7 f0 03 00 00 and $0x3f0,%edi > d: 48 29 f8 sub %rdi,%rax > 10: 48 89 c4 mov %rax,%rsp > 13: e8 00 00 00 00 callq 18 <alloca_blah+0x18> > 18: 48 89 ec mov %rbp,%rsp > 1b: 5d pop %rbp > 1c: c3 retq > $ > > (From a glance at the assembly, gcc seems to always assume that the > length may be misaligned.) Right -- this is why I didn't bother with it, since it didn't seem to notice what I'd already done to the alloca() argument. (But from what I could measure on cycle counts, the additional ALU didn't seem to really make much difference ... it _would_ be nice to avoid it, of course.) > Maybe this should be something along the lines of > __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with > appropriate definitions of the stack alignment mask depending on the > architecture's choice of stack alignment for kernel code). Is that explicitly selected anywhere in the kernel? I thought the alignment was left up to the compiler (as in I've seen bugs fixed where the kernel had to deal with the alignment choices the compiler was making...) -- Kees Cook