On Mon, Jun 22, 2020 at 04:07:11PM -0700, Kees Cook wrote: > On Mon, Jun 22, 2020 at 06:56:15PM -0400, Arvind Sankar wrote: > > On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote: > > > + > > > +#define add_random_kstack_offset() do { \ > > > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > > > + &randomize_kstack_offset)) { \ > > > + u32 offset = this_cpu_read(kstack_offset); \ > > > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > > > + asm volatile("" : "=m"(*ptr)); \ > > > + } \ > > > +} while (0) > > > > This feels a little fragile. ptr doesn't escape the block, so the > > compiler is free to restore the stack immediately after this block. In > > fact, given that all you've said is that the asm modifies *ptr, but > > nothing uses that output, the compiler could eliminate the whole thing, > > no? > > > > https://godbolt.org/z/HT43F5 > > > > gcc restores the stack immediately, if no function calls come after it. > > > > clang completely eliminates the code if no function calls come after. > > nothing uses the stack in your example. And adding a barrier (which is > what the "=m" is, doesn't change it. Yeah, I realized that that was what's going on. And clang isn't actually DCE'ing it, it's taking advantage of the red zone since my alloca was small enough. But I still don't see anything _stopping_ the compiler from optimizing this better in the future. The "=m" is not a barrier: it just informs the compiler that the asm produces an output value in *ptr (and no other outputs). If nothing can consume that output, it doesn't stop the compiler from freeing the allocation immediately after the asm instead of at the end of the function. I'm talking about something like asm volatile("" : : "r" (ptr) : "memory"); which tells the compiler that the asm may change memory arbitrarily. Here, we don't use it really as a barrier, but to tell the compiler that the asm may have stashed the value of ptr somewhere in memory, so it's not free to reuse the space that it pointed to until the function returns (unless it can prove that nothing accesses memory, not just that nothing accesses ptr).