On 10/3/22 14:36, Edgecombe, Rick P wrote: >>> +static inline pte_t pte_clear_cow(pte_t pte) >>> +{ >>> + /* >>> + * _PAGE_COW is unnecessary on !X86_FEATURE_SHSTK kernels. >>> + * See the _PAGE_COW definition for more details. >>> + */ >>> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) >>> + return pte; >>> + >>> + /* >>> + * PTE is getting copied-on-write, so it will be dirtied >>> + * if writable, or made shadow stack if shadow stack and >>> + * being copied on access. Set they dirty bit for both >>> + * cases. >>> + */ >>> + pte = pte_set_flags(pte, _PAGE_DIRTY); >>> + return pte_clear_flags(pte, _PAGE_COW); >>> +} >> These X86_FEATURE_SHSTK checks make me uneasy. Maybe use the >> _PAGE_COW >> logic for all machines with 64-bit entries. It will get you much more >> coverage and more universal rules. > Yes, I didn't like them either at first. The reasoning originally was > that _PAGE_COW is a bit more work and it might show up for some > benchmark. > > Looking at this again though, it is just a few more operations on > memory that is already getting touched either way. It must be a very > tiny amount of impact if any. I'm fine removing them. Having just one > set of logic around this would make it easier to reason about. > > Dave, any thoughts on this? The cpu_feature_enabled(X86_FEATURE_SHSTK) checks enable both compile-time and runtime optimization. What makes this even more fun is: +#ifdef CONFIG_X86_SHADOW_STACK +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif which I think means that the pte_clear_flags() goes away if CONFIG_X86_SHADOW_STACK is disabled. So, what Rick posted here ends up doing the following with: | X86_FEATURE_SHSTK=1 | X86_FEATURE_SHSTK=0 ==========+=====================+======================== CONFIG=n | compiled out | compiled out CONFIG=y | set/clear | boot-time patched out If we pull the cpu_feature_enabled() out, I think we end up getting behavior like this: | X86_FEATURE_SHSTK=1 | X86_FEATURE_SHSTK=0 ==========+=====================+======================== CONFIG=n | set _PAGE_DIRTY | set _PAGE_DIRTY CONFIG=y | set/clear | set/clear It ends up adding instruction overhead (set _PAGE_DIRTY) to two cases where it completely compiled out before. It also adds runtime overhead (the "tiny amount of impact" you mentioned) to set/clear where it would have runtime patched out before. None of this is a deal breaker in terms of runtime overhead. But, I do think the benefits of the cpu_feature_enabled() are worth it, even if it's just an optimization. You could move it to the end of the series and we can debate it on its own merits if you want.