On Fri, 2018-08-31 at 18:29 +0200, Peter Zijlstra wrote: > On Fri, Aug 31, 2018 at 08:58:39AM -0700, Dave Hansen wrote: > > > > On 08/31/2018 08:48 AM, Yu-cheng Yu wrote: > > > > > > To trigger a race in ptep_set_wrprotect(), we need to fork from one of > > > three pthread siblings. > > > > > > Or do we measure only how much this affects fork? > > > If there is no racing, the effect should be minimal. > > We don't need a race. > > > > I think the cmpxchg will be slower, even without a race, than the code > > that was there before. The cmpxchg is a simple, straightforward > > solution, but we're putting it in place of a plain memory write, which > > is suboptimal. > Note quite, the clear_bit() is LOCK prefixed. With the updated ptep_set_wrprotect() below, I did MADV_WILLNEED to a shadow stack of 8 MB, then 10,000 fork()'s, but could not prove it is more or less efficient than the other. So can we say this is probably fine in terms of efficiency? Yu-cheng --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1203,7 +1203,36 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER + pte_t new_pte, pte = READ_ONCE(*ptep); + + /* + * Some processors can start a write, but end up + * seeing a read-only PTE by the time they get + * to the Dirty bit. In this case, they will + * set the Dirty bit, leaving a read-only, Dirty + * PTE which looks like a Shadow Stack PTE. + * + * However, this behavior has been improved and + * will not occur on processors supporting + * Shadow Stacks. Without this guarantee, a + * transition to a non-present PTE and flush the + * TLB would be needed. + * + * When changing a writable PTE to read-only and + * if the PTE has _PAGE_DIRTY_HW set, we move + * that bit to _PAGE_DIRTY_SW so that the PTE is + * not a valid Shadow Stack PTE. + */ + do { + new_pte = pte_wrprotect(pte); + new_pte.pte |= (new_pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; + new_pte.pte &= ~_PAGE_DIRTY_HW; + } while (!try_cmpxchg(ptep, &pte, new_pte)); +#else clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); +#endif }