On 06/07/2018 09:24 AM, Andy Lutomirski wrote: >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma, >> + unsigned long addr, pte_t *ptep) >> +{ >> + bool rw; >> + >> + rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); >> + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) { >> + struct mm_struct *mm = vma->vm_mm; >> + pte_t pte; >> + >> + if (rw && (atomic_read(&mm->mm_users) > 1)) >> + pte = ptep_clear_flush(vma, addr, ptep); > Why are you clearing the pte? I found my notes on the subject. :) Here's the sequence that causes the problem. This could happen any time we try to take a PTE from read-write to read-only. P==Present, W=Write, D=Dirty: CPU0 does a write, sees PTE with P=1,W=1,D=0 CPU0 decides to set D=1 CPU1 comes in and sets W=0 CPU0 does locked operation to set D=1 CPU0 sees P=1,W=0,D=0 CPU0 sets back P=1,W=0,D=1 CPU0 loads P=1,W=0,D=1 into the TLB CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF is generated because of the write fault. The problem with this is that we end up with a shadowstack-PTE (Write=0,Dirty=1) where we didn't want one. This, unfortunately, imposes extra TLB flushing overhead on the R/W->R/O transitions that does not exist before shadowstack enabling. Yu-cheng, could you please add this to the patch description?