On Wed, 2018-07-18 at 17:06 -0700, Dave Hansen wrote: > > > > > > > > > > > > > -static inline bool can_follow_write_pte(pte_t pte, unsigned > > > > int flags) > > > > +static inline bool can_follow_write(pte_t pte, unsigned int > > > > flags, > > > > + struct vm_area_struct > > > > *vma) > > > > { > > > > - return pte_write(pte) || > > > > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) > > > > && pte_dirty(pte)); > > > > + if (!is_shstk_mapping(vma->vm_flags)) { > > > > + if (pte_write(pte)) > > > > + return true; > > > Let me see if I can say this another way. > > > > > > The bigger issue is that these patches change the semantics of > > > pte_write(). Before these patches, it meant that you *MUST* > > > have this > > > bit set to write to the page controlled by the PTE. Now, it > > > means: you > > > can write if this bit is set *OR* the shadowstack bit > > > combination is set. > > Here, we only figure out (1) if the page is pointed by a writable > > PTE; or > > (2) if the page is pointed by a RO PTE (data or SHSTK) and it has > > been > > copied and it still exists. We are not trying to > > determine if the > > SHSTK PTE is writable (we know it is not). > Please think about the big picture. I'm not just talking about this > patch, but about every use of pte_write() in the kernel. > > > > > > > > > That's the fundamental problem. We need some code in the kernel > > > that > > > logically represents the concept of "is this PTE a shadowstack > > > PTE or a > > > PTE with the write bit set", and we will call that pte_write(), > > > or maybe > > > pte_writable(). > > > > > > You *have* to somehow rectify this situation. We can absolutely > > > no > > > leave pte_write() in its current, ambiguous state where it has > > > no real > > > meaning or where it is used to mean _both_ things depending on > > > context. > > True, the processor can always write to a page through a shadow > > stack > > PTE, but it must do that with a CALL instruction. Can we define > > a > > write operation as: MOV r1, *(r2). Then we don't have any doubt > > on > > pte_write() any more. > No, we can't just move the target. :) > > You can define it this way, but then you also need to go to every > spot > in the kernel that calls pte_write() (and _PAGE_RW in fact) and > audit it > to ensure it means "mov ..." and not push. Which pte_write() do you think is right? bool is_shstk_pte(pte) { return (_PAGE_RW not set) && (_PAGE_DIRTY_HW set); } int pte_write_1(pte) { return (_PAGE_RW set) && !is_shstk_pte(pte); } int pte_write_2(pte) { return (_PAGE_RW set) || is_shstk_pte(pte); } Yu-cheng