On Tue, Mar 16, 2021 at 08:10:35AM -0700, Yu-cheng Yu wrote: > There is essentially no room left in the x86 hardware PTEs on some OSes > (not Linux). That left the hardware architects looking for a way to > represent a new memory type (shadow stack) within the existing bits. > They chose to repurpose a lightly-used state: Write=0, Dirty=1. > > The reason it's lightly used is that Dirty=1 is normally set by hardware > and cannot normally be set by hardware on a Write=0 PTE. Software must > normally be involved to create one of these PTEs, so software can simply > opt to not create them. > > In places where Linux normally creates Write=0, Dirty=1, it can use the > software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other > words, whenever Linux needs to create Write=0, Dirty=1, it instead creates > Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This > clearly separates shadow stack from other data, and results in the > following: > > (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) > (b) A R/O page that has been COW'ed: (Write=0, Cow=1) > The user page is in a R/O VMA, and get_user_pages() needs a writable > copy. The page fault handler creates a copy of the page and sets > the new copy's PTE as Write=0 and Cow=1. > (c) A shadow stack PTE: (Write=0, Dirty=1) > (d) A shared shadow stack PTE: (Write=0, Cow=1) > When a shadow stack page is being shared among processes (this happens > at fork()), its PTE is made Dirty=0, so the next shadow stack access > causes a fault, and the page is duplicated and Dirty=1 is set again. > This is the COW equivalent for shadow stack pages, even though it's > copy-on-access rather than copy-on-write. > (e) A page where the processor observed a Write=1 PTE, started a write, set > Dirty=1, but then observed a Write=0 PTE. That's possible today, but > will not happen on processors that support shadow stack. > > Define _PAGE_COW and update pte_*() helpers and apply the same changes to > pmd and pud. > > After this, there are six free bits left in the 64-bit PTE, and no more > free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not > implemented for the 32-bit kernel. > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> -- Kirill A. Shutemov