On Thu, Sep 29, 2022 at 03:29:07PM -0700, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> > > There is essentially no room left in the x86 hardware PTEs on some OSes > (not Linux). That left the hardware architects looking for a way to > represent a new memory type (shadow stack) within the existing bits. > They chose to repurpose a lightly-used state: Write=0,Dirty=1. > > The reason it's lightly used is that Dirty=1 is normally set _before_ a > write. A write with a Write=0 PTE would typically only generate a fault, > not set Dirty=1. Hardware can (rarely) both set Write=1 *and* generate the s/Write/Dirty/ > fault, resulting in a Dirty=0,Write=1 PTE. Hardware which supports shadow s/Dirty=0,Write=1/Write=0,Dirty=1/ > stacks will no longer exhibit this oddity. > > The kernel should avoid inadvertently creating shadow stack memory because > it is security sensitive. So given the above, all it needs to do is avoid > manually crating Write=0,Dirty=1 PTEs in software. Whichever way around you choose, please be consistent. > In places where Linux normally creates Write=0,Dirty=1, it can use the > software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other > words, whenever Linux needs to create Write=0,Dirty=1, it instead creates > Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. This > clearly separates shadow stack from other data, and results in the > following: > > (a) (Write=0,Cow=1,Dirty=0) A modified, copy-on-write (COW) page. > Previously when a typical anonymous writable mapping was made COW via > fork(), the kernel would mark it Write=0,Dirty=1. Now it will instead > use the Cow bit. > (b) (Write=0,Cow=1,Dirty=0) A R/O page that has been COW'ed. The user page > is in a R/O VMA, and get_user_pages() needs a writable copy. The page > fault handler creates a copy of the page and sets the new copy's PTE > as Write=0 and Cow=1. > (c) (Write=0,Cow=0,Dirty=1) A shadow stack PTE. > (d) (Write=0,Cow=1,Dirty=0) A shared shadow stack PTE. When a shadow stack > page is being shared among processes (this happens at fork()), its PTE > is made Dirty=0, so the next shadow stack access causes a fault, and > the page is duplicated and Dirty=1 is set again. This is the COW > equivalent for shadow stack pages, even though it's copy-on-access > rather than copy-on-write. > (e) (Write=0,Cow=0,Dirty=1) A Cow PTE created when a processor without > shadow stack support set Dirty=1. Please restureture this (and the comment) something like: (Write=0,Dirty=0,Cow=1): - copy_present_pte(): A modified copy-on-write page. - ... (Write=0,Dirty=1,Cow=0): - FEATURE_CET: Shadow Stack entry - !FEATURE_CET: see the above Cow=1 cases