On Fri, 2022-10-14 at 11:41 +0200, Peter Zijlstra wrote: > On Thu, Sep 29, 2022 at 03:29:07PM -0700, Rick Edgecombe wrote: > > From: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> > > > > There is essentially no room left in the x86 hardware PTEs on some > > OSes > > (not Linux). That left the hardware architects looking for a way to > > represent a new memory type (shadow stack) within the existing > > bits. > > They chose to repurpose a lightly-used state: Write=0,Dirty=1. > > > > The reason it's lightly used is that Dirty=1 is normally set > > _before_ a > > write. A write with a Write=0 PTE would typically only generate a > > fault, > > not set Dirty=1. Hardware can (rarely) both set Write=1 *and* > > generate the > > s/Write/Dirty/ Oops, yes. > > > fault, resulting in a Dirty=0,Write=1 PTE. Hardware which supports > > shadow > > s/Dirty=0,Write=1/Write=0,Dirty=1/ Ok, I'll scrub the series for the order. > > > stacks will no longer exhibit this oddity. > > > > The kernel should avoid inadvertently creating shadow stack memory > > because > > it is security sensitive. So given the above, all it needs to do is > > avoid > > manually crating Write=0,Dirty=1 PTEs in software. > > Whichever way around you choose, please be consistent. > > > In places where Linux normally creates Write=0,Dirty=1, it can use > > the > > software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In > > other > > words, whenever Linux needs to create Write=0,Dirty=1, it instead > > creates > > Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. > > This > > clearly separates shadow stack from other data, and results in the > > following: > > > > (a) (Write=0,Cow=1,Dirty=0) A modified, copy-on-write (COW) page. > > Previously when a typical anonymous writable mapping was made > > COW via > > fork(), the kernel would mark it Write=0,Dirty=1. Now it will > > instead > > use the Cow bit. > > (b) (Write=0,Cow=1,Dirty=0) A R/O page that has been COW'ed. The > > user page > > is in a R/O VMA, and get_user_pages() needs a writable copy. > > The page > > fault handler creates a copy of the page and sets the new > > copy's PTE > > as Write=0 and Cow=1. > > (c) (Write=0,Cow=0,Dirty=1) A shadow stack PTE. > > (d) (Write=0,Cow=1,Dirty=0) A shared shadow stack PTE. When a > > shadow stack > > page is being shared among processes (this happens at fork()), > > its PTE > > is made Dirty=0, so the next shadow stack access causes a > > fault, and > > the page is duplicated and Dirty=1 is set again. This is the > > COW > > equivalent for shadow stack pages, even though it's copy-on- > > access > > rather than copy-on-write. > > (e) (Write=0,Cow=0,Dirty=1) A Cow PTE created when a processor > > without > > shadow stack support set Dirty=1. > > Please restureture this (and the comment) something like: > > > (Write=0,Dirty=0,Cow=1): > > - copy_present_pte(): A modified copy-on-write page. > - ... > > > (Write=0,Dirty=1,Cow=0): > > - FEATURE_CET: Shadow Stack entry > - !FEATURE_CET: see the above Cow=1 cases Yes, I incorporated feedback from your earlier comment. Sorry for bad communication.