On 2/10/2021 11:42 AM, Kees Cook wrote:
On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote:
There is essentially no room left in the x86 hardware PTEs on some OSes
(not Linux). That left the hardware architects looking for a way to
represent a new memory type (shadow stack) within the existing bits.
They chose to repurpose a lightly-used state: Write=0, Dirty=1.
The reason it's lightly used is that Dirty=1 is normally set by hardware
and cannot normally be set by hardware on a Write=0 PTE. Software must
normally be involved to create one of these PTEs, so software can simply
opt to not create them.
In places where Linux normally creates Write=0, Dirty=1, it can use the
software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other
words, whenever Linux needs to create Write=0, Dirty=1, it instead creates
Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This
clearly separates shadow stack from other data, and results in the
following:
(a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
(b) A R/O page that has been COW'ed: (Write=0, Cow=1)
The user page is in a R/O VMA, and get_user_pages() needs a writable
copy. The page fault handler creates a copy of the page and sets
the new copy's PTE as Write=0 and Cow=1.
(c) A shadow stack PTE: (Write=0, Dirty=1)
(d) A shared shadow stack PTE: (Write=0, Cow=1)
When a shadow stack page is being shared among processes (this happens
at fork()), its PTE is made Dirty=0, so the next shadow stack access
causes a fault, and the page is duplicated and Dirty=1 is set again.
This is the COW equivalent for shadow stack pages, even though it's
copy-on-access rather than copy-on-write.
(e) A page where the processor observed a Write=1 PTE, started a write, set
Dirty=1, but then observed a Write=0 PTE. That's possible today, but
will not happen on processors that support shadow stack.
Define _PAGE_COW and update pte_*() helpers and apply the same changes to
pmd and pud.
I still find this commit confusing mostly due to _PAGE_COW being 0
without CET enabled. Shouldn't this just get changed universally? Why
should this change depend on CET?
For example, in...
static inline int pte_write(pte_t pte)
{
if (cpu_feature_enabled(X86_FEATURE_SHSTK))
return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY);
else
return pte_flags(pte) & _PAGE_RW;
}
There are four cases:
(a) RW=1, Dirty=1 -> writable
(b) RW=1, Dirty=0 -> writable
(c) RW=0, Dirty=0 -> not writable
(d) RW=0, Dirty=1 -> shadow stack, or not-writable if !X86_FEATURE_SHSTK
Case (d) is ture only when shadow stack is enabled, otherwise it is not
writable. With shadow stack feature, the usual dirty, copy-on-write PTE
becomes RW=0, Cow=1.
We can get this changed universally, but all usual dirty, copy-on-write
PTEs need the Dirty/Cow swapping, always. Is that desirable?
--
Yu-cheng
[...]