On Mon, 2023-01-23 at 10:50 +0100, David Hildenbrand wrote: > On 19.01.23 22:22, Rick Edgecombe wrote: > > The x86 Control-flow Enforcement Technology (CET) feature includes > > a new > > type of memory called shadow stack. This shadow stack memory has > > some > > unusual properties, which requires some core mm changes to function > > properly. > > > > Since shadow stack memory can be changed from userspace, is both > > VM_SHADOW_STACK and VM_WRITE. But it should not be made > > conventionally > > writable (i.e. pte_mkwrite()). So some code that calls > > pte_mkwrite() needs > > to be adjusted. > > > > One such case is when memory is made writable without an actual > > write > > fault. This happens in some mprotect operations, and also prot_numa > > faults. > > In both cases code checks whether it should be made > > (conventionally) > > writable by calling vma_wants_manual_pte_write_upgrade(). > > > > One way to fix this would be have code actually check if memory is > > also > > VM_SHADOW_STACK and in that case call pte_mkwrite_shstk(). But > > since > > most memory won't be shadow stack, just have simpler logic and skip > > this > > optimization by changing vma_wants_manual_pte_write_upgrade() to > > not > > return true for VM_SHADOW_STACK_MEMORY. This will simply handle all > > cases of this type. > > > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > Tested-by: Pengfei Xu <pengfei.xu@xxxxxxxxx> > > Tested-by: John Allen <john.allen@xxxxxxx> > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > --- > > Instead of having these x86-shadow stack details all over the MM > space, > was the option explored to handle this more in arch specific code? > > IIUC, one way to get it working would be > > 1) Have a SW "shadowstack" PTE flag. > 2) Have an "SW-dirty" PTE flag, to store "dirty=1" when "write=0". I don't think that idea came up. So vma->vm_page_prot would have the SW shadow stack flag for VM_SHADOW_STACK, and pte_mkwrite() could do Write=0,Dirty=1 part. It seems like it should work. > > pte_mkwrite(), pte_write(), pte_dirty ... can then make decisions > based > on the "shadowstack" PTE flag and hide all these details from core- > mm. > > When mapping a shadowstack page (new page, migration, swapin, ...), > which can be obtained by looking at the VMA flags, the first thing > you'd > do is set the "shadowstack" PTE flag. I guess the downside is that it uses an extra software bit. But the other positive is that it's less error prone, so that someone writing core-mm code won't introduce a change that makes shadow stack VMAs Write=1 if they don't know to also check for VM_SHADOW_STACK.