On Mon, 2022-10-03 at 10:18 -0700, Kees Cook wrote: > On Thu, Sep 29, 2022 at 03:28:58PM -0700, Rick Edgecombe wrote: > > [...] > > +Overview > > +======== > > + > > +Control-flow Enforcement Technology (CET) is term referring to > > several > > +related x86 processor features that provides protection against > > control > > +flow hijacking attacks. The HW feature itself can be set up to > > protect > > +both applications and the kernel. Only user-mode protection is > > implemented > > +in the 64-bit kernel. > > This likely needs rewording, since it's not strictly true any more: > IBT is supported in kernel-mode now (CONFIG_X86_IBT). Yep, thanks. > > > +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow > > stack is > > +a secondary stack allocated from memory and cannot be directly > > modified by > > +applications. When executing a CALL instruction, the processor > > pushes the > > +return address to both the normal stack and the shadow stack. Upon > > +function return, the processor pops the shadow stack copy and > > compares it > > +to the normal stack copy. If the two differ, the processor raises > > a > > +control-protection fault. Indirect branch tracking verifies > > indirect > > +CALL/JMP targets are intended as marked by the compiler with > > 'ENDBR' > > +opcodes. Not all CPU's have both Shadow Stack and Indirect Branch > > Tracking > > +and only Shadow Stack is currently supported in the kernel. > > + > > +The Kconfig options is X86_SHADOW_STACK, and it can be disabled > > with > > +the kernel parameter clearcpuid, like this: "clearcpuid=shstk". > > + > > +To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or LLVM > > v10.0.1 > > +or later are required. To build a CET-enabled application, GLIBC > > v2.28 or > > +later is also required. > > + > > +At run time, /proc/cpuinfo shows CET features if the processor > > supports > > +CET. > > Maybe call them out by name: shstk ibt Ok. > > > +CET arch_prctl()'s > > +================== > > + > > +Elf features should be enabled by the loader using the below > > arch_prctl's. > > + > > +arch_prctl(ARCH_CET_ENABLE, unsigned int feature) > > + Enable a single feature specified in 'feature'. Can only > > operate on > > + one feature at a time. > > Does this mean only 1 bit out of the 32 may be specified? Yes, exactly. > > > + > > +arch_prctl(ARCH_CET_DISABLE, unsigned int feature) > > + Disable features specified in 'feature'. Can only operate on > > + one feature at a time. > > + > > +arch_prctl(ARCH_CET_LOCK, unsigned int features) > > + Lock in features at their current enabled or disabled status. > > How is the "features" argument processed here? Yes, this should have more info. The kernel keeps a mask of features that are "locked". The mask is ORed with the existing value. So any bits set here cannot be enabled or disabled afterwards. Bit's unset in the mask passed are ignored. > > > [...] > > +Proc status > > +=========== > > +To check if an application is actually running with shadow stack, > > the > > +user can read the /proc/$PID/arch_status. It will report "wrss" or > > +"shstk" depending on what is enabled. > > TIL about "arch_status". :) Why is this a separate file? "status" is > already has unique field names. It looks like "status" only has arch-agnostic feature status today. Maybe that's the reason? CET seems to fit there though. > > > +Fork > > +---- > > + > > +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are > > required > > +to be read-only and dirty. When a shadow stack PTE is not RO and > > dirty, a > > +shadow access triggers a page fault with the shadow stack access > > bit set > > +in the page fault error code. > > + > > +When a task forks a child, its shadow stack PTEs are copied and > > both the > > +parent's and the child's shadow stack PTEs are cleared of the > > dirty bit. > > +Upon the next shadow stack access, the resulting shadow stack page > > fault > > +is handled by page copy/re-use. > > + > > +When a pthread child is created, the kernel allocates a new shadow > > stack > > +for the new thread. > > Perhaps speak to the ASLR characteristics of the shstk here? It behaves just like mmap(). I can add some info. > > Also, it seems if there is a "Fork" section, there should be an > "Exec" > section? I suspect it would be short: shstk is disabled when execve() > is > called and must be re-enabled from userspace, yes? Sure, I can add some info.