On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> wrote:
On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <yu-cheng.yu@xxxxxxxxx> wrote:
+
+ cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+ if (!cet) {
+ /*
+ * This is an unlikely case where the task is
+ * CET-enabled, but CET xstate is in INIT.
+ */
+ WARN_ONCE(1, "CET is enabled, but no xstates");
"unlikely" doesn't really cover this.
+ fpregs_unlock();
+ goto sigsegv;
+ }
+
+ if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
+ cet->user_ssp += 8;
This looks buggy. The condition should be "if SHSTK is on, then add 8
to user_ssp". If the result is noncanonical, then some appropriate
exception should be generated, probably by the FPU restore code -- see
below. You should be checking the SHSTK_EN bit, not SSP.
The code now checks if shadow stack is on (yes, it should check SHSTK_EN
bit, I will fix it.), then adds 8 to user_ssp. If the result is
canonical, then it sets the corresponding xstate.
If the resulting address is not canonical, the kernel does not know what
the address should be either. I think the best action to take is doing
nothing about the shadow stack pointer, and let the application return
and get a control protection fault. The application should have not got
into such situation in the first place; if it does, it should fault.
Also, can you point me to where any of these canonicality rules are
documented in the SDM? I looked and I can't find them.
The SDM is not very explicit. It should have been.
This reminds me: this code in extable.c needs to change.
__visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr)
{
regs->ip = ex_fixup_addr(fixup);
WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
FPU registers.",
(void *)instruction_pointer(regs));
__copy_kernel_to_fpregs(&init_fpstate, -1);
Now that we have supervisor states like CET, this is buggy. This
should do something intelligent like initializing all the *user* state
and trying again. If that succeeds, a signal should be sent rather
than just corrupting the task. And if it fails, then perhaps some
actual intelligence is needed. We certainly should not just disable
CET because something is wrong with the CET MSRs.
Yes, but it needs more thought. Maybe a separate patch and more discussion?
Yu-cheng