[PATCH] x86/fpu: don't let PTRACE_SETREGSET set xcomp_bv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Eric Biggers <ebiggers@xxxxxxxxxx>

On x86, userspace can use ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE,
&iov) to set a task's extended state registers.  Registers can be set to
any value, but the kernel assumes that the xregs_state itself remains
valid in the sense that the CPU can restore it.

However, in the case where the kernel is using the non-compacted
extended state format (which it does whenever the XSAVES instruction is
unavailable), userspace could also set the xcomp_bv field of the
xstate_header to any value.  But all bits in that field are reserved in
the non-compacted case, so when switching to a task with nonzero
xcomp_bv, the XRSTOR instruction failed with a #GP fault.  This caused
the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit and, since the
error is otherwise ignored, could also leak registers from the task
previously executing on the CPU.

Fix the bug by ignoring the user-specified value of xcomp_bv in the
non-compacted case and instead always setting it to 0.

Note: we used to clear all xstate registers if XRSTOR failed, but this
was removed by commit 9ccc27a5d297 ("x86/fpu: Remove error return values
from copy_kernel_to_*regs() functions").  It perhaps should be added
back, to prevent this class of bug from causing an information leak.

This bug was found by syzkaller, which hit the above-mentioned
WARN_ON_FPU():

    WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
    RIP: 0010:__switch_to+0x5b5/0x5d0
    RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
    RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
    RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
    RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
    R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
    FS:  00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
    Call Trace:
    Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f

The following C program reproduces the bug quite reliably.  The expected
behavior is that the program spin forever with no output.  However, on a
buggy kernel running on a processor with the "xsave" feature but without
the "xsaves" feature (e.g. Sandy Bridge through Broadwell for Intel),
within a second or two the program reports that the xmm registers were
corrupted, i.e. were not restored correctly.  With
CONFIG_X86_DEBUG_FPU=y it also hits the above kernel warning.

    #define _GNU_SOURCE
    #include <stdbool.h>
    #include <inttypes.h>
    #include <linux/elf.h>
    #include <stdio.h>
    #include <sys/ptrace.h>
    #include <sys/uio.h>
    #include <sys/wait.h>
    #include <unistd.h>

    int main(void)
    {
        int pid = fork();
        uint64_t xstate[512];
        struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };

        if (pid == 0) {
            bool tracee = true;
            for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
                tracee = (fork() != 0);
            uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x12345678 : 0xDEADBEEF };
            asm volatile("   movdqu %0, %%xmm0\n"
                         "   mov %0, %%rbx\n"
                         "1: movdqu %%xmm0, %0\n"
                         "   mov %0, %%rax\n"
                         "   cmp %%rax, %%rbx\n"
                         "   je 1b\n"
                         : "+m" (xmm0) : : "rax", "rbx", "xmm0");
            printf("BUG: xmm registers corrupted!  tracee=%d, xmm0=%08X%08X%08X%08X\n",
                   tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
        } else {
            usleep(100000);
            ptrace(PTRACE_ATTACH, pid, 0, 0);
            wait(NULL);
            ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
            xstate[65] = -1;
	    ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
	    ptrace(PTRACE_CONT, pid, 0, 0);
	    wait(NULL);
	}
	return 1;
    }

Fixes: 0b29643a5843 ("x86/xsaves: Change compacted format xsave area header")
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Kevin Hao <haokexin@xxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
Cc: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>    [v3.17+]
Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
---
 arch/x86/kernel/fpu/regset.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index b188b16841e3..718b791bc037 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -131,11 +131,15 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
 
 	fpu__activate_fpstate_write(fpu);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
 		ret = copyin_to_xsaves(kbuf, ubuf, xsave);
-	else
+	} else {
 		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
 
+		/* xcomp_bv must be zero when using uncompacted format */
+		xsave->header.xcomp_bv = 0;
+	}
+
 	/*
 	 * In case of failure, mark all states as init:
 	 */
-- 
2.14.1.581.gf28d330327-goog




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]