On Thu, Feb 26, 2015 at 8:28 AM, Brian Gerst <brgerst@xxxxxxxxx> wrote: > On Thu, Feb 26, 2015 at 10:32 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> On Tue, Feb 24, 2015 at 7:23 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote: >>> On Tue, Feb 24, 2015 at 3:08 PM, Denys Vlasenko >>> <vda.linux@xxxxxxxxxxxxxx> wrote: >>>> On Tue, Feb 24, 2015 at 9:02 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>> This currently fails in 32-bit kernels (at least in qemu): >>>>>> >>>>>> / # ./es_test >>>>>> Allocated GDT index 7 >>>>>> [FAIL] ES changed from 0x3b to 0x7b >>>>>> [FAIL] ES was corrupted 1000/1000 times >>>>>> / # uname -a >>>>>> Linux (none) 4.0.0-rc1 #1 SMP Tue Feb 24 16:41:58 CET 2015 i686 GNU/Linux >>>>> >>>>> Want to send a patch? I'll get it in a few days if no one beats me. >>>> >>>> I have no patch, sorry (in fact, I failed to find where is the relevant >>>> 32-bit counterpart). >>>> >>>> It's just security people asked me to backport this and I wondered >>>> maybe I should wait a bit on this one, since fix for 32-bit ought >>>> to appear as well. >>> >>> For 32-bit kernel, userspace DS and ES are saved at syscall/interrupt >>> entry time and reloaded on exit, unlike in 64-bit where they are saved >>> and loaded at context switch time. Therefore 32-bit is not affected >>> by the issue this patch addresses. >>> >>> It looks to me though, that the ES test program doesn't actually test >>> what the patch fixes - the segment attributes, like the base address. >>> It tests just the selector, which shouldn't change across a kernel >>> entry (with a few exceptions, like signals). If the test is failing, >>> then it is a different issue from what this patch addresses. >> >> It tests it indirectly. The 64-bit code sets the selector to zero if >> it fails to reload it. Testing the ES base is awkward because it >> can't be done in 64-bit code at all. > > I figured out why Denys got the failure. usleep() makes a syscall via > sysenter. The sysenter path saves es/ds, but does not restore them > before sysexit like the int80/iret path would. That leaves them as > USER_DS that the kernel loaded for itself. I believe this was an > intentional optimization, assuming the vdso would only be called from > programs conforming to the ELF ABI. Makes sense. The attached variant passes, so I think we're fine. --Andy
/* * Copyright (c) 2014-2015 Andy Lutomirski * GPL v2 */ #include <stdio.h> #include <unistd.h> #include <time.h> #include <err.h> #include <asm/ldt.h> #include <sys/syscall.h> static unsigned short GDT3(int idx) { return (idx << 3) | 3; } static int create_tls(int idx, unsigned int base) { struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; if (syscall(SYS_set_thread_area, &desc) != 0) err(1, "set_thread_area"); return desc.entry_number; } int main() { int idx = create_tls(-1, 0); printf("Allocated GDT index %d\n", idx); unsigned short orig_es; asm volatile ("mov %%es,%0" : "=rm" (orig_es)); int errors = 0; int total = 1000; for (int i = 0; i < total; i++) { struct timespec req = { .tv_sec = 0, .tv_nsec = 100000, }; int ret; asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx))); /* * Force rescheduling. On 32-bit kernels, fast syscalls * destroy DS and ES, so force int 80. */ asm volatile ("int $0x80" : "=a" (ret) : "a" (SYS_nanosleep), "b" (&req), "c" (0)); unsigned short es; asm volatile ("mov %%es,%0" : "=rm" (es)); asm volatile ("mov %0,%%es" : : "rm" (orig_es)); if (es != GDT3(idx)) { if (errors == 0) printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n", GDT3(idx), es); errors++; } } if (errors) { printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total); return 1; } else { printf("[OK]\tES was preserved\n"); return 0; } }