Re: [PATCH 3.2 055/152] x86_64, switch_to(): Load TLS descriptors before switching DS and ES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 26, 2015 at 8:28 AM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
> On Thu, Feb 26, 2015 at 10:32 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Tue, Feb 24, 2015 at 7:23 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>>> On Tue, Feb 24, 2015 at 3:08 PM, Denys Vlasenko
>>> <vda.linux@xxxxxxxxxxxxxx> wrote:
>>>> On Tue, Feb 24, 2015 at 9:02 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>> This currently fails in 32-bit kernels (at least in qemu):
>>>>>>
>>>>>> / # ./es_test
>>>>>> Allocated GDT index 7
>>>>>> [FAIL]    ES changed from 0x3b to 0x7b
>>>>>> [FAIL]    ES was corrupted 1000/1000 times
>>>>>> / # uname -a
>>>>>> Linux (none) 4.0.0-rc1 #1 SMP Tue Feb 24 16:41:58 CET 2015 i686 GNU/Linux
>>>>>
>>>>> Want to send a patch?  I'll get it in a few days if no one beats me.
>>>>
>>>> I have no patch, sorry (in fact, I failed to find where is the relevant
>>>> 32-bit counterpart).
>>>>
>>>> It's just security people asked me to backport this and I wondered
>>>> maybe I should wait a bit on this one, since fix for 32-bit ought
>>>> to appear as well.
>>>
>>> For 32-bit kernel, userspace DS and ES are saved at syscall/interrupt
>>> entry time and reloaded on exit, unlike in 64-bit where they are saved
>>> and loaded at context switch time.  Therefore 32-bit is not affected
>>> by the issue this patch addresses.
>>>
>>> It looks to me though, that the ES test program doesn't actually test
>>> what the patch fixes - the segment attributes, like the base address.
>>> It tests just the selector, which shouldn't change across a kernel
>>> entry (with a few exceptions, like signals).  If the test is failing,
>>> then it is a different issue from what this patch addresses.
>>
>> It tests it indirectly.  The 64-bit code sets the selector to zero if
>> it fails to reload it.  Testing the ES base is awkward because it
>> can't be done in 64-bit code at all.
>
> I figured out why Denys got the failure.  usleep() makes a syscall via
> sysenter.  The sysenter path saves es/ds, but does not restore them
> before sysexit like the int80/iret path would.  That leaves them as
> USER_DS that the kernel loaded for itself.  I believe this was an
> intentional optimization, assuming the vdso would only be called from
> programs conforming to the ELF ABI.

Makes sense.  The attached variant passes, so I think we're fine.

--Andy
/*
 * Copyright (c) 2014-2015 Andy Lutomirski
 * GPL v2
 */

#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <err.h>
#include <asm/ldt.h>
#include <sys/syscall.h>

static unsigned short GDT3(int idx)
{
	return (idx << 3) | 3;
}

static int create_tls(int idx, unsigned int base)
{
	struct user_desc desc = {
		.entry_number    = idx,
		.base_addr       = base,
		.limit           = 0xfffff,
		.seg_32bit       = 1,
		.contents        = 0, /* Data, grow-up */
		.read_exec_only  = 0,
		.limit_in_pages  = 1,
		.seg_not_present = 0,
		.useable         = 0,
	};

	if (syscall(SYS_set_thread_area, &desc) != 0)
		err(1, "set_thread_area");

	return desc.entry_number;
}

int main()
{
	int idx = create_tls(-1, 0);
	printf("Allocated GDT index %d\n", idx);

	unsigned short orig_es;
	asm volatile ("mov %%es,%0" : "=rm" (orig_es));

	int errors = 0;
	int total = 1000;
	for (int i = 0; i < total; i++) {
		struct timespec req = {
			.tv_sec = 0,
			.tv_nsec = 100000,
		};
		int ret;

		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));

		/*
		 * Force rescheduling.  On 32-bit kernels, fast syscalls
		 * destroy DS and ES, so force int 80.
		 */
		asm volatile ("int $0x80"
			      : "=a" (ret)
			      : "a" (SYS_nanosleep), "b" (&req),
				"c" (0));

		unsigned short es;
		asm volatile ("mov %%es,%0" : "=rm" (es));
		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
		if (es != GDT3(idx)) {
			if (errors == 0)
				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
				       GDT3(idx), es);
			errors++;
		}
	}

	if (errors) {
		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
		return 1;
	} else {
		printf("[OK]\tES was preserved\n");
		return 0;
	}
}

[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]