RE: [PATCH] MIPS: VDSO: Match data page cache colouring when D$ aliases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

I have looked a bit more at this issue and found ways to reproduce and looked
further at both James Hogan's and Paul Burton's patches.

--------------------------------------oOo--------------------------------------

The Problem
-----------

On our MIPS 24KEc, the dcache is 32 KBytes, 4-way and had Linux utilized a page
size of 8 Kbytes, we wouldn't have this dcache aliasing problem.

With a 4 Kbytes page size, however, it must be ensured that the color of
user-land pages is the same as the color of kernel-space pages when memory is
shared between the two.

In our case, this means that NOT only bits 11:0 (page-aligned) of the addresses
must be identical, but also bit 12 (making them color-aligned).

In order to expose the problem, we must therefore attempt to have VDSO in kernel
have a different page color than VDSO for the user-land mapping.

When a program loads, the first data that gets allocated is for glibc's loader
(ld-2.27.so in our case), and the next thing that gets allocated is the two
pages needed for VDSO ([vvar] and [vdso]).

Therefore, the page color of user-space VDSO highly depends on the size of
the loader's requested data. In my original post
(https://www.linux-mips.org/archives/linux-mips/2017-06/msg00621.html), I wrote
that it started happening when compiling glibc with
'-fasynchronous-unwind-tables'. This may have changed the loader's data size to
go from an even number of pages to an odd number of pages or vice versa, thereby
making the color of the subsequent VDSO user-space mapping likely to be
different from the kernel's.

A change in the linux kernel may also produce this, because of a change in page
color of the address where 'vdso_data' (declared in arch/mips/kernel/vdso.c)
starts.

For completeness, here's a snippet from the pagemap for some random process:

Section Name    Perm Virt Start Virt End   Virt Size  Phys Size
--------------- ---- ---------- ---------- ---------- ----------
...
                rwxp 0x77d04000 0x77d0e000      40960      36864
[vvar]          r--p 0x77d0e000 0x77d0f000       4096          0
[vdso]          r-xp 0x77d0f000 0x77d10000       4096       4096
/lib/ld-2.27.so r-xp 0x77d10000 0x77d11000       4096       4096
/lib/ld-2.27.so rwxp 0x77d11000 0x77d12000       4096       4096
[stack]         rwxp 0x7ff81000 0x7ffa2000     135168      28672

--------------------------------------oOo--------------------------------------

Modify kernel to provoke the issue
----------------------------------

In order to provoke the problem, we must first figure out whether the color of
'vdso_data' in kernel-space is different from [vvar] in user-space.

The attached patch named 'vdso-chk-1.patch' prints the address of &vdso_data and
the corresponding user-land address and "aligned" if bit 12 are identical and
"NOT ALIGNED" if not.

If "aligned" is printed for all started processes, I suggest trying the attached
patch named 'vdso-chk-2.patch'. This will declare a dummy variable in vdso.c
that will cause the linker to place vdso_data at a differently colored page.

--------------------------------------oOo--------------------------------------

Reproduce
---------

When the error is reproducible, you may want to attempt to provoke it.
I've attached a program, 'provoke.c', that will print to stderr whenever two
consecutive timestamps are received out of order from the kernel.

To increase the chance of errors to occur, the program must be instantiated
many times in parallel. The following shell command will create 50 simultaneous
instances of it:
    $ for i in $(seq 50); do provoke > /dev/null & done

An example of a snippet of the output when it goes wrong:
    ...
    [   46.926329] tgid = 171, pid = 171, comm =    timeofday: data_addr = 0x77f60000, &vdso_data = 0x80525000, &dummy = 0x80524000 => NOT ALIGNED
    [   46.986344] tgid = 172, pid = 172, comm =    timeofday: data_addr = 0x77126000, &vdso_data = 0x80525000, &dummy = 0x80524000 => NOT ALIGNED
    [   47.070821] tgid = 173, pid = 173, comm =    timeofday: data_addr = 0x7701c000, &vdso_data = 0x80525000, &dummy = 0x80524000 => NOT ALIGNED
    [   47.090460] tgid = 170, pid = 170, comm =    timeofday: data_addr = 0x779c2000, &vdso_data = 0x80525000, &dummy = 0x80524000 => NOT ALIGNED
    [   47.138366] tgid = 174, pid = 174, comm =    timeofday: data_addr = 0x77f60000, &vdso_data = 0x80525000, &dummy = 0x80524000 => NOT ALIGNED
    [   47.166330] tgid = 175, pid = 175, comm =    timeofday: data_addr = 0x77406000, &vdso_data = 0x80525000, &dummy = 0x80524000 => NOT ALIGNED
    tid = 126: Ran 10000 times. error_cnt = 0, success_cnt = 10000
    tid = 130: Ran 10000 times. error_cnt = 0, success_cnt = 10000
    Error: tid = 174: clock_gettime(): Prev = 56043, Cur = 53056, diff = -2987
    Error: tid = 161: clock_gettime(): Prev = 56247, Cur = 53060, diff = -3187
    Error: tid = 168: clock_gettime(): Prev = 56251, Cur = 53064, diff = -3187
    Error: tid = 137: clock_gettime(): Prev = 56255, Cur = 53068, diff = -3187
    Error: tid = 175: clock_gettime(): Prev = 56259, Cur = 53072, diff = -3187
    Error: tid = 129: clock_gettime(): Prev = 56263, Cur = 53076, diff = -3187
    tid = 129: Ran 10000 times. error_cnt = 1, success_cnt = 9999
    Error: tid = 155: clock_gettime(): Prev = 56267, Cur = 53078, diff = -3189
    Error: tid = 165: clock_gettime(): Prev = 56271, Cur = 53080, diff = -3191
    ...

--------------------------------------oOo--------------------------------------

Trying out James Hogan's patch
------------------------------

With the error-producing version of vdso-chk-X.patch applied, apply James'
patch and run the 'provoke' program again.

This works since kernel- and user-space coloring always becomes identical.

--------------------------------------oOo--------------------------------------

Trying out Paul Burton's patch
------------------------------

With the error-producing version of vdso-chk-X.patch applied, apply Paul's patch
and run the 'provoke' program again.

This also works.

Paul's patch allocates twice the amount of needed VM, but I guess that's fine,
as it's also less intrusive (no changes to mmap.c).

Regards,
René Nielsen

-----Original Message-----
From: Paul Burton [mailto:paul.burton@xxxxxxxx] 
Sent: 30. august 2018 20:01
To: Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx>; Rene Nielsen <rene.nielsen@xxxxxxxxxxxxx>; Hauke Mehrtens <hauke@xxxxxxxxxx>
Cc: linux-mips@xxxxxxxxxxxxxx; Paul Burton <paul.burton@xxxxxxxx>; James Hogan <jhogan@xxxxxxxxxx>; stable@xxxxxxxxxxxxxxx
Subject: [PATCH] MIPS: VDSO: Match data page cache colouring when D$ aliases

EXTERNAL EMAIL


When a system suffers from dcache aliasing a user program may observe stale VDSO data from an aliased cache line. Notably this can break the expectation that clock_gettime(CLOCK_MONOTONIC, ...) is, as its name suggests, monotonic.

In order to ensure that users observe updates to the VDSO data page as intended, align the user mappings of the VDSO data page such that their cache colouring matches that of the virtual address range which the kernel will use to update the data page - typically its unmapped address within kseg0.

This ensures that we don't introduce aliasing cache lines for the VDSO data page, and therefore that userland will observe updates without requiring cache invalidation.

Signed-off-by: Paul Burton <paul.burton@xxxxxxxx>
Reported-by: Hauke Mehrtens <hauke@xxxxxxxxxx>
Reported-by: Rene Nielsen <rene.nielsen@xxxxxxxxxxxxx>
Reported-by: Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx>
Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO")
Cc: James Hogan <jhogan@xxxxxxxxxx>
Cc: linux-mips@xxxxxxxxxxxxxx
Cc: stable@xxxxxxxxxxxxxxx # v4.4+
---
Hi Alexandre,

Could you try this out on your Ocelot system? Hopefully it'll solve the problem just as well as James' patch but doesn't need the questionable change to arch_get_unmapped_area_common().

Thanks,
    Paul
---
 arch/mips/kernel/vdso.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 019035d7225c..5fb617a42335 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -13,6 +13,7 @@
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/ioport.h>
+#include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
@@ -20,6 +21,7 @@

 #include <asm/abi.h>
 #include <asm/mips-cps.h>
+#include <asm/page.h>
 #include <asm/vdso.h>

 /* Kernel-provided data used by the VDSO. */ @@ -128,12 +130,30 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
        vvar_size = gic_size + PAGE_SIZE;
        size = vvar_size + image->size;

+       /*
+        * Find a region that's large enough for us to perform the
+        * colour-matching alignment below.
+        */
+       if (cpu_has_dc_aliases)
+               size += shm_align_mask + 1;
+
        base = get_unmapped_area(NULL, 0, size, 0, 0);
        if (IS_ERR_VALUE(base)) {
                ret = base;
                goto out;
        }

+       /*
+        * If we suffer from dcache aliasing, ensure that the VDSO data page is
+        * coloured the same as the kernel's mapping of that memory. This
+        * ensures that when the kernel updates the VDSO data userland will see
+        * it without requiring cache invalidations.
+        */
+       if (cpu_has_dc_aliases) {
+               base = __ALIGN_MASK(base, shm_align_mask);
+               base += ((unsigned long)&vdso_data - gic_size) & shm_align_mask;
+       }
+
        data_addr = base + gic_size;
        vdso_addr = data_addr + PAGE_SIZE;

--
2.18.0

Attachment: vdso-chk-1.patch
Description: vdso-chk-1.patch

Attachment: vdso-chk-2.patch
Description: vdso-chk-2.patch

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <inttypes.h>

// for i in $(seq 50); do provoke > /dev/null & done

static volatile int run = 1;

static void ctrl_c_handler(int sig)
{
    run = 0;
}

static uint64_t milliseconds(void)
{
    struct timespec time;
    if (clock_gettime(CLOCK_MONOTONIC, &time) == 0) {
        return ((uint64_t)time.tv_sec * 1000ULL) + (time.tv_nsec / 1000000);
    }

    fprintf(stderr, "clock_gettime() failed: %s\n", strerror(errno));
    exit(-1);
}

int main(void)
{
    uint32_t i, error_cnt = 0;
    uint64_t prev_time = 0;
    int      tid = syscall(SYS_gettid);

    signal(SIGINT, ctrl_c_handler);

    for (i = 0; i < 10000 && run; i++) {
        uint64_t cur_time = milliseconds();

        if (cur_time < prev_time) {
            error_cnt++;
            fprintf(stderr, "Error: tid = %d: clock_gettime(): Prev = %" PRIu64
	            ", Cur = %" PRIu64 ", diff = -%" PRIu64 "\n",
		    tid, prev_time, cur_time, prev_time - cur_time);
        } else {
            fprintf(stdout, "Info:  tid = %d: clock_gettime(): Prev = %" PRIu64
	            ", Cur = %" PRIu64 ", diff =  %" PRIu64 "\n",
		    tid, prev_time, cur_time, cur_time - prev_time);
        }

        prev_time = cur_time;
    }

    fprintf(stderr, "tid = %d: Ran %u times. error_cnt = %u, success_cnt = %u\n",
            tid, i, error_cnt, i - error_cnt);

    return error_cnt ? -1 : 0;
}


[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux