Re: Dirty/Access bits vs. page content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 23, 2014 at 11:41 AM, Jan Kara <jack@xxxxxxx> wrote:
>
> Now I'm not sure how to fix Linus' patches. For all I care we could just
> rip out pte dirty bit handling for file mappings. However last time I
> suggested this you corrected me that tmpfs & ramfs need this. I assume this
> is still the case - however, given we unconditionally mark the page dirty
> for write faults, where exactly do we need this?

Honza, you're missing the important part: it does not matter one whit
that we unconditionally mark the page dirty, when we do it *early*,
and it can be then be marked clean before it's actually clean!

The problem is that page cleaning can clean the page when there are
still writers dirtying the page. Page table tear-down removes the
entry from the page tables, but it's still there in the TLB on other
CPU's. So other CPU's are possibly writing to the page, when
clear_page_dirty_for_io() has marked it clean (because it didn't see
the page table entries that got torn down, and it hasn't seen the
dirty bit in the page yet).

I'm including Dave Hansen's "racewrite.c" with his commentary:

 "This is a will-it-scale test-case which handles all the thread creation
  and CPU binding for me: https://github.com/antonblanchard/will-it-scale
  .  Just stick the test case in tests/.  I also loopback-mounted a ramfs
  file as an ext4 filesystem on /mnt to make sure the writeback could
  happen fast.

  This reproduces the bug pretty darn quickly and with as few as 4 threads
  running like this:  ./racewrite_threads -t 4 -s 999"

and

 "It reproduces in about 5 seconds on my 4770 on an unpatched kernel.  It
  also reproduces on a _normal_ filesystem and doesn't apparently need the
  loopback-mounted ext4 ramfs file that I was trying before."

so this can actually be triggered.

              Linus
#define _GNU_SOURCE
#define _XOPEN_SOURCE 500
#include <sched.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>

#define BUFLEN 4096

static char wistmpfile[] = "/mnt/willitscale.XXXXXX";

char *testcase_description = "Same file pwrite";

char *buf;
#define FILE_SIZE (4096*1024)
void testcase_prepare(void)
{
	int fd = mkstemp(wistmpfile);

	assert(fd >= 0);
	assert(pwrite(fd, "X", 1, FILE_SIZE-1) == 1);
	buf = mmap(NULL, FILE_SIZE, PROT_READ|PROT_WRITE,
			       MAP_SHARED, fd, 0);
	assert(buf != (void *)-1);
	close(fd);
}

void testcase(unsigned long long *iterations)
{
	int cpu = sched_getcpu();
	int fd = open(wistmpfile, O_RDWR);
	off_t offset = sched_getcpu() * BUFLEN;
	long counter = 0;
	long counterread = 0;
	long *counterbuf = (void *)&buf[offset];

	printf("offset: %ld\n", offset);
	printf("       buf: %p\n", buf);
	printf("counterbuf: %p\n", counterbuf);
	assert(fd >= 0);

	while (1) {
		int ret;
		if (cpu == 1) {
			ret = madvise(buf, FILE_SIZE, MADV_DONTNEED);
			continue;
		}
	
		*counterbuf = counter;
		posix_fadvise(fd, offset, BUFLEN, POSIX_FADV_DONTNEED);
		ret = pread(fd, &counterread, sizeof(counterread), offset);
		assert(ret == sizeof(counterread));

		if (counterread != counter) {
			printf("cpu: %d\n", cpu);
			printf("    counter %ld\n", counter);
			printf("counterread %ld\n", counterread);
			printf("*counterbuf %ld\n", *counterbuf);
			while(1);
		}
		counter++;	
		(*iterations)++;
	}
}

void testcase_cleanup(void)
{
	unlink(wistmpfile);
}

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]