TLB batching breaks MADV_DONTNEED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Something seems to be really wrong with all these TLB flush batching
mechanisms that are all around kernel. Here is another example, which was
not addressed by the recently submitted patches.

Consider what happens when two MADV_DONTNEED run concurrently. According to
the man page "After a successful MADV_DONTNEED operation … subsequent
accesses of pages in the range will succeed, but will result in …
zero-fill-on-demand pages for anonymous private mappings.”

However, the test below, which does MADV_DONTNEED in two threads, reads “8”
and not “0” when reading the memory following MADV_DONTNEED. It happens
since one of the threads clears the PTE, but defers the TLB flush for some
time (until it finishes changing 16k PTEs). The main thread sees the PTE
already non-present and does not flush the TLB.

I think there is a need for a batching scheme that considers whether
mmap_sem is taken for write/read/nothing and the change to the PTE.
Unfortunately, I do not have the time to do it right now.

Am I missing something? Thoughts?


---


#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <pthread.h>
#include <string.h>

#define PAGE_SIZE	(4096)
#define N_PAGES		(65536)

volatile int sync_step = 0;
volatile char *p;

static inline unsigned long rdtsc()
{
	unsigned long hi, lo;
	__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
	 return lo | (hi << 32);
}

static inline void wait_rdtsc(unsigned long cycles)
{
	unsigned long tsc = rdtsc();

	while (rdtsc() - tsc < cycles);
}

void *big_madvise_thread(void *ign)
{
	sync_step = 1;
	while (sync_step != 2);
	madvise((void*)p, PAGE_SIZE * N_PAGES, MADV_DONTNEED);
}

void main(void)
{
	pthread_t aux_thread;

	p = mmap(0, PAGE_SIZE * N_PAGES, PROT_READ|PROT_WRITE,
		 MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

	memset((void*)p, 8, PAGE_SIZE * N_PAGES);

	pthread_create(&aux_thread, NULL, big_madvise_thread, NULL);
	while (sync_step != 1);

	*p = 8;		// Cache in TLB
	sync_step = 2;
	wait_rdtsc(100000);
	madvise((void*)p, PAGE_SIZE, MADV_DONTNEED);
	printf("Result : %d\n", *p);
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux