Something seems to be really wrong with all these TLB flush batching mechanisms that are all around kernel. Here is another example, which was not addressed by the recently submitted patches. Consider what happens when two MADV_DONTNEED run concurrently. According to the man page "After a successful MADV_DONTNEED operation … subsequent accesses of pages in the range will succeed, but will result in … zero-fill-on-demand pages for anonymous private mappings.” However, the test below, which does MADV_DONTNEED in two threads, reads “8” and not “0” when reading the memory following MADV_DONTNEED. It happens since one of the threads clears the PTE, but defers the TLB flush for some time (until it finishes changing 16k PTEs). The main thread sees the PTE already non-present and does not flush the TLB. I think there is a need for a batching scheme that considers whether mmap_sem is taken for write/read/nothing and the change to the PTE. Unfortunately, I do not have the time to do it right now. Am I missing something? Thoughts? --- #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <sys/types.h> #include <pthread.h> #include <string.h> #define PAGE_SIZE (4096) #define N_PAGES (65536) volatile int sync_step = 0; volatile char *p; static inline unsigned long rdtsc() { unsigned long hi, lo; __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); return lo | (hi << 32); } static inline void wait_rdtsc(unsigned long cycles) { unsigned long tsc = rdtsc(); while (rdtsc() - tsc < cycles); } void *big_madvise_thread(void *ign) { sync_step = 1; while (sync_step != 2); madvise((void*)p, PAGE_SIZE * N_PAGES, MADV_DONTNEED); } void main(void) { pthread_t aux_thread; p = mmap(0, PAGE_SIZE * N_PAGES, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); memset((void*)p, 8, PAGE_SIZE * N_PAGES); pthread_create(&aux_thread, NULL, big_madvise_thread, NULL); while (sync_step != 1); *p = 8; // Cache in TLB sync_step = 2; wait_rdtsc(100000); madvise((void*)p, PAGE_SIZE, MADV_DONTNEED); printf("Result : %d\n", *p); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href