On Wed, Nov 4, 2015 at 12:00 PM, Shaohua Li <shli@xxxxxxxxxx> wrote: > > The new proposal tries to fix the TLB issue. We introduce two madvise verbs: > > MARK_FREE. Userspace notifies kernel the memory range can be discarded. Kernel > just records the range in current stage. Should memory pressure happen, page > reclaim can free the memory directly regardless the pte state. > > MARK_NOFREE. Userspace notifies kernel the memory range will be reused soon. > Kernel deletes the record and prevents page reclaim discards the memory. If the > memory isn't reclaimed, userspace will access the old memory, otherwise do > normal page fault handling. > > The point is to let userspace notify kernel if memory can be discarded, instead > of depending on pte dirty bit used by MADV_FREE. With these, no TLB flush is > required till page reclaim actually frees the memory (page reclaim need do the > TLB flush for MADV_FREE too). It still preserves the lazy memory free merit of > MADV_FREE. > > Compared to MADV_FREE, reusing memory with the new proposal isn't transparent, > eg must call MARK_NOFREE. But it's easy to utilize the new API in jemalloc. > I can't speak to the usefulness of this or to other arches, but on x86 (unless you have nohz_full or similar enabled), a pair of syscalls should be *much* faster than an IPI or a page fault. I don't know how expensive it is to write to a clean page or to access an unaccessed page on x86. I'm sure it's not free (there's memory bandwidth if nothing else), but it could be very cheap. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html