On Wed, Oct 31, 2012 at 5:50 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > Hello, > > On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote: >> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton >> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: >> > >> > On Tue, 30 Oct 2012 10:29:54 +0900 >> > Minchan Kim <minchan@xxxxxxxxxx> wrote: >> > >> > > This patch introudces new madvise behavior MADV_VOLATILE and >> > > MADV_NOVOLATILE for anonymous pages. It's different with >> > > John Stultz's version which considers only tmpfs while this patch >> > > considers only anonymous pages so this cannot cover John's one. >> > > If below idea is proved as reasonable, I hope we can unify both >> > > concepts by madvise/fadvise. >> > > >> > > Rationale is following as. >> > > Many allocators call munmap(2) when user call free(3) if ptr is >> > > in mmaped area. But munmap isn't cheap because it have to clean up >> > > all pte entries and unlinking a vma so overhead would be increased >> > > linearly by mmaped area's size. >> > >> > Presumably the userspace allocator will internally manage memory in >> > large chunks, so the munmap() call frequency will be much lower than >> > the free() call frequency. So the performance gains from this change >> > might be very small. >> >> I don't think I strictly understand the motivation from a >> malloc-standpoint here. >> >> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want >> to perform discards on Linux. For any reasonable allocator (short >> of binding malloc --> mmap, free --> unmap) this seems a better >> choice. >> >> Note also from a performance stand-point I doubt any allocator (which >> case about performance) is going to want to pay the cost of even a >> null syscall about typical malloc/free usage (consider: a tcmalloc > > Good point. > >> malloc/free pairis currently <20ns). Given then that this cost is >> amortized once you start doing discards on larger blocks MADV_DONTNEED >> seems a preferable interface: >> - You don't need to reconstruct an arena when you do want to allocate >> since there's no munmap/mmap for the region to change about >> - There are no syscalls involved in later reallocating the block. > > Above benefits are applied on MADV_VOLATILE, too. > But as you pointed out, there is a little bit overhead than DONTNEED > because allocator should call madvise(MADV_NOVOLATILE) before allocation. > For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem > and could be a problem on parallel malloc/free workload as KOSAKI pointed out. > > In such case, we can change semantic so malloc doesn't need to call > madivse(NOVOLATILE) before allocating. Then, page fault handler have to > check whether this page fault happen by access of volatile vma. If so, > it could return zero page instead of SIGBUS and mark the vma isn't volatile > any more. I think being able to determine whether the backing was discarded (about a atomic transition to non-volatile) would be a required property to make this useful for non-malloc use-cases. > >> >> The only real additional cost is address-space. Are you strongly >> concerned about the 32-bit case? > > No. I believe allocators have a logic to clean up them once address space is > almost full. > > Thanks, Paul. > > -- > Kind regards, > Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>